Threshold Tuning Strategies

Threshold tuning decides the exact numbers OpenSearch Index State Management (ISM) uses to end an index’s hot life — the min_size, min_index_age, min_doc_count, and min_primary_shard_size values a rollover fires on — and getting them wrong is expensive in ways that only surface under load. Set the size guard too high and a single primary grows past the recovery-safe ceiling, so a node restart takes hours to rejoin; set the age guard too low and a day of logs fragments into dozens of tiny indices that burn heap on cluster state; leave the thresholds static and a seasonal traffic spike either overshoots the disk watermark before the next sweep or stalls transitions so indices pile up in hot. This guide treats threshold calibration as a measurable, closed-loop discipline: it maps thresholds to the hot-tier hardware that has to absorb them, explains the OR-semantics and precedence that decide which condition actually fires, derives the size and age values from real ingestion velocity, and wraps the whole thing in idempotent Python so the numbers track the workload instead of drifting away from it. It builds on the ISM Policy Implementation & Python Automation execution model and calibrates the exact conditions that Rollover Trigger Configuration fires on before an index enters the Phase Transition Logic chain.

Tier alignment for threshold budgets

Thresholds are not abstract numbers — every one of them is a claim on hot-node hardware. The size and shard-size guards must fit inside the disk headroom of the node that holds the write shard, and the age guard is only meaningful relative to how fast that node ingests. Before tuning a single value, align the thresholds to the tier the index actually lives on as it moves through its lifecycle. The node-role mechanics behind these routing attributes are covered under Node Role Allocation, and how the tier ratios are sized is the subject of Hot-Warm-Cold Tier Design.

Lifecycle role	Storage profile	vCPU : RAM ratio	Routing attribute	Threshold that governs it
Hot (write index)	Local NVMe SSD	1 : 4 (compute-heavy)	`node.attr.data: hot`	`min_size` / `min_primary_shard_size` sized to hot-disk headroom
Warm (post-roll)	SATA/SAS SSD	1 : 6	`node.attr.data: warm`	`min_rollover_age` before allocation to warm
Cold (long-term)	High-density HDD	1 : 8 (storage-heavy)	`node.attr.data: cold`	Age-based transition; watermark-sensitive
CCR follower (hot)	Mirrors leader hot	Matches leader	`node.attr.data: hot`	Leader `min_index_age` capped so replication keeps pace

The operational takeaway is that a threshold is a budget against a specific disk, not a rule of thumb. A single hot node hosting several write shards can roll several indices in the same window, and each freshly bootstrapped shard lands on the same disk the outgoing one still occupies during relocation — so the size guard must leave room for both. When hot capacity is short, the write index falls back per Fallback Routing Strategies, which is why the numbers below are always paired with a watermark headroom band.

How ISM evaluates rollover thresholds

ISM evaluates the rollover action’s conditions on the background job scheduler, not in real time. The scheduler polls index metadata on a fixed interval (plugins.index_state_management.job_interval, default 5 minutes) and rolls the index on the first sweep after any one condition is satisfied — the conditions are OR-ed, never AND-ed. This has two direct consequences for tuning. First, because evaluation is periodic, a high-throughput index overshoots its threshold between sweeps: treat every size value as a floor with headroom, not an exact ceiling. Second, because the conditions are OR-ed, the most aggressive threshold wins — an over-tight min_index_age fires the roll regardless of how small the shard still is, which quietly defeats a carefully sized min_primary_shard_size. Misaligned values therefore disrupt the downstream Phase Transition Logic, stranding indices in hot or pushing them into warm/delete prematurely.

The four conditions map to distinct control goals, and precedence between them is what keeps a policy predictable:

min_size targets total primary-shard storage (replicas excluded) and is the usual driver for recovery-safe sizing, but it scales with primary count and can mask a single bloated shard.
min_primary_shard_size measures the largest single primary and is the more precise control on multi-primary indices — prefer it over min_size whenever the shard count is greater than one.
min_index_age guarantees a predictable roll boundary (one index per day, say) regardless of volume, which keeps retention math simple.
min_doc_count suits uniform, small documents but is a poor control for blob-heavy logs where document size varies wildly.

Because ISM never overrides cluster-level allocation guards, thresholds must be validated against index.routing.allocation.total_shards_per_node and the disk watermarks (cluster.routing.allocation.disk.watermark.low/high/flood_stage). ISM only signals when a lifecycle action should execute; it will not rescue a policy whose size guard sits above the disk it has to allocate on.

Calibrating thresholds from ingestion velocity

Static defaults degrade under variable workloads, so calibration starts with measurement rather than a guessed number. Capture a baseline over a representative window — at least 72 hours to span a weekday/weekend cycle — with _cat/indices?v&h=index,store.size,docs.count,health, then pick a target primary-shard size from the query pattern and hardware IOPS:

Workload type	Target primary shard size	Rationale
High-cardinality logs	30–50 GB	Optimizes segment-merge frequency and reduces heap pressure
Time-series metrics	10–20 GB	Accelerates time-range filters and range queries
Audit / compliance trails	5–10 GB	Supports frequent wildcard/regex queries without segment bloat

With a target shard size in hand, derive the age threshold directly from the measured hourly ingestion volume so a size-driven and an age-driven roll land at roughly the same moment:

\text{Age threshold (hours)} = \frac{\text{Target shard size (GB)}}{\text{Average hourly ingestion (GB)}}

Apply a 10–15% buffer below the raw result to absorb traffic spikes and prevent thrashing during peak ingestion windows. The same buffer applies to size: set min_size and min_primary_shard_size 10–15% under the hard disk limit so an index that overshoots between sweeps still rolls before it trips the watermark. In multi-shard deployments, anchor the policy on min_primary_shard_size — it guarantees a predictable segment boundary regardless of replica count, whereas min_size grows with the total footprint and can hide primary-shard bloat until recovery time.

Step-by-step threshold configuration

Threshold values only take effect through three coordinated pieces — the hot-node attributes the write index allocates on, the index template that carries the rollover_alias, and the policy that holds the calibrated conditions. The four steps below stand those up and verify the numbers are live.

1. Node configuration

Confirm the hot nodes that host the write index carry the routing attribute the write path targets, and bound the sweep interval so threshold overshoot stays predictable.

YAML

# opensearch.yml on each hot data node
node.roles: [ data, ingest ]
node.attr.data: hot
# Tighten the ISM sweep so overshoot between evaluations is bounded (cluster-wide, dynamic):
# PUT _cluster/settings { "persistent": { "plugins.index_state_management.job_interval": 5 } }

2. Index template

The template pins the shard count the size thresholds are budgeted against and wires the rollover_alias ISM rolls. Keep number_of_shards explicit — changing it later invalidates every min_primary_shard_size you calibrated.

JSON

PUT _index_template/logs-threshold-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1,
      "plugins.index_state_management.rollover_alias": "logs-write"
    }
  },
  "priority": 100
}

The template priority must exceed any legacy template matching the same pattern, or the older template silently wins and the rollover_alias never lands — the single most common reason a well-tuned policy “does nothing”.

3. Policy JSON

Deploy the calibrated thresholds in an explicit, deterministic condition block, wrapped in a retry so transient cluster pressure does not strand the action. Every value here comes from the calibration step above, not from a default.

HTTP

PUT _plugins/_ism/policies/log_tiered_lifecycle
{
  "policy": {
    "description": "Tiered log lifecycle with calibrated thresholds",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "retry": { "count": 5, "backoff": "exponential", "delay": "2m" },
            "rollover": {
              "min_index_age": "12h",
              "min_size": "35gb",
              "min_doc_count": 150000000,
              "min_primary_shard_size": "40gb"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_rollover_age": "12h" }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [ { "replica_count": { "number_of_replicas": 1 } } ],
        "transitions": []
      }
    ],
    "ism_template": [
      { "index_patterns": ["logs-*"], "priority": 100 }
    ]
  }
}

Two rules keep this deterministic rather than merely valid. The retry block must live inside the action scope so ISM honours exponential backoff on transient failures, and the transition uses min_rollover_age (time since the roll) rather than min_index_age (time since creation), so a just-rolled index only advances after background replication has settled.

4. Verification

Confirm the policy attached, the thresholds are the ones you deployed, and the generation counter is advancing.

Shell

# Is the policy managing the write alias's backing indices, and in which state?
GET _plugins/_ism/explain/logs-*

# Are the live shard sizes tracking the threshold you set?
GET _cat/indices/logs-*?v&h=index,pri,pri.store.size,docs.count&s=index

# Verify segment merge efficiency after a roll — oversized segments signal a too-high threshold:
GET _cat/segments?v&h=index,shard,size

A healthy result shows the current write index in state hot with no failed_index_attempts, primary store sizes climbing toward but not past your min_primary_shard_size, and the highest-numbered index as the write target. If a primary is already well past its threshold, the sweep is too coarse for the ingest rate — jump to the troubleshooting section.

Automated threshold adjustment with Python

Static policies cannot follow seasonal traffic or recover from a pipeline stall, so production deployments wrap the thresholds in an orchestration layer that measures the current ingestion rate, recalculates the size and age values, and applies them idempotently. The script below reads raw byte counts to avoid fragile unit-suffix parsing, clamps the derived age to a sane range, and PUTs the policy with exponential backoff — the same idempotent pattern that slots into the CI/CD structure under Python Orchestration Frameworks. The exact DSL payloads it deploys are documented in the Configuring index size and age thresholds for rollover walkthrough.

Python

import os
import json
import logging
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)

OPENSEARCH_HOST = os.getenv("OPENSEARCH_HOST", "https://localhost:9200")
OPENSEARCH_USER = os.getenv("OPENSEARCH_USER", "admin")
OPENSEARCH_PASS = os.getenv("OPENSEARCH_PASS", "admin")
POLICY_NAME = "log_tiered_lifecycle"
TARGET_SHARD_SIZE_GB = 35


def get_session() -> requests.Session:
    session = requests.Session()
    session.auth = (OPENSEARCH_USER, OPENSEARCH_PASS)
    session.verify = os.getenv("SSL_VERIFY", "false").lower() == "true"
    retry_strategy = Retry(total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504])
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    session.mount("http://", adapter)
    return session


def fetch_ingestion_rate(session: requests.Session, index_pattern: str = "logs-*") -> float:
    """Return average hourly ingestion in GB over the last 24h."""
    # bytes=b returns raw integer byte counts, avoiding fragile unit-suffix parsing.
    resp = session.get(
        f"{OPENSEARCH_HOST}/_cat/indices/{index_pattern}"
        "?h=pri.store.size,docs.count&format=json&bytes=b"
    )
    resp.raise_for_status()
    indices = resp.json()
    total_size_bytes = sum(int(idx.get("pri.store.size") or 0) for idx in indices)
    return total_size_bytes / (24 * 1024**3)


def calculate_thresholds(hourly_gb: float) -> dict:
    if hourly_gb <= 0:
        hourly_gb = TARGET_SHARD_SIZE_GB / 24  # fall back to a 24h baseline
    # Clamp between 4 hours and 7 days so tiny rates don't yield absurd ages.
    age_hours = min(168, max(4, round(TARGET_SHARD_SIZE_GB / hourly_gb)))
    return {
        "min_index_age": f"{age_hours}h",
        "min_size": f"{TARGET_SHARD_SIZE_GB}gb",
        "min_primary_shard_size": f"{TARGET_SHARD_SIZE_GB}gb",
    }


def update_ism_policy(session: requests.Session, thresholds: dict) -> None:
    payload = {
        "policy": {
            "description": "Auto-tuned tiered lifecycle",
            "default_state": "hot",
            "states": [{
                "name": "hot",
                "actions": [{
                    "retry": {"count": 5, "backoff": "exponential", "delay": "2m"},
                    "rollover": thresholds,
                }],
                "transitions": [{"state_name": "warm", "conditions": {"min_rollover_age": "12h"}}],
            }],
        }
    }
    url = f"{OPENSEARCH_HOST}/_plugins/_ism/policies/{POLICY_NAME}"
    resp = session.put(url, json=payload, headers={"Content-Type": "application/json"})
    resp.raise_for_status()
    logger.info("Policy updated successfully: %s", json.dumps(thresholds))


def main():
    session = get_session()
    try:
        hourly_gb = fetch_ingestion_rate(session)
        if hourly_gb <= 0:
            logger.warning("Ingestion rate too low; skipping threshold update.")
            return
        thresholds = calculate_thresholds(hourly_gb)
        logger.info("Calculated thresholds: %s", thresholds)
        update_ism_policy(session, thresholds)
    except requests.exceptions.RequestException as exc:
        logger.error("Failed to update ISM policy: %s", exc)
        raise


if __name__ == "__main__":
    main()

Schedule this via cron or a Kubernetes CronJob on a 6–12 hour cadence — frequent enough to track a workload shift, slow enough to avoid API thrashing. Because the PUT is idempotent, an unchanged calculation simply bumps the policy’s sequence number without side effects. How the sweep dispatches the recalibrated action without blocking is detailed under Async Execution Patterns.

Operational guardrails

Thresholds interact with disk watermarks, the sweep interval, and Cross-Cluster Replication (CCR) topology — tune one in isolation and another breaks. The settings below are the ones that keep calibrated thresholds deterministic under load.

Setting	Recommended	Why it matters for threshold tuning
`plugins.index_state_management.job_interval`	5m (2m under heavy ingest)	Upper bound on how far an index overshoots its threshold before rolling
`rollover.min_primary_shard_size`	25–40 GB	Keeps individual shards inside the recovery-safe window on restart
`rollover.min_size`	10–15% below hot-disk headroom	Total-index guard; must fit the disk during bootstrap of the next index
`retry.count` / `backoff` / `delay`	5 / exponential / 2m	Rides out transient thread-pool rejection without stranding the action
`cluster.routing.allocation.disk.watermark.low`	82%	Reserves headroom so a rolled index can allocate on a busy hot node
`cluster.routing.allocation.disk.watermark.high`	88%	Blocks relocation onto a hot node already near capacity mid-roll

The watermark numbers sit deliberately below the single-tier defaults (85% / 90%): a roll briefly needs room for both the outgoing and the newly bootstrapped index on the same hot disk, and the min_size guard must clear the low watermark with margin. In a CCR topology the leader’s thresholds also constrain the follower, because oversized leader shards extend the checkpoint-alignment window and can trip replication_lag alerts. Keep min_index_age aligned across leader and follower policies so rollover windows do not diverge, cap min_primary_shard_size at 50 GB for replicated indices to avoid full-segment transfers on initial sync, and watch _plugins/_replication/follower_stats for checkpoint_lag_bytes; if lag exceeds 20% of the roll interval, lower min_size by ~15%.

Troubleshooting threshold failures

Threshold problems are almost always calibration or timing issues, not bugs. The four below account for most production incidents; each pairs a diagnosis command with its fix.

1. Index grows far past min_size before rolling. The sweep interval is too coarse for the ingest rate, so the index overshoots between polls.

Shell

GET _plugins/_ism/explain/logs-write        # diagnose: compare last-evaluated timestamp vs now

Lower plugins.index_state_management.job_interval (for example to 2m) or drop the size threshold so the roll fires with headroom to spare.

2. Indices roll while still tiny. An over-aggressive min_index_age is winning the OR race and firing before the size guard ever engages.

Shell

GET _cat/indices/logs-*?v&h=index,pri.store.size,creation.date.string   # diagnose: sub-target shards

Raise min_index_age to match the calibrated age formula, or remove it and let min_primary_shard_size drive the roll.

3. Rollover action stuck in a failed state. A transient thread-pool rejection or allocation stall left the action failed after exhausting retries.

Shell

GET _plugins/_ism/explain/logs-write        # diagnose: inspect failed_index_attempts + info

Clear it with POST _plugins/_ism/retry/logs-write; if it recurs, widen the retry block or resolve the capacity pressure. Bounded recovery for this class is detailed under Error Handling & Retries.

4. Roll blocked by disk watermark despite a “safe” threshold. min_size sits too close to the low watermark, so the new index cannot allocate.

Shell

GET _cluster/allocation/explain            # diagnose: watermark decider blocking allocation

Lower min_size so it clears the low watermark by 10–15%, or add hot-node capacity; ISM will not override the allocation guard.

Frequently asked questions

Are multiple rollover thresholds AND-ed or OR-ed together?

They are OR-ed. ISM rolls the index as soon as any configured condition (min_size, min_index_age, min_doc_count, or min_primary_shard_size) crosses its threshold. When you combine several, the most aggressive one wins — so choose both values conservatively if you want them to fire at roughly the same time.

Why does my index roll well past the min_size I set?

Thresholds are evaluated on the background sweep (job_interval, default 5 minutes), not in real time. Between two sweeps a high-throughput index keeps ingesting and overshoots. Treat min_size as a floor with 10–15% headroom and tighten the sweep interval if the overshoot is unacceptable.

Should I tune min_size or min_primary_shard_size?

Prefer min_primary_shard_size whenever an index has more than one primary. It caps the individual shard, guaranteeing a predictable segment boundary regardless of replica count, whereas min_size measures the total across all primaries and can hide a single bloated shard until recovery time exposes it.

How often should the Python recalibration job run?

A 6–12 hour cadence tracks workload shifts without thrashing the ISM API. Because the PUT is idempotent, an unchanged calculation only bumps the policy’s sequence number, so running it frequently is safe — the constraint is API load, not correctness.

Rollover Trigger Configuration — the rollover action mechanics these thresholds fire on.
Phase Transition Logic — what happens to an index once a threshold rolls it out of hot.
Python Orchestration Frameworks — CI/CD structure for the recalibration script.
Async Execution Patterns — how the sweep dispatches a recalibrated action without blocking.
Configuring index size and age thresholds for rollover — the exact DSL payloads and boundary conditions this page introduces.

Up: ISM Policy Implementation & Python Automation

Threshold Tuning Strategies

Tier alignment for threshold budgets #

How ISM evaluates rollover thresholds #

Calibrating thresholds from ingestion velocity #

Step-by-step threshold configuration #

1. Node configuration #

2. Index template #

3. Policy JSON #

4. Verification #

Automated threshold adjustment with Python #

Operational guardrails #

Troubleshooting threshold failures #

Frequently asked questions #

Related #

Tier alignment for threshold budgets

How ISM evaluates rollover thresholds

Calibrating thresholds from ingestion velocity

Step-by-step threshold configuration

1. Node configuration

2. Index template

3. Policy JSON

4. Verification

Automated threshold adjustment with Python

Operational guardrails

Troubleshooting threshold failures

Frequently asked questions

Related