Rollover Trigger Configuration
Rollover Trigger Configuration dictates the precise conditions under which OpenSearch Index State Management (ISM) transitions an active write index to a read-only or archived state. In high-throughput log pipelines and distributed search architectures, poorly calibrated triggers cause unbounded shard growth, replication lag, or premature index fragmentation. Effective deployment requires aligning trigger thresholds with underlying storage capacity, shard allocation limits, and Cross-Cluster Replication (CCR) synchronization windows. This guide details the exact API payloads, threshold calibration matrices, and automation patterns required to enforce deterministic rollover behavior. Teams standardizing policy deployment pipelines should integrate these patterns into their broader ISM Policy Implementation & Python Automation workflows.
flowchart TD
A["Active write index"] --> B{"Any rollover condition met?"}
B -- "min_size / min_index_age / min_doc_count / min_primary_shard_size" --> R["Roll over to new write index"]
B -- "none met" --> W["Keep writing; re-check next job cycle"]
R --> T["Transition rolled index to next phase"]
API Payload Structure for Deterministic Triggers
OpenSearch ISM evaluates rollover conditions through a declarative JSON policy submitted via the _plugins/_ism/policies/ endpoint. The payload must explicitly define state transitions, retry logic, and CCR-safe parameters. A production-grade configuration avoids implicit defaults and enforces strict boundaries to prevent race conditions during index lifecycle shifts.
PUT _plugins/_ism/policies/log_rollover_policy
{
"policy": {
"description": "Production log rollover with CCR alignment",
"default_state": "hot",
"states": [
{
"name": "hot",
"actions": [
{
"retry": {
"count": 5,
"backoff": "exponential",
"delay": "2m"
},
"rollover": {
"min_size": "50gb",
"min_index_age": "1d",
"min_doc_count": 50000000
}
}
],
"transitions": [
{
"state_name": "warm",
"conditions": {
"min_rollover_age": "12h"
}
}
]
},
{
"name": "warm",
"actions": [
{
"replica_count": { "number_of_replicas": 1 }
}
],
"transitions": []
}
],
"ism_template": {
"index_patterns": ["logs-*"],
"priority": 100
}
}
}
Critical implementation rules:
min_rollover_agein the transition block prevents premature state shifts before background replication tasks finalize.- The
retryblock must reside inside the action scope, not at the policy root, ensuring ISM respects exponential backoff during transient cluster pressure. ism_template.prioritymust exceed legacy index template priorities (typically > 50) to guarantee attachment during dynamic index creation.
Threshold Calibration Matrix
Trigger thresholds require continuous calibration against shard sizing, JVM heap pressure, and replication throughput. The following matrix outlines operational boundaries for high-throughput ingestion clusters:
| Metric | Conservative (Low Risk) | Aggressive (High Throughput) | CCR Consideration |
|---|---|---|---|
min_size |
30 GB | 50–75 GB | Must not exceed follower node storage headroom |
min_index_age |
6h | 12–24h | Align with CCR checkpoint intervals |
min_doc_count |
25M | 50M+ | Irrelevant for binary-heavy logs |
min_rollover_age |
1h | 4–6h | Prevents warm-state transition mid-sync |
Calibrating these values requires understanding how Phase Transition Logic evaluates cluster health before executing state changes. Overly aggressive size thresholds on clusters with limited JVM heap will trigger garbage collection storms, while excessively low document counts fragment search performance and increase query latency across distributed nodes.
Deterministic Execution & Async Handling
ISM does not evaluate triggers synchronously. The ISM background job polls index metadata at a configurable interval (plugins.index_state_management.job_interval, default: 5m). When a condition is met, the rollover action enters an asynchronous execution queue. In CCR environments, this introduces a critical synchronization dependency: the follower cluster must complete its replication checkpoint before the leader transitions the index to a read-only state, or risk data divergence.
To manage this, engineers should leverage Async Execution Patterns that decouple trigger evaluation from downstream orchestration. Implementing explicit polling against the _plugins/_ism/explain/ endpoint allows automation frameworks to verify deterministic state confirmation rather than relying on immediate HTTP response codes. This approach is essential when coordinating policy updates across geographically distributed data centers.
Python Automation Integration
Manual policy deployment does not scale across multi-cluster environments. Python automation builders should wrap ISM API interactions in idempotent, retry-aware clients. The following production-ready script demonstrates how to deploy a rollover policy, validate attachment, and monitor trigger execution using the official opensearch-py client:
import os
import time
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests.auth import HTTPBasicAuth
def deploy_rollover_policy(client, policy_id, payload):
"""Idempotent policy deployment with validation."""
try:
response = client.transport.perform_request(
method="PUT",
url=f"/_plugins/_ism/policies/{policy_id}",
body=payload
)
return response.get("result") in ("created", "updated")
except Exception as e:
raise RuntimeError(f"Policy deployment failed: {e}")
def verify_trigger_execution(client, index_pattern, timeout=300):
"""Polls ISM explain endpoint until rollover condition is met."""
start = time.time()
while time.time() - start < timeout:
explain = client.transport.perform_request(
method="GET",
url=f"/_plugins/_ism/explain/{index_pattern}"
)
indices = explain.get("total_managed_indices", 0)
if indices > 0:
return True
time.sleep(10)
return False
# Production usage
client = OpenSearch(
hosts=[{"host": os.getenv("OPENSEARCH_HOST", "localhost"), "port": 9200}],
http_auth=HTTPBasicAuth(os.getenv("OS_USER"), os.getenv("OS_PASS")),
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
policy_payload = {
"policy": {
"description": "Automated log rollover",
"default_state": "hot",
"states": [{
"name": "hot",
"actions": [{"rollover": {"min_size": "50gb", "min_index_age": "1d"}}],
"transitions": [{"state_name": "warm", "conditions": {"min_rollover_age": "12h"}}]
}, {"name": "warm", "actions": [], "transitions": []}],
"ism_template": {"index_patterns": ["logs-*"], "priority": 100}
}
}
if deploy_rollover_policy(client, "log_rollover_policy", policy_payload):
print("Policy deployed successfully. Monitoring trigger execution...")
if verify_trigger_execution(client, "logs-*"):
print("Rollover trigger active and managing indices.")
For teams building continuous deployment pipelines, this approach integrates seamlessly with Writing Python scripts for automated ISM rollover triggers to enforce version-controlled policy rollouts and automated drift detection.
Operational Validation & Troubleshooting
Before promoting a Rollover Trigger Configuration to production, validate the following:
- Shard Count Alignment: Ensure
min_sizedoes not force indices to exceed the recommended 50 GB per shard limit. Oversized shards degrade recovery times and increase CCR replication latency. - CCR Checkpoint Sync: Monitor
_plugins/_replication/follower_statsto confirm follower nodes are not lagging behind leader rollover events. - Stuck State Resolution: If an index fails to transition, query
GET _plugins/_ism/explain/<index>and inspect thefailed_index_attemptsfield. Clear transient failures by resetting the policy state viaPOST _plugins/_ism/retry/<index>. - Template Priority Conflicts: Verify
priorityvalues usingGET _index_template/to prevent legacy templates from overriding ISM policy attachment.
Reference the official OpenSearch ISM API documentation for endpoint specifications and Cross-Cluster Replication architecture guidelines for synchronization best practices. Properly configured triggers eliminate manual index management overhead while maintaining predictable storage and query performance across distributed clusters.