Index Lifecycle Basics

Index Lifecycle Basics define the operational framework for automating index state transitions, shard allocation, and retention across distributed OpenSearch clusters. In production telemetry and search workloads, manual index rotation introduces configuration drift, inconsistent shard distribution, and uncontrolled storage growth. The Index State Management (ISM) plugin replaces external cron jobs with declarative, cluster-native policies that execute deterministic transitions based on measurable thresholds. This operational model integrates directly with OpenSearch ISM Architecture & Fundamentals to enforce predictable data movement without external orchestration overhead.

Declarative Policy Architecture

An ISM policy is a JSON document that defines a finite state machine for index management. Each policy consists of states, actions, conditions, and transitions. The cluster evaluates policy conditions at intervals defined by plugins.index_state_management.poll_interval (default: 5 minutes). When a condition evaluates to true, the cluster executes the associated actions and transitions the index to the next state.

stateDiagram-v2
    [*] --> hot
    hot --> warm: min_index_age / min_size met
    warm --> cold: retention threshold met
    cold --> delete: expiry threshold met
    delete --> [*]

Key architectural principles:

  • Deterministic Execution: Policies run sequentially. If an action fails (e.g., insufficient disk space for force_merge), the index remains in the current state and retries on the next poll cycle.
  • Stateless Evaluation: Conditions are evaluated against index metadata (_stats, _cat/indices, index.creation_date). No external state tracking is required.
  • Cluster-Native Scheduling: The ISM coordinator node manages the execution queue, distributing workloads across data nodes to prevent hot-spotting during bulk operations like shrink or force_merge.

Production Policy Payload

The following policy implements a standard telemetry retention workflow. It handles rollover in the hot tier, optimizes storage in warm, reduces replicas in cold, and enforces hard deletion at 30 days.

JSON
PUT _plugins/_ism/policies/telemetry-lifecycle
{
  "policy": {
    "description": "Automated tiered retention for high-volume telemetry indices",
    "default_state": "hot",
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1d",
              "min_primary_shard_size": "45gb",
              "min_doc_count": 75000000
            }
          }
        ],
        "transitions": [
          {
            "state_name": "warm",
            "conditions": { "min_index_age": "3d" }
          }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "replica_count": { "number_of_replicas": 1 }
          },
          {
            "force_merge": { "max_num_segments": 1 }
          },
          {
            "allocation": {
              "require": { "data": "warm" }
            }
          }
        ],
        "transitions": [
          {
            "state_name": "cold",
            "conditions": { "min_index_age": "14d" }
          }
        ]
      },
      {
        "name": "cold",
        "actions": [
          {
            "replica_count": { "number_of_replicas": 0 }
          },
          {
            "allocation": {
              "require": { "data": "cold" }
            }
          }
        ],
        "transitions": [
          {
            "state_name": "delete",
            "conditions": { "min_index_age": "30d" }
          }
        ]
      },
      {
        "name": "delete",
        "actions": [
          { "delete": {} }
        ]
      }
    ]
  }
}

Automated Policy Attachment & Python Integration

Policies must be explicitly attached to indices or index templates. For dynamic environments, automation scripts ensure consistent policy application across newly created indices.

Direct API Attachment

HTTP
POST _plugins/_ism/add/telemetry-logs-*
{
  "policy_id": "telemetry-lifecycle"
}

Python Automation Script

The following script provides a production-ready attachment handler with retry logic, environment-based credential injection, and explicit error classification. It is designed for CI/CD pipelines and platform automation workflows.

Python
import os
import json
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

class ISMManager:
    def __init__(self, host: str, policy_id: str, index_pattern: str):
        self.base_url = host.rstrip("/")
        self.policy_id = policy_id
        self.index_pattern = index_pattern
        self.session = requests.Session()
        self.session.auth = (os.getenv("OPENSEARCH_USER"), os.getenv("OPENSEARCH_PASS"))
        
        retry_strategy = Retry(
            total=3, backoff_factor=1, status_forcelist=[429, 500, 502, 503, 504]
        )
        self.session.mount("https://", HTTPAdapter(max_retries=retry_strategy))

    def attach_policy(self) -> dict:
        endpoint = f"{self.base_url}/_plugins/_ism/add/{self.index_pattern}"
        payload = {"policy_id": self.policy_id}
        
        response = self.session.post(endpoint, json=payload)
        response.raise_for_status()
        return response.json()

    def verify_attachment(self) -> bool:
        endpoint = f"{self.base_url}/_plugins/_ism/explain/{self.index_pattern}"
        response = self.session.get(endpoint)
        response.raise_for_status()
        data = response.json()
        
        # Check if any matching index has the policy attached
        for idx, details in data.items():
            if details.get("index_plugins.ism.policy_id") == self.policy_id:
                return True
        return False

if __name__ == "__main__":
    manager = ISMManager(
        host=os.getenv("OPENSEARCH_ENDPOINT", "https://localhost:9200"),
        policy_id="telemetry-lifecycle",
        index_pattern="telemetry-logs-*"
    )
    
    try:
        manager.attach_policy()
        time.sleep(2)  # Allow cluster state propagation
        if manager.verify_attachment():
            print("Policy successfully attached and verified.")
        else:
            print("Attachment completed but verification pending.")
    except requests.exceptions.RequestException as e:
        print(f"ISM operation failed: {e}")

Cross-Cluster Replication & Tier Routing Alignment

When deploying ISM alongside Cross-Cluster Replication (CCR), policy execution must account for replication topology. Follower indices inherit the leader’s ISM policy but operate under strict write-block constraints. To prevent replication conflicts, configure index.plugins.index_state_management.rollover_alias on the leader and attach the follower’s policy via POST _plugins/_ism/add/<follower_index> scoped to read-only tier transitions only.

Routing decisions during warm and cold transitions rely on node attributes. Aligning ISM allocation actions with Hot-Warm-Cold Tier Design ensures indices migrate to hardware optimized for their access patterns. Additionally, policy execution requires precise RBAC scoping. Restricting _plugins/_ism/* endpoints to platform engineering roles enforces Security & Access Boundaries and prevents accidental policy overrides by application-level service accounts.

Operational Validation & Execution Monitoring

After attachment, monitor policy execution using the explain API:

HTTP
GET _plugins/_ism/explain/telemetry-logs-2024.01.15

The response returns the current state, last transition timestamp, and any action failures. For automated alerting, parse the failed_actions array and trigger PagerDuty or Slack webhooks when step_status remains failed across multiple poll cycles.

Advanced tuning requires balancing poll_interval against cluster load, configuring index.lifecycle.parse_origination_date for time-series alignment, and implementing index template versioning to prevent policy drift during cluster upgrades. For comprehensive configuration strategies and failure recovery workflows, consult the Best practices for OpenSearch index lifecycle management.

For additional API reference and cluster setting documentation, review the official OpenSearch Index State Management documentation and standardize JSON payload construction using Python’s native json module.