Hot-Warm-Cold Tier Design

Hot-warm-cold tier design is the practice of sizing distinct hardware pools to an index’s changing access pattern, then letting Index State Management (ISM) walk each index down those tiers as it ages. It is fundamentally a capacity-economics problem: the hot tier is optimized for expensive high-IOPS ingest, the warm tier trades throughput for density, and the cold tier maximizes bytes-per-dollar for data that is queried rarely but must be retained. When the topology is wrong — too little hot headroom, a warm tier that cannot absorb the migration wave, or cold nodes running above their flood stage — ISM transitions stall at the boundary, ingest back-pressures onto the hot tier, and disk watermarks lock indices read-only mid-flight. This guide covers the sizing ratios, node topology, ISM policy, watermark calibration, and Python automation needed to make each tier boundary transition cleanly, building on the OpenSearch ISM Architecture & Fundamentals execution model.

Tier hardware profiles and capacity ratios

Tier separation begins at the node role level, and a misaligned hardware profile causes silent allocation failures during rollover or cross-tier migration. Each tier maps to a storage medium, a compute-to-memory ratio, and a single canonical routing attribute that every index template and ISM allocation action must reference. The exact node-role mechanics behind this table — declaring the attributes and verifying they are live — are covered under Node Role Allocation; how ISM stamps those attributes onto an index at each phase is the subject of Data Tier Routing Patterns.

Tier	Storage profile	vCPU : RAM ratio	Routing attribute	Primary workload
Hot	Local NVMe SSD	1 : 4 (compute-heavy)	`node.attr.data: hot`	Ingest, rollover, real-time search, aggregations
Warm	SATA/SAS SSD or fast HDD	1 : 6	`node.attr.data: warm`	Recent history, force-merged reads, reduced writes
Cold	High-density HDD	1 : 8 (storage-heavy)	`node.attr.data: cold`	Compliance retention, infrequent queries
Frozen	Object storage / searchable snapshots	1 : 8 (minimal compute)	`node.attr.data: frozen`	Archival, rarely-searched snapshots

The design lever that matters most is the ratio of tier sizes, not any single node spec. A tier is sized to hold every index that will reside in it across its retention window, plus watermark headroom. For a steady ingest workload, the usable hot-tier capacity you must provision is:

C_{\text{hot}} = R_{\text{ingest}} \times T_{\text{hot}} \times (1 + r) \times \frac{1}{w_{\text{high}}}

where $R_{\text{ingest}}$ is daily primary-shard growth (GB/day), $T_{\text{hot}}$ is hot-retention days, $r$ is the replica count, and $w_{\text{high}}$ is the high-watermark fraction (relocation must never be the thing that finally fills the tier). The same formula sizes each downstream tier by substituting its retention window; because warm and cold typically drop replicas and force-merge, their effective $R$ shrinks, which is exactly why density-optimized hardware pays off there.

How an index walks the tiers

ISM does not move data on a wall-clock timer; it rewrites the index’s index.routing.allocation.require attribute at each phase transition, and the allocation deciders relocate the shards on the next cluster-state evaluation. The order of operations inside a state is what keeps a migration clean: pairing the allocation action with force_merge (cold) or shrink (warm) lets the relocation those actions already perform absorb the routing change, instead of triggering a second relocation wave. The transition conditions themselves — min_index_age, min_size, min_primary_shard_size — are grounded in Index Lifecycle Basics, and when a target tier has no eligible node, Fallback Routing Strategies decide whether the index degrades gracefully or blocks.

Step-by-step tier configuration

The four steps below stand up a hot-warm-cold topology for an app-logs-* index set. Apply them in order: attributes on the nodes, a template so new indices start hot, a policy so aging indices migrate, then a verification pass that confirms both the declared setting and the physical placement.

1. Node configuration

Declare the tier attribute on every data node in opensearch.yml. The value here is the exact string every template and policy will reference — a trailing space or a case mismatch produces a shard that can never route to that node.

YAML

# opensearch.yml — value matches the node's physical tier
node.name: os-data-hot-01
node.roles: [ data, ingest ]
node.attr.data: hot          # hot | warm | cold | frozen
node.attr.zone: us-east-1a   # optional: pair with allocation awareness

Confirm the attribute is live before attaching any policy, and validate the tier distribution — you need enough nodes in each tier to hold that tier’s share of the retention window:

Shell

# Every data node must report a data attribute; blanks mean shards can never route there
curl -s "https://<cluster>:9200/_cat/nodeattrs?v&h=node,attr,value&s=attr" | grep -E "data|zone"

# Confirm the hot/warm/cold split matches your capacity plan
curl -s "https://<cluster>:9200/_cat/nodes?v&h=name,node.role,disk.total&s=name"

2. Index template

Bake the baseline require filter into a template so freshly created or rolled-over indices land on the hot tier immediately, without waiting for ISM’s first evaluation cycle. Template versioning prevents configuration drift across environments.

HTTP

PUT _index_template/app_logs_tiered
{
  "index_patterns": ["app-logs-*"],
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.number_of_replicas": 1,
      "index.routing.allocation.require.data": "hot",              // start on NVMe
      "index.plugins.index_state_management.policy_id": "tiered_log_policy",
      "index.refresh_interval": "5s",
      "index.translog.durability": "async"                         // throughput over per-write durability
    },
    "mappings": {
      "properties": {
        "@timestamp":   { "type": "date" },
        "message":      { "type": "text" },
        "service_name": { "type": "keyword" }
      }
    }
  },
  "priority": 100,
  "version": 2
}

3. ISM policy JSON

The policy defines a state per tier. In each state the allocation action rewrites the routing attribute, and the tier-appropriate optimization action (force_merge, replica_count, snapshot) runs alongside it. wait_for: true on the allocation action holds the transition until relocation completes, so downstream actions never run against a half-migrated index. Every action should carry an explicit retry block so a briefly unreachable snapshot repository or a watermark blip degrades into bounded retries rather than a stuck index.

HTTP

PUT _plugins/_ism/policies/tiered_log_policy
{
  "policy": {
    "description": "Production hot-warm-cold lifecycle for application logs",
    "default_state": "hot",
    "ism_template": [
      { "index_patterns": ["app-logs-*"], "priority": 100 }
    ],
    "states": [
      {
        "name": "hot",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1d",
              "min_primary_shard_size": "50gb"   // roll before shards get unwieldy
            }
          }
        ],
        "transitions": [
          { "state_name": "warm", "conditions": { "min_index_age": "7d" } }
        ]
      },
      {
        "name": "warm",
        "actions": [
          {
            "allocation": { "require": { "data": "warm" }, "wait_for": true },
            "retry": { "count": 3, "backoff": "exponential", "delay": "10m" }
          },
          { "replica_count": { "number_of_replicas": 1 } },
          { "force_merge": { "max_num_segments": 1 } }   // relocation + merge in one controlled pass
        ],
        "transitions": [
          { "state_name": "cold", "conditions": { "min_index_age": "30d" } }
        ]
      },
      {
        "name": "cold",
        "actions": [
          {
            "allocation": { "require": { "data": "cold" }, "wait_for": true },
            "retry": { "count": 3, "backoff": "exponential", "delay": "30m" }
          },
          { "replica_count": { "number_of_replicas": 0 } },   // cold data leans on the snapshot for durability
          {
            "snapshot": { "repository": "s3-archive-repo", "snapshot": "cold-archive" }
          }
        ],
        "transitions": [
          { "state_name": "delete", "conditions": { "min_index_age": "365d" } }
        ]
      },
      {
        "name": "delete",
        "actions": [ { "delete": {} } ]
      }
    ]
  }
}

4. Verification

Never assume the allocation action took effect — confirm the declared setting, the physical placement, and ISM’s own view of the index, because those three diverge exactly when something is wrong.

Shell

# a) Confirm ISM wrote the expected require attribute for the current phase
curl -s "https://<cluster>:9200/app-logs-2026.07/_settings" | grep -o '"require":{[^}]*}'

# b) Confirm the shards physically sit on nodes of that tier
curl -s "https://<cluster>:9200/_cat/shards/app-logs-2026.07?v&h=index,shard,prirep,state,node"

# c) Ask ISM where the index is in its lifecycle and whether any action failed
curl -s "https://<cluster>:9200/_plugins/_ism/explain/app-logs-2026.07?pretty"

Cross-cluster replication across tiers

When a hot-warm-cold topology spans geographically distributed clusters, Cross-Cluster Replication (CCR) adds a second routing surface. Leader clusters typically own ingest and hot-tier routing; follower clusters replicate the indices but run their own ISM state machines. A follower must have tier parity — matching node.attr.data labels — or it inherits a require attribute for which it has no eligible node, and the replicated shards sit UNASSIGNED.

To keep followers consistent, attach a read-optimized policy to replicated indices after the initial sync. Follower policies should skip rollover (the leader owns the write index) and focus on allocation, force_merge, and snapshot. The credentials that let the replication user execute ISM state transitions and snapshot APIs on the follower are governed by Security & Access Boundaries; a follower role missing the _plugins/_ism/* or snapshot permission leaves indices stranded in transitional states with no obvious error on the leader.

Python automation for tier deployment

Manual tier rollout does not scale, and a policy deployed against an OpenSearch cluster missing a whole tier fails silently. The opensearch-py routine below runs pre-flight topology checks — it refuses to deploy unless hot, warm, and cold nodes are all present — then deploys the policy and verifies attachment, with structured logging, transport retries, and SSL verification for production use.

Python

import os
import logging
from opensearchpy import OpenSearch
from opensearchpy.exceptions import TransportError, NotFoundError

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger(__name__)


class TierPolicyDeployer:
    def __init__(self, host: str, port: int = 9200, auth: tuple = None):
        self.client = OpenSearch(
            hosts=[{"host": host, "port": port}],
            http_auth=auth,
            use_ssl=True,
            verify_certs=True,
            timeout=30,
            max_retries=3,
            retry_on_timeout=True,
        )

    def verify_tier_topology(self, required=("hot", "warm", "cold")) -> bool:
        """Refuse to deploy unless every required tier has at least one node."""
        try:
            attrs = self.client.cat.nodeattrs(format="json", h="node,attr,value")
        except TransportError as exc:
            logger.error("Could not read node attributes: %s", exc)
            return False
        present = {a["value"] for a in attrs if a.get("attr") == "data"}
        missing = set(required) - present
        if missing:
            logger.error("Missing tier nodes: %s (found %s)", missing, present or "none")
            return False
        logger.info("Tier topology verified: %s", sorted(present))
        return True

    def deploy_policy(self, name: str, payload: dict) -> bool:
        """Create or update an ISM policy via the plugin endpoint."""
        try:
            self.client.transport.perform_request(
                "PUT", f"/_plugins/_ism/policies/{name}", body=payload
            )
            logger.info("Policy '%s' deployed.", name)
            return True
        except TransportError as exc:
            logger.error("Policy deployment failed: %s", exc)
            return False

    def verify_attachment(self, index_pattern: str, name: str) -> int:
        """Count existing indices that ISM reports as managed by this policy."""
        attached = 0
        try:
            explain = self.client.transport.perform_request(
                "GET", f"/_plugins/_ism/explain/{index_pattern}"
            )
            for idx, state in explain.items():
                if isinstance(state, dict) and state.get("policy_id") == name:
                    attached += 1
        except NotFoundError:
            logger.warning("No indices match '%s' yet.", index_pattern)
        except TransportError as exc:
            logger.error("Attachment check failed: %s", exc)
        logger.info("Policy '%s' attached to %d existing indices.", name, attached)
        return attached


if __name__ == "__main__":
    POLICY = {
        "policy": {
            "description": "Automated tier routing",
            "default_state": "hot",
            "ism_template": [{"index_patterns": ["app-logs-*"], "priority": 100}],
            "states": [{"name": "hot", "actions": [], "transitions": []}],
        }
    }
    deployer = TierPolicyDeployer(
        host=os.getenv("OPENSEARCH_HOST", "localhost"),
        port=int(os.getenv("OPENSEARCH_PORT", "9200")),
        auth=(os.getenv("OPENSEARCH_USER", "admin"), os.getenv("OPENSEARCH_PASS", "admin")),
    )
    if deployer.verify_tier_topology():
        if deployer.deploy_policy("tiered_log_policy", POLICY):
            deployer.verify_attachment("app-logs-*", "tiered_log_policy")

Schedule this against a CI/CD job so a topology gap is caught at deploy time rather than at the first stalled transition. The step-by-step interactive walkthrough, including advanced watermark tuning, is expanded in How to configure OpenSearch ISM hot-warm-cold architecture.

Operational guardrails and watermark calibration

Disk watermarks gate every allocation, so a tier at capacity rejects incoming shards even when the require filter is correct — this is the single most common cause of a stalled cold transition. Default thresholds are tuned for single-tier clusters and are usually too conservative for a cold tier that is designed to run dense. Tune them per your storage economics, but always leave the flood stage below 100% so a runaway migration cannot lock the tier read-only.

Setting	Recommended value	Effect on tier design
`cluster.routing.allocation.disk.watermark.low`	`85%`	Stops new shards routing to a filling node
`cluster.routing.allocation.disk.watermark.high`	`90%`	Triggers relocation off the node
`cluster.routing.allocation.disk.watermark.flood_stage`	`95%`	Forces indices read-only; blocks migration in
`cluster.routing.allocation.node_concurrent_recoveries`	`3` (NVMe) / `1–2` (HDD)	Caps parallel relocations per node
`cluster.routing.allocation.disk.threshold_enabled`	`true`	Watermarks must stay enabled for tier safety

HTTP

PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.routing.allocation.disk.threshold_enabled": true
  }
}

Watch shard-migration latency during warm-to-cold transitions in particular: cold nodes are the slowest to receive relocations, and if network or disk bandwidth is constrained the migration wave can back up onto the warm tier. When that headroom is tight, stagger transitions across index sets and lean on Fallback Routing Strategies so a blocked cold move never starves the hot tier of capacity.

Troubleshooting tier transitions

Each failure mode below pairs a diagnosis command with the corrective action.

Cold transition stalls at the watermark. The cold tier is above watermark.high, so no shard can route in. Identify the pressured node, then expand capacity or lower incoming volume:

Shell

curl -s "https://<cluster>:9200/_cat/allocation?v&h=node,disk.percent,disk.avail&s=disk.percent:desc"
# Fix: add cold-tier disk, or temporarily raise the high watermark after confirming headroom

ISM allocation action retried out. The action exhausted its retries and the index is parked in its current state. Read the failure reason, then retry the managed index:

Shell

curl -s "https://<cluster>:9200/_plugins/_ism/explain/app-logs-2026.07?pretty"
curl -s -X POST "https://<cluster>:9200/_plugins/_ism/retry/app-logs-2026.07"

Shards UNASSIGNED after a tier move. The target tier has no eligible node — usually a missing or mistyped attribute. Ask the decider exactly why, then fix the attribute or relax the filter:

Shell

curl -s -X POST "https://<cluster>:9200/_cluster/allocation/explain" \
  -H "Content-Type: application/json" \
  -d '{"index":"app-logs-2026.07","shard":0,"primary":true}'
# Fix: add node.attr.data to a node in that tier, or override the require filter

Hot tier fills faster than rollover fires. Ingest outpaces the min_primary_shard_size boundary, so shards grow past their target before rolling. Check current shard sizes and tighten the rollover condition:

Shell

curl -s "https://<cluster>:9200/_cat/shards/app-logs-*?v&h=index,shard,prirep,store&s=store:desc"
# Fix: lower min_primary_shard_size or add min_index_age so rollover fires sooner

Snapshot action fails on the cold transition. The repository is unreachable or the role lacks snapshot permission, leaving the index stuck before delete. Verify the repository, then confirm access boundaries:

Shell

curl -s "https://<cluster>:9200/_snapshot/s3-archive-repo/_verify"
# Fix: repair the repository, or grant the ISM role snapshot permission (see Security & Access Boundaries)

Frequently asked questions

How many nodes should each tier have?

Size each tier to hold every index that resides in it across its retention window, plus watermark headroom — use the $C_{\text{tier}}$ formula above with that tier’s retention days. Node count then follows from per-node disk: divide the required tier capacity by usable disk per node, and always round up so a single node loss does not push the tier past watermark.high.

Should cold-tier indices keep a replica?

Usually not. Cold data is typically snapshotted, so the snapshot repository provides durability and dropping to number_of_replicas: 0 roughly halves cold storage cost. Keep a replica only if your query SLA on cold data cannot tolerate the recovery time of restoring from a snapshot after a node loss.

Why does the warm transition run force_merge and allocation together?

force_merge already relocates and rewrites segments, so pairing it with the allocation action in the same state lets that single controlled pass carry the routing change — instead of the allocation action triggering one relocation wave and the merge triggering another. Set wait_for: true on allocation so the merge never runs against a half-migrated index.

Can I add a frozen tier to this design?

Yes. Add a frozen state after cold that snapshots and then uses searchable snapshots on node.attr.data: frozen nodes backed by object storage. Frozen extends retention at near-archival cost while keeping data queryable, at the price of much higher query latency.

Node Role Allocation — declare and verify the tier attributes this design routes to.
Data Tier Routing Patterns — how ISM enforces the routing attribute at each phase.
Fallback Routing Strategies — graceful degradation when a target tier has no eligible node.
Index Lifecycle Basics — the transition conditions that time each tier move.
Security & Access Boundaries — scoping the roles that run allocation and snapshot actions.
How to configure OpenSearch ISM hot-warm-cold architecture — the interactive step-by-step build.

Up: OpenSearch ISM Architecture & Fundamentals

Hot-Warm-Cold Tier Design

Tier hardware profiles and capacity ratios #

How an index walks the tiers #

Step-by-step tier configuration #

1. Node configuration #

2. Index template #

3. ISM policy JSON #

4. Verification #

Cross-cluster replication across tiers #

Python automation for tier deployment #

Operational guardrails and watermark calibration #

Troubleshooting tier transitions #

Frequently asked questions #

Related #

Tier hardware profiles and capacity ratios

How an index walks the tiers

Step-by-step tier configuration

1. Node configuration

2. Index template

3. ISM policy JSON

4. Verification

Cross-cluster replication across tiers

Python automation for tier deployment

Operational guardrails and watermark calibration

Troubleshooting tier transitions

Frequently asked questions

Related