Polygon suffers hour-long RPC disruption affecting block production; Heimdall hotfix to blame


Polygon suffers hour-long RPC disruption affecting block production; Heimdall hotfix to blame


Polygon’s proof‑of‑stake (PoS) network suffered a one‑hour outage on July 30 that disrupted apps and users.

According to Polygon CEO and founder Sandeep Nailwal, the incident did not stop block production

However, QuickNode Status reported that block production stopped for one hour at block height 74,592,238, echoing other reports made by users on X.

Consensus hiccup

In a post on X, Nailwal said several RPC providers “and hence corresponding apps and users” faced issues over a two‑to‑three‑hour window. At the same time, the chain “remained operational and continued to produce blocks and process user transactions” for unaffected RPCs. 

Polygonscan data indicates that the network generated a new block two seconds after block 74,592,238, as of press time, despite the block explorer displaying stalled block production at this height.

Nailwal traced the incident to a hotfix and a temporary pause on the consensus (Heimdall) layer tied to a recent, complex upgrade. The execution layer (Bor) continued running, but some RPC nodes fell out of sync after the fix, creating app‑level failures that felt like a network halt. 

He apologized for the end‑user impact and said Polygon is working with providers “to bring everyone up to speed,” expecting no further follow‑on issues.

Halted block production

Infrastructure operators described the same symptoms. QuickNode reported the mainnet stalled from its vantage point at block height 74,592,238 and warned that the new Heimdall v0.2.16 upgrade was causing issues with execution clients Bor and Erigon. 

The company paused upgrades “until further notice,” advised operators not to proceed, and began resyncing full nodes while resetting Erigon instances to restore service.

Polygon’s status page isolated the fault to the Heimdall‑V2 network. The team said the mainnet Heimdall service became unresponsive, affecting validator and checkpoint visibility via Heimdall APIs, but emphasized there was no impact on the Bor layer. 

Engineers identified the issue and deployed a fix before marking the incident resolved.

The timeline shows how the disruption unfolded and spread across teams. Polygon opened the incident at 09:52 UTC on July 30, identified the issue by 09:57, and declared it resolved at 11:01. 

QuickNode then reported a stall at 11:28, paused the Heimdall v0.2.16 rollout at 11:51 pending guidance from the Polygon Foundation, and by 15:39 said it was resyncing and resetting nodes to bring services back online.

Nailwal characterized the episode as a coordination gap between consensus and infrastructure rather than a protocol failure.

Mentioned in this article



Source link