Prism - ASH-4 Offline – Incident details

ASH-4 Offline

Resolved
Major outage
Started 4 months agoLasted 1 day

Affected

ASH - Ashburn, US (US-East-1)

Major outage from 3:14 AM to 7:12 AM

ASH-4

Major outage from 3:14 AM to 7:12 AM

Updates
  • Resolved
    Resolved
    This incident has been resolved.
  • Update
    Update

    Access to the node has been restored (spare memory was located) and ~2TB of server data is being migrated to another node for future stability.

  • Identified
    Identified

    A dimm (ram) experienced a catastrophic failure and because of the [global memory shortage](https://en.wikipedia.org/wiki/2024–present_global_memory_supply_shortage) we haven't been able to yoink one from repair stock a spare node since there are none. We've scheduled a tech appointment for later today to move the drives to a separate machine.

    Failures like this are extremely rare and we understand how unexpected downtime can hurt communities, please make a ticket to discuss compensation

    Under normal circumstances, we would have been able to bring the node back online on another machine within the same location in roughly an hour. Unfortunately, we currently have no spare memory stock available in this datacenter, so the data will need to be transferred to a node in a different location (Ashburn1 (Equinix DC3) -> Ashburn2 (Coresite)), which will result in an IP change.

    This node was scheduled to be migrated to our new Ashburn PoP in one day, so an IP change would have occurred regardless. The failure itself is simply unfortunate timing, as it could have happened on any other day.

  • Investigating
    Investigating
    We are currently investigating this incident.