Resolved
AWS has reported that the upstream incident is resolved.
Monitoring
The turbopuffer service remains healthy in the aws-us-east-1 region. We continue to closely monitor the situation while AWS continues to report service disruption in the region.
Monitoring
AWS is experiencing a major disruption in region us-east-1 impacting multiple services (https://health.aws.amazon.com/health/status?eventID=arn:aws:health:us-east-1::event/MULTIPLE_SERVICES/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE/AWS_MULTIPLE_SERVICES_OPERATIONAL_ISSUE_BA540_514A652BE1A). This caused turbopuffer to become unavailable in that region starting at 7:50 UTC. We've rolled out mitigations which resulted in a full recovery of turbopuffer at 8:20 UTC (despite the ongoing AWS outage). An existing mitigation to guard against EC2 instance launch issues has prevented the long-running EC2 degradation from further affecting turbopuffer availability. The AWS incident is ongoing. We will continue monitoring the cluster's health.
Monitoring
We have deployed mitigations and are monitoring the situation.
Investigating
AWS us-east-1 is experiencing a major outage and many services are affected.
turbopuffer clients in aws-us-east-1 will see elevated error rate and latencies during this outage.
We're looking at mitigating the impact.