Uptime ideals vs reality in the AI era. A recent post from Theo (t3.gg) calling out sub-90% uptime on a major AI service reignites the question: how seriously should we treat downtime for non-critical apps? In this episode Matt and Mike dig into SLAs, the real cost of monitoring and rapid support, why “always-on” isn’t free, and whether 24/7 expectations turn developers into shift workers instead of on-call responders.
Recently, Theo from t3.gg posted on X about Anthropic/Claude having raised billions of dollars, but still somehow achieving a sub-90% uptime. He has expressed disdain for downtime in the past, with the main example that I remember being some downtime for his T3 Chat service offering earlier this summer. Both of these downtime mentions are regarding rather new AI technology, so one might say that downtime in inevitable.
We’ve discussed uptime and downtime in past episodes before, primarily centered around the requirements that our clients have for their websites. Unfortunately, clients rarely want to pay for monitoring and fast support, so there are times where things are not monitored - and therefore could go down unnoticed.
Obviously, in a utopian world, nothing would go down, and everything should work perfectly. But even with older website technology, there are a lot of moving parts and therefore downtime happens.