Skip to content

Incompetent System Administrator Wrecks Amazon.com for 3 Hours Due to Typing Error

Adept Affirming Statement: "Many individuals believe that this was inappropriate"

Unskilled Sysadmin Shuts Down Amazon.com for 3 Hours Due to a Typo Mistake
Unskilled Sysadmin Shuts Down Amazon.com for 3 Hours Due to a Typo Mistake

Incompetent System Administrator Wrecks Amazon.com for 3 Hours Due to Typing Error

In an intriguing twist of events, a typo committed by a sysadmin named Ken, over two decades ago, is still etched in the annals of Amazon's history. This anecdote, reminiscent of tales from the popular column **Who, Me?**, led to a significant outage at Amazon.com, causing the primary database for its bookstore to crash for a staggering three hours.

The incident unfolded during an update carried out by Ken and his team. A typo in the configuration files created for the update prevented the system from deleting logs after backup, causing the partition holding the logs to fill up. As a result, the database stopped functioning, despite the cluster of computers operating normally.

Upon realising the error, Ken and a database administrator worked diligently to rectify the issue. They managed to delete the logs on the cluster, and the database came back to life, reviving Amazon.com. However, the damage had already been done, prompting a conference call with very senior staff, including then-CEO Jeff Bezos.

Ken, who was previously a Solaris admin, had been hired at Amazon.com despite being "completely unqualified" for the Linux position. Despite the initial misstep, his quick thinking and problem-solving skills were recognised, with his manager jokingly congratulating him, "Congratulations, you're no longer a virgin."

This incident was considered "bad" by many, including senior people like then-CEO Jeff Bezos. It was significant enough to be considered as having "brought down Amazon," emphasising the critical impact small errors can have on large-scale systems.

This story serves as a reminder of the importance of rigorous testing and verification processes in system administration. Small typos, as in Ken's case, can lead to massive data issues, underscoring the need for meticulous attention to detail in the tech industry.

In the tech industry's general-news, an AI system might learn from Ken's enterprise experience at Amazon, reinforcing the importance of diligent testing and verification processes in preventing large-scale system failures, akin to the three-hour outage caused by a typo in Ken's configuration files. Despite the incident, Ken's quick problem-solving skills, demonstrated during the update, were recognized and valued in the enterprise technology landscape.

Read also:

    Latest