Yesterday there were some reports of people not able to access services from Hotmail, Outlook.com and SkyDrive. That was due to outage of Hotmail and other services. And Microsoft now has provided the details about this outage and tendered apology to its users for the same. In our earlier post we had reported about Outlook.com coming out of Preview and the upgrade of the millions of Hotmail users to the new, modern Outlook.com.
It appears that people upgraded faster than expected and the process going on very well. The vast majority of users did have a smooth upgrade experience and are enjoying the modern Outlook.com. But yesterday it had an issue and the outage occurred.
“…we do want to sincerely apologize to anyone that was unable to access their email during the interruption. Outages are something we take very seriously and invest a significant amount of our time and energy in doing our best to prevent.”
Outage Cause
At 13:35 PM PDT on March 12th, 2013 there was a service interruption that affected some people’s access to a small part of the SkyDrive service, but primarily Hotmail.com and Outlook.com. Availability was restored over the course of the afternoon and evening, and fully restored by 5:43 AM PDT on March 13th, 2013.
It all began on 12th afternoon – a firmware update was done on a core part of the physical plant. Such updates were done successfully earlier, but it failed unexpectedly this instance. This failure resulted in a rapid and substantial temperature spike in the datacenter. This spike was significant and caused safeguards to kick in for large number of servers in this datacenter. These safeguards prevent automatic failover of other parts of infrastructure and also prevents access to mailboxes housed on these servers. And this part of datacenter housed parts of the Hotmail.com, Outlook.com and SkyDrive infrastructure and thus affected some people trying to access these services.
Restoration
The team was instantly alerted as safeguards kicked in and the restoration work began. It was mix of infrastructure software and human intervention that was needed to bring the core infrastructure back online. Normally human intervention is not required for these services and this need of human intervention added significant time for restoration.
The majority of the impacted mailboxes were fully restored before midnight and the rest completed by 5:30 AM.
“…we sincerely apologize and regret the impact this outage had on all of you. Now that we’re through the resolution, we’re also hard at work on ensuring this doesn’t happen again…”
If you are still facing any issues, best way to check is to use https://status.live.com which provides the real-time information specific to any service issues. Yesterday, for me these services were seems to be normal. This is what was seen when these services were affected
SkyDrive was restored first and then Hotmail and Outlook.com.
I hope such issues are not encountered by users again, in future.