AMO.NET America's Multimedia Online (Microsoft DOWN, Part 3)

Never Offline™ Internet Search Portal™ Contact Us

Home

News

Digital Media

Never Offline™

Internet Search Portal™

New Technology (NT)

NOLminal Global Beta!

Download NOL™ from Microsoft

Weather

Tell Friends or Family

Press Releases

Investors and Microsoft

Advertise with Us

Microsoft Explains their outages, part 3
02/01/2001 9:12:00pm MST Albuquerque, Nm
By Dustin D. Brand; Owner AMO

3 Distinct outages brought Microsoft DOWN.

I followed the Microsoft outages very closely since I noticed the problems very early on. I quickly determined that a DNS problem which I attributed to an attack on Microsoft's DNS Structure was the cause. I was right, and wrote 2 other reports on this, each with the report from Microsoft explained at the time.

You might want to read Part 1 and Part 2.
Part 3 of my Microsoft report begins here with the official response from Microsoft.

An Open Letter to Customers from Microsoft Chief Information Officer Rick Devenuti

You undoubtedly have seen a good deal of coverage about availability issues with some Microsoft Web sites through parts of last week. In fact, you may have experienced difficulty reaching our sites. We deeply regret any inconvenience, and want to take a moment to tell you what happened last week, provide you with an update of where we are today, and tell you what we have learned from the past week's events.

There were three distinct issues with our routing infrastructure last week that caused varying levels of difficulty for some customers:

Router Configuration Error: Tuesday, January 23, to Wednesday, January 24

At around 6:30 p.m. PST on Tuesday, January 23, we made some changes to the configuration of two routers which are housed in one of our Internet data centers. Routers help direct traffic on computer networks. Unfortunately, this configuration change had the unintended consequence of severely limiting the communication to our Domain Name System (DNS) network. DNS servers are used to connect domain names with strings of numbers called IP addresses, which point to the actual servers and networks that make up the Internet. As a result, many customers found themselves unable to reach our DNS servers and thus also some Microsoft Web sites. While the Web servers that host our Web properties and applications continued to function normally, customers were unable to reach these properties due to this problem with our routing network.

At about 4:00 p.m. PST Wednesday, we determined that the router configuration change had caused the problem. As soon as we rolled back the change, everything returned to normal and customers once again had full access to our Web sites.

We have taken steps already to ensure that we do not have a single point of access to DNS for the Microsoft Web properties.

Denial-of-Service Attack: Thursday, January 25

At about 8:30 a.m. PST the next day, some unknown person or persons initiated a distributed denial-of-service attack against the routers in front of our network. This kind of attack was not an attempt to penetrate our network; there was no intrusion of any kind and customers' data was completely secure at all times. A denial-of-service attack is simply an attempt to flood some point on a network with so much artificial traffic that legitimate traffic is blocked out.

Once again, our Web server infrastructure continued to function normally. However, due to the flood of illegitimate traffic to the routers, many customers once again were unable to access our Web sites.

We reacted immediately. First, we isolated and blocked the illegitimate requests from our routers, allowing legitimate traffic to flow freely to the servers for successful access to Microsoft's Web sites.

Second, as a follow-up to the previous day's problem, we had already begun the process of distributing access to DNS resolution across more than one network. We completed this task on Thursday, providing an additional level of redundancy to our system.

Once it was clear that this was indeed a malicious attack against our network infrastructure, our security team notified the FBI, which opened an investigation.

While the timing of this attack was unfortunate, coming on the heels of Wednesday's outage, Microsoft's networking and security teams were able to quickly restore access to the sites by identifying the attack, confirming that it was not related to the previous day's networking issues, and taking a number of countermeasures to limit the attack. Our network was functioning properly and customers had full access to all Web sites by 12:30 p.m. PST, or four hours after this attack was launched.

Second Denial-of-Service Attack: Friday, January 26

At 10:15 a.m. PST Friday, a second distributed denial-of-service attack was launched against routers on a different part of the network. We were able to counteract this attack much more quickly, and the impact of the attack on customers was much less severe. Some customers experienced delays in accessing microsoft.com and certain other sites for two fifteen-minute periods, during which the sites were never fully inaccessible and many sites were unaffected. Once again, the problem was simply that the artificial traffic generated by the attack prevented legitimate traffic from reaching our Web servers, which were up and running normally throughout the incident.

We continue to closely monitor the situation, and are ready to respond should another attempt at malicious activity begin.

What we have learned

We have learned some important lessons this week and want to share that learning with you.

We are taking a hard look at our network architecture and have already made some changes to insulate customers from any unforeseen network events. In particular, we have distributed access to DNS for Microsoft so that there is no single network performing name resolution, and we plan to further enhance this solution shortly. With hindsight, we should have done this earlier. We are currently looking across the rest of the network to determine if there are other similar changes we can make to ensure the reliability and availability of Microsoft's Web sites.

In the past, Microsoft has focused on understanding and protecting against attacks on Microsoft products in order to provide a better set of Internet services to customers and a more robust and secure set of products for enterprise customers. Unfortunately, as we have learned over the last few days, we did not apply sufficient self-defense techniques to our use of some third-party products at the front-end of parts of our core network infrastructure, where we are a customer, not a vendor. We are currently looking closely at other aspects of the network to see where we can do a better job in the operation of our entire network, to provide an example for customers who are building demanding enterprise businesses on the Microsoft platform.

We want to assure you that there were no Microsoft products involved in any of the three incidents last week. In all three cases, we were dealing with issues or attacks on the routing infrastructure of our networks. During all of the episodes, Microsoft's Web servers, which run on Windows 2000, continued to function normally. The DNS servers, which also run on Windows, continued to operate through all of the incidents as well. As we mentioned above, we are committed to providing you with the most robust and secure enterprise products on the market.

We truly regret any inconvenience last week's events may have caused you. We are proud of the service we are able to offer millions of customers on the Web every day. The availability numbers for Microsoft's major Web properties are routinely among the best on the Internet, and you can be assured that last week's issues have made us redouble our efforts to maintain that kind of availability for our customers.

We will continue to monitor this situation closely and take appropriate action to deliver to our customers the access and services they need, continuing to provide you with a great Internet experience.

Thank you for your time.

Rick Devenuti
Vice President, Chief Information Officer
Microsoft Corp.