Outlook 365 having a bad day

I’ve seen scattered reports today that some mail to the Outlook 365 servers is failing. This has been confirmed by ZDNet. Only folks with a Office 365 account can log in and see the status messages, but there are some folks on the mailop list posting updates from the website.
Attempts to mail to affected domains results in this response:

421 4.3.2 The maximum number of concurrent server connections has exceeded a limit, closing transmission channel

MS is aware of this and is working to restore service. Senders should not remove addresses that bounce today with the 421 4.3.2 error message from Outlook hosted domains.

Related Posts

Holomaxx v. Yahoo and MS: The hearing

I visited Judge Fogel’s courtroom this morning to listen to the oral motions in the Holomaxx cases. This is a general impression, based on my notes. Nothing here is to be taken as direct quotes from any participant. Any errors are solely my own. With that disclaimer in mind, let’s go.
The judge is treating these two cases as basically a single case. When it came time for arguments, the cases were called together and both Yahoo and Microsoft’s lawyers were at the defendant’s table.
Oral arguments centered on the question of CDA immunity and to a lesser extent if there is an objective industry standard for blocking and dealing with blocks. Nothing at all was mentioned about the wiretapping arguments.
The judge opened the hearing with a quick summary of the case so far and what he wanted to hear from the lawyers.
Judge Fogel pointed out that current case law suggests that the CDA provides a robust immunity to ISPs to block mail. The plaintiff can’t just say that the blocks were done in bad faith, there has to be actual evidence to show bad faith. The law does permit subjective decisions by the ISPs. Also, that it is currently hard to see any proof of bad faith by the defendants.
The judge asked the plaintiff’s attorney for his “absolute best argument” as to the bad faith exhibited by the defendants.
The plaintiff responded that they are a competitor who is being stonewalled by the defendants. That their email is not spam (as it is CAN SPAM compliant) and it is wanted email. The defendants are not following the “objective industry standard” as defined by MAAWG.
The judge responded clarifying that the plaintiff really claimed he didn’t need to present any evidence. “Yes.” Judge Fogel mentioned the Towmbly standard which says that a plaintiff must have enough facts to make their allegations plausible, not just possible.
Yahoo!’s lawyer pointed out that both case law and the statutes require a robust showing to invalidate claims under the CDA. And that the purpose of the CDA is to protect ISPs from second guessing. She started to bring up the absolute numbers of emails, but was interrupted and told the numbers weren’t relevant. My notes don’t say if that was the judge or Holomaxx’s lawyer that interrupted, and the numbers discussion did come up again.
Yahoo continued that the CAN SPAM compliance is not a litmus test for what is spam. The decision for what is and is not spam is left to the subjective judgement of the ISP. She also pointed out that the numbers are important. She defined the amount of spam as a tax on the network and a tax on users.
She also addressed the anti-competitive claim. Even if Holomaxx is right, and neither defendant was conceding the point, and it is doubtful that the anti-competitive point can be proven, competition alone cannot establish bad faith. What evidence is there that either defendant exhibited bad faith? In Yahoo’s case there is zero advertiser overlap and in the Microsoft case Holomaxx showed one shared customer.
She then pointed out that the MAAWG document was a stitched collection of experiences from desks. That the document itself says it is not a set of best practices. She also pointed out that there was nothing in the document about how to make spam blocking decisions. That it was solely a recommendation on how to handle people who complain.
According to Yahoo!’s lawyer the plaintiffs brought this suit because they disagreed with the ISPs’ standards for blocking and they were upset about how they were treated. That the worst Holomaxx can say is the MS and Y! had bad customer service.
At this point there was some discussion between the judge and lawyers about how they were currently in a “grey area” between Rule 9(b) and Rule 12(b)6. I am not totally sure what this was about (one of my lawyer readers can help me out?) but there was also mention of using these rules in the context of the ISPs’ robust immunity under the CDA.
Finally, the judge asked Microsoft’s lawyer if he had anything more to add. He reiterated that the MAAWG document was not a standard, it was a collection of options. He also brought up the volume issue again, asserting that even if it is a true standard that the volume of unwanted mail sent by Holomaxx does not mean ISPs need to follow it.
Judge Fogle asked him if he meant there was no legal obligation for the ISPs to be warm and fuzzy.
The judge and defendant lawyers talked around a few general ideas about the MAAWG document. First that there was no obligation to tell senders enough information so that senders could reverse engineer spam filters. Microsoft also brought up the volume issue again, saying that the volume of unwanted 3rd party mail that the plaintiff was sending was, in itself, proof that the mail was bad.
Holomaxx interrupted claiming that the volume is a red herring. Judge Fogel countered with “but the gross number of unwanted emails is a huge number of emails.” Holomaxx’s lawyer argued that both Yahoo and Microsoft had large, robust networks, and the volume is irrelevant. I thought this was funny, given how often both of them have outages due to volume. However, the Holomaxx lawyer did have a point. Facebook sends billions of emails a day and both Yahoo and Hotmail can cope with that volume of mail and that volume dwarfs what Holomaxx sends.
The judge asked if he should look at the percentage of complaints about the mail rather than the gross number. Holomaxx replied that both were just a drop in the bucket and neither number was relevant.
Holomaxx then claimed again that MAAWG was a standard. The judge pointed out it was a standard for customer service, not a standard for blocking. Holomaxx disagreed and said that the MAAWG document was a standard for both how to block and how to deal with blocks afterwards.
The judge asked Holomaxx if there was any actual evidence of their claims. He talked about a case he heard a few years ago. Some company was suing Google because their search results were not on the front page of Google results. That company didn’t prevail because they never offered any actual evidence that Google was deliberately singling them out. He asked Holomaxx how they were being singled out.
Holomaxx replied there was no industry standard to measure against.
The judge wrapped up the hearing by pointing out that he was being asked to show where the exceptions to the CDA were and that he had to consider the implications of his ruling. He agreed that bad faith was clearly an exception to CDA protection, but what was the burden of proof required to identify actual bad faith. He seemed to think this was the most important point and one that would take some deliberation.
Overall, the hearing took about 15 minutes, which seemed in line with the case immediately before this one.
My impression was that the judge was looking for Holomaxx to argue something, anything with facts rather than assertion. But, I am scientist enough to see that may be my own biases at work. But the judge gave Holomaxx the opportunity to show their absolute best evidence, and Holomaxx provided exactly zero, instead falling back to it’s true because we said it’s true.
The judge will issue a written ruling, I’ll keep an eye out for it and post it when it’s out.

Read More

SNDS is back

For years now, Microsoft has maintained Smart Network Data Services (SNDS) for anyone sending mail to Hotmail/Outlook/Live.com. This is a great way for anyone responsible for an IP sending mail to hotmail to monitor what traffic Hotmail is seeing from that IP address.
This morning I got up to a number of people complaining that logins were failing on the website and the API was down. I contacted the person behind SNDS and they confirmed there was a problem and they were fixing it.
Sometime this afternoon it was possible to login to the SNDS interface again, so it looks like they did fix it.
A bit of a warning, though, don’t expect to see any of the data from the last few days. There seems to be something with SNDS that means that when the service is down data isn’t collected or available. In the past when there have been problems, older data was not populated when the service came back.

Read More

DNS, SERVFAIL, firewalls and Microsoft

When you look up a host name, a mailserver or anything else there are three types of reply you can get. The way they’re described varies from tool to tool, but they’re most commonly referred to using the messages dig returns – NXDOMAIN, NOERROR and SERVFAIL.
NXDOMAIN is the simplest – it means that there’s no DNS record that matches your query (or any other query for the same host name).
NOERROR is usually what you’re hoping for – it means that there is a DNS record with the host name you asked about. There might be an exact match for your query, or there might not, you’ll need to look at the answer section of the response to see. For example, if you do “dig www.google.com MX” you’ll get a NOERROR response – because there is an A record for that hostname, but no answers because there’s no MX record for it.
SERVFAIL is the all purpose “something went wrong” response. By far the most common cause for it is that there’s something broken or misconfigured with the authoritative DNS for the domain you’re querying so that your local DNS server sends out questions and never gets any answers back. After a few seconds of no responses it’ll give up and return this error.
Microsoft
Over the past few weeks we’ve heard from a few people about significant amounts of delivery failures to domains hosted by Microsoft’s live.com / outlook.com, due to SERVFAIL DNS errors. But other people saw no issues – and even the senders whose mail was bouncing could resolve the domains when they queried Microsofts nameservers directly rather than via their local DNS resolvers. What’s going on?
A common cause for DNS failures is inconsistent data in the DNS resolution tree for the target domain. There are tools that can mechanically check for that, though, and they showed no issues with the problematic domains. So it’s not that.
Source ports and destination ports
If you’re even slightly familiar with the Internet you’ve heard of ports – they’re the numbered slots that servers listen on to provide services. Webservers listen on port 80, mailservers on port 25, DNS servers on port 53 and so on. But those are just the destination ports – each connection comes from a source port too (it’s the combination of source port and destination port that lets two communicating computers keep track of what data should go where).
Source ports are usually assigned to each connection pretty much randomly, and you don’t need to worry about them. But DNS has a history of the source port being relevant (it used to always use source port 53, but most servers have switched to using random source ports for security reasons). And there’s been an increasing amount of publicity about using DNS servers as packet amplifiers recently, with people being encouraged to lock them down. Did somebody tweak a firewall and break something?
Both source and destination ports range between 1 and 65535. There’s no technical distinction between them, just a common understanding that certain ports are expected to be used for particular services. Historically they’ve been divided into three ranges – 1 to 1023 are the “low ports” or “well known ports”, 1024-49151 are “registered ports” and 49152 and up are “ephemeral ports”. On some operating systems normal users are prevented from using ports less than 1024, so they’re sometimes treated differently by firewall configurations.
While source ports are usually generated randomly, some tools let you assign them by hand, including dig. Adding the flag -b "0.0.0.0#1337" to dig will make it send queries from  source port 1337. For ports below 1024 you need to run dig as root, but that’s easy enough to do.
A (slightly) broken firewall
sudo dig -b "0.0.0.0#1024" live.com @ns2.msft.net” queries one of Microsofts nameservers for their live.com domain, and returns a good answer.
sudo dig -b "0.0.0.0#1023" live.com @ns2.msft.net” times out. Trying other ports above and below 1024 at random gives similar results. So there’s a firewall or other packet filter somewhere that’s discarding either the queries coming from low ports or the replies going back to those low ports.
Older DNS servers always use port 53 as their source port – blocking that would have caused a lot of complaints.
But “sudo dig -b "0.0.0.0#53" live.com @ns2.msft.net” works perfectly. So the firewall, wherever it is, seems to block DNS queries from all low ports, except port 53. It’s definitely a DNS aware configuration.
DNS packets go through a lot of servers and routers and firewalls between me and Microsoft, though, so it’s possible it could be some sort of problem with my packet filters or firewall. Better to check.
sudo dig -b "0.0.0.0#1000" google.com @ns1.google.com” works perfectly.
So does “sudo dig -b "0.0.0.0#1000" amazon.com @pdns1.ultradns.net“.
And “sudo dig -b "0.0.0.0#1000" yahoo.com @ns1.yahoo.com“.
The problem isn’t at my end of the connection, it’s near Microsoft.
Is this a firewall misconfiguration at Microsoft? Or should DNS queries not be coming from low ports (other than 53)? My take on it is that it’s the former – DNS servers are well within spec to use randomly assigned source ports, including ports below 1024, and discarding those queries is broken behaviour.
But using low source ports (other than 53) isn’t something most DNS servers will tend to do, as they’re hosted on unix and using those low ports on unix requires jumping through many more programming hoops and involves more security concerns than just limiting yourself to ports above 1023. There’s no real standard for DNS source port randomization, which is something that was added to many servers in a bit of a hurry in response to a vulnerability that was heavily publicized in 2008. Bind running on Windows seems to use low ports in some configurations. And even unix hosted nameservers behind a NAT might have their queries rewritten to use low source ports. So discarding DNS queries from low ports is one of the more annoying sorts of network bugs – one that won’t affect most people at all, but those it does affect will see it much of the time.
If you’re seeing DNS issues resolving Microsoft hosted domains, or you’re seeing patterns of unexpected SERVFAILs from other nameservers, check to see if they’re blocking queries from low ports. If they are, take a look and see what ranges of source ports your recursive DNS resolvers are configured to use.
(There’s been some discussion of this recently on the [mailop] mailing list.)

Read More