Recent Posts

Social media connections are not opt-ins

It seems silly to have to say this, but connecting on social media is not permission to add an address to your newsletter or mailing list or prospecting list or spam list. Back in 2016, I wrote:

Still with the Microsoft problems

We took a quick trip to Dublin last week. I had every intention of blogging while on the trip, but… oops. I did get to meet with some clients, and had a great dinner while discussing email and delivery.

Coming back, I see a lot of folks still reporting delivery problems to Microsoft properties. I’ve been operating under the assumption this was temporary as kinks were worked out after the migration. I’m still pretty convinced not all of the problems are intentional. Even the best tested code can have issues that only show up under real load with real users. Reading between-some-lines tells me that the tech team is hard at work identifying and fixing issues. There will be changes and things will continue to improve.
With all that being said, I think it’s important to realize that delivering to the new system is not the same as delivering to the old system. This is a major overhaul of their email handling code, representing multiple years worth of planning and development inside Microsoft. It’s very likely that not all of the current delivery problems are the result of deployment. Some of the problems are likely a result of new standards and thresholds for reaching the inbox. What worked a year ago to get into the inbox just doesn’t any more.

AOL Changes

We’ve known for a while that AOL email infrastructure is going to be merging with Yahoo’s, but apparently it’s happening sooner than anyone expected.
The MXes for aol.com will be migrated to Yahoo infrastructure around February 1st. Reading between the lines I expect that this isn’t a flag day, and much of the rest of the AOL email infrastructure will be in use for a while yet, but primary delivery decisions will be made on Yahoo infrastructure.
The AOL and Yahoo postmaster teams are pretty smart so I assume they’ll have made sure that their reputation data is consistent, and be doing everything else they can do to make the migration as painless as possible. But it’s a major change affecting a lot of email, and I wouldn’t be surprised to see some bumpiness.
If you’ve done anything … unwise … with delivery to AOL addresses, such as hard-wiring MXes for delivery to aol.com, you should probably look at undoing that in the next week or so. I’m guessing the changeover will happen at the DNS level, so if you’ve nailed down delivery IPs for aol.com you might end up trying – and probably failing – to deliver to the old AOL infrastructure.

Tempo

When we say that you might just be sending too much email and fatiguing or annoying the recipient into unsubscribing or hitting spam, this is the sort of thing we mean.
Three emails (to the same email address) in four minutes might be a bit much.

If you can’t combine the content you want to send into a single personalized email, maybe spread deliveries out a bit? Or even not send all of it, perhaps.

Filters evolving

I started writing this blog post while sitting on a conference call with a bunch of senders discussing some industry wide problems folks are having with delivery. Of course the issue of Microsoft comes up. A lot of senders are struggling with reaching the inbox there and no one has any real, clear guidance on how to resolve it. And the MS employees who regularly answer questions and help folks have been quiet during this time.

In some ways the current situation with Microsoft reminds me of what most deliverability was like a decade ago. Receivers were consistently making changes and they weren’t interacting with senders. There weren’t FBLs really. There weren’t postmaster pages. The reason knowing someone at an ISP was so important was because there was no other way to get information about blocking.
These days, we have a lot more institutional knowledge in the industry. The ISPs realized it was better to invest in infrastructure so senders could resolve issues without having to know the right person. Thus we ended up with postmaster pages and a proliferation of FBLs and best practices and collaboration between senders and receivers and the whole industry benefited.
It is challenging to attempt to troubleshoot deliverability without the benefit of having a contact inside ISPs. But it is absolutely possible. Many ISP folks have moved on over the years; in many cases due to layoffs or having their positions eliminated. The result is ISPs where there often isn’t anyone to talk to about filters.
The lack of contacts doesn’t mean there’s no one there and working. For instance, in the conference call one person asked if we thought Microsoft was going to fix their systems or if this is the new normal. I think both things are actually true. I think Microsoft is discovering all sorts of interesting things about their mail system code now that it’s under full load. I think they’re addressing issues as they come up and as fast as they can. I also think this is some level of a new normal. These are modern filters that implement the lessons learned over the past 20 years of spam filtering without the corresponding cruft.
Overall, I do think we’re in a period of accelerating filter evolution. Address filtering problems has always been a moving target, but we’ve usually been building on known information. Now, we’re kinda starting over. I don’t have a crystal ball and I don’t know exactly what the future will bring. But I think the world of deliverability is going to get challenging again.

That's not how you do it…

Got an email this morning from a company advertising their newest webinar “The Two Pillars of Effective Large-Scale Email: Security and Deliverability.” The message came to a tagged address, so clearly I’d given them one at some point. But I didn’t recognize the name or company or anything. I did a search to seen when I may have interacted with this company in the past.

Looking through my old emails, it appears I contacted this company through their support form back in 2007. They were blocking a client’s newsletter. This is what I sent:

Oh, Microsoft

Things have been a little unsettled at Microsoft webmail properties over the last few months. A number of ESPs reported significantly increased deferrals from Microsoft properties starting sometime late in November. Others saw reduced open rates across their customer base starting in late October. More recently, people are noticing higher complaint rates as well as an increase in mail being dropped on the floor. Additionally, Return Path announced certification changes at the end of November lowering the Microsoft overall complaint rate to 0.2%, half of what is was previously.

Overall, sending mail to Microsoft is a challenge lately. This is all correlated with visible changes which may seem unrelated to deliverability, but actually are. What are the changes we know about?

That Should Be A Word

What … is your name?

For some reason otherwise legitimate ESPs have over the years picked up a habit of obfuscating who they are.
I don’t mean those cases where they use a customers subdomain for their infrastructure or bounce address. If the customer is Harper Collins then mail “from” @bounce.e.harpercollins.com sent from a server claiming to be mail3871.e.harpercollins.com isn’t unreasonable. (Though something in the headers that identified the ESP would be nice).
No, I mean random garbage domains created by an ESP to avoid using their real domains in the mail they send and in their network infrastructure. This isn’t exactly snowshoe behaviour. They’re not really hiding anything terribly effectively from someone determined to identify them – the domains are registered with real contact information, and the IP addresses the mail is sent from are mostly SWIPped accurately – but they do prevent a casual observer from identifying the sender.
Silverpop has registered over 9,000 domains in .com that are just “mkt” followed by some random digits that they use for infrastructure hostnames, bounce addresses and click-tracking links. Apart from anything else, it’s a terrible waste of domain name space to use links.mkt1572.com where they could just as well use links1572.silverpop.com or links.mkt1572.silverpop.com.
For what they’re paying just for domain name registration and management they could probably hire multiple full time employees.
And Marketo has registered over 17,000 domains in .com that are just “mkto-” followed by what looks like a location code.
(I’m not picking on Marketo and Silverpop in particular – several other notable ESPs do the exact same thing – they’re just relevant to the end of the story).
Using garbage domains like this makes you look more like a snowshoe spammer at first glance than a legitimate ESP.
It also makes it much harder for a human glancing at your headers to correctly identify a responsible party …
… which is probably why abuse@marketo are rather tired of receiving misdirected complaints about spam sent by Silverpop from machines called something like mkt1572.com.

Meltdown & Spectre, Oh My

If you follow any infosec sources you’ve probably already heard a lot about Meltdown and Spectre, Kaiser and KPTI. If not, you’ve probably seen headlines like Major flaw in millions of Intel chips revealed or Intel sells off for a second day as massive security exploit shakes the stock.

What is it?
These are all about a cluster of related security issues that exploit features shared by almost all modern, high performance processors. The technical details of how they work are fascinating if you have a background in CPU architecture but the impact is pretty simple: they allow programs to read from memory that they’re not supposed to be able to read.
That might mean that a program running as a normal user can read kernel memory, allowing a malicious program to steal passwords, authentication cookies or even the entire state of the kernels random number generator, potentially allowing it to compromise encryption.
Or it might mean a program running on a virtual machine being able to escape from the sandbox the virtual machine’s hypervisor keeps it in and reading memory of other virtual machines that are running on the same hardware. A malicious user could sign up for a cloud service, such as Amazon EC2 Google Code Engine or Microsoft Azure, repeatedly create temporary virtual machines and grovel through all the other virtual machines running on the same hardware to steal, login credentials or TLS private keys.
Or it might mean a malicious piece of javascript running in a browser from a hostile website or a malicious banner ad being able to steal secrets and credentials not just from your web browser, but from any other software running on your laptop.
It’s pretty bad.
Meltdown and Spectre
One variant has been given the snappy name Meltdown. It (mostly) affects Intel CPUs, and is trivial to exploit reliably by unskilled skript kiddies. It can be mitigated at the operating system level, and all major operating system vendors are doing so, but that mitigation will have significant impact on performance – perhaps 20% slower for common workloads.
The other variant has been named Spectre. It’s more subtle, relying on measuring how long it takes to run carefully crafted code. Whether the code is fast or slow tells the malicious actor whether a particular bit of forbidden memory is zero or one, allowing them to step through reading everything they want. This is likely to be harder to exploit reliably, but is also going to be much harder to mitigate reliably in software (I’ve seen some speculation that it might be impossible to mitigate – I’m pretty sure that’s not true, but it is going to be difficult to do so reliably and will probably have significant performance impact). It affects pretty much everything, including AMD processors (despite what their PR flacks would like you to believe).
What should you do

As a typical end user you should apply your security patches as normal to mitigate Meltdown. macOS was patched on December 6th, the Windows kernel has mitigation in place. The latest release candidate of the Linux kernel has mitigation patches in place, which’ll presumably trickle out to various distributions over the next few days.
You should also update your browser. One nasty vector Spectre can use is timing attacks from malicious javascript. Chrome and Firefox have partial mitigation in their mainline development, and Microsoft have announced fixes for IE11 and Edge.
Keep updating your ‘phones. At least some of the ARM chips in iPhone and Android are vulnerable, and the more constrained ‘phone environment may make targeted attacks more likely.
If you’re using any virtual machines or cloud hosted services then your provider has probably already done rolling reboots so they can patch their hypervisors to mitigate Meltdown. You’ll still need to update your kernel yourself, to protect against attacks within your machine, even though your provider has patched their hypervisors.
Performance (and Email)
The operating system level mitigation for Meltdown works by having the CPU throw away a bunch of information every time the thread of execution goes from the kernel back to the application. Most common applications will switch between kernel code and application code a lot so this has a significant performance impact.
Initial tests with PostgreSQL show slowdowns as bad as 23%, but more realistic workloads look to be maybe 5-15% slower, depending on the workload and the hardware features available.
I wondered whether there’d be much impact on network service performance, so I set up a test network with a couple of mailservers running latest release candidates of the Linux kernel. I sent mail from one to the other, using postfix, smtp-source and smtp-sink – smtp-source and -sink are test tools distributed with postfix that make it easy to send mail or to receive and discard mail.
I wasn’t really expecting to find any performance impact for something that was likely network limited, but ran some tests anyway, slinging a few million emails from one machine to the other and turning mitigation on and off on the sender and receiver. There wasn’t any performance impact that I could measure – if it’s there it was well below the noise floor.
So you’ll probably see slight performance degradation for some things, especially disk-heavy workloads, but nothing to worry too much about.