BLOG

Industry News & Analysis

Gmail, machine learning, filters

I’m sure by now readers have seen the article from Gmail “Spam does not bring us joy — ridding Gmail of 100 million more spam messages with TensorFlow.” If you haven’t seen it, go read it. It’s not often companies write about their filtering philosophy and what tools they’re using to manage incoming bad mail.

There were a few parts of the article that confirmed some of my theories about Gmail and a few things that were unexpected.

Open source tools

It’s no surprise that Google uses a machine learning engine built in house. What I didn’t know was it was called TensorFlow and was open sourced by Google. Many companies in the email space open source some of their tools. Exacttarget open sourced FuleUX long before they were SFMC and maintain a GitHub account with a number of tools. Mailchimp also maintains an account with their open source code. Steve releases a bunch of tools and code he writes both for work and for fun.

Open source software runs a whole lot more of the internet than many people know. Some of the primary contributors do the work on their own time. But many companies, large and small, understand how vital open source tools are to their business. They hire and support open source developers to maintain and extend the software.

Catching the hard spam

Google catches a lot of spam, and they’re always trying to catch the stuff that falls through the cracks. My recent call volume about going to spam at Gmail told me that Gmail had implemented some new filters. Many people were telling me that things were fine and then, with no change in what they were doing, mail started going to bulk. Other delivery folks were also talking about their customers getting caught up in filters.

We’ve gotten to the point, particularly with Google but also with the other webmail providers, where the bulk of egregious spam is blocked. What’s left is not some spammer sending 10MM messages, but a much more difficult problem. Spam that reaches the inbox is sent in much smaller quantities. It’s also heavily targeted. Spammers are trying to look like legitimate marketers but still sending mail without permission.

This targeted spam is something I’ve been thinking about a lot lately. Mostly because anti-spammers did a pretty good job making not-spamming look like it was beneficial to senders. Many deliverability recommendations boil down to stop spamming but phrased in a way that makes the advice more palatable. Much of the type of spam that’s getting caught in the new filters follows deliverability recommendations. The piece it misses is that it’s not being sent with the permission of the recipient.

Believe it or not, spam filters started out as protecting users from mail they didn’t ask for. As the internet as grown and email has become a channel for crime the focus of filters have changed. But, fundamentally, deep down, the original purpose of keeping mail boxes useful by stopping unsolicited mail is still there. The ML filters are giving Google, and others, tools to actually address that mail better.

The trend is clear. Filters are getting more an more able to address unsolicited email in a complex sender and user environment. Machine learning is driving a lot of that, and Google is at the front of the pack. They’re doing their best to stop the small scale spammers that have avoided a lot of the last generation of filters.


No Comments

AOL FBL petering out

This is pretty clear evidence that AOL accounts are being transferred to the Oath / Verizon Media / Yahoo backend.

AOL has been slowly disabling their postmaster pages and I don’t trust them to provide accurate info any longer. I am also way of using the contact forms at AOL these days. It’s possible that no one is monitoring them.

Looks like the last of the AOL mail infrastructure is nearly gone.

No Comments

Deliverability Help: Information checklist

When asking a for assistance with email delivery, there are some pieces of information that are required before anyone can help. Be prepared with the information so you can get timely assistance. This advice is true whether you’re looking for help from peers or working with paid deliverability consultants.

What is the problem?

Be very specific about the problem you see. The fix for mail going to the bulk folder can be very different from the fix for a Spamhaus listing. The fix for a Spamhaus DBL listing is going to be different than the fix for a ROKSO listing. The more specific you can be about the problem the more likely people can answer your question.

Bad:

  • I’m having a delivery problem.
  • None of my mail is delivering.
  • I need delivery help.

Better:

  • My mail is going to spam.
  • My IP address is listed on the SBL.
  • ISP is deferring my mail.

Where are you seeing this?

It’s important to be specific about where the problem is happening. Are you actually having mail problems? Or did you drop your IP or sending domain into a webpage that came back and told you that it was listed somewhere?

Bad:

  • My mail is blacklisted
  • We’re being blocked

Better:

  •  toolname is telling me that our IP is listed on blocklist
  • My delivery reports show that ISP is deferring mail from our IP address
  • We’re not getting any opens at ISP and my tests show mail from our domain is going to the bulk folder.

When did this start?

Many delivery problems are transient and will come and go in a matter of hours. Once the delivery problem is gone, it’s difficult to troubleshoot it. Waiting a few hours or even overnight will make it clear if this is something transient or if it’s the start of a real problem. Jumping at every little delivery problem is exhausting.

On the other hand, if a delivery problem goes on for a few days it’s unlikely to self resolve. You don’t want to let problems fester for weeks or months. The longer a delivery problem goes on the longer it’s going to take to repair any reputation damage.

Bad:

  • Worrying about delivery problems in the first few moments after sending mail
  • Waiting more than a year before addressing delivery problems.

Better:

  • Giving mail 12 – 24 hours before looking at delivery.
  • Monitoring delivery on an ongoing basis and addressing things within a few weeks of the first sign of problems.

Know your mail

Every, and I do mean EVERY, delivery person should know how to check headers and should be at a minimum able to identify the headers used in authentication. (the description I’ve been meaning to write but haven’t yet). The really easy way to do it is grab the information out of your gmail inbox. Gmail provides an easy to read interface into headers that shows exactly what they’re seeing in terms of SPF, DKIM and DMARC. There’s also a handy “how long it took mail to get from the google.com mail servers into the users’s inbox” counter – letting you know if problems are on the sending or the receiving end.

Knowing your mail includes knowing what you’re using as a mail from (5321.from, bounce string). Is it your domain or the ESP’s domain? Are you signing with DKIM? What’s the from address your recipients see?

Do you know if you’re sending from a dedicated IP? A shared pool? Are you using an ESP, a mail relay service or are you sending out over self hosted servers?

One of the big use cases here is Google Postmaster Tools. Many senders are confused because they are seeing DKIM passing but SPF failing, only to discover that the domain authenticated by SPF is actually the ESP domain, not theirs.

Anyone answering questions is going to need to know the following information. And, yes, you’re probably going to have to share the actual IP address and domain if you really want folks to help.

Collect the information:

  • Sending IP
  • Pool type (shared or dedicated IP)
  • 5321.from
  • 5322.from
  • d= value
  • ESP or MTA
  • Email address sources (including those of less than squeaky clean provenance)
  • Frequency of mailing
  • How long the IP has been in use
  • Age of domain

How we help

These days there are very few magic wands to fix delivery problems, whether you’re peer sourcing delivery help or working with a paid professional. Anyone helping you troubleshoot and fix delivery problems needs to know the who, what, when, and where in order to understand the why. Only once they understand they why, can they help you with the how to fix it.

No Comments

Share your average bounce rates

The question came up on slack this morning about bounce rate benchmarks. What are the normal / average bounces that different ESPs see? Does region matter? What’s acceptable for bounce rates?

Bounce rate is an overall measure of address quality

Here’s your the chance for ESPs share the data from your customers. We’re interested in anything you care to share. But more detail is always helpful. Some suggestions:

  • Overall bounce rate for 2018.
  • Bounce rate for 2018 compared to 2017.
  • Rate by month.
  • Rate by region (US, EMEA, AsiaPac, etc).
  • Rate by industry.
  • Bounce rates that make you sit up and take notice.

Feel free to add any other numbers. This is just an informal poll of readers.

Anonymization / Privacy note for this post: Generally I don’t approve comments from clearly forged or fake email addresses. If you want your comment to stay anonymous, use an address that’s not been previously approved and it will go into moderation. Please let me know who you are in the text and I will edit before approving the post.

No Comments

Cousin domains

When I checked in on Facebook this morning there was a discussion from a couple people frustrated by cousin domains. I share their frustration.

Kitten running through field with text “every time a marketing department registers a cousin domain, god kills a kitten”

Cousin domains are a major problem for ISPs trying to protect their users from phishing and other fraud. Because so many companies use cousin domains in their legitimate mail, ISPs can not be strict with them. Instead, they have to expend time and energy to determine if this particular cousin domain is legitimate or not.

It’s time, energy and other resources that could be used better.


6 Comments

What SPF records should you publish?

When it comes to SPF records there seems to be a lot of confusion. I mean, a decade after I posted it Authenticating SPF is still the most frequently visited post on the site. And, of course, there are hundreds of other pages out there that discuss SPF and what to publish. Still, there are common questions.

Most recently I’ve been addressing questions about what SPF records need to be published. In the older post, I talk about how to publish, but don’t talk about what domains should be included. It’s probably time to do that.

What is SPF?

SPF is Sender Policy Framework. It allows domain owners to specify what IP addresses are expected to send mail from that domain.

What does SPF authenticate?

Primarily, SPF authenticates the address in the Mail From:. This is not the domain that the final recipient sees in their mail client. The Mail From is also known as the envelope from, the bounce domain, the return path, the 5321.from, and other names..

The SPF spec also says that a lookup can be performed on the HELO value. During the SMTP transaction the first step is for the connecting server to introduce itself to the receiving server. These values can be things like “mx.wordtothewise.com” or “mail116-221.us32.msgfocus.com” It’s not a requirement to check this value, but it is an option.

What records should you publish?

SPF records should be published for the mail from address. I typically recommend publishing ~all. Back in 2009, I said there wasn’t much different in how ~all and -all were handled. This was true then, but recently I’ve become aware of a few large providers actively rejecting mail in cases where a -all is published and the sending IP is not in the SPF record.

It’s not uncommon for mail to be forwarded without altering the Mail From: causing SPF to become invalid. This is one of the reasons many ISPs didn’t reject based on SPF. With the advent of DMARC, it seems some ISPs are more confident in the accuracy of SPF records and thus are rejecting based on it. In light of these rejections, unless you really want mail rejected I still recommend using ~all.

What about the from address?

There are still a lot of ESPs and tutorials out there that suggest publishing SPF for the email address in the friendly from. Many years ago this was best practice because Microsoft would check SPF for the From address. It’s not going to hurt anything if you publish a SPF record for this domain, but it’s usually unnecessary.

3 Comments

Undelivered mail without a bounce

A few weeks ago I wrote a blog post focusing on one small part of bounce handling. Today I want to talk about delivery failures that aren’t bounces. This is really the biggest issue for companies who have written their own bulk sending servers. Modern bulk MTA appliances and ESPs correctly handle these types of bounces. However, when you’re troubleshooting it’s important to know that sometimes there won’t be a SMTP level bounce.

When we talk about bounce handling, we’re usually talking about what happens during (or after) the SMTP transaction. That’s built into our terminology. User unknown, 5xy, 550, 4xx is all shorthand we’ve taken from the email protocol documents. There are some types of delivery failures that happen before a SMTP transaction happen. And, like the SMTP failures, they can be permanent or temporary.

Domain not found in DNS

The first thing an outgoing mail server does to send mail is look up the mail exchange (MX) record for the receiving domain. There are a few different responses a DNS server can return. The expected response is a domain name that is actively configured to receive mail for that system. The sending server can move on and attempt to deliver the mail.

When there is no MX configured there are a couple different responses returned. All of them contain a blank MX record, but they also include a status message. If a domain exists, but just doesn’t have a MX configured the DNS server returns a NOERROR status. If a domain or subdomain doesn’t exist in DNS the DNS server returns NXDOMAIN. For NXDOMAIN the email address should be marked as undeliverable and removed from future sends.

What to do in the case of NOERROR isn’t as clear cut. It is a perfectly valid configuration to have no MX published in DNS but to accept mail. What happens is a DNS lookup is performed for the A (or AAAA) record of the domain. If there is an A record then the mail server attempts to deliver to port 25 at that IP address. In the vast majority of cases there is a MX record and any domain publishing a blank MX is not accepting mail. This is even more true when we’re looking at the consumer email domains. However, it’s not 100% guaranteed that a domain will publish a MX record. Overall, I think that if there is a blank MX then mail to that domain should be suppressed as a domain with no email users. This recommendation may result in a very, very tiny fraction of email to hobby and personal domains being suppressed. But I think that overall, it’s better to suppress than attempt to deliver to domains that may not actually want mail.

Sometimes a DNS server will return a SERVFAIL response. In this case, current delivery will fail but the address can be retried in the future.

DotMX or localhost in DNS

There are a couple things that a domain owner does to identify a domain that does not accept email. One of those is using a single . (yes, it’s really just a dot) in the MX record (RFC7570). This is a clear statement by the domain owner that this domain is never used for email. Some domain owners put in a record of 127.0.0.1 or localhost. In both cases, these responses mean mail will never deliver and all future mail to that address should be suppressed.

No server at the MX

Sometimes the domain does respond with a MX but there is no mail server there. Behind the scenes, the sending mail server attempts to connect on port 25 but the connection attempt times out because there is no answer. This situation isn’t as clear cut as a NXDOMAIN or a dotmx. There are cases where a server might not answer temporarily. Ideally, the MTA will requeue the mail and attempt to send it over a few days. If the failure continues over a long period of time then it’s likely this is a dead domain and future mailings to the domain should be suppressed.

Overall, these bounces are quietly handled by competent ESPs and modern bulk MTAs. But they do happen. It’s not a spam issue, but it is a data hygiene issue. Excessive numbers of people submitting addresses with no MX, or a dotMX or no server at that domain mean that they’re also likely submitting addresses that belong to unsuspecting third parties. Understanding what about the collection process is encouraging forgeries should result in cleaner deliverable data.

 

 

No Comments

One subscription should equal one unsubscription

One of the side effects of using tagged addresses to sign up for things is seeing exactly what companies do with your data once they get it.

For instance, 3 years ago I downloaded a white paper or something from an ESP. That white paper was apparently co-branded and the other company got my email address from the ESP. They’re now sending mail to that address. I unsubscribed from the ESP mail and haven’t gotten anything from them in the last 2 1/2 years.

There are multiple problems with this kind of sharing. The first is that recipients don’t know they’re giving permission for their data to be shared. Maybe it was in the fine print, but hiding permission in terms and condition isn’t real permission.

Compounding the spam is the fact that I only gave one group my email address, but I have to unsubscribe multiple times. To me, this is the same as unsubscribing from one email only to have a sender add me to a different list of theirs.

I’m becoming more and more convinced that the only fair way to handle subscriptions in a truly opt-in fashion is that the number of unsubscribes necessary to stop mail should equal the number of subscribes. In my case it’s easy. Every subscription gets a unique address. When I give my address in one place, then I should be able to stop all mail to that address through a single unsubscribe.

I’m not against preference centers. If you want to add me to multiple segments or lists, all you have to do is tell me and let me choose. If you can’t do that, then take an unsubscribe request as a request to remove me from all mail. If you’re in the US, you’re required to do that under CAN SPAM and other laws.

No recipient should have to chase down every company their addresses have been shared with just to opt-out. Companies that share opt-ins for addresses should also share opt-outs. If that’s too much work for you, then how is it any less work for the recipient? You know who you’ve given the address to, I don’t. I just get to unsubscribe any time someone decides to mail the address of mine you gave them.

Otherwise, it’s all just spam.

5 Comments

Spamtraps on the brain

I really dislike whomever it was that coined the term pristine spamtraps. I get what they were trying to do, explain the different kinds of spamtraps and how different traps get on your list in different ways. Except… any type of trap can end up on your list in any way.

For instance, not every recycled trap shows up on a list because bounce handling is bad. Sometimes, people input their old addresses they never use anymore into forms, not knowing that address is now a recycled spamtrap. In other cases, an address was entered into enough forms by random people that the original owner had to abandon it and then handed over the resulting mail feed to spamtrap maintainers.

Likewise, not every pristine trap is pristine. Unless a domain has been continuously owned by the spamtrap maintainer for the past two decades, they don’t know what the history of the address was. Maybe it was a domain that was never registered and it actually is a pristine domain. Alternatively, it could have belonged to a startup and been used for a couple years before falling back into the available domain pool.

In the early days of deliverability we often blamed spamtraps for blocking. It made sense. Senders couldn’t argue they had permission to mail spamtraps. They didn’t, they couldn’t, there was no one using that address in order to give meaningful permission.

What I’m seeing now among some senders, though, is an almost laser like focus on spamtraps as the one metric to rule them all. Senders, and ESPs, are heavily weighting the data they get back from the commercial sensor networks. And, let’s be honest here, while the public writings of the companies describe them as sensor networks and are careful to avoid the spamtrap terminology, almost everyone else calls them spamtraps.

Spamtraps are not the problem. They’re a signal. Spamtraps tell us that there is something wrong with how addresses are being collected or maintained. They indicate what problems we need to fix in order to get good delivery. No one really cares if spammers send mail to abandoned or unread email addresses. What really matters is that a subscription process lets any email address be added without doing anything to verify that address belongs to the person sending it.

Right now, of the data hygiene tools do anything to link the address input into a form with the person providing the address. They’ll remove potential spamtraps and bad addresses, but that’s it. We can remove bouncing addresses and make lists look clean. Still, it’s not enough in the age of engagement based filters. In order to get to the inbox you need to send mail people want.

2 Comments

Recycled spamtraps

Spamtraps strike fear into the heart of senders. They’ve turned into this monster metric that can make or break a marketing program. They’ve become a measure and a goal and I think some senders put way too much emphasis on spamtraps instead of worrying about their overall data accuracy.

Recently I got a question from a client about the chances that any address they were currently mailing would turn into a recycled spamtrap. Assuming both a well behaved outbound mail server and a well behaved spamtrap maintainer the answer is never. Well behaved spamtrap maintainers will reject every email sent to one of their spamtrap feeds for 6 – 12 months. Some reject for longer. Well behaved mail servers will remove addresses that consistently bounce and never deliver.

Of course, not everyone is well behaved. There are maintainers who don’t actively reject mail, they simply pull the domain out of DNS for years and then start accepting mail. Well behaved mail servers can cope with this, they create a fake bounce when the get NXDomain for an address and eventually remove the address from future mailings. There have been cases in the past where spamtrap maintainers purchase expired domains and turn them into spamtraps immediately. No amount of good behaviour on the part of the sender will cope with this situation.

On the flip side some MTAs never correctly handle any undeliverable address when the reason is anything other than a direct SMTP response. Generally these are built on the open source MTAs by people who don’t realise there are mail failures outside of SMTP failures.

There are three general cases where recycled spamtraps will show up on a list.

  1. A list has been improperly bounce handled.
  2. An address has not been mailed for more than a year.
  3. Someone signs up an address that’s a recycled spamtrap (same as how a pristine trap will get added to a list)

ESPs have to worry about recycled spamtraps in another common case. A new customer brings over a list and decides to retry addresses that their previous ESP marked as bounced. (It happens. Regularly.)

Recycled addresses are a sign that there is a problem with the long term hygiene of a list. As with any spamtrap, they’re a sign of problems with data collection and maintenance. The traps aren’t the problem, they’re just a symptom. Fix the underlying issue with data maintenance and traps cease to be an actual issue.

No Comments