What really is "spam" anyway?

A few days ago I was reading the attempt by e360 and Dave Linhardt to force Comcast to accept his mail and to stop people posting in the newsgroup news.admin.net-abuse.email from claiming he is a spammer. The bit that pops out at me in this complaint of his, is the fact that he believes that by complying with the minimal standards of the CAN-SPAM act, he is not spamming.
The problem with this claim is that CAN SPAM lists the minimal standards an email must meet in order to avoid prosecution. CAN SPAM does not define what is spam, it only defines the things senders must do in order to not be violating the act. There is no legal definition of spam or of what is not spam.
To add to the confusion there are a number of confusing and contradictory definitions of spam. Definitions people have used over the years include:

  • unsolicited bulk email
  • unsolicited commercial email
  • mail I don’t want
  • mail I don’t think my customers want
  • mail that is identical/similar to mail that hit my spamtrap
  • mail that was sent to a non-existent address at my domain
  • mail that contains HTML
  • unsolicited email
  • mail that advertises Viagra or porn sites or similar
  • mail that other people send

I rarely use the word spam. There are so many different definitions of spam, I have no way to know if my clients understand what I am saying, so I avoid the term completely. I do think it is important for senders to understand the definitions of spam as used by entities responsible for filtering large amounts of incoming email.
Spamhaus and some other blocking lists use “unsolicited bulk email” as their definition. Generally, they have addresses that have never been used to sign up for email, and if a mailer sends mail to them, the mailer is sending unsolicited bulk email and is eligible for listing on the blocklist. The lists believe that if a mailer is sending one piece of email to a user who did not request it, then they are likely mailing many other users who did not request any mail. This definition centers around permission, and only sending email when you have the permission of the recipient.
Many of the large ISPs use “mail our users complain about” as their definition. With this definition, they do not have to argue permission status with a sender. The data shows that their customers complain about mail from that sender or with that URL. The ISPs are going to block, or deliver to the bulk folder, email that their users do not want.
Filters and some blocking lists use “mail that has characteristics of mail we know is unsolicited bulk mail” as their definition. These characteristics can be things like an invalid HELO string, or lack of reverse DNS on the connecting IP address, or badly formatted HTML. Mail that looks like spam, in the technical sense, is often treated like spam.
Resolving a block or listing requires first understanding the definition that entity is using. For blocklists senders usually must make changes to eliminate any possibility an address will get on the list without permission of the owner of that address. For ISPs, senders must decrease the complaints from users, usually accomplished by improving the signup process, getting a FBL from the ISP and and sending more relevant email. For filters, fixing the technical issues, cleaning up HTML and sending mail that does not look like spam will resolve many of the issues.
Complying with the law is not sufficient to meet the standards of recipients. If e360 is sending mail users are complaining about, then the recipient ISPs are going to treat the mail as spam and filter or block it. If e360 is sending mail to people who have not requested it, then posters in NANAE are going to claim e360 is spamming. Is e360 sending mail that complies with CAN SPAM? I expect that they are. Does this mean they are not spamming as defined by some people? Of course not.

Related Posts

Why do ISPs limit emails per connection?

A few years ago it was “common knowledge” that if you were sending large amounts of email to an ISP the most polite way to do that, the way that would put the least load on the receiving mailserver, was to open a single SMTP session to the mailserver and then to send all the mail for that ISP down that single connection.
That’s because the receiving mailserver is concerned about two main resources when handling inbound email – the pool of “slots” assigned one per inbound SMTP session, and the bandwidth (network and disk, and related resouces such as memory and CPU) consumed by the inbound mail – and this approach means the sender only uses one slot, and it allows the receiving mailserver to control the bandwidth used simply by accepting data on that one connection at a given rate. It also amortizes all the connection setup costs over multiple emails. It’s a beautiful thing – it just doesn’t get any more efficient than that.
That seems perfect for the receiving ISP – but ISPs don’t encourage bulk senders to do this. Instead many of them have been moving from “one connection, lots of mail through it” to “multiple connections, a few messages through each”. They’re even limiting the number of deliveries permitted over a single connection. Why would that be?
The reason for this is driven by three things. One is that the number of simultaneous inbound SMTP sessions that a mailserver can handle is quite tightly limited by the architecture of most mailservers. Another is that the amount of mail that’s being sent to large ISP mailservers keeps going up and up – so there are sometimes more inbound SMTP sessions asking for access than the mailserver can handle. The third is that ISPs know that there are different categories of email being sent to their users – 1:1 mail from their friends that they want to see as soon as possible, wanted bulk mail that their users want to see when it arrives and spam; lots and lots of spam.
So ISPs want to be able to do things like accept 1:1 mail all the time, while deferring bulk mail and spam to allow them to shed traffic at times of peak load. But they can only make decisions about whether to accept or defer delivery in an efficient way at SMTP connection time – they pick and choose amongst the horde of inbound connection attempts to prioritize some and defer others, letting them keep within the number of inbound sessions that they can handle simultaneously.
But once the ISP lets a bulk mailer connect to deliver their mail, they lose most of the ability to further control that delivery as the sender might send thousands of emails down that connection. (Even if the ISP has the ability to throttle bandwidth – as some do to control obvious spam – that just means that the sender would tie up an expensive inbound delivery slot for longer).
So, in order to allow them to prioritize inbound connections effectively the ISP needs to terminate the session after a few deliveries, and then make that sender start competing with other senders for a connection again.
So ISPs aren’t limiting the number of deliveries per SMTP connection to make things difficult for senders, or because they don’t understand how mail works. They’re doing it because that lets them prioritize wanted email to their users. The same is true when they defer your mail with a 4xx response.
It might be annoying to have to deal with these limits on delivery, but for legitimate bulk mail senders all this throttling and prioritization is a good thing. Your mail may be given less priority than 1:1 mail – but, if you maintain a good reputation, you’re given higher priority than all the spam, higher priority than all the email borne viruses, higher priority than all the junk email, higher priority than the 419 spams. And higher priority than mail from those of your competitors who have a worse reputation than yours.

Read More

Greylisting: that which Yahoo does not do

Over the last couple days multiple people have asserted to me that Yahoo is greylisting mail. The fact that Yahoo itself asserts it is not using greylisting as a technique to control mail seems to have no effect on the number of people who believe that Yahoo is greylisting.
Deeply held beliefs by many senders aside, Yahoo is not greylisting. Yahoo is using temporary failures (4xx) as a way to defer and control mail coming into their servers and their users.
I think much of the problem is that the definition of greylisting is not well understood by the people using the term. Greylisting generally refers to a process of refusing email with a 4xx response the first time delivery is attempted and accepting the email at the second delivery attempt. There are a number of ways to greylist, per message, per IP or per from address. The defining feature of greylisting is that the receiving MTA keeps track of the messages (IP or addresss) that it has rejected and allows the mail through the second time the mail is sent.
This technique for handling email is a direct response to some spamming software, particularly software that uses infected Windows machines to send email. The spam software will drop any email in response to a 4xx or 5xx response. Well designed software will retry any email receiving a 4xx response. By rejecting anything on the first attempt with a 4xx, the receiving ISPs can trivially block mail from spambots.
Where does this fit in with what Yahoo is doing? Yahoo is not keeping track of the mail it rejects and is not reliably allowing email through on the second attempt. There are a couple reasons why Yahoo is deferring mail.

Read More

Yahoo and Spamhaus

Yahoo has updated and modified their postmaster pages. They have also put a lot of work into clarifying their response codes. The changes should help senders identify and troubleshoot problems without relying on individual help from Yahoo.
There is one major change that deserves its own discussion. Yahoo is now using the SBL, XBL and PBL to block connections from listed IP addresses. These are public blocklists run by Spamhaus. Each of them targets a different type of spam source.
The SBL is the blocklist that addresses fixed spam sources. To get listed on the SBL, a sender is sending email to people who have never requested it. Typically, this involves email sent to an address that has not opted in to the email. These addresses, known as spamtraps, are used as sentinel addresses. Any mail sent to them is, by definition, not opt-in. These addresses are never signed up to any email address lists by the person who owns the email address. Spamtraps can get onto a mailing list in a number of different ways, but none of them involve the owner of the address giving the sender permission to email them.
Additionally, the SBL will list spam gangs and spam supporters. Spam supporters include networks that provide services to spammers and do not take prompt action to remove the spammers from their services.
The XBL is a list of IP addresses which appear to be infected with trojans or spamware or can be used by hackers to send spam (open proxies or open relays). This list includes both the CBL and the NJABL open proxy list. The CBL list machines which appear to be infected with spamware or trojans. The CBL works passively, looking only at those machines which actively make connections to CBL detectors. NJABL lists machines that are open proxies and open relays.
The Policy Block List (PBL) is Spamhaus’ newest list. Spamhaus describes this list as

Read More