The variables are not independent

In my previous career I was a molecular biologist. Much of my work was done on bacteria but after I left grad school, I ended up working in a developmental biology lab. Bacteria were (mostly) simple: just about every trait was controlled by a single gene. We could study what that gene did by removing it from the bacteria or adding it to a well characterised bacteria.

image of a head with gears and ideas floating around it

When I moved to developmental biology, the world got more complex. In higher organisms many traits are controlled by a whole bunch of genes, and there is a lot of redundancy and overlap and duplication. But, there was still quite a bit of removing a gene to see what happens. The lab I was in was specifically studying teratogens – chemicals that interfere with development. The most well known teratogen is thalidomide. In fact, a lot of the work we were doing with vitamin A and alcohol involved many of the same pathways that were disrupted by thalidomide.

One of the important parts of development is controlled by a complex of genes called Hox genes. These do a lot of things, but one of the most important things they do is define what parts of the embryo will become the front and back, the top and bottom and the near and far.

OK, now that we have 3 paragraphs of background, here’s the story. There was one seminar we went about Hox genes. The research being done was trying to assign specific activities to Hox genes by knocking them out. But, because Hox genes are so redundant, knocking one of them out doesn’t actually change much. There was nothing really wrong with the single knockouts this lab was studying. So, they ended up knocking out two Hox genes. At that point most things still worked, except… 2 vertebra switched places.

That story has always stuck with me, because, you have these genes that are so important they exist in everything from worms to humans. And they’re so vital that higher vertebrates like humans have the same set of genes duplicated across 4 different chromosomes. You knock out two of these vital developmental genes… and the only real evidence of anything happening is two vertebra switch places.

Recently I’ve been blogging about how to troubleshoot delivery problems. And I realised that a lot of how I treat delivery problems is influenced by my time in research. Much of how I troubleshoot starts with the premise that the things we’re testing aren’t independent variables. Everything, or almost everything, is conditional.

Email filtering, particularly that driven by machine learning, is closer to molecular biology than I realised. We can imagine each individual rule like it’s a gene. And these genes all work together and, in some cases, modify each other. Some rules don’t get activated unless another rule is active, or inactive. In some cases, one rule is so dominant none of the other rules matter. For instance, if an IP is listed on the SBL, your mail is blocked, no questions asked. But, if the sending IP isn’t listed, then hundreds of rules act on the message. Or, on the other end, if a user has a rule that says “always deliver this to my inbox” none of the rules matter, that message will always go to the inbox.

Filtering variables aren’t independent. In order to troubleshoot delivery problems, we need to start looking at the whole picture and the whole system. We can’t troubleshoot things in a vacuum.

Related Posts

When you can’t get a response

I’ve seen a bunch of folks in different places looking for advice on what to do when they can’t get a response from a postmaster team, or a filtering company. I was all set to write yet another post about how silence is an answer. Digging through the archives, though, I see I’ve written about this twice already in the last 18 months.

Read More

Email filters and small sends

Have you heard about the Baader-Meinhoff effect?

The Baader-Meinhof effect, also known as frequency illusion, is the illusion in which a word, a name, or other thing that has recently come to one’s attention suddenly seems to appear with improbable frequency shortly afterwards (not to be confused with the recency illusion or selection bias). Baader–Meinhof effect at Wikipedia

There has to be an corollary for email. For instance, over the last week or so I’ve gotten an influx of questions about how to fix delivery for one to one email. Some have been from clients “Oh, while we’re at it… this happened.” Others have been from groups I’m associated with “I sent this message and it ended up in spam.”

Read More

Gmail, machine learning, filters

I’m sure by now readers have seen the article from Gmail “Spam does not bring us joy — ridding Gmail of 100 million more spam messages with TensorFlow.” If you haven’t seen it, go read it. It’s not often companies write about their filtering philosophy and what tools they’re using to manage incoming bad mail.

Read More