The variables are not independent
In my previous career I was a molecular biologist. Much of my work was done on bacteria but after I left grad school, I ended up working in a developmental biology lab. Bacteria were (mostly) simple: just about every trait was controlled by a single gene. We could study what that gene did by removing it from the bacteria or adding it to a well characterised bacteria.
When I moved to developmental biology, the world got more complex. In higher organisms many traits are controlled by a whole bunch of genes, and there is a lot of redundancy and overlap and duplication. But, there was still quite a bit of removing a gene to see what happens. The lab I was in was specifically studying teratogens – chemicals that interfere with development. The most well known teratogen is thalidomide. In fact, a lot of the work we were doing with vitamin A and alcohol involved many of the same pathways that were disrupted by thalidomide.
One of the important parts of development is controlled by a complex of genes called Hox genes. These do a lot of things, but one of the most important things they do is define what parts of the embryo will become the front and back, the top and bottom and the near and far.
OK, now that we have 3 paragraphs of background, here’s the story. There was one seminar we went about Hox genes. The research being done was trying to assign specific activities to Hox genes by knocking them out. But, because Hox genes are so redundant, knocking one of them out doesn’t actually change much. There was nothing really wrong with the single knockouts this lab was studying. So, they ended up knocking out two Hox genes. At that point most things still worked, except… 2 vertebra switched places.
That story has always stuck with me, because, you have these genes that are so important they exist in everything from worms to humans. And they’re so vital that higher vertebrates like humans have the same set of genes duplicated across 4 different chromosomes. You knock out two of these vital developmental genes… and the only real evidence of anything happening is two vertebra switch places.
Recently I’ve been blogging about how to troubleshoot delivery problems. And I realised that a lot of how I treat delivery problems is influenced by my time in research. Much of how I troubleshoot starts with the premise that the things we’re testing aren’t independent variables. Everything, or almost everything, is conditional.
Email filtering, particularly that driven by machine learning, is closer to molecular biology than I realised. We can imagine each individual rule like it’s a gene. And these genes all work together and, in some cases, modify each other. Some rules don’t get activated unless another rule is active, or inactive. In some cases, one rule is so dominant none of the other rules matter. For instance, if an IP is listed on the SBL, your mail is blocked, no questions asked. But, if the sending IP isn’t listed, then hundreds of rules act on the message. Or, on the other end, if a user has a rule that says “always deliver this to my inbox” none of the rules matter, that message will always go to the inbox.
Filtering variables aren’t independent. In order to troubleshoot delivery problems, we need to start looking at the whole picture and the whole system. We can’t troubleshoot things in a vacuum.