Lately, I have read a book called “Anomaly!” written by Tommaso Dorigo. The book is about a part of the history of the CDF Experiment at Fermilab, more or less spanning the period between its inception in the early 1980s to the end of the first part of data taking (2000). This spurred my intention to look back at the mistakes that are routinely taken during the scientific process, a fundamental ingredient without which science would never advance. Still, one would never publish articles to describe the things that went wrong. So let me tell you this story. I won’t be able to show you any plot (these are intellectual property of the collaboration) but I’ll do my best to help your imagination.
Hadron jets and boosted particles
Before I tell the story, I need to make a short introduction to clarify what I was working on. To start with, in a collider experiment, two beams of particles either go around in circle (in the case of a synchrotron like the LHC) or along a straight line (linear colliders such as the future ILC). The particles can be protons, antiprotons, electrons, positrons or possibly even muons. In the case of the LHC, the two beams are made of protons, which are made to collide in four points around the circumference. Built around each “interaction point” there is a detector, a sort of huge camera that takes a 3D picture each 25 ns.
The most likely outcome of these collisions are collimated sprays of a certain class of particles called hadrons, among which protons, neutrons, pions, kaons and other short-lived entities. Commonly, a clustering algorithm (the reader may be familiar e.g. with k-means or even FastJet) is executed to separate these “sprays” into well-defined objects called jets. A jet is not a real particle although corresponds roughly to the fundamental particle that originated it during the collision. Typically, these are either quarks or gluons, which cannot be observed alone because of the way the strong force behaves. Similar to a spring or an elastic band, the more you try to separate two quarks, the more energy you add to their system. At some point the energy will be enough to “snap” their connection and pull a pair of quark/antiquarks out of the quantum vacuum. The result is the creation of two hadrons. The process goes on until there is not much energy left, and one ends up with the sprays of particles I mentioned above.
My research involved the study of fundamental particles called Top quark (discovered at Fermilab in 1995) and Higgs boson (discovered at CERN in 2012) created with so much energy that they are kicked out of the interaction point with a velocity close to the speed of light. Shortly after being created, these particles decay into other species. If these are in turn quark and gluons (as in the case I was studying), they will originate hadron jets of a particular class that I will refer to as boosted jets without much further explanation. Suffice to say that by looking into the pattern of energy distribution inside these jets one can distinguish between a genuine top or Higgs jet, and one created by some other process that would appear as a false positive. These boosted jets are at the center of many searches for new phenomena not described by the currently most-accepted theory called the Standard Model. Anything new would be considered a huge step forward but unfortunately so far nobody has ever obtained uncontroversial evidence on this front.
A serendipitous discovery or just a wrong calibration?
The first time I thought I have discovered a new phenomenon I was trying to calibrate the parameters of an algorithm that classifies (“tags”) boosted jets into either originated from a top quark or not. My idea was to select events in which a pair of muons with opposite electric charge are present, and whose invariant mass is consistent with that of one of the carriers of the weak force called the Z boson. This particle is easy to identify and is often regarded as a standard candle in the field of particle physics. Then, I selected a sub-sample of these events in which the Z was recoiling against a boosted jet. I thought it was a neat idea since this way one could calibrate the momentum of the jet quite precisely. At this point, I wanted look at the tag of the jet. Since events with a top quark produced in association with a Z are predicted to happen with an extremely small probability by Standard Model calculations, any of them appearing in my sample would necessarily be a false positive. And in fact there were a few. I was very happy! Next, I started to characterize their kinematics in order to provide what are called calibration constants. And then there was trouble. I noticed that some of these jets exhibited a distribution of their masses that was unusually high and peaked around a value of 500 times that of the proton (i.e. 500 GeV). I also noticed that their momentum distribution was very different from what the simulations predicted, indicating a missing component that to my eyes really looked like the production of an unknown particle.
I discussed the matter with a colleague, who was intrigued but not as excited. This person suggested me to cross-check the result using another simulation obtained with a program called Sherpa that is known to give a more accurate description of a process called Z+3jets compared to the one I used (called POWHEG). If you haven’t heard of this story is because the suggestion was correct: the Sherpa simulation accounted perfectly for the missing component, that was in fact due to events containing a Z boson produced with very high momentum and in association with three other particles (quarks and gluons) that were reconstructed as three jets so close to each other that they appeared as a single boosted jet. I put the bottle of champagne back in the fridge.
Do not trust QCD
The second time I thought I discovered new physics I was explicitly looking for new physics, which is a recipe for disaster. In fact, I’ve always steered clear of searches (as this kind of analyses are called in jargon) because of the intrinsic bias. Over many years, people devised lots of statistical tools to avoid fooling themselves. Still, I didn’t want to trust my own judgement too much.
At the time I was supervising a student who was doing a great job looking for events in which an hypothetical particle called vector-like top quark was created. In a fraction of the cases, this particle is predicted to decay into a top quark and a Higgs boson with high momentum. Armed with my experience with boosted jets, we started looking for this kind of signature in the data.
Soon I realized that there was a number of events that featured a “bump” in the mass distribution of the more energetic boosted jet around 100 GeV. The feature could be made more evident by selecting a sub-sample of events in which the jet was tagged as originating from a Higgs boson and to contain at least one B-meson. That was the tell-tale sign of a particle decaying to a pair of bottom quarks. According to my expectation, there should have been none in the sample under scrutiny.
If you haven’t heard of this story is because my student was keen enough to run a simulation of QCD multi-jet events with high momentum. Despite the fact that this kind of process is known not to be simulated correctly due to difficulties to describe how the mass of the jets arise from more basic interactions, the distribution obtained matched almost perfectly the real data. Does this account as an explanation? I think so under any reasonable standard. Still, if you asked me to explain exactly what happened, I can’t really say. Probably, the tagging algorithm picked up events in which the energy of the boosted jet was concentrated in a narrow region around its axis, and whose mass fell into the accepted “mass window”. I’ll never find out for sure.
I wanted to share these two stories because I think that one can learn more from a mistake than from a success. Now that I am not part anymore of a scientific collaboration, I feel like I am in a position to urge former collaborators to do the same when possible. This is true for science but certainly for technology as well, where people working on neural networks and other wonders keep hitting their heads against a wall until something good happens. But the beauty of the process is in fact in the hundred times it didn’t work out. It’s by understanding why, and where our biases are, that we make real advancements.