According to folklore, more babies are born under a New Moon. I was also said that during those days midwives have to book for more shifts. I think this is a pretty simple statement that can be tested (falsified) with some basic statistics, the main problem being where to find good data. My answer? Facebook, of course!
First of all, what is a New Moon? It is the first phase (shape of the illuminated portion) of the Moon, and it happens when our satellite lies closest to the Sun in the sky as seen from the Earth. The mean time interval between new moons, called synodic month, is approximately 29.5 days long.
Getting the data
To set up the analysis, we need two main components: one is a statistically significant collection of our friends’ birthdays, the second is tool to match each birthday with the corresponding Moon phase.
Let’s start with the second, which is easier to deal with. According to wikipedia, we can make use of an approximate formula. However, I decided to rely on a python packaged called Astral. You can download it for free their GIT repository. The usage is pretty basic
from astral import * import pytz import datetime here = pytz.timezone( "Europe/Rome" ) moon_day = a.moon_phase(date=datetime.date(year, month, day), tz=here) phase = moon_day / 7
You might need to install pytz and datetime as well (sudo easy_install <your package>).
The tricky part is to get a list of birthdays from Facebook. I’ll try to cut a long story short: one has to make use of GraphAPI to query the database and get a JSON array. I tried to put the instructions in my code, but I couldn’t find a quick way to make it work (dear app developers: how do I get the user access token?). Eventually, I copied&pasted the result from the GraphAPI web interface to a simple text file. To handle the array, I made use of Python’s JSON module
import json data = open("birthdays.json").read() birthdays = json.loads(data) for friend in birthdays['data']: if not friend.has_key( "birthday" ): continue name = friend['name'] fulldate = friend["birthday"].split("/")
What to Look For
This is of course the core of our analysis. The general statement can be translated to a “bump hunting”. This is a very important topic especially in particle physics, but I think it might by quite an overshoot for our need. At least for this post, I decided to use a simplest approach: to test a small number of models against our data. So what models do we want to test? Here’s my list:
- Constant function. There is no effect, birthdays are uniformly distributed and folklore is simply wrong.
- Main model: birthdays are more probable in the first synodic days. I’ll describe this distribution as a narrow gaussian peaked at zero with mean 3 days + a constant function to account for the bakcground
- A less strict model: same as the main “folklore” model, but I consider the width of the gaussian as one of the parameters to be fitted.
- Periodic function. Maybe there are periodicities in the distribution that can be described in general by a cosine.
To perform the fits, I decided to use CERN’s ROOT, a powerful open-source analysis framework. This might also be accomplished with Excel, but to my understanding fitting non-linear functions is not trivial.
ROOT can be programmed in C/C++ or Python. The latter interface is just what we need. With this tool one can define histograms and fit functions, perform the fit, retrieve the parameters and estimate the level of agreement.
h_counts_fine = TH1F( "counts_fine", "counts (fine)", 28, 0.5, 28.5 ) for friend in birthdays['data']: moon_day = a.moon_phase(date=datetime.date(year, month, day), tz=here) h_counts_fine.Fill( moon_day )
I performed three fits on the same data, aggregated in three different manners. First of all, I divided the birthdays in four categories, one for each Moon Phase (New Moon, First quarter, Full moon, Last quarter). This means that I re-binned the histograms to have exactly four bins. The result is quite clear: one can force the “folklore model” to fit the data, it works but with a penalty with respect to the simpler constant function (uniform birthdays distribution hypothesis).
Anyway, one can ask whether our assumptions are too strong. Maybe a bump in the first phase is actually there, but our wide-binning is hiding it. More bins mean higher statistical uncertainty, but let’s see what happens with 7 bins.
We get an interesting result: we can force the folklore models to fit the data, but the signal strength goes basically to zero! Otherwise, we can fit the parameters of the gaussian and lower the background. In this case, though, the signal width is as large as a synodic month. The periodic function shows a similar behavior.
Last but not least, a very fine binning (14) might yield some more insight in the actual distribution. It turns out that the conclusions are the same: no peak is found with our statistics.
Does it debunk completely the claim? My honest answer is “not exactly, but it’s very likely”. First, I had access to 195 data points only. Maybe they are enough, but someone more popular than me on social networks can lend me a larger dataset.. Second, strictly speaking, the statistical probabilities (p-values) we obtained did not reject any model. A serious bump-hunt would also assess the significance of the highest peak found. It could also be interesting to split the data according to the year, the month, and look for broad-range periodicities.
p.s. In case you’re wondering, yes, I was born under a New Moon…