Showing posts with label birthdays. Show all posts
Showing posts with label birthdays. Show all posts

Thursday, 29 September 2016

For a Few Birthdays More

Okay, maybe it CAN be worse than Paint Your Wagon.

It's that time again. Time to solve some math problem involving birthdays. I've previously solved the Classic Birthday Problem, as well as the Near-Match, First Match (and First Near-Match), Same Birthday as You, and Matched Couples (and larger groupings) variations.

I'm going to call this the Cover All Birthdays variation. We're looking for the probability that a group of people has every day of the year covered for birthdays.

Let's start by looking at a simpler problem. Suppose that we're tossing rings at some pegs. In this scenario we're assuming that every peg has an equal probability of being hit and that every ring must land on a peg. For any given numbers of n rings and k pegs, what's the probability that every peg is hit at least once? It is impossible to hit k pegs with fewer than k rings, so there is an implicit assumption in this analysis that n k.

With only one peg, the problem is trivial because all rings necessarily land on that one peg. There is only one possible outcome irrespective of n: the probability of hitting all pegs is 100%.

Since rings can land on any peg, including on the same peg multiple times, the total number of possible outcomes in this ring toss scenario is equal to k to the n-th power. To calculate the probability of hitting all pegs we need to count all of the ways that we could have at least one lone peg. So let's look at a two-peg situation now. There are 2 to the n-th power possible outcomes with the ring tosses. If we have one lone peg, that means there's just one peg left to hit. We already know that there's only one way to hit just one peg. But now, either peg could be the one that gets all the rings, so with 2 pegs there are 2*1 = 2 possible ways to have a lone peg. That means there are 2^n - 2 ways of hitting each peg at least once. The probability of covering all the pegs is therefore [2^n - 2] / [2^n] = 1 - 2^(1-n).

Number of ways to throw n rings at 2 pegs, hitting all pegs at least once.

Probability of hitting all pegs at least once when throwing n rings at 2 pegs.


What happens when we add in a third peg? Again, we could have all the rings land on one peg. This time there are 3 different pegs that could get all the rings, so there are 3 * 1 = 3 ways to hit only one peg. We already figured out that there were 2^n - 2 ways of covering 2 pegs, but now there are three pegs we could choose the two from, or C(3,2) = 3 ways to pick two out of three pegs. That gives us a total of 3 * (2^n - 2) ways of covering exactly two out of three pegs. We're left with 3^n - 3*(2^n-2) - 3 ways of covering all the pegs, out of a total 3^n permutations. Therefore, the probability of covering all three pegs is:


Number of ways to throw n rings at 3 pegs, hitting all pegs at least once.

Probability of hitting all pegs at least once when throwing n rings at 3 pegs.

Let's now define a function F[n,k] as the number of ways of hitting each of the k pegs at least once from n tosses. That recursive relationship looks like this:

Recursive relationship defining the number of ways you can throw n rings hitting all k pegs at least once.

Probability of throwing n rings and hitting all k pegs at least once.

As you keep adding pegs you'll see that the equations quickly start to get lengthy. Using the recursive relationship we could take a brute force calculation approach to evaluate F[n,k], but that quickly gets to be very inefficient as k increases. We'd like a different form of the equation. If we expand and simplify for k = 4 and k = 5, we get the following:

Expanded and simplified expressions for F[n,k] and P[n,k] for k = 1 to 5.

Just for fun, let's plot those out as well so you can see how many tosses you need to have a reasonable chance of hitting all of the pegs.

Tabulated values of the probability of throwing n rings and hitting all k pegs at least once

Graph showing the probability of hitting all pegs at least once with n tosses.

In order to have at least 50% chance of hitting all of the pegs, you need only 2 tosses when there are 2 pegs, 5 tosses for 3 pegs, 7 tosses for 4 pegs, and 10 tosses for 5 pegs.

Now, going back to the expanded equations we wrote out, you might've spotted a pattern. We can define F[n,k] in closed-form using binomial coefficients and summation notation as follows:
  
Solution to the recursive relation for F[n,k].


Okay, so how does this relate to the Cover All Birthdays problem? Well if instead of n rings we have n randomly selected people and instead of k pegs we have 365 possible days a person could be born on, the problem is modeled exactly the same way. There are 365^n permutations of birthdays in a group of n people and there are F[n,365] ways of covering each possibility at least once:

Number of birthday combinations in a group of n people where every day of the year is covered at least once.


Dividing by 365^n to get the probability of covering all birthdays, we can make the following graph:


Probability of covering all 365 birthdays in a group of randomly selected people.

Size of group required to achieve the given threshold probabilities of covering all 365 birthdays.


For n < 365 we know the probability of covering all the birthdays is exactly zero. For n = 365, there's less than 1 in 6.873×10¹⁵⁶ chance of covering every day of the year. If we double that to n = 730, we increase the probability by a factor of nearly 3.45×10¹²⁸, but that's still a negligibly small chance of about 1 in 1.993×10²⁸. Chances of covering all birthdays are vanishingly small until approximately n = 1400. We need at least n = 2287 to exceed a 50% chance of covering all birthdays and at least n = 3828 to give us better than a 99% chance.

Saturday, 10 October 2015

A Fistful of Birthdays

What? Well, it can't be worse than Paint Your Wagon.

To commemorate my birthday, I'm going to solve yet another variation of the Birthday Problem. So far we've looked at the probability of any two people in a group having the same birthday, the probability of having a "near match", the probability of the first match or near-match as people join a group one at a time, and probability of matching or nearly matching a specific birthday (such as your own).

Suppose instead of looking to match any pair of individual birthdays, you want to match a pair of combinations of birthdays, like couples' birthdays or entire families' birthdays. For instance, how unusual would it be to discover that a couple has matching birthdays with another couple at a party?

Let's start with a little review from the classic birthday problem. What are the chances that two randomly selected people have the same birthday? Assuming that birthdays are evenly distributed among the 365 calendar days (let's neglect those leap-year birthdays) and everyone's birthdays are independent of each other, the probability is 1/365 (about 0.27%).

Well what about matching couples birthdays? We start by calculating the number of different possible combinations of birthdays that exist for two randomly selected people. Using the formula for combinations with repetition (Equation 1), where n = 365 days and m = 2 people, I calculate there are 66,795 possible combinations for a random couple. So there's a 1 in 66,795 (about 0.0015%) chance that two randomly selected couples will share the same birthdays between them.

Equation 1

This assumes that the order is not important; it is only necessary that the same birthdays be shared by both groups. For example, A1 & A2 are born January 3rd and February 5th. B1 & B2 would be considered a match if B1 is born January 3rd and B2 is born February 5th. B1 & B2 would also be considered a match if B2 is born January 3rd and B1 is born February 5th.

If there's a 1 in 66,795 that two couples have matching combinations of birthdays, that means there's a 66,794/66,795 probability that two couples don't have matching combinations of birthdays. Having taken up two combinations of birthdays with couples A and B means that a third couple, C, has a 66,793/66,795 chance of not sharing their combination of birthdays with either A or B. The probability of no shared combination of birthdays in the group of three couples is [66,794/66,795] * [66,793/66,795] (about 99.9955%). To find the probability of at least one successful match, simply subtract the probability of no shared birthdays from 100%. So about 0.0045% chance of a shared combination of birthdays among three randomly selected couples. Generalizing now to n randomly selected couples, we get this complicated-looking formula:

Equation 2: Probability of at least one shared combination of birthdays among n randomly selected couples.

While Equation 2 is correct, you're dealing with huge numbers that typical calculators simply can't handle, so it'd be good to have a different version of this expression. Using pi notation (product of a sequence), we can rewrite Equation 2 to get a version that's computationally easier to work with:

Equation 3: Alternate expression of Equation 2

And with that, we can crunch the numbers to figure out how many random couples you need to invite to the party to have a reasonable chance of a matching combination of birthdays.

Minimum number of couples required for a specified probability of at least one pair of couples with matching birthday combinations.

With 13 couples you have a little less than 0.12% chance of a match. You need 38 couples before there's at least 1% chance of a match, and 120 couples to have a 1 in 10 chance. At least one matching combination becomes more likely than not if there are at least 305 couples. There's less than 10% chance of not having a match if there are more than 555 couples. At 1,355 couples, you're practically guaranteed (better than 99.9999% probability) at least one match. Below is a pretty graph from n = just 1 couple (guaranteed no match since there are no other couples) all the way up to 66,796 couples (guaranteed to have at least one match by the pigeonhole principle).

Figure 1: Probability of at least one pair of couples with matching birthday combinations
in a group of
n randomly selected couples.
Note: horizontal axis is plotted on logarithmic scale. 

In case you were wondering, this analysis doesn't apply to groups of twins.
So the probability of finding at least two couples with matching birthday combinations is pretty unlikely in common scenarios like a dancing class or the office Christmas party, but there could be a reasonable chance at very large gatherings like a traditional Mediterranean wedding, and it's practically guaranteed at stadium-sized events like a concert or professional hockey game.

What if we want to know about matching birthday combinations involving a larger grouping, like a family of 4? Well if we go back to Equation 3 and replace the value 66,795 with the right-hand side of Equation 1, we'll have a general expression to calculate the probability that, among n randomly selected subgroups of m people, there's at least one pair of them having matching birthday combinations. That general formula looks like this:

Equation 4: General expression to find the probability of at least one pair of m-sized subgroups with matching birthday combinations among n subgroups.

So if you're interested in the probability of at least one pair of matching combinations of birthdays among 25,000 randomly selected families of 4, set m = 4 and n = 25,000 (p  34.0%). I've expanded the previous plot for m = 3, m = 4, and m = 6.

Figure 2: Probability of at least one pair of m-sized subgroups with matching birthday combinations
in a group of 
n randomly selected subgroups. 

As you probably guessed, as m increases, you need an increasingly large number of n subgroups to get to a reasonable probability of a match. It also quickly gets too difficult to solve with a personal computer unless you start using a more manageable numerical approximation. For m = 7 and up I switched to using Ramanujan's approximation of the factorial function (which is quite accurate) and used logarithms to reduce the size of the numbers involved.
By taking logarithms and using Ramanujan's approximation, the problem of solving for p begins to look more complicated, but is actually much easier for your computer to handle.

Fun fact: using logarithms simplifies math problems because exponentiation reduces to multiplication, multiplication reduces to addition, very large numbers become much smaller, and all the numbers you're working with become more similar in size. Before calculators and computers, most scientists and engineers were quite familiar with this trick and regularly referred to printed handbooks containing tables of logarithms in order to complete their calculations.


Equation 5: Calculate p after you've found the approximate solution to ln q.
I used the above approximations to calculate ln q and the false position method to find the minimum value of required to have either a 50% or 95% probability of finding a matching pair:

Minimum number n for 50% or 95% probability of a matching pair of birthday combinations for m-sized subgroups

I also decided to work out what's the largest subgroup size m where we would have a reasonable chance of finding a matching combination of birthdays in another subgroup, assuming all subgroups are selected from the entire population of Earth. Set n = 7,300,000,000/m, vary m, and solve for the probability of finding at least one matching pair on Earth.

Just based on the pattern observed in Figure 2, I was able to extrapolate and guess that the largest subgroup size m would probably be 9 (there's almost an order of magnitude increase in n to get to the steeply sloped part of the curve as m increases by 1). In fact, for m = 8 and n = 912,500,000 random groups of 8, the probability of a matching pair of birthday combinations is infinitesimally close to a mathematical guarantee (slightly more than 99.999999999999999999964%). For m = 9 and n = 811,111,111 random groups of 9, the probability of a matching pair of birthday combinations is just over 60.99%. For m = 10 and n = 730,000,000, the probability of finding a matching pair of birthday combinations is a little under 2.02%. At m = 11 and n = 663,636,364, the probability of finding a match is about 1 in 2,024.

In summary:  Finding a pair of matching individual birthdays at the office: fairly likely. Matching couples' birthdays at a party? Maybe, but it'd probably need to be a pretty large party. Matching the combination of birthdays in a group of 10 or more? Nearly impossible, even if the whole planet is involved.

Friday, 9 May 2014

The Birthday Problem: Round 4

We've looked at the probability of any two people having the same birthday in a group. We've also looked at the probability of two people having a "near match" in a group. We then looked at the probability of the first match or near-match as people join a group one at a time. But what happens when we want to match (or nearly match) a specific birthday (your own for example).

Matches to your birthday will follow a binomial distribution, which has a probability mass function:

Equation 1

where q is the number of trials, r is the number of successes, and p is the probability of success. If we're trying to match your birthday in a random group of people, r is any whole number greater than zero, p is (2k+1)/365 chance of an individual matching your birthday to within k days, and q is one less than the number of people in the group. If we substitute (n-1) = q, (2k+1)/365 = p, and r = 0 into Equation 1, we'll get the probability of NOT matching (or nearly matching) your birthday in a group of n people. That gives us Equation 2 below:

Equation 2

Subtract from 100% to get the probability of at least one match to your birthday within k days among a total of n people. After simplifying, we get Equation 3:

Equation 3
Plotting for different values of k and n (k = 0 if you're looking for a same-day birthday match) gives a graph like this:


It's hard to see the full picture with both k = 0 and k = 30 on the same plot, so here's another one:


We can also make a table showing the minimum size of the group for a corresponding probability that your birthday will be matched or nearly matched:


As you can see, matching your birthday is far less likely than matching any birthday. The results might be surprising though. Recall that we need only 23 people for there to be better than a 50% chance that two people in the group had matching birthdays. But in a group of you and 22 other people, there's only a 5.86% chance that your birthday is shared with someone else in the group. You need at least 253 other people in the group before it is more likely than not that your own birthday is shared with someone else.

We can also repeat the analysis from the third birthday post and figure out, if people entered randomly one at a time, who would be most likely to match your birthday. That perhaps isn't so interesting because the first person to enter after you always has the greatest chance of being the first person to match your birthday. The incremental probability of matching your birthday in a group of n versus n+1 people (i.e. the probability that the next person matches your birthday) is equal to:

Equation 4

It can be shown that Equation 4 is maximized by n = 1, irrespective of k. Therefore, the first random person joining you at the party is the most likely person to match your birthday, with a probability of a match simply equal to:
Equation 5

For completion, here is the plot of equation 4 and the tabulated data to show the probability of the first person entering the group after you matching your birthday.



In short, matching a specific birthday in a random group of people is uncommon and generally requires a large sample, but it's not unusual for any two people in a small group of random people to share a birthday. And now there will be no more talk of birthdays for a long time.

Friday, 2 May 2014

Another Variation of the Birthday Problem

In my first post on the Birthday Problem we determined the probability of there being a shared birthday in a group of n random people. In my second post, we expanded that to look at a more general problem of having a birthday within k days time of another birthday in a group of n random people.

This time, we'll be looking at the problem of determining which person in a list is most likely to share a birthday with the people before him. In other words, if randomly selected people enter a room one at a time, which person is most likely to be the first one to share a birthday with someone already in the room? Perhaps you'd like to know because you're looking for a gambling opportunity, betting your friends on which person arriving at a party will share a birthday with someone already there.

The probability of there being a shared birthday among a randomly selected group of n people can be calculated from the expression below:

Equation 1

Suppose now we add one more person. The probability of there being a shared birthday in a group of n+1 people is:

Equation 2

The incremental change in the probability of there being a shared birthday in a group of n people versus n+1 people is the probability that the (n+1)th person is the first person to share a birthday with someone already in the group. So subtracting the first expression from the second gives us the probability that the (n+1)th person added to a group of n people will share a birthday with someone already.

Equation 3

Now if we want to know what value of n maximizes this probability, we can plot the function to find out. I've done that here:



The peak is at n = 19. So, at 3.23%, the 20th person to enter a random group of people has the greatest chance of being the first one to share a birthday with someone already in the group. 

Suppose now we repeat this analysis for the more general case of birthdays within k days time of each other. The probability of there being birthdays with k days of each other among a group of n people is:


Equation 4

And in a group of n+1 people, the probability is:

Equation 5

Just as before, we can subtract Equation 4 from Equation 5 to get the probability that the (n+1)th person is the first person with a birthday close to one already in the group:

Equation 6

Plotting this beastly looking function will allow us to see which person entering the room is most likely to be the one having a birthday close to a birthday of a person already at the party. 



The table below summarizes the information from all the peaks in the above figure.


There you have it. As random people enter a room, the 3rd is most likely to be the first to have a birthday within a month of a person already at the party. The 6th is most likely to be the first to have a birthday within a week. The 12th person is most likely to have a birthday within one day, and the 20th is most likely to have the same birthday as someone already at the party. 









Friday, 25 April 2014

More Fun with Probability and Birthdays

In my previous post I examined the probability of there being a shared birthday in a random group of n people. Let's take this analysis a little further and look at the probability of there being birthdays within a certain proximity to each other, say for example within 4 days time. After all, it is common to hold birthday celebrations on a weekend when they're more convenient, so two people with a birthday within 4 days of each other could conceivably choose to celebrate their birthdays on the same day. This problem gets a little more complicated than the same-day birthday. The probability of two randomly selected people having a birthday within 4 days of each other is 9/365 (about 2.47%), because B could have his birthday on the same day as A, or in any of the four days before, or in any of the four days after. Just as before, it is simpler to calculate the probability of there being no shared birthday as we increase our group size and then subtract from 100%. When we add a third person we now have two possibilities. One possibility is that A and have their birthdays at least 9 days apart, in which case there are 18 days where C can't have his birthday. A second possibility is that A and have their birthday more than 4 days apart but less than 9 days apart, in which case there is some overlap and there could be as few as 14 days eliminated for C. This brackets the probability of there being no birthdays close together in a group of three people to somewhere between 6.20% and 7.28% (exact solution is 7.26%; formula is given a few lines down). Adding a fourth similarly means there are as few as 19 and as many as 27 days eliminated, bracketing the probability of there being no birthdays close together in a group of four people to between 11.09% and 14.13% (exact solution is 14.08%).

There is an exact solution that accounts for the probability of the overlaps in the spacing between birthdays, but the explanation of how that's accomplished would be a little math intensive and there's already more than enough math in this post. Just take my word for it that the total possible permutations of n people's birthdays spaced at least k days apart from each other is equal to:
which, in factorial notation, looks like this:

With this result we can move on to calculating the probability that no birthdays in a group of random people are within k days of each other. Subtract that from 1 and you get the the probability that there is at least one birthday in the group within k days of another person's. That equation looks like this:

It looks more complicated than the formula for probability of same-day birthdays, but the general form is the same. It converges to 1 pretty rapidly as n increases. And the larger k is, the faster it converges. If we take k = 0, the equation reduces to the earlier expression for birthdays on the same day. In the graph and table below you can see the results for yourself.



As you can see, among just 10 people there's a pretty good chance there are two or more birthdays in the same week. What I think is pretty cool is the huge difference between k = 0 and k = 1. For instance, in a group of 25 people, there's about 57% probability that two people have the same birthday, but almost 93% probability that two people have birthdays within one day of each other (i.e. on the same day or on consecutive days). Test it out among your co-workers or a sample of your Facebook friends and see for yourself. If you're clever, you might even be able to use this knowledge to make some bets and relieve a few naive people of their money.