## Sunday 27 July 2014

### Monte Carlo Simulation of Western 649

Someone recently expressed disbelief to me that Western 649 is actually a smarter gamble than a typical 50/50 draw, so I decided to run a short Monte Carlo simulation to prove that my previous analysis based on probability theory was correct. I used Excel to generate 500,000 random numbers between 1 and 13,983,816 to simulate one Western 649 draw with 500,000 random combinations played. The criteria for determining which of my simulated numbers were winners were as follows:

A total of 3,000 draws (each with 500,000 random combinations played) were simulated. The tables below summarize the results of my simulation:

 Number of winners in the simulated Western 649 draws

 Summary of WCLC's profit and house edge, and Western 649 expected value

In the first table, we see that the expected values generally agree well with the averages from the simulation, apart from there being a few more jackpot winners than expected from probability theory. At most I had three winners sharing the $50,000 prize, which from my previous analysis I calculated had about 1 in 753 chance of occurrence in a draw with 500,000 random combinations. I also had two winners sharing the$1,000,000 prize in one of my simulated draws, which I calculated previously to have about 1 in 1,621 chance of occurrence.

### Probabilities and Prizes

There are 5 different ways to win a prize playing Western 649, ranging from $10 to$1,000,000. The first and second prizes are capped to a total of $1,000,000 and$50,000 respectively, meaning that if there were multiple winners, the prize gets split between them.

There are 13,983,816 unique combinations of six numbers that can be chosen from the numbers 1 to 49. That number of combinations of items you can choose from a set without repetition is calculated using the following formula:

where k is the number of items chosen, n is the total number of item to choose from, and ! denotes the factorial function. In this case, n = 49 and k = 6.

The probability of matching some number of the winning 6 can be calculated from the hypergeometric distribution. The formula looks like this:

 Equation 1 (general lottery prize probability formula)

where p is the probability of winning, n is the size of the set of numbers the winners are drawn from, k is the size of a combination, and r is the number of matching numbers required to win the prize. For example, to calculate the probability of matching four numbers in Western 649, r = 4, k = 6, and n = 49. Plugging in the numbers gives the probability of approximately 0.0969%, or about 1 in 1,032.

The second prize adds a complication because of the bonus number. The formula to calculate the probability of matching r out of k winning numbers plus a bonus number (also drawn from the pool of n numbers) is a slight modification of Equation 1:

 Equation 2 (general lottery with bonus number prize probability formula)

Here is a table summarizing Western 649's payouts and their probabilities:

Since matching 5 out of 6 numbers has two different prizes depending on whether or not you match the bonus, the probability has to be adjusted for overcounting. The probability of matching 6 of 6, 4 of 6, or 3 of 6 numbers is calculated from Equation 1. The probability of matching 5 of 6 plus the bonus is calculated from Equation 2. The remaining prize, matching 5 of 6 but not the bonus, is calculated from Equation 1 minus Equation 2.

The last thing we need is the probability that your wager is just money thrown away. That's 100% minus the total of probabilities of winning the prizes in the table above, which works out to exactly 245,057/249,711 (about 98.1%). This means that a Western 649 number combination has only a 1.9% chance of not losing.

### Expected Value

The precise expected value is a little tricky because the payout on the top two prizes depends on how many winners there were. Let's first start by calculating the contribution to the expected value coming from the guaranteed prize amounts. Each contribution is simply the net gain (prize minus wager) multiplied by the probability.

Next, lets calculate how much the losing tickets contribute to the expected value. Again, it's the net gain multiplied by the probability, or (-$1) * (245,057/249,711). That's about -$0.9814. At this point, the expected value of a ticket, ignoring the top two prizes, is about -$0.5321. The last thing we have to do is figure out how much the top two prizes contribute to expected value. That depends on how many winners there are. We don't know how many winners there are, but if we know how many combinations are played, we can estimate how likely there are to be 0, 1, 2, etc. winners. This is calculated using the binomial distribution. If the probability of winning the prize is p, the probability of not winning the prize is (1-p), and N combinations are played, then probability of there being s winners of the shared pot is: What do we need this for? Well, because the prize depends on the number of winners and the number of winners is a probabilistic function of the number of tickets sold. Therefore, we need to estimate an expected value of the prize. I've estimated that the number of combinations played in each Western 649 draw is on the order of 500,000. Accordingly, I've estimated the expected top prize based on there being up to 5 winners (more than 5 winners is too unlikely to be worth consideration). I've also checked N = 100,000 and N = 2,000,000 to show that the final result is not that sensitive to the accuracy of my estimate of N.  Probability of s winners of the top prize in a draw where N random combinations are played.Expected prize amount estimated based on N combinations played. The expected value of the top prize is$1,999,373.32 if 500,000 combinations are played in each draw. Remember that we're working with normalized values, so that's the prize money for a winning combination divided by the cost to play one combination. So for the top prize, the effect of prize sharing is pretty small, assuming my estimate of 500,000 combinations played per draw is reasonable. As you can see from the table, even if I was off by a factor of four and 2,000,000 combinations are played, the expected prize is still within 0.5% of the single-winner prize amount.

Working through the same process for the second prize, this time considering up to 8 winners because the likelihood of multiple second prize winners is higher, we get the following table:

 Probability of s winners of the second prize in a draw where N random combinations are played.Expected prize amount estimated based on N combinations played.

You can see from the difference between N = 500,000 and N = 2,000,000 that the expected payout for the second prize is little more sensitive to the accuracy of my estimate of N. But it is still not excessively sensitive, so our estimate is probably still pretty close to correct.

Now that we have the expected prize amounts for the top two prizes, we can finally calculate the contribution of the top two prizes to the expected value of a single combination played. Based on 500,000 combinations played per draw, the top prize contributes ($1,999,373.32 -$1) * (1/13,983,816) = $0.1430 to the expected value. The second prize contributes ($98,977.43 - $1) * (1/2,330,636) =$0.0425 to the expected value. Summing up all components, we get the expected value of -$0.3467 per$1 wagered. This means that, on average, for every dollar you spend playing Western 649, you're giving about 34.7 cents to the WCLC.

What happens if our estimate of N is wrong? Here's a table to show you:

Basically, the expected value gets worse with increased ticket sales because of prize sharing. However, as long as we're within a reasonable range of estimates of N, the expected value of Western 649 doesn't stray far from about -$0.35. From our analysis we can draw the following conclusions: 1. A Western 649 number has less than 2% chance of winning anything. 2. Western 649's expected value is a function of the number of tickets sold, but is probably around -$0.35 per $1 wagered. 3. Based on expected value, Western 649 is actually one of the smarter purchases among different lottery tickets. The expected value is significantly better than say Pick 3 or Extra. ### Addendum In response to someone's disbelief in the accuracy of my analysis, I've verified the above results with a simple Monte Carlo simulation in this post. ## Saturday 19 July 2014 ### Gambling and Expected Value: Encore (OLG) In this post on Gambling and Expected Value, we look at the "Encore" lottery offered by the OLG. Click here to find similar posts on other lotteries and games of chance. ### Encore (OLG) Encore is a lottery offered by the Ontario Lottery and Gaming Corporation (OLG). ### How the Game Works Encore is an add-on lottery you can play if you are already playing another lottery (Lotto 649, Lotto Max, Lottario, Ontario 49, Pick 2, Pick 3, Pick 4, or Daily Keno).The player chooses how many wagers he or she wishes to make (you can choose to play 1 to 10 combinations per ticket). The player doesn't get to choose their numbers though; a random string of seven digits (from 0 to 9) is chosen for him for each play. Winnings are determined by matching particular digits in the sequence. ### Probabilities and Prizes There are 22 different ways to win a prize playing Encore, ranging from$2 to $1,000,000 on a$1 wager. Figuring out their respective probabilities isn't too difficult, but does require some care to avoid overcounting. I've made the table below to help you see how to calculate how many possible sequences of numbers could win a particular prize.

Since the sequence consists of 7 randomly selected digits that can be repeated, there are a total of 10,000,000 (10 raised to the 7th power) possible number combinations that can be played. The probability of winning a particular prize is the number of ways to get a match divided by the total number of possible plays. For instance, the probability of winning the prize for matching only the last 4 digits is 891 divided by 10,000,000 (i.e. 0.00891%, or approximately 1 in 11,223). The table below shows the payout and the likelihood of winning for each of the 22 ways a player can win at Encore.

Subtracting the probabilities of the 22 ways to win from 100% gives the probability of losing at Encore. The probability of losing works out to exactly 891 in 1,000, or 89.1%.

Something you should note looking at the table of payouts and probabilities is that the prize doesn't consistently scale to the probability of occurrence. For instance, matching the First 2 and Last 4 digits has the same probability of matching the Last 6 digits, but the former pays only about 0.1% as much as the latter.

### Example

The following four combinations were played in an Encore draw where the winning numbers 5554321.
'A' = 5556789
'B' = 5556781
'C' = 7777321
'D' = 4321555

'A' wins a $10 payout for matching only the first 3 digits. 'B' wins a$12 payout matching the first 3 and the last 1 digits. 'C' wins a $10 payout for matching only the last 3 digits. 'D' loses. ### Expected Value The expected value of is the sum of the products of the probability of each outcome and the monetary gain of that outcome. The net monetary gain is the payout less the wager. Encore has 23 possible outcomes (22 ways to win and 1 way to lose), so the expected value of Encore is the sum of 23 terms. The expected value of Encore is:  Expected value calculation for the OLG lottery "Encore". The expected value of Encore is -$0.486 per $1 wagered. This means that, on average, for every dollar you spend playing Encore, you lose 48.6 cents. As always, it's smarter not to gamble at all, though an expected value better than -$0.50 per $1 wager is slightly better than most lotteries. Therefore, if you are going to buy lotteries anyway, Encore appears to be one of the better choices. From our analysis we can draw the following conclusions: 1. Encore has an expected value of slightly better than -$0.50 per $1 wager, which is comparable to other lotteries and typical 50/50 draws. 2. Accordingly, Encore is similar to other lotteries in that it you will generally lose money rapidly if you play. 3. Encore is neither the best choice, nor the worst choice, as far as OLG lotteries go. 4. Most of Encore's 22 ways to win have a very low probability of occurrence. 5. Encore's "22 ways to win" is a clever marketing trick to disguise the fact that there's an 89.1% chance you'll lose playing Encore. 6. Encore's prizes do not scale with their probability of occurrence. Most of the prizes are much smaller than one should expect given their probability of occurrence. ## Sunday 13 July 2014 ### Will Women Outrun Men? In 1992, Drs. Brian Whipp and Susan Ward of the University of California published an article in the journal Nature claiming that women would be beating the men at the marathon at the world elite level as early as 1998. In fact, they claimed that women will be outrunning the men in all the events by the middle of the 21st century. Needless to say, their predictions were bad. But that didn't discourage Tatem et al, a group of doctors from the University of Oxford, from publishing their own article in Nature in 2004. Tatem et al. essentially repeated the same analysis, and arrived at essentially the same conclusion, as Whipp and Ward. Tatem et al focused on the 100 m dash though and had a few more years worth of data to work with. However, they still concluded that women will beat men in the 100 m event at the 2156 Olympics.  Tika Gelana won the 2012 Olympic marathon in Olympic record time, but still 15 minutes behind the winner of the men's event. She would have placed 64th in the men's race. It's now 2014 and a woman has yet to beat all of the men at any event at any world-class track & field meet. Twenty-two years after the blunderous predictions of Whipp and Ward and women still aren't threatening to break any of the men's world records any time soon. So how exactly did these doctors arrive at such bad conclusions? They were guilty of a gross misuse of statistics. How did bad statistics get published in Nature in 1992? I don't know. My guess is that the authors lucked out and got a peer reviewer who also knew nothing about statistics. How did essentially the same argument get published in Nature again in 2004? I wish I knew that too. It was a "double fail" for Nature's peer review process. Anyway, I'm going to walk you through how to analyze world record progressions just like Tatem et al and Whipp & Ward, then provide some reasoning to demonstrate how ridiculous those authors' conclusions were. Step 1: Obtain the historical progression of world records from today to as far back as you can go. Whipp & Ward used records from the early 1900s up to 1992 (the IAAF started keeping records in 1912). Tatem et al used 1912 to 2004. I'll go from as far back as I can find data for, though arguments can be made for ignoring data prior to 1912. The main one is that the IAAF hadn't yet formed and therefore older records are not ratified as world records. A weaker argument is that Excel doesn't recognize dates from before the year 1900. That problem is easily overcome though, you just have to get a little creative with how you plot the dates. By the way, finding the data for this analysis was harder to do back when our learned doctors were drafting papers for Nature, but today we have Wikipedia and all the world record progressions (at least the ratified records anyway) are found with ease. For example, here's the men's 100 m record progression. I gathered my data from here, which includes many non-ratified records. I've cross-checked some of the results to verify accuracy using this extensive database of track results. Step 2: Plot the record progression on a graph in Excel. Plot either the result or the average velocity on the y-axis and the date the record was set on the x-axis. Do this for both men's and women's records. I've plotted the 100 m and marathon world record progressions below:  Normalized world record progression for the 100 m dash and marathon. Step 3: Use the "Add trendline" feature in Excel to add best fit lines to the men's and women's records. Here's my plot again, but with the trendlines added:  Normalized world record progression with trendline forecasting for the 100 m dash and marathon. Step 4: Extrapolate the best-fit lines and calculate the date when the men's line intersects the women's line. Boldly conclude that women will be beating men at the Olympics by the date you've calculated. I've calculated the intersections of all my trendlines and generated the table below:  Predicted dates for intersection of men's and women's world records. Step 5: Write up your findings in a short article and submit it for publication in Nature. Here is a list of conclusions you can pretend are supported by the data: 1. The 5000 m world record will be the last of the men's records to fall to the women. It probably won't happen until 2178. 2. Women's records will exceed the men's records in eleven of the thirteen events by the year 2050. 3. We are 20 years overdue for the intersection of the men's and women's records in sprint hurdles, based on average speed to complete the event (the sprint hurdles are 100 m for women and 110 m for men). 4. We are 10 years overdue for the intersection of the men's and women's records in the 10,000 m. 5. We are 8 years overdue for the intersection of the men's and women's records in the 4x400 m relay. So, there you have it. Writing a paper for Nature can be just that easy. But you've probably already realized that something's amiss. Let's look at a few reasons why this kind of "analysis" is completely irrational. Reason #1: The models predict that marathon runners will eventually run faster than sprinters. If you look at the slopes of the best-fit lines for each record progression, you'll find that most of the long distance events have steeper slopes than the sprint events. This suggests that the marathon record will eventually represent a higher average speed than the 100 m world record. I guess the marathoners will be content to use their incredible speed and super stamina only to win marathons, leaving the 100 m event to slower, less capable humans. Here are all the slopes of the lines:  Average rate of improvement in world record performances, expressed in metres per second per year. The women's 3000 m steeplechase record has the fastest rate of improvement, suggesting that a woman running nearly two miles and over several 30-inch high barriers will one day be the fastest human on the planet. The men's 100 m record has the lowest rate of improvement, suggesting that the men's 100 m record holders will eventually have the slowest average speed of all male and female record holders in any track event. Below are all the dates of intersection with the men's 100 m world record progression: As you can see, in most of the events, our models predict that a woman will be outpacing the men's 100 m world record holder by the year 2100. The men running in other events will also outpace the men's 100 m world record, but it will take them, on average, 100 years longer to do it than the women. Reason #2: A linear model to predict how fast the world's fastest human can run at a given time makes no physical sense and has no basis in reality. Why does a linear model not make sense? Well, to start, a linear model has a zero-intercept. Meaning that the model suggests there was a time in history where the world's fastest human was stationary. If you go further back in time, the model predicts negative speeds. Speed is considered to be an absolute quantity, so negative speed has no physical meaning. A linear model also increases without bound; it suggests that there is no upper limit to how fast a human can run. Obviously, that cannot be true. The speed of light is definitely beyond reach, but a linear model suggests that we'd get there (eventually). Of course, there are more stringent restrictions related to our biology and physiology that cap human speed to far more modest levels, but we needn't get into that. The point is there are obvious limits to how fast we can run and a linear model ignores them. So here are some important and completely absurd milestones predicted for the women's 3000 m steeplechase record: Reason #3: A model with no physical basis cannot be trusted to give meaningful results if you extrapolate beyond the data. The average extrapolation to intersection of the trendlines was 32 years for the men and 42 years for the women. That's quite significant. The men's records often went back to the late 1800s, but the women's records rarely went back to around the 1920s. Several of the women's records only go back to around the 1970s. In the case of the 3000 m steeplechase, the records only go back to 1996 because the IAAF didn't permit women to compete at that event previously.  How far you have to extrapolate the trendlines to reach a predicted win for women over men. Extrapolating so far beyond your data is not a meaningful prediction. All it can tell you is what might be if the general trend you see now just happens to continue in exactly the same way long into the future. Reason #4: In the past, female steroid users could get away with more significant performance enhancement than male steroid users. Drug testing in the 1970s and 1980s wasn't nearly as sensitive as it is today. Steroid use was rampant and often went undetected. Several communist countries had state sponsored programs to enhance athletic performance (often without the athlete's knowledge or consent). East Germany and the Soviet Union were quite successful at it. This isn't to say they were the only cheaters, but they definitely had spent considerable effort researching the best way to cheat. Many American and Chinese athletes were also cheating. The reason steroid use is comparatively advantageous for females has to do with our biology. Anabolic steroids mimic the hormones that make men strong and muscular. Women have these hormones too, but in much smaller quantities (typically less than 10% of male levels). What this means is that for the same quantity of anabolic steroid, women will have a much higher increase in their relative hormone levels, and therefore experience comparatively greater enhancement of their athletic performance. I've depicted this graphically below:  Graphical comparison of an equal dose of steroids in men and women. What I've done is assumed that a male athlete typically has 100 units of testosterone naturally and a female athlete typically has one tenth that amount. I have then shown the effects of adding 10 units from steroids. In the male athlete, it's a 10% increase in his normal testosterone level. In the female athlete, it's a 100% increase in her normal testosterone level. What does all this talk of steroid use mean? Well, it means that a level of steroid use too small to detect in 1980 could potentially still provide significant performance enhancement to female athletes. That might help explain why women's records are so much older than men's records: the women's records have been set almost impossibly high by the steroid-fed women of the 70s and 80s. Looking only at official Olympic running events, the average age of a men's world record is currently 8.8 years. The average age of a women's world record is more than twice that at 18.5 years.  Dates the current world records were set. Women simply aren't breaking records like they used to, but our prediction models don't know that. Reason #5: Historically, fewer women have been able to train and compete in athletics. This strongly influences the slope of the best-fit line. Many of the women's races didn't appear as Olympic events until long after the first modern Olympiad in 1896. Therefore, the men have a longer history of world-class competition in these events. Here's a list showing when each event first appeared at the Olympic games: Furthermore, men were competing in some of these events and maintaining statistics long before the first Olympics, so the world records were well-established. Women in the past haven't pursued athletic endeavours due to various gender-biases and ill-conceived notions of female physical limitations. For instance, after six women collapsed upon completing the 800 m race at the 1928 Olympics, it became widely believed that this event was simply too much for feminine strength. Some doctors warned that women who participated in such feats of endurance would grow old too quickly. It didn't seem to occur to everyone that these women simply hadn't trained for this event and that's why it was so hard for them. So up until around the first half of the 20th century, very few women even had the opportunity to pursue athletics and many of the women's records were just beginning to be tracked by the IAAF. Therefore, the initial women's records improved quite rapidly, since they were set by athletes who were comparatively not as well trained as male athletes at the time, in competition against a comparatively smaller pool of talent. This rapid progression early in the data set will inflate the predicted average rate of progression (i.e. increase the slope of the best-fit line). To show you what I mean, here are the women's 100 m and marathon world record progressions again, but split up to show how much faster the records were improving at the beginning.  Women's record progression in the 100 m and marathon. As you can see, for both events, the rate of improvement in the world record decreases in the latter half of the record progression. To summarize, two papers prepared by medical doctors and published in Nature suggested that women would soon outpace men in world-class athletics events. The authors (and reviewers) demonstrated poor understanding of the subject matter and did not appreciate the limitations of the analytical methods used to arrive at their conclusions. As a result, their predictions were wildly inaccurate. The moral of the story is: don't conduct an analysis that you're incompetent to perform. And to answer the question "Will women outrun men?", the answer is "probably not". Men naturally produce more testosterone, hence are larger and stronger, ultimately making them more capable athletes. Even in the marathon, their increased size and strength gives them a bit of an edge. References Tatem, A. J., Guerra, C. A., Atkinson, P. M., and Hay, S. I. (2004). Athletics: Momentous sprint at the 2156 Olympics? Nature, 431. pp. 525. Whipp, B. J. and Ward, S. A. (1992). Will women soon outrun men? Nature, 355. pp 25. ## Tuesday 1 July 2014 ### Analysis of Price Inflation of Alcoholic Beverages in Canada I enjoy indulging in beer, wine, or scotch once in a while. But drinking alcohol's an expensive habit, even if you're just drinking the cheapest available beer on the market. It's been almost 9 years now since I bought my first drink, which got me thinking about how the price of booze has changed over time. First, I realized that reminiscing about how much I once could buy with my dollar is something old people like to talk about with their children and grandchildren. But then I got past the harsh reality that I am beginning to think like an old man and searched for the data. Statistics Canada has been interested in the month-to-month changes in the price of alcohol for years. The data's available here. Let's start with the national average. I took the price indices published by Statistics Canada, normalized them to September 1978, and then plotted the graph below. The graph shows how much$1 worth of beer, wine, or liquor in 1978 has increased in price over time.

 On average, $1 worth of beer in Canada in 1978 would cost nearly$5 today.

 Inflation of the average price of alcohol in Canada since 1978

What's interesting is that beer prices have increased at a significantly higher rate than wine or liquor. There was also a period there from the mid-1990's to the mid-2000's where the inflation of the price of beer bought at the liquor store was much higher than for other alcohol bought anywhere or for beer bought at the bar. That's since tapered off and now the inflation for beer prices is about equal for bars and liquor stores.

What's also interesting is that the rate of inflation of the price of liquor at the store is comparatively low, even lower than the Consumer Price Index (CPI). You can think of the national CPI as the average inflation for stuff the average Canadian typically spends his or her money on. It's a weighted average of all the stuff you spend money on: food, clothes, gasoline, rent/mortgage, etc.

Basically, compared to 1978, the price of beer has inflated much faster than the CPI while liquor at the liquor store has been maintaining below average inflation.

Okay, that's fine, but if you're a consumer of alcohol in Canada, you're probably well aware that the same drink in Quebec costs significantly more in Alberta. So how do the provinces compare to each other? Statistics Canada does have some data sorted by province, but not the "served" alcohol stats. But we can at least look at how the value of a dollar spent at the liquor store has changed.

 On average, $1 worth of beer in Alberta in 1978 costs$7.29 today.

 Inflation of the price of beer bought at liquor stores in Canada since 1978

 Inflation of the price of wine bought at liquor stores in Canada since 1978

 Inflation of the price of liquor bought at liquor stores in Canada since 1978

We can also get the average annual rate of inflation from 1978 to 2014:

 Average annual rate of inflation of alcohol prices in Canada from 1978 to 2014

As you can probably see, Alberta has experienced the highest inflation in both alcohol prices and the CPI since 1978. The most significant inflation; however, was in beer prices. Interestingly, the prairie provinces have all experienced relatively high inflation in the price of beer. I haven't looked into why that might be, but perhaps is has to do with the price of the raw ingredients, which are typically grown in the prairies. I think it's also interesting that the basic trend in each province is the same. Beer prices have inflated more than wine or liquor prices in every province and have inflated at a rate ahead of the CPI. Liquor prices have also inflated at lower rates than beer and wine in every province, except in New Brunswick where wine just barely beat liquor. Liquor prices have also inflated at a rate below the CPI in every province. What all this means is drinking beer has gotten comparatively less affordable everywhere in Canada since 1978. Meanwhile, drinking liquor has gotten comparatively more affordable. To borrow part of a Corb Lund lyric, I guess it's time to switch to whiskey.

In summary,