To Engineer is Human: statistics

Showing posts with label statistics. Show all posts

Saturday, 4 April 2015

Understanding Return Periods

When discussing extreme events like floods, earthquakes, heavy snow, or strong winds, people often refer to a return period (also known as a recurrence interval). The “100 year flood” or the “30 year wind” for example. The 2010 National Building Code of Canada prescribes how loads on buildings should be calculated. The starting points for wind and snow load calculations are based on 50 year return periods. Discussing extreme events in terms of the return period is a convenient way for engineers and scientists to think about the statistical likelihood of these events. However, it’s also easy to misinterpret what a return period really means.

The terms “return period” and “recurrence interval” are confusing because they are not real durations of time. In other words, the return period is not the amount of time that should elapse between similar events. The return period is really an estimate of the likelihood of an event’s occurrence.

Let’s take a 30 year snow load as an example. The 30 year snow load is not an event that repeats regularly, every 30 years like clockwork. Snow loading is random, so there’s no reason to expect big snow loads to recur at a regular interval. If you could look at many years of data, you’d probably find a few clusters of big snow loads and long periods with comparatively low snow loads. But you should find that, overall, the 30 year snow load occurs in only about 3.3% (1/30) of the years in total.

There are few places in the world where this wouldn't be considered a very rare snow load. (Source: Snow-Blow.com)

Another confusing aspect of the return period is that these all sound like rare events, but then we hear about them all the time on the news. Are return periods being exaggerated?

The big issue here is that a return period is limited to a specific area where the statistical data is valid. The 100 year flood they’re talking about applies only to a particular area along a particular river. At a specific location, the 100 year flood is rare. But between all the rivers in the world, it’s actually pretty likely that a 100 year flood will take place somewhere in any given year.

Another issue is that the assumptions that went into estimating a return period might be wrong. There’s no guarantee that future conditions won’t change, affecting the likelihood of the event. For example, if a new town pops up and it discharges some of its storm sewers into the river, the characteristics of that river have changed. If it’s a small river and a big town, the river’s new 100 year flood level could be significantly higher than it was before.

An area of Morningside Creek in Toronto, ON at normal flow. (Source: Geocaching.com)

This is what happens to Morningside Creek when it gets a sudden influx of stormwater from the outfalls.
(Source: Friends of the Rouge Watershed)

Rouge Park, Meadowvale Road, Scarborough, ON

An area of Morningside Creek where the damage caused by stormwater outfalls is quite apparent. Regular flooding from stormwater outfalls added in the relatively recent past has caused rapid erosion of the banks. Mature trees have toppled into the river as its new floodplain gets carved into the earth. (Source: E. Victor C.)

To summarize, return periods are not real durations of time. They are just a different way to describe the probability of extreme events such as floods and snow loads. Return periods are calculated for specific areas using historical data. Hence, extreme events are rare for specific locations, but it is fairly likely that one will occur somewhere in the world in any given year. Furthermore, factors influencing some extreme events can change over time. Therefore, return periods estimated from past events don’t always accurately reflect the probability of future events.

References

Benjamin, J. R. and Cornell, C. A. (1970). Probability, Statistics and Decision for Civil Engineers. McGraw-Hill, New York, NY.

Mays, L. M. (2005). Water Resources Engineering. John Wiley & Sons, New York, NY.

NRCC. (2010). National Building Code of Canada 2010. National Research Council of Canada, Ottawa, ON.

Sunday, 13 July 2014

Will Women Outrun Men?

In 1992, Drs. Brian Whipp and Susan Ward of the University of California published an article in the journal Nature claiming that women would be beating the men at the marathon at the world elite level as early as 1998. In fact, they claimed that women will be outrunning the men in all the events by the middle of the 21st century. Needless to say, their predictions were bad. But that didn't discourage Tatem et al, a group of doctors from the University of Oxford, from publishing their own article in Nature in 2004. Tatem et al. essentially repeated the same analysis, and arrived at essentially the same conclusion, as Whipp and Ward. Tatem et al focused on the 100 m dash though and had a few more years worth of data to work with. However, they still concluded that women will beat men in the 100 m event at the 2156 Olympics.

Tika Gelana won the 2012 Olympic marathon in Olympic record time, but still 15 minutes behind the winner of the men's event. She would have placed 64th in the men's race.

It's now 2014 and a woman has yet to beat all of the men at any event at any world-class track & field meet. Twenty-two years after the blunderous predictions of Whipp and Ward and women still aren't threatening to break any of the men's world records any time soon. So how exactly did these doctors arrive at such bad conclusions? They were guilty of a gross misuse of statistics. How did bad statistics get published in Nature in 1992? I don't know. My guess is that the authors lucked out and got a peer reviewer who also knew nothing about statistics. How did essentially the same argument get published in Nature again in 2004? I wish I knew that too. It was a "double fail" for Nature's peer review process.

Anyway, I'm going to walk you through how to analyze world record progressions just like Tatem et al and Whipp & Ward, then provide some reasoning to demonstrate how ridiculous those authors' conclusions were.

Step 1:
Obtain the historical progression of world records from today to as far back as you can go. Whipp & Ward used records from the early 1900s up to 1992 (the IAAF started keeping records in 1912). Tatem et al used 1912 to 2004. I'll go from as far back as I can find data for, though arguments can be made for ignoring data prior to 1912. The main one is that the IAAF hadn't yet formed and therefore older records are not ratified as world records. A weaker argument is that Excel doesn't recognize dates from before the year 1900. That problem is easily overcome though, you just have to get a little creative with how you plot the dates.

By the way, finding the data for this analysis was harder to do back when our learned doctors were drafting papers for Nature, but today we have Wikipedia and all the world record progressions (at least the ratified records anyway) are found with ease. For example, here's the men's 100 m record progression. I gathered my data from here, which includes many non-ratified records. I've cross-checked some of the results to verify accuracy using this extensive database of track results.

Step 2:
Plot the record progression on a graph in Excel. Plot either the result or the average velocity on the y-axis and the date the record was set on the x-axis. Do this for both men's and women's records. I've plotted the 100 m and marathon world record progressions below:

Normalized world record progression for the 100 m dash and marathon.

Step 3:
Use the "Add trendline" feature in Excel to add best fit lines to the men's and women's records. Here's my plot again, but with the trendlines added:

Normalized world record progression with trendline forecasting for the 100 m dash and marathon.

Step 4:
Extrapolate the best-fit lines and calculate the date when the men's line intersects the women's line. Boldly conclude that women will be beating men at the Olympics by the date you've calculated. I've calculated the intersections of all my trendlines and generated the table below:

Predicted dates for intersection of men's and women's world records.

Step 5:
Write up your findings in a short article and submit it for publication in Nature.

Here is a list of conclusions you can pretend are supported by the data:

The 5000 m world record will be the last of the men's records to fall to the women. It probably won't happen until 2178.
Women's records will exceed the men's records in eleven of the thirteen events by the year 2050.
We are 20 years overdue for the intersection of the men's and women's records in sprint hurdles, based on average speed to complete the event (the sprint hurdles are 100 m for women and 110 m for men).
We are 10 years overdue for the intersection of the men's and women's records in the 10,000 m.
We are 8 years overdue for the intersection of the men's and women's records in the 4x400 m relay.

So, there you have it. Writing a paper for Nature can be just that easy. But you've probably already realized that something's amiss. Let's look at a few reasons why this kind of "analysis" is completely irrational.

Reason #1: The models predict that marathon runners will eventually run faster than sprinters.

If you look at the slopes of the best-fit lines for each record progression, you'll find that most of the long distance events have steeper slopes than the sprint events. This suggests that the marathon record will eventually represent a higher average speed than the 100 m world record. I guess the marathoners will be content to use their incredible speed and super stamina only to win marathons, leaving the 100 m event to slower, less capable humans. Here are all the slopes of the lines:

Average rate of improvement in world record performances, expressed in metres per second per year.

The women's 3000 m steeplechase record has the fastest rate of improvement, suggesting that a woman running nearly two miles and over several 30-inch high barriers will one day be the fastest human on the planet. The men's 100 m record has the lowest rate of improvement, suggesting that the men's 100 m record holders will eventually have the slowest average speed of all male and female record holders in any track event. Below are all the dates of intersection with the men's 100 m world record progression:

As you can see, in most of the events, our models predict that a woman will be outpacing the men's 100 m world record holder by the year 2100. The men running in other events will also outpace the men's 100 m world record, but it will take them, on average, 100 years longer to do it than the women.

Reason #2: A linear model to predict how fast the world's fastest human can run at a given time makes no physical sense and has no basis in reality.

Why does a linear model not make sense? Well, to start, a linear model has a zero-intercept. Meaning that the model suggests there was a time in history where the world's fastest human was stationary. If you go further back in time, the model predicts negative speeds. Speed is considered to be an absolute quantity, so negative speed has no physical meaning. A linear model also increases without bound; it suggests that there is no upper limit to how fast a human can run. Obviously, that cannot be true. The speed of light is definitely beyond reach, but a linear model suggests that we'd get there (eventually). Of course, there are more stringent restrictions related to our biology and physiology that cap human speed to far more modest levels, but we needn't get into that. The point is there are obvious limits to how fast we can run and a linear model ignores them. So here are some important and completely absurd milestones predicted for the women's 3000 m steeplechase record:

Reason #3: A model with no physical basis cannot be trusted to give meaningful results if you extrapolate beyond the data.

The average extrapolation to intersection of the trendlines was 32 years for the men and 42 years for the women. That's quite significant. The men's records often went back to the late 1800s, but the women's records rarely went back to around the 1920s. Several of the women's records only go back to around the 1970s. In the case of the 3000 m steeplechase, the records only go back to 1996 because the IAAF didn't permit women to compete at that event previously.

How far you have to extrapolate the trendlines to reach a predicted win for women over men.

Extrapolating so far beyond your data is not a meaningful prediction. All it can tell you is what might be if the general trend you see now just happens to continue in exactly the same way long into the future.

Reason #4: In the past, female steroid users could get away with more significant performance enhancement than male steroid users.

Drug testing in the 1970s and 1980s wasn't nearly as sensitive as it is today. Steroid use was rampant and often went undetected. Several communist countries had state sponsored programs to enhance athletic performance (often without the athlete's knowledge or consent). East Germany and the Soviet Union were quite successful at it. This isn't to say they were the only cheaters, but they definitely had spent considerable effort researching the best way to cheat. Many American and Chinese athletes were also cheating.

The reason steroid use is comparatively advantageous for females has to do with our biology. Anabolic steroids mimic the hormones that make men strong and muscular. Women have these hormones too, but in much smaller quantities (typically less than 10% of male levels). What this means is that for the same quantity of anabolic steroid, women will have a much higher increase in their relative hormone levels, and therefore experience comparatively greater enhancement of their athletic performance. I've depicted this graphically below:

Graphical comparison of an equal dose of steroids in men and women.

What I've done is assumed that a male athlete typically has 100 units of testosterone naturally and a female athlete typically has one tenth that amount. I have then shown the effects of adding 10 units from steroids. In the male athlete, it's a 10% increase in his normal testosterone level. In the female athlete, it's a 100% increase in her normal testosterone level.

What does all this talk of steroid use mean? Well, it means that a level of steroid use too small to detect in 1980 could potentially still provide significant performance enhancement to female athletes. That might help explain why women's records are so much older than men's records: the women's records have been set almost impossibly high by the steroid-fed women of the 70s and 80s. Looking only at official Olympic running events, the average age of a men's world record is currently 8.8 years. The average age of a women's world record is more than twice that at 18.5 years.

Dates the current world records were set.

Women simply aren't breaking records like they used to, but our prediction models don't know that.

Reason #5: Historically, fewer women have been able to train and compete in athletics. This strongly influences the slope of the best-fit line.

Many of the women's races didn't appear as Olympic events until long after the first modern Olympiad in 1896. Therefore, the men have a longer history of world-class competition in these events. Here's a list showing when each event first appeared at the Olympic games:

Furthermore, men were competing in some of these events and maintaining statistics long before the first Olympics, so the world records were well-established. Women in the past haven't pursued athletic endeavours due to various gender-biases and ill-conceived notions of female physical limitations. For instance, after six women collapsed upon completing the 800 m race at the 1928 Olympics, it became widely believed that this event was simply too much for feminine strength. Some doctors warned that women who participated in such feats of endurance would grow old too quickly. It didn't seem to occur to everyone that these women simply hadn't trained for this event and that's why it was so hard for them. So up until around the first half of the 20th century, very few women even had the opportunity to pursue athletics and many of the women's records were just beginning to be tracked by the IAAF. Therefore, the initial women's records improved quite rapidly, since they were set by athletes who were comparatively not as well trained as male athletes at the time, in competition against a comparatively smaller pool of talent. This rapid progression early in the data set will inflate the predicted average rate of progression (i.e. increase the slope of the best-fit line). To show you what I mean, here are the women's 100 m and marathon world record progressions again, but split up to show how much faster the records were improving at the beginning.

Women's record progression in the 100 m and marathon.

As you can see, for both events, the rate of improvement in the world record decreases in the latter half of the record progression.

To summarize, two papers prepared by medical doctors and published in Nature suggested that women would soon outpace men in world-class athletics events. The authors (and reviewers) demonstrated poor understanding of the subject matter and did not appreciate the limitations of the analytical methods used to arrive at their conclusions. As a result, their predictions were wildly inaccurate. The moral of the story is: don't conduct an analysis that you're incompetent to perform.

And to answer the question "Will women outrun men?", the answer is "probably not". Men naturally produce more testosterone, hence are larger and stronger, ultimately making them more capable athletes. Even in the marathon, their increased size and strength gives them a bit of an edge.

References

Tatem, A. J., Guerra, C. A., Atkinson, P. M., and Hay, S. I. (2004). Athletics: Momentous sprint at the 2156 Olympics? Nature, 431. pp. 525.

Whipp, B. J. and Ward, S. A. (1992). Will women soon outrun men? Nature, 355. pp 25.

Tuesday, 1 July 2014

Analysis of Price Inflation of Alcoholic Beverages in Canada

I enjoy indulging in beer, wine, or scotch once in a while. But drinking alcohol's an expensive habit, even if you're just drinking the cheapest available beer on the market. It's been almost 9 years now since I bought my first drink, which got me thinking about how the price of booze has changed over time. First, I realized that reminiscing about how much I once could buy with my dollar is something old people like to talk about with their children and grandchildren. But then I got past the harsh reality that I am beginning to think like an old man and searched for the data. Statistics Canada has been interested in the month-to-month changes in the price of alcohol for years. The data's available here.

Let's start with the national average. I took the price indices published by Statistics Canada, normalized them to September 1978, and then plotted the graph below. The graph shows how much $1 worth of beer, wine, or liquor in 1978 has increased in price over time.

On average, $1 worth of beer in Canada in 1978 would cost nearly $5 today.

Inflation of the average price of alcohol in Canada since 1978

What's interesting is that beer prices have increased at a significantly higher rate than wine or liquor. There was also a period there from the mid-1990's to the mid-2000's where the inflation of the price of beer bought at the liquor store was much higher than for other alcohol bought anywhere or for beer bought at the bar. That's since tapered off and now the inflation for beer prices is about equal for bars and liquor stores.

What's also interesting is that the rate of inflation of the price of liquor at the store is comparatively low, even lower than the Consumer Price Index (CPI). You can think of the national CPI as the average inflation for stuff the average Canadian typically spends his or her money on. It's a weighted average of all the stuff you spend money on: food, clothes, gasoline, rent/mortgage, etc.

Basically, compared to 1978, the price of beer has inflated much faster than the CPI while liquor at the liquor store has been maintaining below average inflation.

Okay, that's fine, but if you're a consumer of alcohol in Canada, you're probably well aware that the same drink in Quebec costs significantly more in Alberta. So how do the provinces compare to each other? Statistics Canada does have some data sorted by province, but not the "served" alcohol stats. But we can at least look at how the value of a dollar spent at the liquor store has changed.

On average, $1 worth of beer in Alberta in 1978 costs $7.29 today.

Inflation of the price of beer bought at liquor stores in Canada since 1978

Inflation of the price of wine bought at liquor stores in Canada since 1978

Inflation of the price of liquor bought at liquor stores in Canada since 1978

We can also get the average annual rate of inflation from 1978 to 2014:

Average annual rate of inflation of alcohol prices in Canada from 1978 to 2014

As you can probably see, Alberta has experienced the highest inflation in both alcohol prices and the CPI since 1978. The most significant inflation; however, was in beer prices. Interestingly, the prairie provinces have all experienced relatively high inflation in the price of beer. I haven't looked into why that might be, but perhaps is has to do with the price of the raw ingredients, which are typically grown in the prairies. I think it's also interesting that the basic trend in each province is the same. Beer prices have inflated more than wine or liquor prices in every province and have inflated at a rate ahead of the CPI. Liquor prices have also inflated at lower rates than beer and wine in every province, except in New Brunswick where wine just barely beat liquor. Liquor prices have also inflated at a rate below the CPI in every province. What all this means is drinking beer has gotten comparatively less affordable everywhere in Canada since 1978. Meanwhile, drinking liquor has gotten comparatively more affordable. To borrow part of a Corb Lund lyric, I guess it's time to switch to whiskey.

In summary,

Saturday, 28 September 2013

Criticism of the Body Mass Index

The body mass index (BMI) is a metric proposed by Adolphe Quetelet during the mid-nineteenth century to assess human body shape. It is defined as a person's body mass in kilograms divided by the square of their height in metres.

Since the 1800s, the BMI has seen continued use by health professionals as a quick assessment of one's health. Today, the BMI is still used to judge if a person is obese (BMI > 30 kg/m²), overweight (25 < BMI < 30), normal weight (18.5 < BMI < 25), or underweight (BMI < 18.5 kg/m²). But is the BMI a reasonable metric? Does it make sense for one's body mass to be proportional to the square of one's height? People of above average height, particularly men, often find that their BMI seems high despite being lean and fit. Shorter people, particularly women, similarly may find that their BMI seems low even with noticeable excess weight in the midsection. The BMI standards are most applicable for people who are close to average human height, but the standards lose their usefulness for everyone else. Considering that the average man and woman are already naturally about 8 cm above and below the average human height, respectively, it would seem that BMI immediately tends toward classifying men as overweight and women as underweight.

The BMI assumes this is true. The evidence says it isn't.

The simplest approach to estimating a power relation between mass and height would be to assume that the density of the human body is independent of size and that our bodies exhibit isometry. In simple terms, isometry means that height, breadth, and thickness all change in equal proportion. If a person gets 10% taller, they also get 10% wider across the shoulders and 10% thicker from front to back. The assumption of density's independence of size means that the average density of a tall person is the same for a short person. If both assumptions are true, then we would expect mass to be proportional to the cube of height.

What we'd expect if all humans were scale copies of each other.

A lesser known index, known as Rohrer's Index or the Ponderal Index, actually makes the assumptions I've just mentioned above. Rohrer's Index is defined as mass divided by height cubed. For reference, the equivalent standards for underweight, overweight, and obese using Rohrer's Index instead of BMI are < 11.0 kg/m³, 14.9 < RI < 17.8, and > 17.8 kg/m³, respectively.

While people generally get wider and thicker as they grow taller, humans don't exhibit true isometry. There is a tendency for taller people to be narrower relative to their height than their shorter counterparts. Babies, with their comparatively large heads and short legs, are also far from being miniature adults.

Their big heads and tiny legs are adorable, but they also force us to abandon the isometry hypothesis.

If you look at actual data, you find that neither index is very good, though Rohrer's Index seems to work better, especially in pediatrics. We'd like to have a working power law relation between mass and height because it would make the whole mass-to-height type index applicable to more people and probably a more useful health metric as a result. To find a power law relation, we can simply plot mass as a function of height and use the power curve fit option in Excel (or see if log(m) ÷ log(h) is approximately constant)

We're looking for a value of 'p' that has better correlation with data.

From Vital and Health Statistics (Series 11, No. 252), which contains anthropometric data from American adults and children collected between 2007 and 2010, we find that p = 2.48 with R² = 0.98 for males and p = 2.50 with R² = 0.97 for females. According to the data used to create the CDC Growth Charts (published in 2000), p = 2.52 and 2.54 for males and females, respectively (R² = 0.99 and 0.98). Data from Britain's 2003 Health Survey suggests that p = 2.49 and 2.69 fit best for males and females, respectively (R² = 0.97 for both sexes).

Mass vs. Height of Males (2000 CDC Growth Chart Data)

Even the data which appears in Quetelet's Treatise on Man and the Development of His Faculties indicates 2 < p < 3. Using the data tables he gives for height and weight at different ages, p = 2.35 and 2.40 for males and females, respectively (R² = 0.97 for both sexes). Quetelet presents a separate table showing average weight for a given height. Based on that table, p = 2.21 for males (R² = 0.98) and p = 2.27 for females (R² = 0.96). It appears that the exponent is usually about the same in males as in females, so if we simply take the average of all the values of p we get 2.45. Therefore, the mass index formula we should be using is:

With this formula, the standards become:

Underweight (MI < 14.6)
Normal weight (14.6 < MI < 19.8)
Overweight (19.8 < MI < 23.7)
Obese (MI > 23.7)

With a correct power law, we eliminate the issue of classifying people as overweight or underweight simply because they are significantly taller or shorter than the average human. While the mass index derived from statistical analysis is an improvement over the BMI, it still doesn't overcome the other serious flaws. First, women naturally have a higher body fat percentage than men. Basically, female hormones cause women to grow breasts full of fatty tissue while male hormones cause men to grow larger muscles. The result is, on average, fat accounts for more of a woman's body weight than a man's by about 6 percentage points.

Second, the index doesn't distinguish between lean mass and body fat. Muscle tissue is about 17% denser than fatty tissue, so athletes and gym rats can be lean and fit but still have a total mass that suggests they are overweight according to the BMI standards.

Finally, lean mass accounts for most of a person's mass (except perhaps in a few extreme cases). Any mass index therefore should only correlate well with body fat percentage among the morbidly obese, but for the majority of people mass index and adiposity will correlate poorly.

Health professionals are aware of issues with using a mass index. Romero-Corral et al. (2008) published a study of over 13,000 Americans in the International Journal of Obesity to assess the accuracy of BMI as a diagnostic tool. Their discussion of the usefulness of BMI evolves around the inability to distinguish between lean mass and fat. They found that BMI > 30 classified 21% of the men and 31% of the women as obese. However, 50% of the men and 62% of the women were actually obese (defined as having greater than 25% or 35% body fat for men or women, respectively).

The accuracy of diagnostic tests is often assessed by positive and negative predictive values. Positive predictive value (PPV) is the probability that a positive test result indicates a correct diagnosis. Negative predictive value (NPV) is the probabilty that a negative result is correct. In Romero-Corral et al, PPV indicates the probability that BMI > 30 correctly identifies a person as obese and NPV indicates the probability that it correctly identifies a person as not obese. They found that the 30 kg/m² benchmark for obesity has a PPV of 87% for men and 99% for women. The NPV was 60% for men and 54% for women. What this all amounts to is a few false positives and a lot of false negatives; 50% of Americans are misidentified by the BMI-defined threshold for obesity.

Obesity diagnoses of the American population using BMI > 30 kg/m².
For every true positive obesity diagnosis using this test, there are 1.31 false negatives.

Despite well-known flaws and such poor accuracy as a diagnostic tool for obesity, the BMI remains commonplace. There are plenty of online BMI calculators. BMI is often included as part of a fitness assessment (I recall that it was calculated during fitness testing in high school gym class). All I can say is that I hope it goes away and that, however BMI classifies you, there's a pretty good chance that it's wrong.

Monday, 16 September 2013

Worldwide Beer Production and Consumption

Which countries produce the most beer? Which countries drink the most? Which nation is the drunkest? If you've ever wondered about these questions, this post is for you.

Beer is the world's third-most popular beverage, after water and tea.

I obtained most of my data from the Kirin Institute of Food and Lifestyle, which has been collecting and publishing beer production and consumption data since 1975. When I checked a Wikipedia article on beer consumption, Kirin was listed as the source, but the article on Wikipedia contains several errors, including one which moved Canada up 18 places in the worldwide per capita beer consumption rankings.

This ancient Egyptian model of a brewery is over 4,000 years old.
Chemical evidence reveals that beer existed at least 7,000 years ago in ancient Iran.

Let's look at beer production first. According to Kirin, the top ten beer producing countries in 2010 by total volume were:

China (44,252,936 m³)
USA (22,898,177 m³)
Brazil (12,769,662 m³)
Russia (10,240,000 m³)
Germany (9,568,300 m³)
Mexico (7,988,900 m³)
Japan (5,850,450 m³)
United Kingdom (4,499,700 m³)
Poland (3,600,000 m³)
Spain (3,337,500 m³)

Canada was 18th with 1,964,700 m³. One cubic metre is equal to 1,000 litres, which is about 2,933 bottles (341 mL bottles). In other words, Canada produced over 5.7 billion bottles worth of beer in 2010, about 8.6% of what the Americans produced and only about 4.4% of the amount of beer produced by the Chinese. I created a pie chart to visualize how much each of the top beer-producing countries contribute to the world's beer supply. As you can see, the top five beer-producing countries account for more than half of the world's beer.

Half of the top 25 beer-producing countries contribute only about 1% each to the world's total beer production.

Not surprisingly, it turns out that the countries producing the most beer tend to also be consuming the most beer. According to Kirin, the ten largest consumers of beer in 2010 were:

China (44,683,000 m³)
USA (24,138,000 m³)
Brazil (12,170,000 m³)
Russia (9,389,000 m³)
Germany (8,787,000 m³)
Mexico (6,419,000 m³)
Japan (5,813,000 m³)
United Kingdom (4,587,000 m³)
Spain (3,251,000 m³)
Poland (3,215,000 m³)

As you can see, all of the top ten beer-producing countries were also the top ten beer-consuming countries, the only difference being that Poland and Spain traded 9th and 10th places. Canada consumed 2,311,000 m³ of beer in 2010, which placed us at 14th in the world. Here's another pie chart, this time showing consumption.

More than half of the world's beer is being drunk in just five countries.

Overall, this chart is pretty similar to the previous one. Some interesting changes are that Netherlands and Belgium, the 14th and 20th biggest beer-producers, respectively, don't even crack the top 25 when it comes to beer consumption. Apparently the Dutch like to sell beer abroad much more than they like drinking the stuff. Similarly, Argentina, which didn't appear among the 25 biggest producers, is the 20th biggest consumer of beer worldwide. This gave me the idea of looking at relative national beer surplus or deficit, shown below:

Belgium produces more than twice the amount of beer that it consumes.

It looks like most countries we've looked at consume roughly the same amount of beer they produce. Seventeen of the 26 countries shown consume within 10% of what they produce. Netherlands and Belgium are interesting because they each consume only about half of the amount of beer that they produce, a much larger disparity than any of the other countries I have data for. The French had the largest relative beer deficit, consuming about 26% more beer than they produce. We Canadians also have an appreciable beer deficit, drinking 18% more beer than we produce.

Now to answer the most important question of all: which country's beer consumption has their population most intoxicated? To answer this, we need to look at beer consumption per capita. China's total beer consumption might be about 19 times that of Canada's, but they've also got about 40 times more people to do the drinking. Based on the 2010 data from Kirin, the ten countries boasting the top beer-drinking peoples are:

Czech Republic (131.7 L/person/year)
Germany (106.8 L/person/year)
Austria (105.8 L/person/year)
Ireland (103.7 L/person/year)
Estonia (90.6 L/person/year)
Lithuania (85.7 L/person/year)
Poland (83.6 L/person/year)
Australia (83.4 L/person/year)
Venezuela (83.0 L/person/year)
Finland (82.7 L/person/year)

This chart shows the top 35 consumers of beer per capita:

Americans, and most Europeans, drink more beer than us.

You might be wondering what happened to China, the country drinking the biggest share of the world's beer. Because of their large population, they rank only 49th in terms of beer consumption per capita. Canada was only 23rd with 68.4 L/person/year (not 5th with 98.5 L/person/year as Wikipedia had first led me to believe). Czech Republic's 131.7 L/person/year is really quite impressive when you realize this works out to an average of a little more than a bottle of beer per day! In one year, the average Czech drinks 107 more Imperial pints of beer than the average Canadian. We can't even take pride in beating the Americans; they were 12th with 78.2 L/person/year. My guess is that Americans, having been brainwashed into believing that baseball is a noteworthy pastime and not wishing for others to see them as "un-American", have to drink a lot of beer in order to suffer through all the baseball games they watch. I mean, I'd drink more too if I thought I had to watch dozens of 3-hour games of baseball to prove my allegiance to my country.

Watching baseball's only slightly more entertaining than watching golf.

Of course, not all beer has the same alcohol content. Americans like light beers with comparatively low alcohol content. Eastern Europeans generally tend to prefer stronger beers. So perhaps you're wondering if there are any stats on the actual quantity of alcohol consumed from beer-drinking? The answer is 'yes'. The World Health Organization's Global Health Observatory makes global data related to several health topics readily available to anyone. This includes the Global Information System on Alcohol and Health. They've been collecting their own per capita beer consumption data, but report it in terms of litres of pure alcohol per year, which is a better indicator of which countries are the most intoxicated from their beer-drinking. According to WHO, the ten countries most drunk on beer in 2010 were:

Czech Republic (6.79 litres of pure alcohol per person per year)
Austria (6.10 L/person/year)
Germany (6.01 L/person/year)
Lithuania (6.00 L/person/year)
Poland (5.90 L/person/year)
Ireland (5.73 L/person/year)
Serbia (5.01 L/person/year)
Spain (4.87 L/person/year)
Estonia (4.68 L/person/year)
Slovenia (4.59 L/person/year)

Here's yet another bar chart, this time plotted using the data from WHO.

On average, Czechs get more than 101 Calories/day from the alcohol in the beer they drink.

There are some apparent discrepancies between the Kirin and WHO data sets. For instance, Serbians aren't even in the top 35 beer-drinking peoples according to Kirin, but are 7th on WHO's list. Spain also shows up 8th on WHO's list but was only 22nd on Kirin's list. I doubt that the Spanish are drinking such strong beer that it would make up the difference. But for the most part, the two lists agree pretty well. The Czechs are still decisively ahead of everyone else, followed most closely by Austria and Germany. It appears that the American taste for light beer sends them down to 16th place with 4.28 L/person/year. Canadians on the other hand move up to 18th place with 4.20 L/person/year.

To summarize:

China produces and consumes nearly one quarter of the world's beer
Canada isn't as big of a beer-drinking nation as some of us would like to believe
Don't challenge a Czech to a beer-drinking contest

Friday, 2 August 2013

Guessing on Multiple Choice Tests

You've probably heard someone give advice along the lines of "when in doubt, guess 'C'" when taking multiple choice tests. The alleged reasoning behind it is that there's a statistical advantage in guessing 'C' consistently rather than randomizing your answers.

I decided to investigate this claim. Let's first check the case when the professor distributes the correct answers evenly and randomly, not favouring any one letter. Take for example a test with 30 questions, where each answer is independent of the others and there are four choices per question (A, B,C, or D). Assuming that you really have no clue which is the correct answer, any guess has a 25% probability of being correct. This is true regardless of whether you choose randomly (let's ignore for the moment that humans are notoriously bad at generating random values on their own) or if you decide to choose the same letter consistently. Here's why:

The instructor randomly chooses which letter is correct, without bias, so the probability of any letter happening to be the correct answer on a particular question is 25%. If you also guess randomly without favouring any letter (i.e. each letter is guessed, on average, 25% of the time), you should then expect that 25% of the correct answers are 'A', 25% of your guesses are 'A', thus 6.25% of your guesses (25% of 25%) are correct. The same is true of 'B', 'C', and 'D', so that overall you expect 4*6.25% = 25% of your guesses to be correct. Now if you consistently choose 'C' for the same random test, the probability that 'A' is correct is 25% but the probability that you guess 'A' is 0%. This is also true of 'B' and 'D'. When you choose 'C' 100% of the time, overall you should expect that 3*0.25*0 + 1*0.25*1 = 25% of your guesses are correct.

Since the guess is either right or wrong, this is a problem for the binomial distribution. The mean is n*p and the variance is sqrt[n*p*(1-p)], where n is the number of trials and p is the probability of the desired outcome. In our example multiple-choice test, whether you guess randomly or choose 'C' every time, you would expect, on average, to get 30*0.25 = 7.5 correct answers. To reinforce the point that both approaches to guessing are equal here, I simulated 50,000 of these tests in Excel and generated the following plot.

Probability distribution for correct guesses on tests with random answers to four-choice questions.

The average was 7.49 correct answers with a standard deviation of 2.37 when all guesses were random. When 'C' was guessed consistently, the average was 7.50 with a standard deviation of 2.38. The binomial distribution with n = 30 and p = 0.25 predicts an average of 7.50 and a standard deviation of 2.37. It's pretty clear that both guessing schemes give the same results when the correct answers are random and evenly distributed among the possible choices.

Now that it's clear that there's no advantage if answers are randomly and evenly distributed, let's investigate the case where the instructor favours one letter over the others. It turns out that no matter how biased the instructor's distribution of correct answers might be, if you randomly guess each letter 25% of the time, your probability of choosing the right answer is still 25%. Let's assign unknown probabilities for each letter being assigned as correct by the instructor: pA, pB, pC, and pD. The sum of these unknown probabilities must be unity. So when the probability of guessing A, B, C, or D is 25% each, the probability of being correct is then 0.25pA+0.25pB+0.25pC+0.25pD = 0.25(pA+pB+pC+pD) = 0.25*1 = 25%. That all changes if you consistently choose 'C' though. In this case, your probability of getting the right answer is 0pA+0pB+1*pC+0pD = pC. I varied pC between the most unfavourable case (where 'C' is never the correct answer) to the most favourable (where 'C' is always the correct answer) to generate the following plot:

Probability distribution for correct answers when 'C' guessed consistently for different values of pC.

Obviously, there's some benefit to choosing 'C' if you know that the instructor favours 'C' over the other letters. But you also get screwed if the instructor doesn't like to use 'C' or simply chooses to favour another letter because he wants to penalize the people who always guess 'C'. While it looks pretty nice that more of the curves I plotted are shifted to the right rather than left of the curve for random guessing, if the choice of which letter gets favoured is random, the chance that 'C' is never the right answer is higher than the chance that it is always correct. To illustrate, I ran a simulation which I think is slightly more realistic than the examples above.

I assumed that pB + pC is 54% on average, which is just a guess on my part but I feel it is reasonable because answers aren't randomly assigned a letter. Numerical answers are often ordered from smallest to largest and statements like "All of the above" are reserved only for 'D' because they'd be confusing if they weren't. I used the normal distribution to generate the random variations in individual tests, so that pB + pC can vary from 0 to 1, but is usually close to the average. I similarly used normally distributed random numbers to split up pB and pC, so that on average pC is half of (pB + pC), but can vary from 0 to 100%. Same idea with pA and pD. Random guesses by the test taker are still equally distributed on average between the four choices. Plotted below are the results of 250,000 simulated tests.

Probability distribution for correct answers based on "realistic" multiple choice tests.

With random guessing we expected to be right 25% of the time on average and to see a binomial distribution after many tests. When 'C' is guessed consistently, things get more complicated, but a simple approximation can be found using the normal distribution. As you can see in the plot, the approximation works fairly well. It is clear that random guessing gives you more consistent results than guessing 'C' all the time. Always guessing 'C' increases your chances of getting 1 in 3 guesses right, but it also increases your chances of doing worse than 1 in 6, simply because your results are influenced by how the distribution of answers was biased by the instructor for the particular test. Overall, consistently guessing 'C' resulted in 2% more correct answers. But of the 250,000 simulated tests, random guessing beat consistently guessing 'C' on a total of 125,568 tests (i.e 50.23% of the time).

To summarize, if you happen to know that the creators of the test favour 'C' for correct answers, guessing 'C' consistently gives you an edge. In the long run, sticking with 'C' probably gives you a very slim advantage over random guessing, though guessing randomly gives you better consistency in the results of your guesses.