Sample Size

Started by
5 comments, last by LilBudyWizer 16 years, 7 months ago
Statistics is a weak topic for me. So I thought I would set up a program to see what the results of varying sample size. I'm testing a yes/no condition with certain percent chance of being true. So run through a bunch of rolls. I count how many are it's true in 10, 20, 30, 50 and 70 rolls and repeat times 10, 100, 1000. That's 1-10 for the first ten and 11-20 for the second ten, not 1-10 for the first 10 and 2-11 for the second ten. It wouldn't be all that hard to change to do a moving window, but that would seem to be about the same as drastically increasing the sample sizes. So I run my little program and it kicks out the number of samples, mean, variance and standard deviation for each sample size. Those are percentages, what percent of the time was the condition true in 10, 20, 30, ect. rolls. So with a 10% chance of being true I get a 0.31% standard deviation with 10k sample sizes and 2100 samples. That seems high to me, but I'm not really sure how to calculate what it should be. What got me started on this is that I play WoW. One of the things I do in the game is try to fathom the mechanics. Some of those mechanics are well established with a fairly high degree of certainity. An example being that your chance of dodging an attack is given on your character sheet. My experience actually testing is that it takes about 3k-5k incoming attacks to get within about 0.1% of the expected chance. That being if you chance to dodge is 20% then within 3k-5k attacks you'll be between 19.9% and 20.1% rather than 19.98% and 20.02%. It's sort of an oddity of WoW that +10% to a 20% chance is a 30% chance rather than 22%. So I'm wondering if someone with a bit more familarity with statistics could say whether the numbers I'm getting from my program is reasonable. If someone could actually calculate, or link to a sight explaining how to calculate, what the expected standard deviation would be I would greatly appreciate it. Also if you see something wrong with my approach please point it out. 3k-5k is a managable number to test with, but may program is saying 70k isn't enough to get within 0.1% with a high level of confidence. I'm inclined to think I have a bug in the program.
Keys to success: Ability, ambition and opportunity.
Advertisement
Coded up a quick test procedure

Taking 70,000 samples from a population of 80% 0.0's and 20% 1.0's

20 trials:

Mean Variance StdDev
0.2014857 0.1609664 0.401206195406986
0.1996857 0.1599074 0.399884178112337
0.2010429 0.1606011 0.400750663586625
0.1990714 0.159518 0.399396993929926
0.2015714 0.1610052 0.40125456851961
0.2001857 0.1600791 0.400098815275696
0.1997857 0.1599209 0.399901151364217
0.2011714 0.1606231 0.400778066611335
0.2018 0.1610822 0.40135051757648
0.2002429 0.1601123 0.400140395708758
0.2011143 0.1606374 0.40079596866105
0.2010857 0.1606197 0.400773920955161
0.1998571 0.1599891 0.399986379361709
0.2012143 0.1606368 0.400795150723979
0.1971429 0.1583627 0.39794809540198
0.2009571 0.1605544 0.400692374802388
0.2036857 0.1620932 0.402608043936565
0.1998571 0.1599889 0.39998613720952
0.1964 0.1577306 0.397153074361838
0.2004286 0.1602151 0.400268833896718


I am using the sample standard deviation methodology (a denominator of N-1 instead of N)
Your samples follow a binomial distribution. Just keep in mind that the binomial distribution counts the number of successes, not the percentage of successes. The mean is np and variance is np(1-p), so with p=.1 and n=10,000 the mean is 1000 and the variance is 900. In percentages, this means a standard deviation of sqrt(900)/10,000 = 30/10,000 = .3%.

So the distribution you were sampling has a known statistical standard deviation of .3%, and your program came up with .31%. I think your program is working correctly.

Note that .31% is less than 1%. If the true value of p was .1, then after taking 10,000 samples it's extremely likely that your result would be between 9% and 11%, less than 1 percentage point off from the true value of 10%. Over 99.9% of the time the result would be in that range.
Thank you very much for that explaination. That's helps a great deal. The sample size needed is rather disappointing, but it will help a great deal in interpreting the samples I can collect.
Keys to success: Ability, ambition and opportunity.
In general, increasing the sample size by a factor of x will decrease the standard deviation by a factor of sqrt(x).

To calculate the increase in sample size needed to go from a standard deviation of .3% to .1%:

.3 / sqrt(x) = .1
x = 9

Since a sample size of 10,000 was needed to get a deviation of .3, a sample of 90,000 is needed to get a deviation of .1.

On the other hand, only 900 samples are needed to get a standard deviation of 1%, so at that point you can be confident that you are within 3 percentage points of the true value. Note that you don't actually know the true value, and the standard deviation is higher for lower values of p and gets lower as p approaches .5.
Here's some test results I got from two hours of combat. The only attacks I used were a normal melee auto-swing and a special attack Shield Slam. Both can be either a normal hit or a critical when they land. When they miss it can be due to a straight up miss or due to the mob dodging or parrying. Mostly it's mutually exclusive outcomes with the exception of block. Only normal melee hits can be blocked, but with special attacks such as Shield Slam a critical can also be blocked. Regretably the mod I use for collecting the statistics doesn't break blocks out by normal and critical.

The tables list the counts, what percentage those are of the total and the standard deviation, i.e. sqrt(n*p*(1-p))/n. In the interest of keeping this post short I just the stats first.

Counts       Melee      Shield Slam  Total..........................................Normal        2287           822      3109Criticals      218            51       269------------------------------------------Hits          2505           873      3378..........................................Block          115            37       189..........................................Miss            47            20        67Dodge          134            69       203Parry          140            49       189------------------------------------------Misses         321           138       459==========================================Total         2826          1011      3837==========================================


% Total      Melee      Shield Slam  Total...........................................Normal       80.93%        81.31%    81.03%Criticals     7.71%         5.04%     7.01%-------------------------------------------Hits         88.64%        86.35%    88.04%...........................................Block         4.07%         3.66%     3.96%...........................................Miss          1.66%         1.98%     1.75%Dodge         4.74%         6.82%     5.29%Parry         5.95%         4.85%     4.93%-------------------------------------------Misses       11.36%        13.65%    11.96%===========================================Total       100.00%       100.00%   100.00%===========================================


Std Dev      Melee      Shield Slam  Total...........................................Normal        0.74%         1.23%     0.63%Criticals     0.50%         0.69%     0.41%-------------------------------------------Hits          0.60%         1.08%     0.52%...........................................Block         0.37%         0.59%     0.31%...........................................Miss          0.24%         0.44%     0.21%Dodge         0.40%         0.79%     0.36%Parry         0.41%         0.68%     0.35%-------------------------------------------Misses        0.60%         1.08%     0.52%===========================================Total         0.00%         0.00%     0.00%===========================================
Keys to success: Ability, ambition and opportunity.
Now my understanding of that standard deviation is that it is the standard deviation of the means of samples of size n given a population mean of p. That can be used with the sample mean to construct a confidence interval around the sample mean within which the population mean should lie.

So from my character sheet I have a 6.91% chance of getting a critical hit against a level 70 creature. I tested against a level 69 though and the assumption is that one level improves my chance to crit by 0.2% so I should have a 7.11% chance to crit against this mob. I crit'd 7.71% of the time on melee (auto-swing) attacks. The standard deviation is 0.5% so (7.71-7.11)/0.5 = 1.2 standard deviations. So there's a good chance the actual chance is 7.11% as expected.

With Shield Slam though I got 5.04% with a standard deviation of 0.69% so that's (7.11-5.04)/0.69 = 3 which says it isn't very likely that it is 7.11 in that case. So would it be an f-test for evaluating the chance the crit chance is differant between the types of attacks as well as each against the expected?

One theory is that special attacks are evaluated with two rolls instead of one. If it evaluated hit/miss first then determined the type of hit then your crits as a percent of total attacks would be lower, i.e. 7.11% of 873 rather than 7.11% of 1011. That would give a 5.84% chance to crit. That would give a 0.79% standard deviation or (5.84-5.04)/0.79 = 1.01 standard deviations. That's close enough that you can't eliminate that possibility.

Another possibility is that it is 5%, always 5%, never differant from 5% on a special attack. Still a single roll though. There it's right on 5%. Retesting with a higher crit chance would eliminate one of the two though not prove the other. So that would bring up how much higher does the crit chance have to be so say that it is definitely not one of them.

Just to be clear I'm not asking anything about how WoW works, but rather how to use this type of test data to make inferances about how the game works. Most of the explaination is just to provide context to the results. Mainly I wondering if there's fundamental flaws in how I'm viewing this and how to fill in some of the holes. I don't need long, detailed explainations since I have 3-4 books on probability and statistics to do that. Rather it's more what are the relevant topics to brush up on.
Keys to success: Ability, ambition and opportunity.

This topic is closed to new replies.

Advertisement