SHAZAM Unbiased Estimation

Unbiased Estimators and their Sampling Distribution

Consider the random variables X1, X2, ..., Xn as a random sample from a population with mean µ. The average value of these observations is the sample mean. The sample mean is a random variable that is an estimator of the population mean. The expected value of the sample mean is equal to the population mean µ. Therefore, the sample mean is an unbiased estimator of the population mean.

How does this work in practice ? Suppose that a data set is collected with n numerical observations x1, x2, ..., xn. A numerical estimate of the population mean can be calculated. Since only a sample of observations is available, the estimate of the mean can be either less than or greater than the true population mean. If the sampling experiment was repeated a second time then a different set of numerical observations would be obtained. Therefore, the estimate of the population mean would be different from the estimate calculated from the first sample. However, the average of the estimates calculated over many repetitions of the sampling experiment will equal the true population mean.

This can be illustrated with a computer simulation. Suppose that a sample of 8 observations is drawn from a population that has a uniform distribution on the interval [0,4]. That is, the population mean is 2.

A computer program is used to generate 1000 different samples of 8 observations. An estimate of the mean is calculated for each sample. The results for the first 50 trials are shown below.

 ``` ---------------- Sample Observations --------------- Sample Trial x1 x2 x3 x4 x5 x6 x7 x8 Mean 1 0.884 3.816 0.663 0.412 0.523 3.934 3.425 0.553 1.776 2 2.033 0.538 2.475 3.411 3.647 2.608 3.875 3.183 2.721 3 1.083 0.111 0.804 3.485 1.739 3.021 2.601 2.469 1.914 4 2.579 1.017 0.362 3.455 1.312 0.280 0.906 0.295 1.276 5 2.733 3.816 3.824 3.573 2.394 2.991 2.409 3.264 3.126 6 0.376 0.346 2.247 0.884 2.836 1.334 2.225 2.217 1.558 7 1.753 2.217 3.492 3.006 1.260 2.859 1.230 2.888 2.338 8 3.522 2.792 3.360 1.069 3.301 2.549 2.380 2.586 2.695 9 1.260 2.152 3.699 0.789 1.385 0.671 2.093 3.050 1.887 10 0.214 3.345 2.085 0.273 1.415 0.907 2.292 3.080 1.701 11 1.739 2.483 2.189 2.321 1.047 3.794 0.627 1.010 1.901 12 2.785 1.282 0.619 2.932 2.336 0.789 0.405 1.341 1.561 13 0.030 0.744 2.034 2.262 1.024 1.496 2.262 1.290 1.393 14 0.111 2.446 2.903 1.650 2.615 2.431 0.361 1.540 1.757 15 3.521 1.856 1.024 0.832 1.724 1.142 2.578 0.973 1.706 16 2.917 2.954 3.839 3.183 3.699 3.801 2.748 2.579 3.215 17 1.106 2.225 2.984 2.520 1.828 3.596 3.316 3.854 2.678 18 3.853 3.588 0.848 0.664 3.176 1.761 1.717 2.314 2.240 19 3.603 0.804 3.714 2.218 2.734 1.423 1.431 1.188 2.139 20 3.317 3.943 2.167 1.791 2.801 2.535 1.666 1.828 2.506 21 2.263 3.508 2.079 2.602 2.072 0.532 0.805 0.068 1.741 22 3.273 1.122 0.989 0.841 3.972 3.162 3.449 2.536 2.418 23 1.482 2.469 0.628 1.541 0.142 1.401 3.346 1.512 1.565 24 2.050 3.346 1.328 2.691 1.586 3.236 0.503 0.260 1.875 25 0.260 1.233 1.380 3.538 3.288 2.949 0.260 1.807 1.839 26 3.604 1.483 2.743 2.426 1.630 0.186 1.336 3.163 2.071 27 0.430 1.866 3.546 0.651 2.684 2.625 1.078 0.304 1.648 28 2.500 1.004 0.356 0.231 0.415 3.899 1.534 3.501 1.680 29 1.564 2.890 1.741 0.886 3.641 0.363 2.433 0.989 1.814 30 0.268 1.873 1.343 0.120 3.184 0.238 0.216 2.897 1.267 31 3.987 2.455 1.962 1.431 1.048 0.827 0.009 0.805 1.566 32 0.378 2.757 2.883 2.956 2.905 1.174 2.013 2.595 2.208 33 1.292 2.080 0.290 2.934 0.695 1.373 0.621 1.874 1.395 34 0.327 2.979 3.200 3.885 3.656 3.929 2.743 3.848 3.071 35 3.752 0.040 0.290 2.051 2.987 1.543 2.950 0.084 1.712 36 1.506 1.749 2.198 3.200 0.998 2.294 2.147 3.856 2.243 37 0.799 1.108 0.990 0.799 2.979 1.336 2.721 1.639 1.546 38 3.952 2.773 3.819 1.336 0.011 0.578 0.025 3.171 1.958 39 0.187 3.996 0.173 2.876 2.309 3.885 0.813 3.686 2.241 40 2.912 1.690 1.602 2.927 0.939 3.244 3.871 3.650 2.604 41 0.703 1.845 3.466 0.504 3.370 3.370 1.374 1.028 1.957 42 3.105 0.446 1.705 1.779 3.599 2.339 0.976 0.342 1.786 43 1.654 2.494 2.759 2.663 1.787 3.223 1.035 1.448 2.133 44 3.511 2.258 3.356 2.604 0.564 0.549 1.175 3.533 2.194 45 2.339 3.503 1.919 3.820 0.004 0.203 2.803 2.899 2.186 46 1.904 0.122 0.262 1.190 0.387 1.713 2.560 2.052 1.274 47 2.855 2.111 2.796 1.403 2.862 1.728 2.435 1.971 2.270 48 3.599 3.937 3.525 2.177 0.269 1.175 2.994 1.926 2.450 49 3.356 0.387 2.472 0.144 1.757 0.277 3.901 1.617 1.739 50 2.634 3.864 2.162 0.844 3.356 0.070 3.805 2.354 2.386 ```

By viewing the final column that lists the estimates of the mean it can be seen that some estimates are less than the population mean of 2 and some estimates are greater than 2. A total of 1000 estimates was calculated and the average was obtained as:

`2.00780`

The closeness of the average to 2 (the true population mean) reflects that the estimates are generated from an unbiased estimation procedure.

The sampling distribution of an estimator is the distribution of the estimator in all possible samples of the same size drawn from the population. For the sample mean, the central limit theorem gives the result that the sampling distribution of the sample mean will tend to the normal distribution. To see this result, the 1000 estimates of the mean were sorted into a number of groups. The numbers of observations in each group are displayed in the histogram below.

The above histogram is centered at 2 (the value of the population mean) and the shape conforms to the shape of a normal distribution.

SHAZAM command file

The SHAZAM commands for the above demonstration are as follows.

 ```SAMPLE 1 8 GEN1 NREP=1000 * Repeated sampling of observations from a uniform distribution * with sample size 8 DIM SAMPMEAN NREP SET NODOECHO NOOUTPUT RANFIX DO #=1,NREP * Generate the sample GENR X=UNI(4) * Calculate the sample mean STAT X / MEAN=MEAN * Save the results MATRIX I=\$DO MATRIX RESULTS=(I|X'|MEAN) FORMAT(1X,F5.0,8F7.3,3X,F7.3) IF (I.LE.50) PRINT RESULTS / FORMAT NONAMES GEN1 SAMPMEAN:#=MEAN ENDO * Get the average from all the replications SET OUTPUT SAMPLE 1 NREP STAT SAMPMEAN / MEAN=MEAN PRINT MEAN * Display the sampling distribution with a histogram GRAPH SAMPMEAN / HISTO GROUPS=10 RANGE STOP ```

[SHAZAM Guide home]