SHAZAM Unbiased Estimation

Unbiased Estimators and their Sampling Distribution


Consider the random variables X1, X2, ..., Xn as a random sample from a population with mean µ. The average value of these observations is the sample mean. The sample mean is a random variable that is an estimator of the population mean. The expected value of the sample mean is equal to the population mean µ. Therefore, the sample mean is an unbiased estimator of the population mean.

How does this work in practice ? Suppose that a data set is collected with n numerical observations x1, x2, ..., xn. A numerical estimate of the population mean can be calculated. Since only a sample of observations is available, the estimate of the mean can be either less than or greater than the true population mean. If the sampling experiment was repeated a second time then a different set of numerical observations would be obtained. Therefore, the estimate of the population mean would be different from the estimate calculated from the first sample. However, the average of the estimates calculated over many repetitions of the sampling experiment will equal the true population mean.

This can be illustrated with a computer simulation. Suppose that a sample of 8 observations is drawn from a population that has a uniform distribution on the interval [0,4]. That is, the population mean is 2.

A computer program is used to generate 1000 different samples of 8 observations. An estimate of the mean is calculated for each sample. The results for the first 50 trials are shown below.

       
       ----------------  Sample Observations  ---------------     Sample
Trial   x1     x2     x3     x4     x5     x6     x7     x8       Mean
   1   0.884  3.816  0.663  0.412  0.523  3.934  3.425  0.553     1.776
   2   2.033  0.538  2.475  3.411  3.647  2.608  3.875  3.183     2.721
   3   1.083  0.111  0.804  3.485  1.739  3.021  2.601  2.469     1.914
   4   2.579  1.017  0.362  3.455  1.312  0.280  0.906  0.295     1.276
   5   2.733  3.816  3.824  3.573  2.394  2.991  2.409  3.264     3.126
   6   0.376  0.346  2.247  0.884  2.836  1.334  2.225  2.217     1.558
   7   1.753  2.217  3.492  3.006  1.260  2.859  1.230  2.888     2.338
   8   3.522  2.792  3.360  1.069  3.301  2.549  2.380  2.586     2.695
   9   1.260  2.152  3.699  0.789  1.385  0.671  2.093  3.050     1.887
  10   0.214  3.345  2.085  0.273  1.415  0.907  2.292  3.080     1.701
  11   1.739  2.483  2.189  2.321  1.047  3.794  0.627  1.010     1.901
  12   2.785  1.282  0.619  2.932  2.336  0.789  0.405  1.341     1.561
  13   0.030  0.744  2.034  2.262  1.024  1.496  2.262  1.290     1.393
  14   0.111  2.446  2.903  1.650  2.615  2.431  0.361  1.540     1.757
  15   3.521  1.856  1.024  0.832  1.724  1.142  2.578  0.973     1.706
  16   2.917  2.954  3.839  3.183  3.699  3.801  2.748  2.579     3.215
  17   1.106  2.225  2.984  2.520  1.828  3.596  3.316  3.854     2.678
  18   3.853  3.588  0.848  0.664  3.176  1.761  1.717  2.314     2.240
  19   3.603  0.804  3.714  2.218  2.734  1.423  1.431  1.188     2.139
  20   3.317  3.943  2.167  1.791  2.801  2.535  1.666  1.828     2.506
  21   2.263  3.508  2.079  2.602  2.072  0.532  0.805  0.068     1.741
  22   3.273  1.122  0.989  0.841  3.972  3.162  3.449  2.536     2.418
  23   1.482  2.469  0.628  1.541  0.142  1.401  3.346  1.512     1.565
  24   2.050  3.346  1.328  2.691  1.586  3.236  0.503  0.260     1.875
  25   0.260  1.233  1.380  3.538  3.288  2.949  0.260  1.807     1.839
  26   3.604  1.483  2.743  2.426  1.630  0.186  1.336  3.163     2.071
  27   0.430  1.866  3.546  0.651  2.684  2.625  1.078  0.304     1.648
  28   2.500  1.004  0.356  0.231  0.415  3.899  1.534  3.501     1.680
  29   1.564  2.890  1.741  0.886  3.641  0.363  2.433  0.989     1.814
  30   0.268  1.873  1.343  0.120  3.184  0.238  0.216  2.897     1.267
  31   3.987  2.455  1.962  1.431  1.048  0.827  0.009  0.805     1.566
  32   0.378  2.757  2.883  2.956  2.905  1.174  2.013  2.595     2.208
  33   1.292  2.080  0.290  2.934  0.695  1.373  0.621  1.874     1.395
  34   0.327  2.979  3.200  3.885  3.656  3.929  2.743  3.848     3.071
  35   3.752  0.040  0.290  2.051  2.987  1.543  2.950  0.084     1.712
  36   1.506  1.749  2.198  3.200  0.998  2.294  2.147  3.856     2.243
  37   0.799  1.108  0.990  0.799  2.979  1.336  2.721  1.639     1.546
  38   3.952  2.773  3.819  1.336  0.011  0.578  0.025  3.171     1.958
  39   0.187  3.996  0.173  2.876  2.309  3.885  0.813  3.686     2.241
  40   2.912  1.690  1.602  2.927  0.939  3.244  3.871  3.650     2.604
  41   0.703  1.845  3.466  0.504  3.370  3.370  1.374  1.028     1.957
  42   3.105  0.446  1.705  1.779  3.599  2.339  0.976  0.342     1.786
  43   1.654  2.494  2.759  2.663  1.787  3.223  1.035  1.448     2.133
  44   3.511  2.258  3.356  2.604  0.564  0.549  1.175  3.533     2.194
  45   2.339  3.503  1.919  3.820  0.004  0.203  2.803  2.899     2.186
  46   1.904  0.122  0.262  1.190  0.387  1.713  2.560  2.052     1.274
  47   2.855  2.111  2.796  1.403  2.862  1.728  2.435  1.971     2.270
  48   3.599  3.937  3.525  2.177  0.269  1.175  2.994  1.926     2.450
  49   3.356  0.387  2.472  0.144  1.757  0.277  3.901  1.617     1.739
  50   2.634  3.864  2.162  0.844  3.356  0.070  3.805  2.354     2.386

By viewing the final column that lists the estimates of the mean it can be seen that some estimates are less than the population mean of 2 and some estimates are greater than 2. A total of 1000 estimates was calculated and the average was obtained as:

      2.00780

The closeness of the average to 2 (the true population mean) reflects that the estimates are generated from an unbiased estimation procedure.

The sampling distribution of an estimator is the distribution of the estimator in all possible samples of the same size drawn from the population. For the sample mean, the central limit theorem gives the result that the sampling distribution of the sample mean will tend to the normal distribution. To see this result, the 1000 estimates of the mean were sorted into a number of groups. The numbers of observations in each group are displayed in the histogram below.

The above histogram is centered at 2 (the value of the population mean) and the shape conforms to the shape of a normal distribution.


SHAZAM command file

The SHAZAM commands for the above demonstration are as follows.

SAMPLE 1 8
GEN1 NREP=1000
* Repeated sampling of observations from a uniform distribution
* with sample size 8
DIM SAMPMEAN NREP
SET NODOECHO NOOUTPUT RANFIX
DO #=1,NREP
* Generate the sample
GENR X=UNI(4)
* Calculate the sample mean
STAT X / MEAN=MEAN
* Save the results
MATRIX I=$DO
MATRIX RESULTS=(I|X'|MEAN)
FORMAT(1X,F5.0,8F7.3,3X,F7.3) 
IF (I.LE.50)
PRINT RESULTS / FORMAT NONAMES
GEN1 SAMPMEAN:#=MEAN
ENDO
* Get the average from all the replications
SET OUTPUT
SAMPLE 1 NREP
STAT SAMPMEAN / MEAN=MEAN
PRINT MEAN 
* Display the sampling distribution with a histogram
GRAPH SAMPMEAN / HISTO GROUPS=10 RANGE
STOP


Home [SHAZAM Guide home]