0 Members and 1 Guest are viewing this topic.
quote:The normal distribution, also called Gaussian distribution, is an extremely important probability distribution in many fields. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean ("average") and standard deviation ("variability"), respectively. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one (the green curves in the plots to the right). It is often called the bell curve because the graph of its probability density resembles a bell. The fundamental importance of the normal distribution as model of quantitative phenomena in the natural and behavioral sciences is due to the central limit theorem (the proof of which requires rather advanced undergraduate mathematics). A variety of psychological test scores and physical phenomena like photon counts can be well approximated by a normal distribution. While the mechanisms underlying these phenomena are often unknown, the use of the normal model can be theoretically justified if one assumes many small (independent) effects contribute to each observation in an additive fashion. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions.
quote:The term 'normal distribution' refers to a particular way in which observations will tend to pile up around a particular value rather than be spread evenly across a range of values (the Central Limit Theorem). It is generally most applicable to continuous data and is intrinsically associated with parametric statistics (e.g. ANOVA, t tests, regression analysis). Graphically the normal distribution is best described by a 'bell-shaped' curve. This curve is described in terms of the point at which its height is maximum (its 'mean') and how wide it is (its 'standard deviation'). In the above example, the most common measurement (i.e. 9)is the same in curves A and B but there is a greater range of values for A than for B. Curve C has the same distribution as A but the most common measurement (i.e. 18) is twice that of curve A. All of these distributions are normal and can be described by: ExampleThe antennae lengths of a sample of 32 woodlice were measured and found to have a mean of 4 mm and standard deviation of 2.37 mm. Using these parameters and the equation above, the expected frequency at each of the lengths encountered was calculated.Code: [Select]Measurement Observed Cumulative Estimated Estimated frequency observed frequency cumulative frequency frequency 0 2 2 1.3 1.3 1 3 5 2.4 3.7 2 4 9 3.8 7.5 3 3 12 4.9 12.4 4 4 16 5.4 17.8 5 7 23 4.9 22.7 6 3 26 3.8 26.5 7 3 29 2.4 28.4 8 2 31 1.3 30.2 9 1 32 0.6 30.8When the observed frequencies (bars) are plotted against the predicted normal distribution (red line) it can be seen that there is a rough agreement between the two. When the cumulative frequencies are plotted against each other the resulting straight line suggests that this sample may have a distribution close enough to normal to allow the use of parametric statistics. To test for normality properly you would have to use something like a Kolmogorov-Smirnoff test (see below). Deviations from NormalityThe above describes the normal distribution that are found occassionallyTests for NormalityThe simplest method of assessing normality is to look at the frequency distribution histogram. The most important things to look at are the symetry and peakiness of the curve. In addition be aware of curves that indicate two or more peaks this would show a bimodal distribution and are not very friendly in statistics.Visual appraisals must only be used as an indication of the distribution and subsequently better methods must be used. Values of skew and kurtosis as found in Excel's Function Wizard (SKEW and KURT respectively) are another good indicator, but can be over optimistic regarding the data's match with normality. Before the advent of good computers and statistical programs, users could be forgiven for trying to avoid any surplus calculations. Now that both are available and much easier to use, tests for normality (and homogeneity of variance) should always be carried out as a best practice in statistics. SPSS and Minitab contain the Kolmogorov-Smirnov test, which is the principal goodness of fit test for normal and uniform data sets. Alternatively, if you are a whizz on the calculator or in Excel and have a day or two spare or have access to UNISTAT, you may wish to use the Shapiro-Wilk test which is more reliable when n<50.Both of the above tests use the same hypotheses:HO: there is no difference between the distribution of the data set and a normal one HA: there is a difference between the distribution of the data set and normal The P-value will be provided by SPSS or Minitab, if below 0.05 reject the HO.
Measurement Observed Cumulative Estimated Estimated frequency observed frequency cumulative frequency frequency 0 2 2 1.3 1.3 1 3 5 2.4 3.7 2 4 9 3.8 7.5 3 3 12 4.9 12.4 4 4 16 5.4 17.8 5 7 23 4.9 22.7 6 3 26 3.8 26.5 7 3 29 2.4 28.4 8 2 31 1.3 30.2 9 1 32 0.6 30.8