Statistics in Analytical Chemistry

Sample Measurements, Histograms, and Probability Distributions:

When we make a measurement, we expect to get the true value. This is known as the expected value E(x), or the population (true) mean μ. We rarely get this value, however, as there is always some degree of random error or fluctuation in the system. The result we get will probably not be the true value, but somewhere close to it. If we repeat the measurements enough times, we expect that the average will be close to the true value, with the actual results spread around it.

As an example, let's determine the average weight of a single sugar-coated chocolate from a box of Smarties™. The following histogram shows the distribution of experimentally determined masses for a large sample (n=97):

Histogram of measured mass of 100 candies
Histogram of the measured individual masses of 97 Smarties™

Here the frequency is the number of times that a particular result occurs. From this plot, we see that the values are distributed relatively evenly around a point somewhere between 1.00 and 1.05, so the mean value of these measurements is probably around 1.02 or 1.03. (In fact, the mean and standard devation for this sample are 1.02₉ ± 0.07₁ g)

Note that we can also plot the vertical axis as the relative frequency by dividing each count by the total number of samples. Also note that when the sample size is small, the histogram can look quite different, and can vary from one set of samples to the next. You can explore this in the exercise included below.

This type of plot is called a probability distribution and, as you can see, it has a bell shape to it. This becomes increasingly apparent as we increase the number of measurements, that is, we progress from measuring a sample to measuring the whole population. In fact, this type of distribution is sometimes called a bell-curve, or more commonly, the Normal Distribution.

Exercise 1: Generating a Histogram

Earlier versions of Excel™ did not directly support generating and plotting histograms. Histograms are now included as a chart type (Excel 2016 for Windows or 2019 for Mac). However, there is currently no built-in function to generate the corresponding data table. To explore histograms, start by downloading and opening the sample data file linked in the side bar.

Current version of Excel:

To insert a histogram of the data, select all the values including the data label. You can then simply use the menu (Insert → Chart → Histogram) or expand the Statistical chart type option in the Insert tab of the toolbar as shown below:

Location of the histogram chart in the Excel
toolbar

Once the histogram plot is inserted, expand it for a better view and then richt-click on the histogram and choose Format Data Series... (or click on the Format tab and choose Series "Mass (g)" in the button bar). Select the Bins drop down menu in the formatting pane as shown below. Change the number of bins and observe how the histogram changes appearance as this number is reeduced or increased.

Adjusting the hisrogram bin size

The bin size of a histogram specifies the range of values that should be counted together; a bin size of one will include all the values in a single bar. As the bin size is decreased, we can see more detail in the sample distribution. You can think of bin size as analogous to measurement resolution: we would see only a single value if the masses had been measured to ±1 g; conversely, we would see even more variation if the masses had been measured to the nearest ±1 mg (0.001 g). Other features of the plot can be formatted as described previously.

Older versions of Excel:

For older versions of Excel, you will need to generate a data table that can then be plotted as a vertical bar chart. You can do this in a number of ways:

Use the COUNTIF function to generate a table from a set of bin values (see the provided example file for details)
Use the HISTOGRAM function from the Analysis Tool Pak add-in to generate the data table
Use the HLOOKUP tool to generate the same data table

The Normal Distribution:

A normal distribution implies that if you take a large enough number of measurements of the same property for the same sample under the same conditions subject only to random (indeterminate) error, the values will be distributed around the expected value, or mean, and that the frequency with which a particular result (i.e. value) ocurs will become lower the farther away the result is from the mean.

Put another way, a normal distribution is a probability curve where there is a high probability of an event (i.e. a particular value) occurring near the mean value, with a decreasing chance of an event occurring as we move away from the mean. The normal distribution curve and equation look like this:

Note: to convert y values to probabilities, P(x), the data must be normalized to give unit area.

The important thing to know about the Normal Distribution is that the probability of getting a certain result decreases the farther that result is from the mean. The concept of the normal distribution will be important when we talk about 1- and 2-tailed tests, confidence levels, and statistical tests, such as the t-test and F-test.

There are many other types of distributions, but we will only consider the normal distribution here. This is because we assume that the measurements we perform in this course will be normally distributed about the mean, and that the random errors will also be normally distributed. Generally, this is a good assumption, though there are many situations where it does not apply.

Continue to Confidence Levels...

Stats Tutorial - Instrumental Analysis and Calibration

Sample Measurements, Histograms, and Probability Distributions:

Tips & links:

Navigation:

Quick Links:

Exercise 1: Generating a Histogram

Current version of Excel:

Older versions of Excel:

The Normal Distribution: