It is possible to estimate the probability that a set of lab data does not contradict the assumed form of the PDF for a random variable such as a measured quantity. The fundamental notion is that one can find a statistic κ representing the deviation between the data set and the assumed theoretical distribution, this quantity is then compute for the data set resulting in some value κd, such that the probability that κ would exceed this value is some small confidence limit d











|
| Think about it, this
represents the probability that a given
set of data will have a deviation from a
theoretical predicted form greater than
the deviation measure of your data.
If this probability is nearly one, your fit is very good. From the figure (or extremizing the integrand) you can see that the most probable value for ξ = χ2 is N - 2, for N statistical degrees of freedom. I suppose that it is up to you how low you wish to go in regards to declaring a fit good. |
Consider now the simple problem of determining whether or not, and to what degree of confidence, a set
of data points {(x1,y1),(x2,y2),
,(xN,yN)} conforms to a hypothesis such as y = ax + b for example, a
linear fit. We take the following steps.
1. Find a curve of best fit fbest(x) = y. This is done by the method of minimizing the deviation of the
data from the curve with respect to its parameters. The deviation is

2. We next construct a standard statistic, the χd2 for our data, and determine the probability that for the
χ2 probability distribution, the value of χ2 should exceed our experimental value χd2. We decide upon an
acceptable significance level.
A good fit will have a small χ2 (a perfect fit has no deviation from theoretical), and so will have
℘N(χ2) = 1. Generally we are willing to accept much less than this. I think that a very reasonable
goodness of fit is that χ2 < N - 2, the most likely value of the variable.
Example
We have a set of data that we hope conforms to the theoretical curve



| χd2 | ℘(χ2 ≥ χd2) |
| 0.001000 | 1.000000 |
| 0.050950 | 0.999681 |
| 0.100900 | 0.998769 |
| 0.150850 | 0.997295 |
| 0.200800 | 0.995285 |
which is pretty much as we expected; a good fit between experiment and theory will have α close to
one.
Example
Suppose we take the exact same data, and run a linear regression on it to find the slope a and intercept b
of the line that best fits the data, and use that line as the theoretical curve. We are now measuring a
goodness of fit rather than testing a hypothesis. We obtain a line of best fit y = 0.039999 + 0.990000x, and
construct the chi-squared statistic
| x | y | y - (a + bx) | (y-(a+bx))2 ∣a+bx∣ |
| 2.000000 | 2.000000 | -0.020000 | 0.000198 |
| 3.000000 | 3.100000 | 0.090000 | 0.002691 |
| 4.000000 | 3.900000 | -0.100000 | 0.002500 |
| 5.000000 | 5.000000 | 0.010000 | 0.000020 |
| 6.000000 | 6.000000 | 0.020000 | 0.000067 |
obtaining χ2 = 0.005476, and the goodness of fit for χ2 distribution with 2 = N - 1 - 2 (we have exhausted two degrees of freedom from our data to get slope and intercept) parameters is;

Based on these two examples, we will call α computed in this way our confidence in the agreement
between experiment and theory for our data set. In the last example we are %99.7 percent confident that
the hypothesis is supported by our data.
Example
Consider the data for a nuclear counting experiment, each over 10s,
| counts j | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| frequency mj | 57 | 205 | 383 | 525 | 532 | 408 | 273 | 139 | 45 | 27 | 16 |
The total number of events counted is 𝒩 = ∑ j=0∞mj = 2608. We suppose that this data agrees with the proposition that

|



| j | ℘(j,aest) | 𝒩℘(j,aest) | mj -𝒩℘(j,aest) | (mj-𝒩℘(j,aest))2 𝒩℘(j,aest) |
| 0 | 0.020858 | 54.398627 | 2.601373 | 0.124399 |
| 1 | 0.080722 | 210.522688 | -5.522688 | 0.144878 |
| 2 | 0.156197 | 407.361402 | -24.361402 | 1.456883 |
| 3 | 0.201494 | 525.496208 | -0.496208 | 0.000469 |
| 4 | 0.194945 | 508.417582 | 23.582418 | 1.093846 |
| 5 | 0.150888 | 393.515208 | 14.484792 | 0.533167 |
| 6 | 0.097323 | 253.817309 | 19.182691 | 1.449766 |
| 7 | 0.053805 | 140.324712 | -1.324712 | 0.012506 |
| 8 | 0.026028 | 67.882080 | -22.882080 | 7.713222 |
| 9 | 0.011192 | 29.189294 | -2.189294 | 0.164204 |
| 10 | 0.004331 | 11.296257 | 4.703743 | 1.958631 |
In this case we find that χd2 = 14.651970. Pearson’s test consists of computing the probability that a standard χ2 probability will exceed this value, namely we compute



| χd2 | ℘(χ2 ≥ χd2) |
| 13.500000 | 0.141256 |
| 13.750000 | 0.131500 |
| 14.000000 | 0.122325 |
| 14.250000 | 0.113706 |
| 14.500000 | 0.105618 |
| 14.750000 | 0.098036 |
| 15.000000 | 0.090936 |
| 15.250000 | 0.084294 |
| 15.500000 | 0.078086 |
| 15.750000 | 0.072289 |
Since we will accept a confidence of %10, and the probability that χ2 exceeds our value 14.651970 is
approximately 0.1 (or %10) from the table, according to Sveshnikov’s statement of the Goodness of Fit
Criterion, we could conclude that deviation from a predicted Poisson distribution is insignificant; we have
a reasonably good fit under our acceptance criteria.
Why were we willing to go so low? We only have 2608 events, a pretty small set from which to build up a
frequency plot. If we had tens of thousands, we could get a much smoother frequency plot that is
less sensitive to the statistical fluctuations that overwhelm small data sets. Large data sets
are good, small are not; you should always be thinking about the Central Limit theorem.
Material relevant to hypothesis testing and the χ2 test can be found in A. A. Sveshnikov Problems in
Probability Theory, Mathematical Statistics and the Theory of Random Functions, Dover (1968). The
radioactive counting example is Example 43.1 of this reference.
21. Examine the formula for the chi-squared statistic. Which experimental data points will dominate this
formula (contribute most to it)? Is that realistic or appropriate in some way? Explain this carefully since
this issue is actually quite important.
22. Prove that the mode of the chi-squared distribution for N degrees of freedom is N - 2.
23. Examine the decay-counting example due to Sveshnikov studied in this section. Which data points are
most strongly affected by the experimenter forgetting to account for ambient background
radiation? Explain the relevance of this question to the problem of setting up a reasonable
value of α to use in deciding on whether or not a hypothesis is supported by experimental
data.
The purpose of this experiment is to derive an understanding of the fact that a result of a measurement is
a random number distributed about some mean (average), with some standard deviation. In addition we
will test this principle as a hypothesis using the χ2-test.
The apparatus is simply a collection of ten pennies.
When a handful of pennies is tossed, there is no way that you could predict with complete certainty what the outcome will be, but we accept as a hypothesis is that the outcome of the measurement is a random number, that has some distribution about a mean value. In this case the distribution is very simple, it is binomial. When you toss a single penny, you have a probability of p that it will turn up “heads”, 1 -p for “tails”. We certainly know p = 0.5 but lets keep it variable. The generating function for an N-coin toss is




Perform 20 tosses of all of the coins at once into a box, each time recording the number of
heads.
Now sort your data; determine the number of tosses N(m) that resulted in m heads, for m = 0,1,2,
,10,
and compute the experimental probability of getting m heads;

Do these experimental measurements of the probabilities confirm the hypothesis that the probability is given by ℘theor(m) = (10 m) 210 ? To decide this, compute χ2 for your data



Perform the χ2 Pearson test. With what confidence does your data support the hypothesis? This experiment is a grossly simplified version of the nuclear counting experiment that you will eventually perform. I would suggest writing up your experiment and producing a lab report, with cover page, table of raw data, tables of processed data, calculations and conclusions. Include a graph of the binomial distribution (theoretical distribution) with your data points superimposed on it.