Inference using chi-squared tests
This last lecture in the course deals with some essential points
about using chi-square tests in the analysis of categorical
data. In all cases, it is not the classification that is
tested, but the distribution of observed frequencies. To do so,
it is compared with the expected frequency distribution derived
from some appropriate null hypothesis.
As already shown, there are two types of
research questions involving frequency data.
- Goodness-of-fit
- Contingency table analysis
The procedures differ in some respects but share a common
framework that we have already encountered in other statistical
procedures: Data, Assumptions, Comparison, Inference. First I
shall outline the common features, and then deal with the
specific details.
You are not expected to do the actual chi-squared calculations for
this section of the course. What is required is to know how
to interpret the results based on a clear understanding of
topics dealt with earlier:
- the purpose of the analysis, which determines the null
hypothesis and degrees of freedom
- observed and expected frequencies
- percentages
- classification and counting
General
The following points always apply. Refer to the textbook for
more details.
- Data
-
A table of frequencies is obtained by sorting the data and
counting sub-totals.
- Assumptions
-
- Classification should be mutually exclusive and
exhaustive. The observations should be independent: No
person should be counted twice. This is not the same as
classifying a person on more than one dimension
(see contingency tables below). Lack of independence may
also occur in more subtle ways. For example, the
distribution of males and females in a particular
situation may be determined by a quota system.
- A null hypothesis. This depends on the type of problem
(see below).
- The level of significance as determined by convention,
usually alpha = .05.
- Compare
- The observed distribution of frequencies is compared with an
expected distribution based on a null hypothesis. The
amount of discrepancy is indicated by:
- The calculated value of chi-squared
- The formula for calculating chi-squared is
always the same (see below): For each
ijth
cell in the table, you find the difference
between observed and expected frequencies. This
value is then squared and divided by the expected
frequency. You then sum these values for the
entire table to obtain the result.
- The critical value.
- The calculated value of chi-square is compared
to the critical value for a particular
significance level and degrees of freedom.
- Decide
-
The result if significant if the critical value is exceeded.
Note that it is a non-directional test: Chi-squared gets
bigger even if the expected frequencies are less than the
observed.
- Infer
- A significant result indicates that the population
distribution is not as expected in terms of the null
hypothesis.
Goodness-of-fit
A test on the distribution of frequencies in a one-way
classification.
Null Hypothesis
Specified by you on the basis of some prior information. This
provides the expected frequencies. For example, in the case
of handedness it was 1/10 "left" versus 9/10 "right". If
no prior information is available you assume the distribution
would be random (equally probable).
Degrees of freedom
Always one less than the number of categories (k): df =
k - 1. For example, if you have 55 females in a sample
of 100 people, then there must be 45 males because the
categories are mutually exclusive and exhaustive.
Inference
If the result is significant, you conclude the observed
distribution is not as expected. If it is not significant, you
conclude the observed distribution is a "good fit" to the
expected distribution.
Contingency tables
Also known as two-way tables, because there are
two dimensions of classification.
The purpose is to detect an
association. That is what contingency refers to:
When two sets of attributes are contingent then they are
associated. Cross-classification and inspection of the frequency
distribution will reveal if attributes tend to coincide.
Conversely, independence implies no contingency, lack of
association.
Null hypothesis
Mutual independence. If this is true, then the distribution of
cell frequencies will reflect the marginal row and column
totals.
- Expected frequency:
- For each cell in the table:
Eij = (Rowi x Colj) / T
where Rowi is the ith row marginal total,
Colj is the jth column marginal total
and T is the grand total. In the rape conviction example,
if this were the case, then for any row a similar
distribution of cell frequencies should be found (irrespective of attributions
about the victim's blame). The marginal total was 258
guilty versus 100 not guilty verdicts, which is roughly
3:1. If the null hypothesis were true, then at each level
of blame we should expect approximately a 3:1 distribution
of guilty versus not guilty verdicts. The same principle
applies for any column:
Table of Expected frequencies
| | Blame |
|
|---|
| Verdict | High | Low | Total
|
|---|
| Guilty | 130.5 | 127.6 | 258
|
|---|
| Not Guilty | 50.6 | 49.4 | 100
|
|---|
| Total | 181 | 177 | 358
|
|---|
(note that the expected frequencies by definition give the
same row and column totals as the observed frequencies).
- Degrees of freedom:
df = (R - 1) x (C - 1)
where R is the number of rows in the table and C is the number of
columns in the table. Given the total for any row all
but one of the cell frequencies are free to vary, and
for any column the same applies.
You can see this in the rape conviction example where
df = 1. Suppose all cell frequencies were
unknown except the number of "not guilty" verdicts for
"low blame" victims (i.e. 24). Given the total of 100
"not guilty" cases, there must be 76 in the "high blame"
cell.
Note that the total frequencies are irrelevant to the
truth or falsity of the null hypothesis.
Significance
If the chi-squared test is significant then there must
be some association or contingency. In the rape prosecution
example the calculated value of chi-square was 35.93 and the
result was significant. The conclusion is that verdict is
contingent on blame. Inspection of the data table shows a
distribution of approximately six guilty to one not guilty
verdict when there is low blame, but less than two to one when
there is high blame. Differences between observed and expected
frequencies are quite large. This is most clearly seen in the
table of calculated chi-squared values for each cell (remember that the
calculated chi-squared value is the sum total of chi-squared
for each cell):
Table of calculated chi-squared values
| | Blame
|
|---|
| Verdict | High | Low
|
|---|
| Guilty | 4.96 | 5.07
|
|---|
| Not Guilty | 12.80 | 13.09
|
|---|
The large chi-squared values in the "not guilty" row indicate
that the largest discrepancies were found there. Comparison
between the observed and expected frequencies for these cells
reveals more "not guilty" observed than expected for high
blame, and less "not guilty" than expected for low blame.
This example assumes a causal relation between attributions of
blame and resulting verdict. That assumption is supported by
the experimental design of the study. Clearly attribution of
blame had an effect.
Another example
The distribution of male and female students in the PSY307F
class has already been noted. Data were obtained from class
lists for 2002, as shown in the previous lecture. Inspection of
the table showed declining numbers of males as students
progress from 2nd year through to Honours. This was more
evident from the percentages.
Percentages were calculated on the assumption that
the independent variable is the level of study. To test if
the distribution of males versus female students depends on the
level of study the chi-square test for a contingency table will
be applied. Given the null hypothesis of mutual independence
expected frequencies for these data are:
Table of Expected Frequencies
| | PSY206F | PSY307F | PSY400W
|
|---|
| Male | 74.7 | 48.9 | 6.3
|
|---|
| Female | 244.3 | 160.1 | 20.7
|
|---|
Table of calculated chi-squared values
| | PSY206F | PSY307F | PSY400W
|
|---|
| Male | 3.12 | 3.43 | 0.85
|
|---|
| Female | 0.96 | 1.05 | 0.26
|
|---|
The calculated total chi-squared value was 9.67, with df =
2 the critical value is 5.99 and the null hypothesis is
rejected. It seem the distribution of male versus female
students depends on the level of study. Examination of the
table above shows large chi-squared values for males in PSY206F
and PSY307F. Comparing observed and expected frequencies for
these cells reveals that relative to the null hypothesis, there
were "too many" males in PSY206F and "too few" in PSY307F.
This could be interpreted as a "bulge" in the number of
males in PSY206F.
Note that the test does not indicate anything about the
decline in numbers across levels of study, nor about the
disproportion of males versus females: It only indicates that
the disproportion is not the same across levels of study.
The data for 2000 are given below for comparison:
Table of Observed frequencies in 2000
| | PSY206F | PSY307F | PSY400W | Total
|
|---|
| Male | 58 | 23 | 4 | 85
|
|---|
| Female | 225 | 114 | 30 | 369
|
|---|
| Total | 283 | 137 | 34 | 454
|
|---|
Table of Percentages in 2000
| | PSY206F | PSY307F | PSY400W
|
|---|
| Male | 21 | 17 | 12
|
|---|
| Female | 79 | 83 | 88
|
|---|
| Total | 100 | 100 | 100
|
|---|
The calculated value of chi-square for these data is 1.99,
with two degrees of freedom. This is less than the critical
value (5.99). We cannot reject the null hypothesis of
independence, and conclude that the distribution of male
versus female students does not depend on the level of study.
Although there appears to be a pattern, we have
insufficient evidence to claim that it is statistically
significant.
P.S.In 2001 the same calculations were done, and the
opposite pattern appeared in the observed frequencies - i.e.
there appeared to be an increasing number of males. However,
the chi-square test again was not significant. We do not have
strong evidence that the proportion of males versus females
depends on the level of study in psychology.
Sample size, number of categories, and expected
frequencies
Howell cautions against using chi-squared tests when you have expected
frequencies that are less than five. This is a rule-of-thumb,
but quite important because it determines sample size. You can
see that a contingency table can have very many cells.
With small samples that inevitably leads to problems with small
expected frequencies. On the other hand, using large numbers of
categories increases the degrees of freedom, which is generally
a good thing. But if you inspect the table of critical
values for chi-square, it will be obvious that the critical
value increases with degrees of freedom: It becomes more
difficult to reject the hypothesis as the critical value
increases, which reduces power. So there is a tradeoff to be
considered when planning the research. Increasing the number of
categories will yield more information, but this may reduce the
power unless there are also enough heads to be
counted (sample size). Fortunately, counting is not very costly.
Categorical data often come cheaply, and relatively large
samples can be obtained quickly.
Another consideration in this tradeoff is the fact that
frequency distributions across multiple categories can be very
difficult to interpret. Clarity and economy can be gained by
reducing the numbers of categories. Simplicity is generally a
good thing, as Popper's quote
in my earlier lecture would suggest.
Copyright 2002, University of Cape Town