Introduction to Chi-square tests
This last section of the course deals with one of the most
useful statistical tests available to psychologists. Here I
shall first give an introduction to some general principles,
followed by the details of statistical analysis.
Qualitative versus Quantitative Research Methods
The reason this test is so useful is because psychologists often
use qualitative data. However, note that we are not
dealing here with qualitative research methods. In terms
of methodology, the chi-square test belongs on the quantitative
side of the divide.
As Trochim points out, all qualitative data can be reduced to
quantitative data by assigning numbers in appropriate ways.
There is much debate about this topic, and you can refer to
Trochim for more information.
A distinction between research methods and data analysis can
make this issue less controversial: Researchers who use
qualitative methods often reject the very notion of
DATA. The word "data" literally means "that which
is given". Some researchers argue that it is false to believe
in such a notion because it negates the constructive nature of
perception and the social construction of knowledge. If there
are no data, there can be no data analysis.
Qualitative data analysis
Trochim's point about qualitative data is valid if one accepts
the notion of data as "given". The distinction between
qualitative and quantitative then reduces to methods that avoid
direct measurements, and use only classification and counting.
For example interview transcripts can be content-analysed.
Trochim's point is simply that as soon as one begins to count,
then you are using quantity. When the results of a
content-analysis are tabulated showing the frequency of certain
categories, then quantitative data analysis is being used.
For the purpose of this section of the course it is assumed that
it is valid to think of "data as given", and that counting
frequencies of whatever categories are employed is the method of
analysis.
The chi-square test is useful whenever the data are in
categories and it is possible to count category instances.
Classification and counting
Basic Principles
- Mutually exclusive categories
- The simplest classification is a dichotomy, such as "figure"
versus "ground". Each category excludes the other.
This is essential for the purpose of chi-square analysis.
- Exhaustive categories
- This means that all possibilities are included,
which is not always obvious. Exhaustive classification
is the basic requirement that conflicting information
not be ignored. For example, consider the Great Dictator
who claims to be democratically elected on the basis of
99% of the votes, when in fact the election was boycotted by
the vast majority of the population! The problem would
disappear if the categories were "in favour"; "against";
"abstain".
- Independence
- The classification of one event should not be affected
by the prior classification of another event. This is
an even more subtle problem than the previous one.
Repeated measures illustrate how this could be
violated: Consider the example comparing PSY206F and
PSY307F given below. Non-independence occurs when the
same people are counted twice. Howell describes a
simple check to control this.
Multi-category classification
Often a dichotomy is too simplistic. The dichotomous "yes" versus
"no" options seen in questionnaires can be extended by including
an "undecided" option.
Apartheid-style racial classification of
"black" versus "white" quickly leads to intermediate categories
such as "coloured", and several strange labels such as "honarary
white"! Whatever the purpose, multi-category classification is
more flexible than dichotomous systems. However, the same
requirement of exclusion pertains: Categories must be
mutually exclusive.
Multi-way classification
This is not simply an extension of the number of categories
employed. In multi-way classification the same object is
considered on two or more dimensions. A dimension is
like a points of view. For example, in a two-way classification an
interview response may be classified in terms of an attitude
dimension and at the same time in terms of a demographic
dimension: Attitude responses are cross-classified by
demographic category to yield information about the distribution
of attitudes across various groups. We call this contingency
table analysis.
The purpose of contingency table analysis is to investigate
relationships between dimensions.
For example, consider the Rape Conviction data cited in Howell.
Data were classified in two dimensions: The verdict of a
jury in decisions about rape, and the level of blame
attributed to the victim. In this example (but not always) there
was a distinction between cause and effect: Attribution of
blame was found to influence verdict.
The data are shown below:
| | Blame |
|
|---|
| Verdict | High | Low | Total
|
|---|
| Guilty | 105 | 153 | 258
|
|---|
| Not Guilty | 76 | 24 | 100
|
|---|
| Total | 181 | 177 | 358
|
|---|
The table shows observed frequencies in the cells, as well as
row and column marginal totals. The grand total is shown at the
bottom right hand corner.
Fixed versus random categories
This reflects the distinction between dependent and
independent variables. In the above example verdict is a
dependent variable and the categories "guilty" versus "not
guilty" are fixed. Blame is an independent variable and the
two categories can be viewed as a random sample of all possible
levels of blame. Thus random categories are not exhaustive, but
fixed categories must be exhaustive. This is further explained
in the example comparing PSY307F and PSY400W given below.
Percentages
In the previous example the data become much clearer when
percentages are reported. However, in a contingency table these
can be calculated for rows or for columns, and will be quite
different depending on which. Whenever there is a meaningful
distinction between cause and effect in a contingency table, one
should calculate percentages so they add to 100 across levels
of the DV for each category of the IV, for example:
| | Blame
|
|---|
| Verdict | High | Low
|
|---|
| Guilty | 58% | 86%
|
|---|
| Not Guilty | 42% | 14%
|
|---|
| | 100% | 100%
|
|---|
If you calculate percentages as though verdict was the cause of
the attribution of blame then you get:
| | Blame |
|
|---|
| Verdict | High | Low |
|
|---|
| Guilty | 41% | 59% | 100%
|
|---|
| Not Guilty | 76% | 24% | 100%
|
|---|
To help you remember how to calculate percentages in tables where
you have a DV, I suggest
- The Anchor Rule:
- Use column percentages and make DV the rows.
This follows the convention of plotting the DV on the vertical
axis and the IV on the horizontal axis, and is therefore easy to
remember. The percentages are "anchored" in the columns because
they will add to 100 downwards. This is merely a convenient way to
remember, and the actual orientation of the
table is not essential so long as the percentages are calculated
appropriately. As another example, check Howell's exhibit 19.6
(4th edition).
Significance
Consider the following table showing the distribution of
Psychology students in 2002 classified by sex and by level of
study:
| | PSY206F | PSY307F | PSY400W | Total
|
|---|
| Male | 90 | 36 | 4 | 130
|
|---|
| Female | 229 | 173 | 23 | 425
|
|---|
| Total | 319 | 209 | 27 | 555
|
|---|
Observed versus Expected Frequencies
It was observed that in 2002 there were 36 males and 173 females
in the PSY307F class of 209 students. We may ask, what was the
expected number of males?
According to the population distribution, there should be
approximately 50% of either sex. This could form the basis of
a null hypothesis for a significance test in which we compare
the observed and expected frequencies. The latter could be
obtained from the null hypothesis: 50% of 209 = 104.5
(note that fractions of persons are allowed when dealing in
expected frequencies!). However, we may use any reasonable null
hypothesis: In the case of left versus right handed persons the
population distribution is not 50/50 but approximately 1/10, so
in a class of 209 students it would be unexpected to find half
are left handed. The expected frequency would be about 20.9.
A different kind, but equally reasonable null hypothesis is that
the male/female distribution is proportionately the same in
PSY206F, in PSY307F, and in PSY400W. For example, since there
are 90 males versus 229 females in PSY206F, we might expect the
same in PSY307F. Using the expected frequency of 90 out of a
total of 318 gives 28% male versus 72% female in PSY206F. In
PSY307F we observed only 17% versus 83%.
The discrepancy between observed and expected frequecies is the
subject of the chi-square test of significance. Note that the
expected frequencies are based on a null hypothesis about the
population frequency distribution.
The topic of statistical inference will be dealt with later.
Here we note that the two kinds of null hypothesis described
above give different models for the expected frequencies:
- Goodness-of-fit test
- Mostly used with a single dimension.
- Observations distribute themselves at random into
the categories, and are equally likely. The expected
outcome is equal frequencies.
- The expected outcome is based on some prior
knowledge of the population distribution.
- Test of association
- Only used with two dimensions: The joint classifications
by rows and columns are mutually independent. You can understand
this with reference to interaction in a factorial
ANOVA design: Mututal independence implies a null
hypothesis of no interaction. With categorical data this
requires contingency table analysis.
Non-independence
In the above example, the expected frequency for PSY307F was
based on the observed frequency in PSY206F in the same year.
Consider the following table in which the data were for PSY206F
in 2000, PSY307F in 2001, and PSY400W in 2002:
| | PSY206F (2000) | PSY307F (2001) | PSY400W (2002) | Total
|
|---|
| Male | 58 | 31 | 4 | 93
|
|---|
| Female | 225 | 142 | 23 | 390
|
|---|
| Total: | 283 | 173 | 27 | 483
|
|---|
Here independence is violated: The same students would very
likely have been counted several times, and the total of 483 for
the table would not be the same as the actual total number of
different people (see Howell). Therefore the chi-squared test
could not be used on such data, because the null hypothesis is
by definition false.
Copyright 2002, University of Cape Town