Introduction to Chi-square tests

This last section of the course deals with one of the most useful statistical tests available to psychologists. Here I shall first give an introduction to some general principles, followed by the details of statistical analysis.

Qualitative versus Quantitative Research Methods

The reason this test is so useful is because psychologists often use qualitative data. However, note that we are not dealing here with qualitative research methods. In terms of methodology, the chi-square test belongs on the quantitative side of the divide.

As Trochim points out, all qualitative data can be reduced to quantitative data by assigning numbers in appropriate ways. There is much debate about this topic, and you can refer to Trochim for more information.

A distinction between research methods and data analysis can make this issue less controversial: Researchers who use qualitative methods often reject the very notion of DATA. The word "data" literally means "that which is given". Some researchers argue that it is false to believe in such a notion because it negates the constructive nature of perception and the social construction of knowledge. If there are no data, there can be no data analysis.

Qualitative data analysis

Trochim's point about qualitative data is valid if one accepts the notion of data as "given". The distinction between qualitative and quantitative then reduces to methods that avoid direct measurements, and use only classification and counting. For example interview transcripts can be content-analysed. Trochim's point is simply that as soon as one begins to count, then you are using quantity. When the results of a content-analysis are tabulated showing the frequency of certain categories, then quantitative data analysis is being used.

For the purpose of this section of the course it is assumed that it is valid to think of "data as given", and that counting frequencies of whatever categories are employed is the method of analysis.

The chi-square test is useful whenever the data are in categories and it is possible to count category instances.

Classification and counting

Basic Principles

Mutually exclusive categories
The simplest classification is a dichotomy, such as "figure" versus "ground". Each category excludes the other. This is essential for the purpose of chi-square analysis.
Exhaustive categories
This means that all possibilities are included, which is not always obvious. Exhaustive classification is the basic requirement that conflicting information not be ignored. For example, consider the Great Dictator who claims to be democratically elected on the basis of 99% of the votes, when in fact the election was boycotted by the vast majority of the population! The problem would disappear if the categories were "in favour"; "against"; "abstain".
The classification of one event should not be affected by the prior classification of another event. This is an even more subtle problem than the previous one. Repeated measures illustrate how this could be violated: Consider the example comparing PSY206F and PSY307F given below. Non-independence occurs when the same people are counted twice. Howell describes a simple check to control this.

Multi-category classification

Often a dichotomy is too simplistic. The dichotomous "yes" versus "no" options seen in questionnaires can be extended by including an "undecided" option. Apartheid-style racial classification of "black" versus "white" quickly leads to intermediate categories such as "coloured", and several strange labels such as "honarary white"! Whatever the purpose, multi-category classification is more flexible than dichotomous systems. However, the same requirement of exclusion pertains: Categories must be mutually exclusive.

Multi-way classification

This is not simply an extension of the number of categories employed. In multi-way classification the same object is considered on two or more dimensions. A dimension is like a points of view. For example, in a two-way classification an interview response may be classified in terms of an attitude dimension and at the same time in terms of a demographic dimension: Attitude responses are cross-classified by demographic category to yield information about the distribution of attitudes across various groups. We call this contingency table analysis.

The purpose of contingency table analysis is to investigate relationships between dimensions.

For example, consider the Rape Conviction data cited in Howell. Data were classified in two dimensions: The verdict of a jury in decisions about rape, and the level of blame attributed to the victim. In this example (but not always) there was a distinction between cause and effect: Attribution of blame was found to influence verdict. The data are shown below:
Not Guilty7624100
The table shows observed frequencies in the cells, as well as row and column marginal totals. The grand total is shown at the bottom right hand corner.

Fixed versus random categories

This reflects the distinction between dependent and independent variables. In the above example verdict is a dependent variable and the categories "guilty" versus "not guilty" are fixed. Blame is an independent variable and the two categories can be viewed as a random sample of all possible levels of blame. Thus random categories are not exhaustive, but fixed categories must be exhaustive. This is further explained in the example comparing PSY307F and PSY400W given below.


In the previous example the data become much clearer when percentages are reported. However, in a contingency table these can be calculated for rows or for columns, and will be quite different depending on which. Whenever there is a meaningful distinction between cause and effect in a contingency table, one should calculate percentages so they add to 100 across levels of the DV for each category of the IV, for example:
Not Guilty42%14%
If you calculate percentages as though verdict was the cause of the attribution of blame then you get:
Not Guilty76%24%100%

To help you remember how to calculate percentages in tables where you have a DV, I suggest

The Anchor Rule:
Use column percentages and make DV the rows.
This follows the convention of plotting the DV on the vertical axis and the IV on the horizontal axis, and is therefore easy to remember. The percentages are "anchored" in the columns because they will add to 100 downwards. This is merely a convenient way to remember, and the actual orientation of the table is not essential so long as the percentages are calculated appropriately. As another example, check Howell's exhibit 19.6 (4th edition).


Consider the following table showing the distribution of Psychology students in 2002 classified by sex and by level of study:

Observed versus Expected Frequencies

It was observed that in 2002 there were 36 males and 173 females in the PSY307F class of 209 students. We may ask, what was the expected number of males?

According to the population distribution, there should be approximately 50% of either sex. This could form the basis of a null hypothesis for a significance test in which we compare the observed and expected frequencies. The latter could be obtained from the null hypothesis: 50% of 209 = 104.5 (note that fractions of persons are allowed when dealing in expected frequencies!). However, we may use any reasonable null hypothesis: In the case of left versus right handed persons the population distribution is not 50/50 but approximately 1/10, so in a class of 209 students it would be unexpected to find half are left handed. The expected frequency would be about 20.9.

A different kind, but equally reasonable null hypothesis is that the male/female distribution is proportionately the same in PSY206F, in PSY307F, and in PSY400W. For example, since there are 90 males versus 229 females in PSY206F, we might expect the same in PSY307F. Using the expected frequency of 90 out of a total of 318 gives 28% male versus 72% female in PSY206F. In PSY307F we observed only 17% versus 83%.

The discrepancy between observed and expected frequecies is the subject of the chi-square test of significance. Note that the expected frequencies are based on a null hypothesis about the population frequency distribution. The topic of statistical inference will be dealt with later. Here we note that the two kinds of null hypothesis described above give different models for the expected frequencies:

Goodness-of-fit test
Mostly used with a single dimension.
  1. Observations distribute themselves at random into the categories, and are equally likely. The expected outcome is equal frequencies.
  2. The expected outcome is based on some prior knowledge of the population distribution.
Test of association
Only used with two dimensions: The joint classifications by rows and columns are mutually independent. You can understand this with reference to interaction in a factorial ANOVA design: Mututal independence implies a null hypothesis of no interaction. With categorical data this requires contingency table analysis.


In the above example, the expected frequency for PSY307F was based on the observed frequency in PSY206F in the same year. Consider the following table in which the data were for PSY206F in 2000, PSY307F in 2001, and PSY400W in 2002:

 PSY206F (2000)PSY307F (2001)PSY400W (2002)  Total

Here independence is violated: The same students would very likely have been counted several times, and the total of 483 for the table would not be the same as the actual total number of different people (see Howell). Therefore the chi-squared test could not be used on such data, because the null hypothesis is by definition false.

Copyright 2002, University of Cape Town