As Trochim points out, all qualitative data can be reduced to quantitative data by assigning numbers in appropriate ways. There is much debate about this topic, and you can refer to Trochim for more information.

A distinction between research methods and data analysis can make this issue less controversial: Researchers who use qualitative methods often reject the very notion of DATA. The word "data" literally means "that which is given". Some researchers argue that it is false to believe in such a notion because it negates the constructive nature of perception and the social construction of knowledge. If there are no data, there can be no data analysis.

For the purpose of this section of the course it is assumed that it is valid to think of "data as given", and that counting frequencies of whatever categories are employed is the method of analysis.

The chi-square test is useful whenever the data are in categories and it is possible to count category instances.

**Mutually exclusive categories**- The simplest classification is a dichotomy, such as "figure" versus "ground". Each category excludes the other. This is essential for the purpose of chi-square analysis.
**Exhaustive categories**- This means that all possibilities are included, which is not always obvious. Exhaustive classification is the basic requirement that conflicting information not be ignored. For example, consider the Great Dictator who claims to be democratically elected on the basis of 99% of the votes, when in fact the election was boycotted by the vast majority of the population! The problem would disappear if the categories were "in favour"; "against"; "abstain".
**Independence**- The classification of one event should not be affected
by the prior classification of another event. This is
an even more subtle problem than the previous one.
*Repeated measures*illustrate how this could be violated: Consider the example comparing PSY206F and PSY307F given below. Non-independence occurs when the same people are counted twice. Howell describes a simple check to control this.

The purpose of contingency table analysis is to investigate relationships between dimensions.

For example, consider the Rape Conviction data cited in Howell. Data were classified in two dimensions: The verdict of a jury in decisions about rape, and the level of blame attributed to the victim. In this example (but not always) there was a distinction between cause and effect: Attribution of blame was found to influence verdict. The data are shown below:

Blame | |||
---|---|---|---|

Verdict | High | Low | Total |

Guilty | 105 | 153 | 258 |

Not Guilty | 76 | 24 | 100 |

Total | 181 | 177 | 358 |

Blame | ||
---|---|---|

Verdict | High | Low |

Guilty | 58% | 86% |

Not Guilty | 42% | 14% |

100% | 100% |

Blame | |||
---|---|---|---|

Verdict | High | Low | |

Guilty | 41% | 59% | 100% |

Not Guilty | 76% | 24% | 100% |

To help you remember how to calculate percentages in tables where you have a DV, I suggest

**The Anchor Rule**:- Use column percentages and make DV the rows.

PSY206F | PSY307F | PSY400W | Total | |
---|---|---|---|---|

Male | 90 | 36 | 4 | 130 |

Female | 229 | 173 | 23 | 425 |

Total | 319 | 209 | 27 | 555 |

According to the population distribution, there should be approximately 50% of either sex. This could form the basis of a null hypothesis for a significance test in which we compare the observed and expected frequencies. The latter could be obtained from the null hypothesis: 50% of 209 = 104.5 (note that fractions of persons are allowed when dealing in expected frequencies!). However, we may use any reasonable null hypothesis: In the case of left versus right handed persons the population distribution is not 50/50 but approximately 1/10, so in a class of 209 students it would be unexpected to find half are left handed. The expected frequency would be about 20.9.

A different kind, but equally reasonable null hypothesis is that the male/female distribution is proportionately the same in PSY206F, in PSY307F, and in PSY400W. For example, since there are 90 males versus 229 females in PSY206F, we might expect the same in PSY307F. Using the expected frequency of 90 out of a total of 318 gives 28% male versus 72% female in PSY206F. In PSY307F we observed only 17% versus 83%.

The discrepancy between observed and expected frequecies is the subject of the chi-square test of significance. Note that the expected frequencies are based on a null hypothesis about the population frequency distribution. The topic of statistical inference will be dealt with later. Here we note that the two kinds of null hypothesis described above give different models for the expected frequencies:

**Goodness-of-fit test**- Mostly used with a single dimension.
- Observations distribute themselves at random into the categories, and are equally likely. The expected outcome is equal frequencies.
- The expected outcome is based on some prior knowledge of the population distribution.

**Test of association**- Only used with two dimensions: The joint classifications
by rows and columns are mutually independent. You can understand
this with reference to
*interaction*in a factorial ANOVA design: Mututal independence implies a null hypothesis of no interaction. With categorical data this requires*contingency table analysis*.

PSY206F (2000) | PSY307F (2001) | PSY400W (2002) | Total | |
---|---|---|---|---|

Male | 58 | 31 | 4 | 93 |

Female | 225 | 142 | 23 | 390 |

Total: | 283 | 173 | 27 | 483 |

Here independence is violated: The same students would very likely have been counted several times, and the total of 483 for the table would not be the same as the actual total number of different people (see Howell). Therefore the chi-squared test could not be used on such data, because the null hypothesis is by definition false.

Copyright 2002, University of Cape Town