Only the seven largest departments are included here, but the example is representative of the original larger dataset.

The videos show how to set this up for analysis in EXCEL using a Pivottable to generate the crosstabulations and then a Chi-squared analysis to test for independence between admission rates and sex of the applicants. The result is a very strong rejection of the independence. Aggregate data suggests a strong bias in favor of male applicants. If it is assumed applicants are equally qualofied, this would constitute an illegal discrimination against female applicants. However, inclusion of a Department variable as a page variable in the pivottable allows filtering to give subgroup results. testing for sex discrimination in departmental subsets gives a different result. This outcome is a classical example of Simpson's Paradox, wherein aggregate data gives a relationship reverse to that in the subsets. This is caused by lurking variables that are not controlled for in the aggregation.

Since, as we examine relationships statistically, we usually don't measure
everything, the disturbing question that comes out of this is "how can
you be sure that a relationship seen in any crosstabulation or correlation
is meaningful and not just the result of some unmeasured other variable
or interaction that has not been taken into account?" The answer is, "we
can't" thus—statistical significance is not "proof" of anything.

Even very high confidence in a conclusion is not “proof.” Our understandings
of situations may change as a result of digging further into the data.
Sometimes, aggregations of data may be inappropriate because there are
other variables lurking in the subsets that could change the interpretation
substantially.

Despite its limitations, Microsoft EXCEL is available on most people's
desktops and provides rapid analysis capabilities in an understandable
interface.

Excel's pivottable tool is a very convenient tool for crosstabulation
and examining aggregates versus subsets. The page variable allows drilling
further down into the data in a very intutuitive way, using filters.

Screencaptures showing formulas | Exposition.doc |

Dataset in EXCEL | sexbias2.xls |

Setting up a Pivottable | Video berkpivt.html |

calculating percents of admissions | Video berkpct.html |

flexible Chi^{2} Calculation and interpretation |
Video berkchi2.html |

The partial dataset was obtained from Gerstman B.B. (2000) ** Data
Analysis with Epi Info**,

http://www.google.com/search?hl=en&q=simpson%27s+paradox

http://core.ecu.edu/psyc/wuenschk/StatHelp/Reversal-Paradox.txt

http://repository.upenn.edu/cgi/viewcontent.cgi?article=1014&context=wharton_research_scholars

http://wolfweb.unr.edu/homepage/jerryj/NNN/Aggregates.pdf

Return to course page

Return to home