Example: paired data method for
correction for other variables
Sex and Salaries Are they significantly different?
There is a common belief that females' pay scales for equivalent work are
less than those of males.
When studied in the context of broad variation of pay for different
work, such differences may not be apparent so the overall means of these
two distributions are not vastly different, but if other factors affecting
the pay are controlled and adjusted for (the Bill Cosby school of detecting
causality) a sex difference may be detected.
Since we don't know which way a difference might show, we will use
two-tailed tests in the t-tests.
There is a Comma-delimited ASCII file of the data as sexsal.txt
which you can copy and paste into EXCEL and immediately do a SaveAs
to your A: disk, specifying file type EXCEL before you start on the project.
It's not enough to add the .xls extension on the file.
-
Comment on the results of each analysis. This might work well if you put
the analyses down the page next to the data, and then short comments to
the right of each analysis. Answer the questions by typing the answers
down at the bottom of the sheet or by the relevant data.
Use the following tools in EXCEL (you don't have to do them exactly in
this order. If you are not sure about some of them, do the easy ones first):
-
add a third column containing the differences between the pairs
-
Derive tables of descriptive statistics for each of the two samples, including
95% confidence intervals. comment on the means and confidence intervals.
Is there strong statistical evidence from this analysis that these two
samples came from different populations? Why or why not?
-
conduct an F test (alpha=0.05) on the variances of the two distributions
and decide whether to do a t-test assuming equal or unequal variances.
Comment on the results. What does the Pvalue mean?
-
conduct the appropriate unpaired t-test to compare the two distributions
(H0: difference=0, alpha=0.05). Comment on the results. Is there a significant
difference? what is the probability of getting this result if the H0 is
true? Do you conclude with 95% confidence from this analysis that males
and females have different average salaries?
-
conduct a t-test for difference of the means of the two distributions as
paired variables, with alpha =0.05 comment on the results. Are they
"significant at the 5% level"? what does that mean? Are they "significant
at the 1% level"?
-
derive descriptive statistics and a confidence interval (95% confidence)
on the column of differences. Calculate a t value for the mean of
this distribution compared an H0 that the mean =0. How does this compare
to the t-test on paired variables?
-
Why didn't we see the difference when we did the analysis without pairing
the observations? What did the pairing do for us mathematically so that
we could identify a difference with higher confidence? Construct a histogram
on the pooled observations (all 200 treated as a group) make sure the X
axis has the right values as labels on the axis.
-
Why is it okay to use a t test rather than a Z test even though there is
a large number of observations?
-
highlight both original columns again, click on the chart icon and this
time select an XY scatter chart. Click next a few times, making sure that
the chart appears as an object in the same spreadsheet, then move it to
a convenient location. Right click on the middle of the data and click
add trendline. This will bring up a dialogue where you can select a linear
trendline under type, and then, on the options tab, check the boxes for
display equation on chart and set intercept = 0. Consider the result.
What does this mean in comparing male to female salaries when values are
paired this way? If you wanted to predict a female's salary knowing only
the salary of her pair partner, how would you calculate your best guess?