|
The
elements
of a test of a hypothesis remain
the same no matter what kind of quantitative data you have and what
kind of statistical test you are running. However the application
of the elements depends on a number of things, including whether
your data are
nominal,
ordinal,
or
interval.
In addition to
the type of data, you have to consider what the research question
is. Are you asking if the mean weights of one group are significantly
greater than those of the other? Are you asking if scores on the
Eating Attitudes Test increase as a woman’s weight decreases?
You also need
to look at the
assumptions
of the test you are considering using. Most statistical tests assume
that you have randomly selected your subjects and that each observation
is independent. Additional assumptions are described with the specific
statistical tests.
What follows
is a discussion of some of the most common statistical tests.
Comparing
Means
Two Sample
T-test
One of the first
statistical tests you learned about was the
t-test.
This statistical
test is used to compare the means of two small groups.
(Since research is very expensive, we frequently use fewer than
100 subjects and sometimes fewer than 20.)
The t-test compares the difference between the two means to the
pooled standard error of the mean.
(The pooled standard error is a weighted average of the sample variances.)
This test is used with
interval
data.
Using a calculation
you don’t want to remember, the computer calculates a t-value and
compares it to the T-value in a chart for the correct number of
degrees
of freedom.
Degrees of
freedom?????!!!!!
Yes. Don’t ask why this term
is used; just learn what it means. The shape of the t-distribution
varies depending on the number of observations you have, so the
computer must use the correct t-table for the number of observations.
The degrees of freedom for a t-test is the total number of observations
in both groups minus 2.
Four assumptions
have to be satisfied in order to use this test:
1. Both
sampled populations have relative frequency distributions that
are approximately normally distributed.
(The statistical tests use for normally distributed populations
are called parametric
tests.)
2. The
population variances are equal.
3. The
samples are randomly selected from the populations.
4. The
samples are independent.
(Results from
one sample do not affect the observations in the other sample.)
Generally we
let the computer do the work and read the printout, which lists
the degrees of freedom, t-value for a
one-tailed
test,
t-value for a
two-tailed
test,
and the probability of each.
We have to decide ahead whether we need a one-tailed test
(we predict
the direction of the difference)
or a two-tailed
test
(we can’t predict the direction of the difference).
Then we look at the probability
(p-value) for the test
we selected. If the probability is less than the alpha we selected
(often .05),
we reject the null hypothesis; the alternative hypothesis is true.
If the probability is larger than alpha, we fail to reject the null
hypothesis; there is no significant difference between the groups.
If the printout lists the two-tailed probability only, and you know
you are doing a one-tailed test, just divide the two-tailed probability
by two.
Paired T-test
In the independent
samples t-test, we typically compare an experimental group to a
control group.
Instead of randomly dividing people into two groups and having one
group eat a high fat diet and the other eat a low fat diet, we could
let our subjects serve as their own controls.
How high is each person’s
cholesterol on a low fat diet versus on a high fat diet?
Each subject spends
several weeks or months consuming each diet.
Or we might also want
to look at pairs of subjects.
For example, do adolescent
sons eat significantly more kcalories than their fathers? (Most
parents will answer, "Yes!" without computing the statistics.)
We need to pair the sons with their dads, rather than having
a group of randomly selected sons and another of randomly selected
dads.
Let’s think about
an example related to weight loss.
Some people lose weight readily no matter what type of plan
they follow. (Do you have any friends like this?) Others are "easy
keepers", whose bodies hang on to every possible kcalorie.
What if mostly easy losers ended up in one group and mostly easy
keepers were in the other group? The results would not be related
to the diet plan, but to chance error. So instead we have each person
serve as his or her own control. Those who lose weight easily will
lose more regardless of the plan and vice versa.
But in this case we aren’t comparing the means of two groups.
Instead we are looking at the average difference in weight loss
for each subject between the two treatments.
If you are running
a paired t-test on the computer, you need to specify this test and
tell the computer how to pair your observations. The printout, however,
will look the same as for the independent sample t-test and you
will interpret the computer printout the same as for the two sample
t-test.
ANOVA
The previous
two tests work fine if you only have two groups, but what if you
have more than two?
For example, a dietetics student wants to lower the fat in cheesecake,
but doesn’t know if a reduced fat cheesecake will be acceptable.
So he makes one type of cheesecake with 50% less fat and one with
75% less fat, and compares it to the control. How can the three
cakes be compared?
You could
run three t-tests:
50% versus 75%; 50% versus control; 75% vs. control. Unfortunately
this could result in SPURIOUS results. (Isn’t that a great word?)
In simple, non-statistician terms, if you run a number of tests
on the same data, eventually SOMETHING will end up being significant.
But the results will be questionable.
(Not as good a word as spurious, is it?)
So should
we tell Bert to compare just two cheesecakes? No. He can run an
ANOVA, which stands for "analysis of variance". This test
compares the variability within each group to the variability between
each group. The resulting test statistic is called an F-value. In
doing this calculation, the variability within groups gets squared
and the variability between groups gets squared. The calculation
results in all positive numbers for the values of F, so the resulting
distribution looks like this:

Just as with
the t-test, the exact shape of the F-distribution will change, depending
on the degrees of freedom. In this case, we actually need two numbers
for degrees of freedom: the number of groups being compared minus
one and the total number of observations minus the number of groups.
(How in the
world do statisticians come up with these things? You can get a
statistics book that goes through all the mathematical background
or just accept it by faith.)
The computer
printout from this test will include an ANOVA table, which lists
the within error, the between error, degrees of freedom, the F-value
and the probability of getting an F that large if all the groups
came from the same population. As with all tests of hypotheses,
if the p-value is greater than .05 (or whatever alpha was chosen)
you fail to reject the null hypothesis of no differences among groups.
If the p-value is less than .05, reject the null hypothesis; there
is a difference among the groups… Somewhere. But we don’t know if
all the groups are different from each other or if only two are
different. So if the ANOVA shows significance, we have to go on
to a second step—multiple
comparison tests.
These are the
equivalent of t-tests.
SO WHY DID WE GO THROUGH ALL THE BOTHER OF RUNNING AN ANOVA?????
The multiple comparison tests are
"protected", because the ANOVA already showed there was
a significant difference somewhere. The results will not be spurious.
Among the various multiple comparisons tests are Scheffe’s, Fisher’s
LSD
(no kidding),
and Tukey-b.
The computer printouts from a multiple comparison will indicate
which groups are significantly different from each other.
Four assumptions
have to be satisfied in order to use this test:
1. All
sampled populations have relative frequency distributions that
are approximately normally distributed.
2. The
population variances are approximately equal.
3. The
samples are randomly selected from the populations.
4. The
samples are independently selected from the populations.
Chi-square
None of the above
tests can be used for nominal data. What is an average eye color?
Can’t do it. Average ethnicity? Nope. Even if we have used numbers
on a survey to represent various eye colors or ethnicities, those
numbers only represent the data; they don’t describe it. So if you
need to determine if there is a relationship between two nominal
factors, or between a nominal factor and an ordinal factor, you
need to use the Chi-square test of independence.
Is the choice
between engineering and dietetics dependent on gender? In order
to determine this, we travel to Unique College, a school with 50%
male students that has only two majors- dietetics (40% of the students)
and engineering (60% of the students). We plan to randomly select
50 students. We make a cross-tabulation table and insert the numbers
that we expect to see if major is not dependent on gender:
| |
Engineering
|
Dietetics
|
|
Female
|
15
|
10
|
|
Male
|
15
|
10
|
Since this school
has a 1/1 female to male ratio, we would expect to get the same
number of males and females in our random sample, 25 of each. Since
60% of the students at this school are engineering majors, we would
expect that 60% of the females (15) will be in engineering and 40%
of them (10) will be in dietetics. The same results are expected
for the males. Now we take our random sample and record the numbers
that we actually observe:
| |
Engineering
|
Dietetics
|
|
Female
|
Expected-15
Observed-
8
|
Expected-
10
Observed-
17
|
|
Male
|
Expected-
15
Observed-
22
|
Expected-
10
Observed-
3
|
The Chi-square
statistic is then calculated as follows:
1. For each cell-
a. find the difference
between the observed frequency and the expected frequency
b. square the
difference
c. divide the
squared difference by the expected frequency
2. Repeat step
1 for each of the cells in the table
3. Add up all
the numbers you have calculated
4. Compare the
total to a the Chi-square statistic on a Chi-square table
Since you squared
the differences, the total will always be positive. The distribution
of the Chi-square statistic depends on the number of rows and the
number of columns.
(Degrees of
freedom, again!)
The degrees
of freedom for this test is: (number of rows-1) times (number of
columns-1).
The hypothetical
printout for the above example gave the following result:
|
Chi-square
|
df
|
Significance
|
| 16.3334 |
1 |
.00001 |
A chi-square
this large would occur .001% of the time if, major is not dependent
on gender in the population. Since this probability is so low, we
reject the null hypothesis and conclude that choice of major is
dependent on gender.
(Keep in mind that this doesn’t mean that men never choose dietetics
as a major.)
Assumptions
Linear Regression
Sometimes a dependent
variable is influenced by one or more other variables, called independent
variables. For example, we might want to predict weight loss per
month on our new diet plan. Caloric intake, energy expenditure,
health status, and other factors will impact weight loss. We collect
data on many randomly selected
Correlation
As the number
of paved roads in a country increases, the incidence of cardiovascular
heart disease increases. This is an example of a correlation. Correlations
do not show causation! Paved roads do not cause heart disease!
(However,
paved roads do indicate that people drive more and often get less
exercise than in countries with few paved roads.)
A positive
correlation shows that in general as one factor in your sample increases
another factor also increases. Negative correlations show that as
one factor in your sample increases another factor decreases. The
advantage of a correlation is that it provides a measure of the
strength of the relationship between two variables.
You may have
plotted out data like this when you were in elementary school. The
data may have looked like this:

You were then
asked to draw a line than best represented the relationship. It
seemed like an exercise in futility, didn’t it?
While we might agree that the relationship is positive
(as one factor
increases, so does the other),
it seemed as though
many different lines would be possible. Fortunately, the computer
can calculate this "best fit" line and determine how variable
all the points are from the selected line.
A computer printout
of the results of a correlation will indicated the strength of the
correlation from 0
(no correlation)
to 1
(a perfect
correlation).
The printout will also
include the p-value, which you compare to the selected alpha. If
the probability is less than alpha, the correlation is significant.
Some statistical
tests for correlations, such as the Pearson’s Product Moment, are
used with
interval
data. Others, such as Spearman’s rank order correlation coefficient
can be used with
ordinal
data.

|