The elements of a test of a hypothesis remain the same no matter what kind of quantitative data you have and what kind of statistical test you are running. However the application of the elements depends on a number of things, including whether your data are nominal, ordinal, or interval.

In addition to the type of data, you have to consider what the research question is. Are you asking if the mean weights of one group are significantly greater than those of the other? Are you asking if scores on the Eating Attitudes Test increase as a woman’s weight decreases?

You also need to look at the assumptions of the test you are considering using. Most statistical tests assume that you have randomly selected your subjects and that each observation is independent. Additional assumptions are described with the specific statistical tests.

What follows is a discussion of some of the most common statistical tests.

Comparing Means

Two Sample T-test

One of the first statistical tests you learned about was the t-test. This statistical test is used to compare the means of two small groups. (Since research is very expensive, we frequently use fewer than 100 subjects and sometimes fewer than 20.) The t-test compares the difference between the two means to the pooled standard error of the mean. (The pooled standard error is a weighted average of the sample variances.) This test is used with interval data.

Using a calculation you don’t want to remember, the computer calculates a t-value and compares it to the T-value in a chart for the correct number of degrees of freedom.

Degrees of freedom?????!!!!!
Yes. Don’t ask why this term is used; just learn what it means. The shape of the t-distribution varies depending on the number of observations you have, so the computer must use the correct t-table for the number of observations. The degrees of freedom for a t-test is the total number of observations in both groups minus 2.

Four assumptions have to be satisfied in order to use this test:

1. Both sampled populations have relative frequency distributions that are approximately normally distributed. (The statistical tests use for normally distributed populations are called parametric tests.)

2. The population variances are equal.

3. The samples are randomly selected from the populations.

4. The samples are independent. (Results from one sample do not affect the observations in the other sample.)

Generally we let the computer do the work and read the printout, which lists the degrees of freedom, t-value for a one-tailed test, t-value for a two-tailed test, and the probability of each.

We have to decide ahead whether we need a one-tailed test
(we predict the direction of the difference) or a two-tailed test (we can’t predict the direction of the difference).

Then we look at the probability
(p-value) for the test we selected. If the probability is less than the alpha we selected (often .05), we reject the null hypothesis; the alternative hypothesis is true. If the probability is larger than alpha, we fail to reject the null hypothesis; there is no significant difference between the groups.

If the printout lists the two-tailed probability only, and you know you are doing a one-tailed test, just divide the two-tailed probability by two.

Paired T-test

In the independent samples t-test, we typically compare an experimental group to a control group.

Instead of randomly dividing people into two groups and having one group eat a high fat diet and the other eat a low fat diet, we could let our subjects serve as their own controls.
How high is each person’s cholesterol on a low fat diet versus on a high fat diet? Each subject spends several weeks or months consuming each diet.

Or we might also want to look at pairs of subjects.
For example, do adolescent sons eat significantly more kcalories than their fathers? (Most parents will answer, "Yes!" without computing the statistics.) We need to pair the sons with their dads, rather than having a group of randomly selected sons and another of randomly selected dads.

Let’s think about an example related to weight loss. Some people lose weight readily no matter what type of plan they follow. (Do you have any friends like this?) Others are "easy keepers", whose bodies hang on to every possible kcalorie.

What if mostly easy losers ended up in one group and mostly easy keepers were in the other group? The results would not be related to the diet plan, but to chance error. So instead we have each person serve as his or her own control. Those who lose weight easily will lose more regardless of the plan and vice versa.

But in this case we aren’t comparing the means of two groups. Instead we are looking at the average difference in weight loss for each subject between the two treatments.

If you are running a paired t-test on the computer, you need to specify this test and tell the computer how to pair your observations. The printout, however, will look the same as for the independent sample t-test and you will interpret the computer printout the same as for the two sample t-test.

 

ANOVA

The previous two tests work fine if you only have two groups, but what if you have more than two?

For example, a dietetics student wants to lower the fat in cheesecake, but doesn’t know if a reduced fat cheesecake will be acceptable. So he makes one type of cheesecake with 50% less fat and one with 75% less fat, and compares it to the control. How can the three cakes be compared?

You could run three t-tests:
50% versus 75%; 50% versus control; 75% vs. control. Unfortunately this could result in SPURIOUS results. (Isn’t that a great word?) In simple, non-statistician terms, if you run a number of tests on the same data, eventually SOMETHING will end up being significant. But the results will be questionable.
(Not as good a word as spurious, is it?)

So should we tell Bert to compare just two cheesecakes? No. He can run an ANOVA, which stands for "analysis of variance". This test compares the variability within each group to the variability between each group. The resulting test statistic is called an F-value. In doing this calculation, the variability within groups gets squared and the variability between groups gets squared. The calculation results in all positive numbers for the values of F, so the resulting distribution looks like this:


Just as with the t-test, the exact shape of the F-distribution will change, depending on the degrees of freedom. In this case, we actually need two numbers for degrees of freedom: the number of groups being compared minus one and the total number of observations minus the number of groups. (How in the world do statisticians come up with these things? You can get a statistics book that goes through all the mathematical background or just accept it by faith.)

The computer printout from this test will include an ANOVA table, which lists the within error, the between error, degrees of freedom, the F-value and the probability of getting an F that large if all the groups came from the same population. As with all tests of hypotheses, if the p-value is greater than .05 (or whatever alpha was chosen) you fail to reject the null hypothesis of no differences among groups. If the p-value is less than .05, reject the null hypothesis; there is a difference among the groups… Somewhere. But we don’t know if all the groups are different from each other or if only two are different. So if the ANOVA shows significance, we have to go on to a second step—multiple comparison tests. These are the equivalent of t-tests.

SO WHY DID WE GO THROUGH ALL THE BOTHER OF RUNNING AN ANOVA?????


The multiple comparison tests are "protected", because the ANOVA already showed there was a significant difference somewhere. The results will not be spurious. Among the various multiple comparisons tests are Scheffe’s, Fisher’s LSD
(no kidding),
and Tukey-b. The computer printouts from a multiple comparison will indicate which groups are significantly different from each other.

Four assumptions have to be satisfied in order to use this test:

1. All sampled populations have relative frequency distributions that are approximately normally distributed.

2. The population variances are approximately equal.

3. The samples are randomly selected from the populations.

4. The samples are independently selected from the populations.

 

Chi-square

None of the above tests can be used for nominal data. What is an average eye color? Can’t do it. Average ethnicity? Nope. Even if we have used numbers on a survey to represent various eye colors or ethnicities, those numbers only represent the data; they don’t describe it. So if you need to determine if there is a relationship between two nominal factors, or between a nominal factor and an ordinal factor, you need to use the Chi-square test of independence.

Is the choice between engineering and dietetics dependent on gender? In order to determine this, we travel to Unique College, a school with 50% male students that has only two majors- dietetics (40% of the students) and engineering (60% of the students). We plan to randomly select 50 students. We make a cross-tabulation table and insert the numbers that we expect to see if major is not dependent on gender:

 

Engineering

Dietetics

Female

15

10

Male

15

10


Since this school has a 1/1 female to male ratio, we would expect to get the same number of males and females in our random sample, 25 of each. Since 60% of the students at this school are engineering majors, we would expect that 60% of the females (15) will be in engineering and 40% of them (10) will be in dietetics. The same results are expected for the males. Now we take our random sample and record the numbers that we actually observe:

 

Engineering

Dietetics

Female

Expected-15

Observed- 8

Expected- 10

Observed- 17

Male

Expected- 15

Observed- 22

Expected- 10

Observed- 3


The Chi-square statistic is then calculated as follows:

1. For each cell-

a. find the difference between the observed frequency and the expected frequency

b. square the difference

c. divide the squared difference by the expected frequency

2. Repeat step 1 for each of the cells in the table

3. Add up all the numbers you have calculated

4. Compare the total to a the Chi-square statistic on a Chi-square table

Since you squared the differences, the total will always be positive. The distribution of the Chi-square statistic depends on the number of rows and the number of columns. (Degrees of freedom, again!) The degrees of freedom for this test is: (number of rows-1) times (number of columns-1).

The hypothetical printout for the above example gave the following result:

Chi-square

df

Significance

16.3334 1 .00001

A chi-square this large would occur .001% of the time if, major is not dependent on gender in the population. Since this probability is so low, we reject the null hypothesis and conclude that choice of major is dependent on gender.
(Keep in mind that this doesn’t mean that men never choose dietetics as a major.)

Assumptions

Linear Regression

Sometimes a dependent variable is influenced by one or more other variables, called independent variables. For example, we might want to predict weight loss per month on our new diet plan. Caloric intake, energy expenditure, health status, and other factors will impact weight loss. We collect data on many randomly selected

Correlation

As the number of paved roads in a country increases, the incidence of cardiovascular heart disease increases. This is an example of a correlation. Correlations do not show causation! Paved roads do not cause heart disease! (However, paved roads do indicate that people drive more and often get less exercise than in countries with few paved roads.) A positive correlation shows that in general as one factor in your sample increases another factor also increases. Negative correlations show that as one factor in your sample increases another factor decreases. The advantage of a correlation is that it provides a measure of the strength of the relationship between two variables.

You may have plotted out data like this when you were in elementary school. The data may have looked like this:

You were then asked to draw a line than best represented the relationship. It seemed like an exercise in futility, didn’t it?

While we might agree that the relationship is positive
(as one factor increases, so does the other), it seemed as though many different lines would be possible. Fortunately, the computer can calculate this "best fit" line and determine how variable all the points are from the selected line.

A computer printout of the results of a correlation will indicated the strength of the correlation from 0 (no correlation) to 1 (a perfect correlation).

The printout will also include the p-value, which you compare to the selected alpha. If the probability is less than alpha, the correlation is significant.

Some statistical tests for correlations, such as the Pearson’s Product Moment, are used with interval data. Others, such as Spearman’s rank order correlation coefficient can be used with ordinal data.