Crosstabulation with Nominal
Variables
- To learn how to perform a crosstabulation and practice formulating
hypotheses.
- To learn how to interpret crosstabs where at least one variables is
nominal.
- To learn how to measure the strength of the relationship between two
variables.
- To learn how to apply the basic measures of association: phi and
Cramer's V
- Crosstabulation brings together two variables and displays the
relationship between them in a single table. Each column in the crosstab
corresponds to a category of the independent variable, and each row
corresponds to a category in the dependent variable. Hence the dependent
variable goes on the left, and the independent variable goes on the top.
- Each cell represents a unique combination of categories from each of the
variables. For example, in the table below, the cell "G" represents all
the respondents who selected Category I for the independent variable and
Category III for the dependent variable.
- The percentage in each cell is calculated by dividing the number of
respondents in the cell by the total number of respondents for the
column. Note: the cell-percentage values will be wrong if the
missing values are not eliminated. Pay attention to the percentages in
each cell rather than the number (n) of respondents in each cell.
- To interpret crosstabs compare the column-percentages
across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F,
and finally compare G, H, and I. If the column-percentages of cells
A-B-C, and/or D-E-F, and/or G-H-I remarkably differ from
one another then you have found a relationship.
|
INDEPENDENT VARIABLE |
Category I |
Category II |
Category III |
DEPENDENT VARIABLE |
Category
I |
A |
B |
C |
Category
II |
D |
E |
F |
Category
III |
G |
H |
I |
- Measures of Association calculate the strength, and for
ordinal variables the direction, of the relationship between two variables.
- PHI: Used to measure the strength of the association
between two variables, each of which has only two categories. (It
applies to 2 X 2 nominal tables only).
- CRAMER'S V: Used to measure the strength of the association between one nominal variable with either another nominal variable, or with an ordinal variable. Both of the variables can have more than 2
categories. (It applies to either nominal X nominal crosstabs, or ordinal X nominal crosstabs, with no restriction on the number of categories.)
- Interpreting the value of the Level of Association:
LEVEL OF ASSOCIATION |
Verbal
Description |
COMMENTS |
0.00 |
No Relationship |
Knowing the
independent variable does not help in
predicting the dependent variable. |
.00 to .15 |
Very Weak |
Not
generally acceptable |
.15 to .20 |
Weak |
Minimally acceptable |
.20 to .25 |
Moderate |
Acceptable |
.25 to .30 |
Moderately Strong |
Desirable |
.30 to .35 |
Strong |
Very Desirable |
.35 to .40 |
Very
Strong |
Extremely
Desirable |
.40 to .50 |
Worrisomely Strong |
Either an extremely good relationship or the two variables are
measuring the same concept |
.50 to .99 |
Redundant |
The two
variables are probably measuring the same concept. |
1.00 |
Perfect
Relationship. |
If we the know
the independent variable, we can perfectly predict the dependent
variable. |
:
Crosstabulating Nominal Data
- Select a Dataset for this exercise, possibly Leger 2013 or Forum Research 2013.
- Enter the Questionnaire for the chosen dataset from the Codebooks link on the POL242Y website.
- Hypothesize a relationship between two nominal variables in the dataset.
- For example, using the Euro2002 data one might suspect
that support for US leadership in world affairs might be related to
perceptions post 9/11 motivations of the US. Detail on the relevant
variables is offered below.
- Enter Webstats to select your chosen dataset.
- Perform separate trial-runs of the Frequency distribution for each of the variables. Based on the Frequency output, decide how to recode each variable and identify the missing values.
- Set the Analysis in Webstats to Bivariate Crosstabs and hit Proceed.
- In 'Step 1,' enter the dependent variable first,
followed by the independent variable. Be sure to put the
dependent or independent variable in the correct entry box. If the
variables are placed appropriately, the dependent variable will appear on the left of the crosstab and the independent variable will appear across the top (See diagram above).
- Select "Phi and Cramer's V (PHI)" in the 'Step 2' entry box. This
section lists other measures of association that you can choose but since we
are working with nominal data select "Phi and Cramer's V (PHI)".
- Enter any recodes (if necessary) in 'Step 4' and hit Run.
- When evaluating the measures of association, you should look at only Phi
for 2 by 2 tables and Cramer's V for other nominal tables.
- Determine whether there is a relationship between the variables based on
the column-percentages in the crosstab. Then, looking at the value of
the measure of association, use the above guidelines to interpret the strength
of the relationship.
- Repeat the analysis until you find a set of variables with a relationship
that has a moderate degree of association ( >.2).
Example #1:
Using phi with two dichotomous variables
- Dataset:
- Dependent Variable:
- [Q6AGG] From your point of view, how desirable is it that the US exert strong leadership in world affairs? Desirable, or undesirable?
- Independent Variable:
- [Q28] Do you think the United
States is using these attacks [9/11] as an excuse to enforce its will around
the globe or is it genuinely seeking to protect itself from further
attacks?
- Excuse to enforce its will
- Seeking to protect itself from further attacks
- Arrow Diagram:
- X -->Y
- US motive --> desirability of US
leadership
- Syntax:
get
file="/homes/josephf/webstats/Euro2002.sav".
missing values Q6AGG (5,6).
missing
values Q28 (3 to 6).
recode Q6AGG (1=2)(2=1)
value labels Q6AGG 1
'Undesirable' 2 'Desirable'
crosstab tables=Q6AGG by Q28
/cells=column count /statistics=PHI.
- Syntax Legend:
- Missing
Values and Recodes: Determined by the trial-run of the
Frequencies output
- Crosstab
command: This tells SPSS
which variables to use in the table. enter the Dependent
Variable first, then the Independent.
- /cells
= This tells SPSS to put column percentages and frequencies in each
cell
- /statistics = This is the
section of syntax that needs to be included after the crosstab syntax in
order to calculate the Measures of Association. In this case we want
to calculate Phi.
- Output:
- Q6AGG The US exert strong
leadership in world
by Q28 United States using these attacks
to
enf
Q28
Page 1 of
1
Count
|
Col Pct
|Excuse t
Seeking
|o enforc to prote
Row
| 1 | 2 |
Total
Q6AGG
--------+--------+--------+
1 | 757 | 731 |
1488
Undesirable | 51.0 | 21.8
|
30.8
+--------+--------+
2 | 726 | 2621 | 3347
Desirable | 49.0 | 78.2
|
69.2
+--------+--------+
Column 1483
3352
4835
Total 30.7 69.3
100.0
Approximate
Statistic
Value
ASE1 Val/ASE0
Significance
--------------------
--------- -------- --------
------------
Phi
.29210
.00000 *1
Cramer's
V
.29210
.00000 *1
*1 Pearson chi-square probability
Number of
Missing Observations: 1166
- Crosstab Legend:
- The most important aspects of the crosstab are highlighted in
colour. The number at the top of each cell is the number of cases (n),
and the number at the bottom of each cell is the
column-percentage.
The column-percentages are highlighted. (You may
find that the row total figures will slightly differ from the figures you
would get from individual Frequency analyses. This is because some of
the people who responded to one variable did not respond to the second and
hence are eliminated by the missing values statement. So you can
expect that the number of missing cases will be slightly higher in the crosstab than it would be was the individual frequency analysis.)
Amongst all of these figures in the output, the most important for the your
assessment will be the column-percentage for each cell.
-
Measures of Association Legend:
- For the present time, the only aspects of the measures of association
output that you have to note are the
'Statistic' and the
'Value' columns.
- Ignore the 'ASE1' and the 'Val/ASEO' columns.
- While you can ignore the 'Approximate Significance' column for the time
being, this will become important after we learn its meaning later in the
course.
- Interpretation of Crosstab:
- Comparing the column-percentages for the cells in the 'Desirable' row,
we can see that there is a remarkable difference. The
column-percentages in the 'Desireable' row are 49.0% and 78.2%. A
difference can also be observed in the 'Undesireable' row.
- This indicates that the individuals who believed that the US used
the 9/11 attacks as an excuse to exert their will across the globe are less
likely to believe that it is desirable for the US to be strong leaders in
the world.
- However, those individuals who believe that the US's actions are genuine
attempts to prevent further attacks are more likely to believe that it is
desirable for the US to be strong leaders in the world.
- Since the crosstab is a 2 X 2 table, we know that Phi is the appropriate
measure of association. The value of Phi is .29, which means that this
is a
Moderately
Strong Association.
- The value of Phi may be negative if the variables are coded in a
particular way. The meaning of a negative measure of association will
be discussed below. For the time being, recognize it the positive
phi value here means that most
of the cases are on the main diagonal of the table.
- The Phi of .29 allows us to
confirm the conclusion drawn in the crosstab analysis, namely, that
individuals who believe that the US was using the 9/11 attacks as an
excuse to exert their will across the globe are less likely to believe that
it is desirable for the US to be strong leaders in the world compared to
those individuals who believe that the US was genuinely seeking to protect
itself from other attacks.
- Dataset:
- Dependent Variable:
- [CPSI1_3] Do you favour
or oppose same-sex marriage, or do you have no opinion on this?
- Independent Variable:
- [CPSQ1_b]
In federal politics do you usually think of yourself as a:
Liberal, Conservative, NDP, Bloc Quebecois, or none of these?
- Arrow Diagram:
- partisanship --> opinion on
same-sex marriage
- Syntax:
-
get file="/homes/josephf/webstats/CES2004.sav".
fre var= CPS_Q1B CPS_I1_3.
missing values CPS_I1_3 (9).
missing values CPS_Q1B (98,99).
recode CPS_Q1B (0 thru 4=copy)(6
thru 10=5)into pid.
value labels pid 0'none' 1'Lib'
2'Cons' 3'NDP' 4'BQ' 5'oth'.
recode CPS_I1_3 (1=1) (3=2)
(8=3)into ssm.
value labels ssm 1'oppose'
2'support' 3'dk'.
crosstab tables=ssm by pid
/cells=column count /statistics=PHI.
- Syntax Legend:
-
Missing
Values: Determined by the trial-run of the Frequencies output
- Recodes into new variable
names each with new value labels.
-
Crosstab
function: Enter the
Dependent Variable first, then the
Independent
Variable
- /statistics: This
is the section of syntax calculates measures of association, in this case
phi and Cramer's V.
- Output:
- SSM by PID
PID
Count |
Col Pct |none
Lib Cons NDP
BQ oth
|
Row
| .00| 1.00|
2.00| 3.00| 4.00|
5.00| Total
SSM
--------+--------+--------+--------+--------+--------+--------+
1.00
| 192 | 185 | 62
| 78 | 78 |
21 | 616
favour
| 29.9 | 31.8 | 17.3 | 51.3
| 48.8 | 36.8 | 31.6
+--------+--------+--------+--------+--------+--------+
2.00
| 221 | 202 | 199
| 30 | 29 |
18 | 699
oppose
| 34.4 | 34.8 | 55.6 | 19.7
| 18.1 | 31.6 | 35.8
+--------+--------+--------+--------+--------+--------+
3.00
| 229 | 194 | 97
| 44 | 53 |
18 | 635
dk
| 35.7 | 33.4 | 27.1 | 28.9
| 33.1 | 31.6 | 32.6
+--------+--------+--------+--------+--------+--------+
Column
642 581 358
152 160
57 1950
Total
32.9 29.8 18.4
7.8 8.2 2.9
100.0
Statistic
Value
--------------------
---------
Phi
.25674
Cramer's V
.18155
- Interpretation of Crosstab:
- Start by looking at the column percentages and compare across the
rows. In the
'favour' row, which
represents the percentage of people in each partisan grouping who favour
same sex marriage, we see for example that 17.3% of Conservatives favour
same sex marriage compared to 51.3% of NDP respondents. This indicates a
less favourable response from Conservatives than from the NDP. However, when we look at the
'oppose' row we can see
that this difference is reversed; a greater portion of the Conservatives
than NDP are opposed. Of course, there are numerous additional
comparisons that can be made in these rows, as well as in the
percentatges of 'don't knows'
(dk) where the
differences are relatively small across parties. But keep in mind we are
comparing column percentages, not row percentages. Hence it is, for
example, incorrect to observe that 51.3% of those who favour same
sex marriage are NDP. Taken together, the differences in the column
percentages show substantial differences in partisan support for same
sex marriage with the NDP and BQ being most supportive, the
Conservatives least, and those with other, Liberal and no party
affiliation being in the middle.
- Interpretation of Cramer's V:
- Since this crosstab involves a nominal variable and an ordinal variable,
the appropriate measure of association to use in summarizing the
relationship is Cramer's V. We do not use
Phi because it is only appropriate for 2 X 2 tables. The Cramer's V value is
.18. Using the
standards above, this relationship is
Weak.
- We have found that there are weak partisan
differences among partisans in their support and opposition to same sex
marriage.
- Did you discover a relevant relationship in your crosstab based on the
column-percentages? If so, was it evident in only one row of the table
or in all rows?
- Can you compare the magnitude of a Phi-value from one relationship to the
magnitude of a Cramer's V value for another relationship?
- Would the strength of the relationship be affected if you looked only at
the results for only the major parties?
- Would the strength of the relationship be affected if you considered
'don't know' responses to be a middle position? What if it those who say
don't know are excluded from the analysis?
- When you find a cell that has a substantially different column-percentage
from the other cells in that row, there are usually other rows in the table
that also have a difference. For example, if you find a difference in
the column-percentage for cells A-B-C, then there is probably also a
difference between D-E-F, or G-H-I. This happens because the
column-percentage in any given cell influences the column-percentage of the
other cells in that column.
|
INDEPENDENT VARIABLE |
Category I |
Category II |
Category III |
DEPENDENT VARIABLE |
Category
I |
A |
B |
C |
Category
II |
D |
E |
F |
Category
III |
G |
H |
I |
- We can compare two values of the same measures readily. But be
cautious about comparing different measures of association to each
other. Eg., you should compare two measures of Phi to one other,
but be cautious about comparing a Phi-value to a Cramer's V value.
- Find out by declaring scores of 0 and 5 missing on party identification
(pid).
- Find out by making the appropriate recodes or declaring the appropriate
missing values.