This post is adapted from a statistics project I completed as part of the Data Analysis and Statistical Inference course on Coursera in April 2015.
Introduction
The purpose of this report is to explore how Congressional approval rating varies by US state. Nationwide, the vast majority of Americans disapprove of Congress. I want to explore if this varies by state and see if any specific groups of states disapprove of Congress more than the rest of the country. This question should be of interest to all Americans since it is in our interest to have a Congress that meets the needs of the electorate.
Data
The data for this exercise comes from American National Election Studies (ANES). Topics in ANES cover voting behaviour and the elections, together with questions on public opinion and attitudes of the electorate. In all Time Series studies, an interview is completed just after the election (the Post-election or “Post” interview); during years of Presidential elections an interview is also completed just before the election (the Pre-election or “Pre” interview). Thus, every “case” in the data set is a person who was questioned before and after the 2012 election.
This data set constitutes a retrospective observational study. The two variables I use are categorical: the US state where the individual resides (and voted), and their Congressional approval rating (“Approve” versus “Disapprove”). Since this is an observational study with random sampling but not random assignment, the results will be generalisable to all people in various US states. Of course, one possible confounding bias is non-response bias. Retrospective observational studies do not allow for random assignment. No random assignment means this study cannot reveal a causal relation (e.g., living in Maryland makes people hate Congress more).
Exploratory Data Analysis
The data set contains 6300 respondents who have given a Congressional approval rating. Of these respondents, only 1643 (26.1%) actually approve of Congress. This is an alarming statistic by itself.
The number of respondents in each state varies from 3 to 745 with a median of 82. The two states with fewer than 5 respondents are Arkansas and Wyoming. These states will be excluded from the inference study below.
The number of respondents in each state varies from 3 to 745 with a median of 82. The two states with fewer than 5 respondents are Arkansas and Wyoming. These states will be excluded from the inference study below.
Figure 1 below shows the percentage of approval ratings by US state, with bar names ordered alphabetically by state abbreviation. The approval percentage rating by state varies all the way from 8.7% (MT) to 42.2% (NM). There is clearly a lot of variation around the 26.1% overall average. This is the justification to look for differences between states.
Figure 1 |
Inference
Now, it is time to make an inferences as to whether or not there are real differences between states in the approval rating. The null hypothesis is that there is no statistically significant variation between states above or below the 26.1% overall approval rating. My alternate hypothesis is that there is a difference, and that the actual approval rating in each state could be statistically different from this overall average.
The appropriate test to use is the chi-square test of independence. The test requires that the samples be independent. This requirement is satisfied because:
- the survey used random sampling,
- the number of people sampled is <10% of the population,
- each case in the survey only contributes to one cell in a contingency table of approval rating versus state.
The sample size requirement means that every valid state must have at least 5 approval and disapproval ratings. Three states fail to have at least 5 of both ratings (AK, HI, WY). Eight more states have <5 approval ratings but >=5 disapproval ratings (DC, ID, ME, MT, ND, NH, SD, VT); in some cases the number of disapproval ratings is >=10 (ID, ME, MT, NH). These states are all discarded as well, leaving 40 valid states to consider.
Given the overall approval rating of 26.1%, I calculate the expected number of approvals and disapprovals for each state based on how many responses were received for each state. Then, I calculate the chi-square by summing over the difference between actual and expected approvals (disapprovals) squared, divided by the expected number of approvals (disapprovals). The resulting chi-square statistic is 99.96. The number of degrees of freedom is 39 given their are 40 valid states.
The appropriate chi-square distribution is visualized in Figure 2.
Figure 2 |
Which states are contributing most to chi-square? The table of state, chi-square, and approval rating is listed below, in order of decreasing chi-square.
## MD 11.83 0.1154
## TX 8.89 0.2948
## CA 8.27 0.2763
## SC 7.51 0.3462
## NM 6.37 0.3649
## LA 5.07 0.3388
## VA 4.58 0.1774
## TN 3.66 0.1835
## MS 3.21 0.3846
## FL 2.97 0.2854
## MA 2.80 0.3209
## CT 2.36 0.1875
## CO 2.35 0.1954
## KY 2.30 0.1928
## OR 2.20 0.2000
## NY 1.90 0.2889
## IA 1.85 0.1944
## UT 1.70 0.1923
## AL 1.69 0.3214
## AZ 1.68 0.2143
## NV 1.57 0.1935
## OH 1.54 0.2469
## WA 1.48 0.2805
## KS 1.45 0.1714
## NC 1.10 0.2797
## IL 1.03 0.2672
## OK 1.02 0.2273
## MN 1.00 0.2273
## IN 0.93 0.2627
## DE 0.88 0.3333
## PA 0.85 0.2580
## RI 0.77 0.2857
## MI 0.59 0.2568
## AR 0.51 0.2143
## NE 0.49 0.2750
## WV 0.38 0.2162
## WI 0.31 0.2481
## GA 0.30 0.2616
## NJ 0.29 0.2532
## MO 0.28 0.2523
Conclusion
I have found strong evidence of a statistically significant state-by-state variation in Congressional approval rating, such that it is sufficient to conclude that the approval rating in each state is not simply the overall average approval rating. Rather, there are states with approval ratings significantly above (e.g., NM) and below (e.g., MD) this average. This begs the question of why the Congressional approval rating varies so high or low compared to the overall average approval rating. Obvious variables to consider are markers of economic success, like statewide employment rates, income, and health care coverage.
Citation
The American National Election Studies (ANES). The ANES 2012 Time Series Study [dataset]. Stanford University and the University of Michigan [producers].
These materials are based on work supported by the National Science Foundation under grants SES-0937727 and SES-0937715, Stanford University, and the University of Michigan.
Any opinions, findings and conclusions or recommendations expressed in these materials are those of the author(s) and do not necessarily reflect the views of the funding organizations.
Link to data: “http://bit.ly/dasi_anes_data”
No comments:
Post a Comment