Link to the assignment's solutions
knitr::opts_chunk$set(cache = TRUE, autodep = TRUE)
- Fork this repository to your GitHub account.
- Write your solutions in R Markdown in a file named
solutions.Rmd
. - When you are ready to submit your assignment, initiate a pull request. Title your pull request "Submission".
To update your fork from the upstream repository:
- On your fork, e.g.
https://github.com/jrnold/Assignment_04
click on "New Pull reqest" - Set your fork
jrnold/Assignment_04
as the base fork on the left, andUW-POLS503/Assignment_04
as the head fork on the right. In both cases the branch will be master. This means, compare any chanes in the head fork that are not in the base fork. You will see differences between theUS-POLS503
repo and your fork. Click on "Create Pull Request", and if there are no issues, "Click Merge" A quick way is to use this link, but change thejrnold
to your own username:https://github.com/jrnold/Assignment_04/compare/master...UW-POLS503:master
.
library("pols503")
library("rio")
library("ggplot2")
library("dplyr")
library("broom")
If you do not have the pols503 package installed, you can install it with,
library("devtools")
install_github("UW-POLS503/r-pols503")
In the social sciences the effect of a variable income inequality
(political mobilization
(
In this assignment we will use replication data for Kastner's (2007) "When Do Conflicting Political Relations Affect International Trade". The replication dataset is from Berry, Golder, and Smith's (2012) "Improving Tests of Theories Positing Interaction".
Use the import()
function of the rio
package to import the STATA file TradConflict.dta
. The file contains 74,415 rows and 43 columns/variables. Each row contains information about a country pair or dyad. The authors built the dataset using information from 76 countries from 1960 to 1992.
db <- import("TradeConflict.dta")
Kastner (2007) argues that "conflicting political interests between countries can have a detrimental effect on their economic relations" but that the "effects of international political conflict on trade are less severe in cases where internationalist economic interests have relatively strong political clout domestically."
Dependent Variable:
-
Trade (
lnrtrade
): The log of the bilateral trade between the two countries$i$ and$j$ in year$t$ (constant 1992 dollars).
Independent Variables of interest:
-
Conflict (
logUNsun
): Whether two countries have similar voting patterns in the UN General Assembly. The log of a 1-3 scale, where 1 =most similar' and 3 =
most dissimilar'. -
Trade Barriers (
avpctBCFE
): Trade Barriers as a proxy for the domestic political power of economic elites with internationalist economic interests. Logged form of the Hiscox and Kastner (2002) index that evaluates trade barriers. Higher numbers mean more closed trade policies, range: {1.61 , 60.53}.
This model is similar to their Model 1 in Table 1 (p. 676):
mod1 <- lm(lnrtrade ~ lnrpciab + avremote + landlocked + island +
landratio + pciratio + jointdem + laglnrtrade +
lnrgdpab + lndist + logUNsun * avpctBCFE,
data = db)
A: Create a new variable avpctBCFEcat3
by splitting the variable avpctBCFE
into 3 categories.
B: Run a new version of mod1
(mod2
) but in this case ignore the interaction effect between the variables logUNsun
and avpctBCFE
, and substitute the variable avpctBCFE
for the new categorical you just created.
C: Plot the predicted values of the model mod2
against the covariate logUNsun
. Draw a linear regression line on it.
D: If you used geom_point()
in the previous plot, you probably saw that there are a lot of data points. Replicate the same plot using stat_binhex()
instead of geom_point()
. You can find the documentation here.
E: Take a look at the plot and at the coefficient for logUNsun
in mod2
. What can you say about the relationship betweeh this covariate and the outcome variable lnrtrade
?
F: Replicate the same plot (logUNsun
v. fitted values of mod2
) but in this case use again geom_point()
and color the dots differently depending on their values for avpctBCFEcat3
. Make sure you also plot 3 different lines describing the relationship between logUNsun
and the predicted values of lnrtrade
for each group of avpctBCFEcat3
. What do you see? How would you interpret this new plot?
G: Run a new model (mod3
) similar to mod2
but in this case interact the variables logUNsun
and avpctBCFEcat3
.
H: Keeping all the control variables at their means, calculate the predicted values for the following scenarios:
# | logUNsun |
avpctBCFEcat3 |
---|---|---|
1 | 0 | low |
2 | 1 | low |
3 | 0 | medium |
4 | 1 | medium |
5 | 0 | high |
6 | 1 | high |
I: Calculate the following:
- `dif1`: Difference between the predicted values of scenarios 2 and 1: (`logUNsun` == 1 & `avpctBCFEcat3` == low) - (`logUNsun` == 0 & `avpctBCFEcat3` == low).
- `dif2`: Difference between the predicted values of scenarios 4 and 3: (`logUNsun` == 1 & `avpctBCFEcat3` == medium) - (`logUNsun` == 0 & `avpctBCFEcat3` == medium).
- `dif3`: Difference between the predicted values of scenarios 6 and 5: (`logUNsun` == 1 & `avpctBCFEcat3` == high) - (`logUNsun` == 0 & `avpctBCFEcat3` == high).
- `dif4`: Difference between the predicted values of scenarios 3 and 1: (`logUNsun` == 0 & `avpctBCFEcat3` == medium) - (`logUNsun` == 0 & `avpctBCFEcat3` == low).
- `dif5`: Difference between the predicted values of scenarios 5 and 1: (`logUNsun` == 0 & `avpctBCFEcat3` == high) - (`logUNsun` == 0 & `avpctBCFEcat3` == low).
- `dif6`: Difference between `dif2` and `dif1`.
- `dif7`: Difference between `dif3` and `dif1`.
J: Explain in your own words what do all these differences represent.
K: Create a dataset with all these differences
L: Create and print a table showing the mod1
coefficients, standard errors, t-statistic and p.value for only the Intercept
and the covariates: logUnsun
, avpctBCFEcat3
, and their interactions.
M: Compare the coefficients to the differences
you previously calculated. Can you now interpret the coefficients?
N: Keeping all the other covariates at their mean, use mod3
to predict (+ 95% confidence interval) the following 300 scenarios. Hint: create a new dataset (scenarios2
) containing the information of all these scenarios and use it for the newdata
argument in the predict()
function.
# | logUNsun |
avpctBCFEcat3 |
---|---|---|
1 | min(logUNsun) |
low |
... | ... | low |
100 | max(logUNsun) |
low |
101 | min(logUNsun) |
medium |
... | ... | medium |
200 | max(logUNsun) |
medium |
201 | min(logUNsun) |
high |
... | ... | high |
300 | max(logUNsun) |
high |
O: Plot the predicted values against the logUNsun
values. You should plot 3 lines, one for each group of avpctBCFEcat3
(low, medium, high). You should also include a 95% confidence interval around each line. Hint: You need to merge first the dataset scenarios2
with the resulting dataset from the predictions.
P: Explain in your own words what the plot is showing.
Q: Keeping all the other covariates at their mean, use now mod1
(where avpctBCFE
is contious and not categorical) to predict (+ 95% confidence interval) the following 110 scenarios. Hint: create a new dataset (scenarios3
) containing the information of all these scenarios and use it for the newdata
argument in the predict()
function.
# | logUNsun |
avpctBCFE |
---|---|---|
1 | min(logUNsun) |
quantile(avpctBCFE, 0.0) |
... | ... | quantile(avpctBCFE, 0.0) |
10 | max(logUNsun) |
quantile(avpctBCFE, 0.0) |
11 | min(logUNsun) |
quantile(avpctBCFE, 0.05) |
... | ... | quantile(avpctBCFE, 0.05) |
20 | max(logUNsun) |
quantile(avpctBCFE, 0.05) |
201 | min(logUNsun) |
quantile(avpctBCFE, 1) |
... | ... | quantile(avpctBCFE, 1) |
210 | max(logUNsun) |
quantile(avpctBCFE, 1) |
R: Plot the predicted values against the logUNsun
values. You should plot a different line for each different value of avpctBCFE
. You don't need to include a 95% confidence interval around these lines. Hint: Although now we are using the continuous instead of the categorical representation of the variable avpctBCFE
, to plot different lines according to different values of avpctBCFE
, you will need to define the variable as a factor()
in the ggplot's aesthetics.