Applied Econometrics Exercise 1:
应用计量经济学练习1:
Education Production and Regression Analysis
教育生产和回归分析
1. Import the dataset excersize_edu.dta. This dataset contains the data from a reap project (slightly altered for pedagogical purposes). The dataset includes the following variables:
1. 导入数据集 excersize_edu.dta。此数据集包含来自 rea 收割项目的数据(出于教学目的略有更改)。该数据集包括以下变量:
2. Find the mean, standard deviation, median, min, and max of the variables (hint: use the summarize command)
2. 查找变量的平均值、标准差、中位数、最小值和最大值(提示:使用 summarize 命令)
3. Make a scatterplot of class_mean_score and classsize.
3. 制作 class_mean_score 和 classsize 的散点图。
a) Next, add a regression line to the graph.
a) 接下来,向图表中添加一条回归线。
b) What do you conclude about the relationship between student scores and class size?
b) 您对学生分数和班级人数之间的关系有什么结论?
4. Look at the graph. Are there any classes that seem like they would highly influence the relationship you would estimate between class size and student scores?
4. 查看图表。是否有任何课程似乎会极大地影响您估计的班级人数和学生分数之间的关系?
a) What would happen if you dropped this class from the analysis?
a) 如果您从分析中删除这个类,会发生什么情况?
b) Now drop the influential class and make the plot again to confirm your intuition.
b) 现在放弃有影响力的职业,再次进行情节以确认您的直觉。
5. Not let’s use the individual student data. Regress rawscore on classsize.
5. 不让我们使用个人学生数据。回归 classsize 的 rawscore。
a) Look at the coefficient on classsize. Is this relationship what you would expect based on the graphs? What does this say (in words)?
a) 查看 classsize 的系数。这种关系是您根据图表所期望的吗?这(用语言)说明了什么?
b) What is the predicted score when classsize is 10? 15? 20?
b) 当 classsize 为 10 时,预测分数是多少?15 还是 20?
c) Does this regression capture a causal relationship between class size
c) 此回归是否捕捉到了班级规模之间的因果关系
and scores? Explain.
和分数?解释。
d) How much of the variation in scores is explained by class size?
d) 班级规模可以解释多少分数变化?
e) Interpret the constant in the regression. What does this mean? What would happen if you estimated the model without a constant? (go ahead, give it a try!) How does this change your estimate for class size? What’s going on?
e) 解释回归中的常数。这是什么意思?如果您在没有常数的情况下估计模型,会发生什么情况?(来吧,试一试吧!这对您对班级规模的估计有何影响?这是怎么回事?
6. Now, look back at your original regression (with the constant).
6. 现在,回顾一下您最初的回归(使用常量)。
a) Compute the fitted values and residuals for each observation and verify that the residuals (approximately) sum to 0.
a) 计算每个观测值的拟合值和残差,并验证残差之和(大约)为 0。
7. Normalize the rawscore variable and put this in a variable called stdscore (hint: use egen with the std() function).
7. 对 rawscore 变量进行归一化 ,并将其放入一个名为 stdscore 的变量中(提示:将 egen 与 std() 函数一起使用)。
a) Now, compare the distributions of rawscore and stdscore. (hint: use –hist- or –kdensity-). How are these different?
a) 现在,比较 rawscore 和 stdscore 的分布。(提示:使用 –hist- 或 –kdensity-)。这些有什么不同?
b) Now, re-estimate your regression. What does the coefficient on class size mean now?
b) 现在,重新估计您的回归。班级规模的系数现在意味着什么?
c) Why might we want to normalize the math scores?
c) 为什么我们想要对数学分数进行标准化?
8. Compute the regression of stdscore on
8. 计算 stdscore 的回归
classsize
male
雄
han
汉
age
年龄
grade5
等级 5
momedu
asset
资产
teacher_age
a) How did your estimate of the relationship between student scores and class size change? Why do you think this happened?
a) 您对学生分数和班级规模之间关系的估计是如何变化的?您认为为什么会发生这种情况?
b) Interpret the coefficients for the other variables. Are the signs of the coefficients what you would expect?
b) 解释其他变量的系数。系数的符号是否符合您的预期?
c) Check your regression by computing the multivariate classsize coefficient manually in two steps: (i) regress classsize on all other covaraites and save the residuals (ii) regress stdscore on the residuals. Mathematically, why does this work?
c) 通过 分两步手动计算多变量 classsize 系数来检查您的回归:(i) 回归 所有其他 covaraites 的 classsize 并保存残差 (ii) 回归残差的 stdscore。从数学上讲,为什么会这样呢?
9. Re-estimate the regression in (8) separately for boys and girls.
9. 分别重新估计 (8) 中男孩和女孩的回归。
a) Does the relationship between class size and scores differ for boys
a) 男生的班级人数和分数之间的关系是否不同
and girls? What would you conclude?
女孩呢?您得出什么结论?
b) How about the other variables? What seems to differ between girl
b) 其他变量呢?女孩之间似乎有什么不同
and boy students in educational production? Why do you think this is?
和教育生产中的男生?你认为这是为什么?
10. Now, estimate (9) in one regression, but allow the relationship between class size and stdscore to differ for boys and girls. (Hint: create a new variable: classize*male)
10. 现在,在一个回归中估计 (9),但允许男生和女生的班级规模和 stdscore 之间的关系不同。(提示:创建一个新变量: 分类*男性)
a) Does the relationship between class size and scores differ
a) 班级人数和分数之间的关系是否不同
significantly for boys and girls?
对男孩和女孩来说很重要吗?
b) What is the predicted score when class size is 10 for girls? 15?
b) 当女生班级人数为 10 时,预测分数是多少?15?
c) What is the predicted score when class size is 10 for boys? 15?
c) 当男生的班级人数为 10 时,预测分数是多少?15?