Prac3_Exercises_with_ANSWERS

Week 3: Multiple Linear Regression
第 3 周：多元线性回归

Last week we ended by looking at simple linear regression, a statistical modelling technique used when we want to analyse the relationship between one continuous independent variable and one continuous dependent variable. In this week’s session we’re going to continue to look at linear regression models but focus on how this technique can be expanded to when we have hypotheses involving more than one independent variable impacting on the same dependent variable.
上周，我们以简单线性回归作为结束语，这是一种统计建模技术，当我们想要分析一个连续自变量和一个连续因变量之间的关系时使用。在本周的会议中，我们将继续研究线性回归模型，但重点关注当我们假设涉及多个自变量影响同一因变量时，如何扩展这一技术。

The relevant chapters for this week in the Andy Field Textbook are:
安迪·菲尔德教科书本周的相关章节是：

Chapter 6: Bias Chapter 8: The Linear Model (regression)
第 6 章：偏差第 8 章：线性模型（回归）

If we consider a scenario where we have 2 (or more) hypothesised predictors (AKA independent variables) and we are interested in the effects these variables have on a single specific outcome (AKA the dependent variable) there are a number of different questions we could ask. For example:
如果我们考虑这样一个场景，即我们有 2 个（或更多）假设预测变量（也称为自变量），并且我们对这些变量对单个特定结果（也称为因变量）的影响感兴趣，那么我们可以提出许多不同的问题问。例如：

Does variability in response in a group of predictor variables help to explaining variance in an outcome variable?
一组预测变量响应的变异性是否有助于解释结果变量的方差？

What are the relative contributions of specific predictors, within a group of predictors, to explaining variance in an outcome variable?
一组预测变量中特定预测变量对解释结果变量方差的相对贡献是什么？

To what extent does a specific predictor, within a larger group, help us explain variability in an outcome variable, which the others cannot?
在更大的群体中，特定的预测变量在多大程度上可以帮助我们解释结果变量的变异性，而其他变量则不能？

We can answer all these questions using Multiple Linear Regression (MLR) models but need to look at slightly different parts of an MLR model, or run slightly different types of MLR, to answers each of these questions.
我们可以使用多重线性回归 (MLR) 模型来回答所有这些问题，但需要查看 MLR 模型的略有不同的部分，或者运行稍微不同类型的 MLR，才能回答每个问题。

In Exercise 1 we’ll look at how we can use MLR models to investigate questions 1 and 2 above.
在练习 1中，我们将了解如何使用 MLR 模型来研究上述问题 1 和 2。

In Exercise 2 we’ll build on this to see how we can use a slightly different type of MLR methodology to investigate question 3.
在练习 2中，我们将在此基础上了解如何使用略有不同类型的 MLR 方法来研究问题 3。

Exercise 1: Multiple Linear Regression (Forced Entry method)
练习 1：多元线性回归（强制进入法）

Open up Exercise1_dataset.sav, which you’ll notice is a slightly modified version of the dataset we used in the final exercise last week. To briefly remind you, this dataset included measures of several personality traits (Conscientiousness, Agreeableness and Extraversion) that had been recorded for a sample of patients as they were being discharged from a mental health ward. The dataset also included a measure of these patient’s medication adherence following discharge.
打开Exercise1_dataset.sav ，您会注意到它是我们上周在最终练习中使用的数据集的稍微修改版本。简单提醒您一下，该数据集包括一些人格特征（责任心、宜人性和外向性）的测量值，这些测量值是在患者从精神健康病房出院时记录的。该数据集还包括对这些患者出院后药物依从性的测量。

In this updated dataset we’ve added in measures of two further personality traits (i.e. we now have measures of all the ‘Big 5’ personality traits). Imagine we want to test a hypotheses about these independent variables’ relationships with medication adherence (the dependent variable).
在这个更新的数据集中，我们添加了另外两个人格特征的测量（即我们现在拥有所有“大 5 ”人格特征的测量）。想象一下，我们想要测试有关这些自变量与药物依从性（因变量）关系的假设。

Looking back at the questions in the intro (page 1), how would you express the first question in that list as an experimental hypothesis, wish to test hypothetical relationships between the independent and dependent variables in this dataset?
回顾一下简介（第 1 页）中的问题，您如何将该列表中的第一个问题表达为实验假设，希望测试此数据集中自变量和因变量之间的假设关系？

Similarly, how might you rephrase the second question as a more specific hypothesis relating to this dataset?
同样，您如何将第二个问题改写为与该数据集相关的更具体的假设？In particular, imagine
特别是想象一下 you had prior reason to belief Conscientiousness was a likely to have the greatest influence of all the personality traits on medication adherence:
您有理由相信尽责性可能是所有人格特质中对药物依从性影响最大的：

All MLR models are made up of ‘blocks’ (or as you might otherwise hear them called: ‘steps’ or ‘nested models’).
所有 MLR 模型都是由“块”组成（或者您可能会听到它们被称为：“步骤”或“嵌套模型”）。

The simplest MLR model is made up of a single block and within this block the influence of all the independent variables on the dependent variable are considered (see illustration below). Entering all your independent variables in a single block like this is known as a ‘Forced Entry’ technique, which refers to the fact that all your hypothesised predictors are ‘forced’ by you into being considered simultaneously in a single block.
最简单的 MLR 模型由单个块组成，在该块内考虑所有自变量对因变量的影响（见下图）。像这样在单个块中输入所有自变量被称为“强制输入”技术，它指的是所有假设的预测变量都被您“强制”在单个块中同时考虑。

Baseline Model (Block 0)
基线模型（块0 ）

Block 1
1号座

[No Predictors]
[无预测]

Conscientiousness
尽责性

Agreeableness
宜人性

Extraversion
外向性

Neuroticism
神经质

Openness
开放性

Forced entry MLRs are useful when we want to:
当我们想要执行以下操作时，强制进入 MLR 非常有用：

Only assess the collective contribution a set of predictors makes to explaining variance in an outcome
仅评估一组预测变量对解释结果方差的集体贡献

(After having established a collective effect) explore what the relative contributions of each predictor in the model are to explaining variance in the outcome.¹
（建立集体效应后）探索模型中每个预测变量对解释结果方差的相对贡献。 ¹

The only difference between this type of model and the Simple Linear Regression you conducted last week is the number of variables you enter in the single block (i.e. last we you only entered 1 personality Trait into block 1, this time you’re going to run a model where you enter all 5 traits into block 1). As such, it allows us to ask the same sorts of questions as we did last week about:
这种类型的模型和您上周进行的简单线性回归之间的唯一区别是您在单个块中输入的变量数量（即上次我们只在块 1 中输入了 1 个性格特征，这次您将运行在该模型中，您将所有 5 个特征输入到块 1)。因此，它允许我们提出与上周相同的问题：

Effect size
效应大小

Overall Model Fit
整体模型拟合

Parameter Estimates (i.e. regression coefficients)
参数估计（即回归系数）

Effect size and Overall Model Fit helps us to evaluate the “collective contribution” referred to earlier, whilst the Parameter Estimates are using in understanding the “relative contributions”
效应大小和总体模型拟合帮助我们评估前面提到的“集体贡献”，而参数估计则用于理解“相对贡献”.

To run this type of analysis follow these steps:
要运行此类分析，请按照下列步骤操作：

Select Analyze > Regression > Linear…
选择分析> 回归 > 线性...

Into the Dependent variable box enter the variable we want to predict (i.e. Medication Adherence)
在因变量框中输入我们想要预测的变量（即药物依从性）

In to the Independent variable box enter all 5 variables we want to use to try and predict the dependent variable (i.e. Conscientiousness, Agreeableness, Neuroticism, Openness and Extraversion)
在自变量框中输入我们想要用来尝试和预测因变量的所有 5 个变量（即责任心、宜人性、神经质、开放性和外向性）

Select OK
选择确定.

The first question we’re interested in answering is whether this group of predictors, as a quintet (i.e. all 5 personality traits together), explains a meaningful and statistically significant amount of variation in the outcome (Medication Adherence). Based on last week’s exercises which aspects of the output tell you about this
我们有兴趣回答的第一个问题是，这组预测因素作为一个五重奏（即所有 5 种人格特质在一起）是否可以解释结果（药物依从性）中有意义且具有统计显着性的变化量。根据上周的练习，输出的哪些方面可以告诉您这一点overall contribution
总体贡献 and what conclusions would you draw from your output? (i.e. what’s the evidence relating to
你会从你的输出中得出什么结论？（即与什么有关的证据your the
你的 first hypothesis you proposed, back on page 2?
你提出的第一个假设，回到第 2 页？

Presuming we establish that this block of predictors explains a significant amount of variation in the outcome, our next question is: what are the relative contributions of each predictor in the model?
假设我们确定这组预测变量可以解释结果中的大量变化，那么我们的下一个问题是：模型中每个预测变量的相对贡献是什么？

To examine this:
要检查这一点：

First,
第一的， we need to distinguish between those Independent Variables that have statistically significant relationships with the outcome and those that don’t. From the
我们需要区分那些与结果有统计显着关系的自变量和那些没有统计显着关系的自变量。从Coefficient
系数 Table in your output identify which (if any) of the personality traits are associated with medication adherence, answering below:
输出中的表格确定了哪些人格特征（如果有）与药物依从性相关，回答如下：

Second, do the results in your outputsupport the second hypothesis you formulated (i.e. about the relative contribution of conscientiousness compared to other factors)?
其次，您的输出结果是否支持您提出的第二个假设（即关于责任心与其他因素相比的相对贡献）？

To conclude this exercise write-up an (APA) formatted summary of this regression model and its’ findings (use the box at the top of the next page). Hint: formatting is similar to what you did last week in the self-test question. Only this time you have more regression coefficients to report.
为了结束本练习，请写下此回归模型及其结果的 (APA) 格式的摘要（使用下一页顶部的框）。提示：格式设置与您上周在自测问题中所做的类似。只是这一次您有更多的回归系数需要报告。

Looking ahead to the next exercise, A limitation of Forced Entry methods is that whilst we can compare standardised betas we cannot do more than this to “formally test” the relative contribution of different independent variables within these types of regression models.
展望下一个练习，强制输入方法的一个局限性是，虽然我们可以比较标准化的beta ，但我们不能做更多的事情来“正式测试”这些类型的回归模型中不同自变量的相对贡献。

If we specifically want to test the unique contribution of one predictor to explaining variance in the outcome, after controlling for the effects of all others predictor variables, then a Hierarchical MLR model is required. Usage of this model will be the focus of Exercise 2.
如果我们特别想在控制所有其他预测变量的影响后测试一个预测变量对解释结果方差的独特贡献，则需要分层MLR 模型。该模型的使用将是练习 2 的重点。

Exercise 2a: Simple to Multiple Linear Regression
练习 2a：简单到多元线性回归

In the box below
在下面的框中is a description of an observational study, run by a university admissions team
是对由大学招生团队进行的观察性研究的描述.

Task 1: Is Intrinsic Motivation Inventory Score an useful predictor of exam performance?
任务 1 ：内在动机量表分数是考试成绩的有用预测指标吗？

To investigate whether Intrinsic Motivation is a good predictor of Entrance exam score we will use simple linear regression to explore the dataset: Exercise2a_dataset.sav. To do this we will:
为了研究内在动机是否可以很好地预测入学考试成绩，我们将使用简单的线性回归来探索数据集：Exercise2a_dataset.sav 。为此，我们将：

Run the regression model
运行回归模型

AND, this time, we’ll also run through some additional steps you should follow to check the assumptions for running a valid regression model are met
并且，这一次，我们还将执行一些您应该遵循的附加步骤，以检查是否满足运行有效回归模型的假设

Note: I skipped over checking assumptions in the first exercise but it is important you run these checks whenever using regression models. We’ll introduce a few absolutely key assumptions you should check here, than discuss a few more in Exercise 2b to give you a comprehensive list
注意：我在第一个练习中跳过了检查假设，但在使用回归模型时运行这些检查很重要。我们将介绍一些绝对关键的假设，您应该在此处检查，然后在练习 2b 中讨论更多假设，以便为您提供全面的列表:

First (before starting any linear regression model), we need to check the assumption that there’s a plausible linear relationship between each independent variable (i.e. the two measures of Intrinsic Motivation) and the outcome (i.e. Exam Performance). Using the skills learned in previous weeks is it safe to say we satisfy this assumption of “linearity”?
首先（在开始任何线性回归模型之前），我们需要检查每个自变量（即内在动机的两个度量）和结果（即考试成绩）之间是否存在合理的线性关系的假设。使用前几周学到的技能是否可以肯定地说我们满足了“线性”假设？

Next, we set up the regression model. Normally, as we do this, we request a number of further checks of our assumptions. However, we’ll leave this until we get onto Exercise 2b. For now just request the following (which amounts to doing a simple linear regression, like in Exercise 1)
接下来，我们建立回归模型。通常，当我们这样做时，我们会要求对我们的假设进行一些进一步的检查。不过，我们将把它留到练习 2b 为止。现在只需请求以下内容（相当于进行简单的线性回归，如练习 1 中所示）:

Select Analyze > Regression > Linear…
选择分析> 回归 > 线性...

Into the Dependent variable box enter the variable we want to predict (i.e. Entrance Exam Performance [Entr_Exam])
在因变量框中输入我们想要预测的变量（即入学考试成绩[ Entr_Exam ] ）

In to the Independent variable box enter the variable we want to use to try and predict the dependent variable (i.e. Intrinsic Motivation Inventory score [IM])
在自变量框中输入我们想要用来尝试和预测因变量的变量（即内在动机库存分数[IM] ）

Select OK
选择确定.

Based on the output from this model can you answer the following question?
根据该模型的输出，您能回答以下问题吗？

Looking at the ANOVA table is our model a good fit for the data? What value(s) specifically suggest this?
查看方差分析表，我们的模型是否适合数据？什么值具体表明了这一点？

Task 2 - Is Intrinsic Motivation Inventory Score a useful predictor of exam performance, after controlling for variability in exam score that is explained by Hours of Independent Study?
任务 2 -在控制了由独立学习时间解释的考试分数的变异性后，内在动机清单分数是否是考试成绩的有用预测指标？

Now we’re going to run a hierarchical multiple linear regression that builds on top of the simple linear regression model we just created for Task 1.
现在，我们将运行一个分层多元线性回归，该回归建立在我们刚刚为任务 1 创建的简单线性回归模型之上。

Hierarchical MLRs are different from Forced Entry models in that they comprise of multiple nested blocks. This means the researcher must choose the order in which variables are entered into the model. This requires the researcher to be hypothesis driven about the order of entry (usually deciding the order based on prior research or theories). This has the advantage of allowing for comparisons to be made between blocks, enabling the unique contribution of specific predictors added later in a series of steps/blocks to be quantified and tested.
分层MLR 与强制进入模型不同，因为它们由多个嵌套块组成。这意味着研究人员必须选择变量输入模型的顺序。这要求研究人员对输入顺序进行假设驱动（通常根据先前的研究或理论来决定顺序）。这样做的优点是允许在块之间进行比较，从而能够量化和测试随后在一系列步骤/块中添加的特定预测变量的独特贡献。

At the same time, it is still possible to ask questions about the overall fit of the model, as we did with Forced Entry (see illustration on next page). This illustration depicts a hierarchical model developed to predict weight from height, gender and age. The blue text boxes highlight the different types of questions that can be used to ask using this approach, by comparing between different blocks.
同时，仍然可以询问有关模型整体拟合的问题，就像我们对强制进入所做的那样（请参见下一页的插图）。该图描绘了一个分层模型，该模型是为了根据身高、性别和年龄预测体重而开发的。通过比较不同的块，蓝色文本框突出显示了可以使用此方法提出的不同类型的问题。

Baseline Model (Block 0)
基线模型（块0 ）

Block 1
1号座

Block 2
第2座

Block 3
第3座

[No Predictors]
[无预测]

Height
高度

+ Gender
+ 性别

Height
高度

Gender
性别

+ Age
+ 年龄

The term ‘nested’ is often used to describe the blocks/steps in a hierarchical regression model because each block builds on the previous one (i.e. it is ‘nested within it’). For example, the Independent Variables from block 1 are carried over to block 2, where some more are added in. This allows investigation of the impact of incrementally adding ‘one more thing’ at each stage of the hierarchical regression model.
“嵌套”一词通常用于描述分层回归模型中的块/步骤，因为每个块都建立在前一个块的基础上（即“嵌套在其中”）。例如，块 1 中的自变量被转移到块 2，其中添加了更多变量。这允许调查在分层回归模型的每个阶段增量添加“另一件事”的影响。

Getting back to the example in this exercise, in the first Block of our hierarchical model we want to enter Independent Study Hours, to control for its influence on Exam Performance. Then we want to add in Intrinsic Motivation at the second step, to see if it is still a significant predictor of performance after controlling for variation already explained by Independent Study Hours. To run this analysis
回到本练习中的示例，在分层模型的第一个块中，我们要输入独立学习时间，以控制其对考试成绩的影响。然后我们想在第二步添加内在动机，看看在控制独立学习时间已经解释的变化后，它是否仍然是表现的重要预测因素。运行此分析:

Selecting Analyze > Regression > Linear…
选择分析% 3E回归 > 线性...

Into the Dependent variable box enter the variable we want to predict (i.e. Entrance Exam Performance)
在因变量框中输入我们想要预测的变量（即入学考试成绩）

In to the Independent(s) variable box enter the variable we want to control for (i.e. Independent Study score)
在独立变量框中输入我们想要控制的变量（即独立研究分数）

Then click on the Next button to move to the next ‘Block’ of our hierarchical model. Into this next step add the Independent(s) variable that we want to evaluate as a predictor, after controlling for the variable(s) already entered in Step 1 (note SPSS terms these ‘steps’ as Blocks). So that’ll be: Intrinsic Motivation Inventory score [IM]
然后单击“下一步”按钮移至分层模型的下一个“块” 。在控制步骤 1 中已输入的变量后，在下一步中添加我们想要作为预测变量进行评估的独立变量（请注意 SPSS 将这些“步骤”称为“块”）。所以那就是：内在动机库存分数[IM]

Next, click on the Statistics… option and make sure the option for R squared change is selected. This will ensure we get some extra values in our output that test for whether the increase in explanatory power of the model from one block/step to the next is statistically significant. In other words: do the Independent Variable(s) added in later blocks add further ‘unique’ variance to the proportion of variability in the outcome already systematically explained by independent variables added at earlier steps nested models?
接下来，单击“统计...”选项，并确保选择“R 平方变化”选项。这将确保我们在输出中获得一些额外的值，用于测试模型的解释力是否从一个块/步骤到下一个块/步骤有所增加。具有统计显着性。换句话说：在后面的块中添加的自变量是否会进一步向结果中的变异性比例添加“独特”方差，该结果已经由在早期步骤嵌套模型中添加的自变量系统地解释了？

To finish Click Continue and then Click OK
要完成，请单击“继续” ，然后单击“确定”

Based on the output answer the following questions:
根据输出回答以下问题：

Looking at the ANOVA table, do both the Step 1 and Step 2 regression models explain a significant amount of variability in the outcome? What values indicate this?
查看方差分析表，步骤 1 和步骤 2 回归模型是否都能解释结果中的大量变异？什么值表明了这一点？

Looking at the Model Summary how much variability does the model only containing Independent Study only explain (Hint: it’ll be an RSquare value you’reinterested in)?
查看模型摘要，仅包含独立研究的模型仅解释了多少变异性（提示：它将是您感兴趣的R平方值）？

Looking at the Model Summary how much variability does the model containing Independent Study and Intrinsic Motivation explain (Hint: it’ll be an RSquare value you’reinterested in)?
查看模型摘要，包含独立研究和内在动机的模型解释了多少变异性（提示：它将是您感兴趣的 R平方值）？

Looking at the
看着Model Summary
型号概要 how much more variability does
还有多少可变性the addition
添加 of Intrinsic Motivation explain and is this increase statistically significant (Hint: you’ll be interested in ‘Change
内在动机的解释，这种增加在统计上是否显着（提示：您会对“改变”感兴趣） Statistics
统计数据’ values
' 价值观 for Model 2
对于模型 2)?
）？

If your struggling with understanding the differences between what the F value in the ANOVA table represents and the F Change in the model summary represents consider this illustration below:
如果您难以理解方差分析表中的 F 值所代表的内容与模型摘要中的 F 变化所代表的内容之间的差异，请考虑下面的图示：

Looking at the Coefficients table, which of the two Independent variables in Step 2 is having the greater influence in this model on exam performance (Hint: use a standardised betas to make comparisons)? Also, what surprising finding is there regarding one of these regression coefficients?
查看系数表，步骤 2 中的两个独立变量中的哪一个对该模型中的考试成绩影响更大（提示：使用标准化贝塔进行比较）？另外，关于这些回归系数之一有什么令人惊讶的发现？

Because of the tiered (AKA ‘hierarchical) nature of entry in Hierarchical Regression Model we would typically describe and interpret the regression coefficient for Intrinsic Motivation, entered at step 2, as ‘the relationship between Exam Score and Intrinsic Motivation after controlling for² Hours of Independent Study’. In what way does this allow us to say more about the contribution specific independent variables are making within our regression model, compared to if we’d used only a Forced Entry approach?
由于分层回归模型中输入的分层（又称“分层”）性质，我们通常会描述和解释在步骤 2 中输入的内在动机的回归系数，即“控制²小时后考试分数与内在动机之间的关系”独立研究' 。与仅使用强制进入方法相比，这如何让我们更多地了解特定自变量在回归模型中所做的贡献？

Hint: recall the “limitation” discussed at the bottom of page 4
提示：回想一下第 4 页底部讨论的“限制”

Exercise 2b: Multiple Linear Regression
练习 2b：多元线性回归

Below is a description of the data from an observational study, which is suitable to be analysed using a slightly more complicated hierarchical multiple regression design.
以下是对观察性研究数据的描述，适合使用稍微复杂的层次多元回归设计进行分析。

In this exercise we’re going to practice: (i) the proper formatting for presenting results from hierarchical regression models and (ii) checking the full set of assumptions we would normally check when running a regression model:
在本练习中，我们将练习：( i ) 用于呈现分层回归模型结果的正确格式，以及 (ii) 检查我们在运行回归模型时通常会检查的全套假设：

Task 1: run a comprehensive regression model, which includes checking all assumptions
任务1 ：运行综合回归模型，其中包括检查所有假设

For this exercise use the following dataset (i.e. Exercise2b_dataset.sav)
对于本练习，请使用以下数据集（即Exercise2b_dataset.sav ）

To carry out the analysis, enter the variables in two steps (corresponding to two sets) using the Hierarchical linear regression technique we used in Exercise 2a.
要进行分析，请使用练习 2a 中使用的d的分层线性回归技术分两步输入变量（对应于两个集合）。

In Block 1 enter: loc, positive, negative
在块 1 中输入：loc、正、负

Then click Next and in the next Block enter: compul, control, impair, delegate, selfw
然后单击“下一步”，在下一个块中输入：compul、control、impair、delegate、selfw

Specify
指定disaff
不满 as the
作为Dependent
依赖者 variable
多变的

We’re also going to request a full diagnostic check of all our assumptions³. Specifically:
我们还将要求对我们的所有假设进行全面的诊断检查³ 。具体来说：

In the linear regression window in Statistics make sure the following options are selected:
在统计的线性回归窗口中，确保选择以下选项：

In Regression Coefficients: Estimates and Confidence Intervals
回归系数：估计值和置信区间

In Residuals: Durbin-Watson and Casewise diagnostics, with Outliers outside 2 standard deviations specified.
在残差中： Durbin-Watson和Casewise诊断，指定的异常值超出 2 个标准差。

Model Fit, R Squared change, Descriptives, and Collinearity diagnostics.
模型拟合、 R 平方变化、描述和共线性诊断。

In plots request plots for:
在绘图中请求绘图：

*ZRESID (y-axis) against *ZPRED (x-axis) to check assumptions of independent errors, homoscedasticity and linearity.
*ZRESID （y 轴）对照*ZPRED （x 轴）来检查独立误差、同方差性和线性的假设。

Also tick for a Histogram and Normal probability plot for your Standardized Residual Plots
还可以勾选标准化残差图的直方图和正态概率图

In the Save menu ask SPSS to create new variable list for each case in your dataset:
在“保存”菜单中，要求 SPSS 为数据集中的每个案例创建新的变量列表：

Standardized residuals
标准化残差

Mahalonobis, Cook’s and Leverage Value distances
马哈洛诺比斯距离、库克距离和杠杆价值距离

DFBeta(s) and Covariance ratio Influence Statistics
DFbeta (s)和协方差比影响统计

To run the model, come out of the side menus and select OK.
要运行模型，请退出侧面菜单并选择“确定”。

Task 2: interpret your results
任务 2：解释结果

Looking at your output. I want you to first of all take a look at and familiarise yourself with:
看看你的输出。我希望您首先看一下并熟悉一下：

Descriptive Stats: useful for reporting the data and for getting a feel for the distributions.
描述性统计：对于报告数据和了解分布很有用。

Correlation Matrix: Examine the (zero order) r values for marital disaffection against the predictor variables. You’ll see comparatively strong associations with control and impair (r >.300), whilst all others apart from locus of control seem to be statistically significant but relatively small in contributors in terms of their Effect Sizes.
相关矩阵：检查婚姻不感情与预测变量的（零阶） r值。您将看到与控制和损害( r >.3 00 )相关性相对较强，而除控制点之外的所有其他因素似乎在统计上均显着，但就其效应大小而言，贡献者相对较小。

The Hierarchical Regression Analysis: The model summary shows a statistically significant relationship with marital disharmony being indicated in Model 1 (although R²suggests this has a small effect size). There’s then a significant increase in model fit with the addition of the second block of variables for Model 2 (see the jump in the R²value from model 1 to model 2). The coefficients output shows the regression weights (AKA the Standardized Regression Coefficients) for each variable within each given step/block of the model. This shows, for block 1, the only significant contributor is negative affect, whilst in block 2 (our main focus in this study), both control and impair seem to be significant predictors (negative affect is no longer so). Also note, we know have confidence interval for each of these parameter estimates because we asked for them in step 1a when setting up this regression model.
层次回归分析：模型摘要显示与模型 1中所示的婚姻不和谐存在统计显着关系（尽管R ²表明这具有较小的效应量）。然后，随着模型 2 的第二个变量块的添加，模型拟合度显着增加（参见R ²值从模型 1 到模型 2的跳跃）。系数输出显示模型的每个给定步骤/块内每个变量的回归权重（也称为标准化回归系数）。这表明，对于块 1，唯一重要的贡献者是负面影响，而在块 2（我们本研究的主要焦点）中，控制和损害似乎都是显着的预测因素（负面影响不再如此）。另请注意，我们知道每个参数估计值都有置信区间，因为我们在设置此回归模型时在步骤 1a 中要求了它们。

This analysis could be written up as demonstrated below:
该分析可以写成如下所示：

“To examine a unique contribution that workaholism may make to explaining marital disaffection, a hierarchical multiple regression analysis was performed. Variables that explain marital disaffection were entered in two steps. In step 1, marital disaffection was the dependent variable and locus of control (LOC), positive affect, and negative affect were the independent variables. In step 2, the subscales scores from the WART (Workaholism questionnaire) were added to the regression model.
“为了检验工作狂可能对解释婚姻不满做出的独特贡献，我们进行了分层多元回归分析。解释婚姻不满的变量分两步输入。在步骤 1 中，婚姻不满是因变量，控制点 (LOC)、积极情感和消极情感是自变量。在步骤 2 中，将 WART（工作狂问卷）的子量表分数添加到回归模型中。

The results of step 1 indicated that the variance accounted for (R²) by the first three independent variables (LOC, positive and negative affects) equalled .04 (adjusted R² = .03), which was significantly significant (F(3, 290) = 3.47, p = .017). Negative affect was the only statistically significant independent variable, b_Negative = 2.27 [0.44, 4.11], p = .015. In step 2, the five subscales of the WART were entered into the regression equation. The change in variance accounted for (ΔR²) was equal to .16, which was significantly different from zero (ΔF(5, 285) = 11.56, p < .001). At this step, only two of the subscales of workaholism contributed significantly to the explanation of marital disaffection though, namely, control (b_Control =3,83 [1.47, 6.40], p = .002) and impaired communication (b_I_mpair_. = 5.38 [2.79, 7.97], p < .001). Standardised beta values for these coefficients were similar in size (b*_Control = .233, b*_Impair.= .269)”
步骤 1 的结果表明，前三个自变量（LOC、正面和负面影响）所占的方差 ( R ² ) 等于 0.04（调整后的R ² = 0.03），显着显着 ( F ( 3, 290) = 3.47, p = .017)。负面影响是唯一具有统计显着性的自变量， b_负面= 2.27 [0.44, 4.11]， p = .015。在步骤 2 中，将 WART 的五个子量表输入回归方程。方差变化 ( ΔR ² ) 等于 0.16，与零显着不同 ( Δ F ( 5, 285) = 11.56, p < .001)。在此步骤中，只有两个工作狂分量表对婚姻不满的解释做出了显着贡献，即控制（ b _Control =3,83 [1.47, 6.40], p = .002）和沟通障碍（ b _mpair = 5.38 [2.79, 7.97], p < .001)。这些系数的标准化beta值大小相似（ b*_控制= .233， b*_损害= .269） ”

Notes: Instead of writing ‘change’ to indicate when an R² or F is referring to a change in model fit it is technically more correct to use the “Δ” symbol.
注意：在 R ²或 F 指的是模型拟合的变化时，不写“变化”来表示，从技术上讲，使用“ Δ”符号更为正确。

Here I’ve again reported b’s and standardised betas (b*), just to give you both. Notice that for the b’s I’ve also included the confidence interval in brackets, which is again optional but considered good practice.
在这里，我再次报告了 b 和标准化贝塔值 (b*)，只是为了给你们两者。请注意，对于 b，我还在括号中包含了置信区间，这也是可选的，但被认为是良好的做法。

Can you see where in the output values have come from and why they are being used at certain points in the summary above to support the specific interpretive statements? If anything appears inconsistent/odd/unclear to you in this summary ask questions in the practical!
您能看出输出值来自何处以及为什么在上面摘要中的某些点使用它们来支持特定的解释语句吗？如果本摘要中出现任何不一致/奇怪/不清楚的内容，请在实践中提出问题！

On the next pages is a complete (APA formatted) multiple regression summary table. Note how within the F, ∆F and β columns a legend is used to indicate values where the corresponding p-value is also statistical significance (e.g. * p<.05; ** p<.01; *** p<.001). Also, see that values for ∆R²and ∆F for Model 1 are omitted as these are identical to the overall model fit for this block. In other words, the overall fit of the first block compared to baseline model is one and the same as the ‘change’ in fit between block 1 and the baseline model, so you don’t write it twice!
接下来的几页是完整的（APA 格式）多元回归汇总表。请注意，在F 、 ΔF和β列中，图例如何用于指示相应 p 值也具有统计显着性的值（例如 * p<.05；** p< .01； *** p% 3C。001 ）。另请注意，模型 1 的ΔR ²和ΔF值被省略，因为这些值与该块的整体模型拟合相同。换句话说，第一个块与基线模型相比的整体拟合度是一个，并且与块 1 和基线模型之间的拟合度“变化”相同，因此您不必写两次！

After all, some predictors entered within the same block may be particularly strongly related to the outcome, whilst others might not be related at all!
毕竟，在同一块中输入的一些预测变量可能与结果特别密切相关，而其他预测变量可能根本不相关！
i.e. holding constant
即保持不变
Except linearity because we showed you how to check this in Exercise 2a (i.e. with scatterplots)
线性除外，因为我们在练习 2a 中向您展示了如何检查这一点（即使用散点图）

Using the format demonstrated in this ‘example’ table , enter the missing values into the APA formatted table on page 14 to report your hierarchical regression results for this Exercise.
使用此“示例”表中演示的格式，将缺失值输入到第 14 页的 APA 格式表中，以报告本练习的层次回归结果。

Table 1. Hierarchical Linear Regression reporting predictors of Exam Score
表 1 .分层线性回归报告考试分数的预测因素

	Model 1 型号1		Model 2 型号2
Variable 多变的	B [SE] 乙[东南]	β	B [SE] 乙[东南]	β

Constant 持续的	134.14 [7.54]		-26.61 [17.35]
A Level points A Level 分数	0.10 [0.01]	.58*	0.09 [0.01] 0.09 [ 0.01]	.51** .51* *
Hours Studied 学习时间			3.37 [0.28] 3.37 [ 0.28]	.51*
Interest in course 对课程的兴趣			11.09 [2.44] 11.09 [ 2.44]	.19*** .19* **

R²	.34		.67
F	99.59*** 99.59* **		129.50*** 129.50* **
∆R² ΔR ²			.33
∆F ΔF			96.45** 96.45* *

Note: N = 242. SE = Standard Error
注： N = 242。SE = 标准误差

* p<.05; ** p<.01; *** p<.001
* p<.05； ** p< .01； *** p<。 001

Additional notes on reporting Regression results in a table:
有关在表格中报告回归结果的附加说明：

Instead of b and b*, the Regression Coefficient and it’s Standardised equivalent are represented as B and β in tables. I’ve no idea why this change in notation is necessary but it’s how it’s formatted on page 219 of the APA manual (7^th Edition). To be honest, I’d accept either notation.
回归系数及其标准化等效值在表中表示为B和β ，而不是b 和 b* 。我不知道为什么需要对符号进行这种更改，但这就是 APA 手册（^第7版）第 219 页上的格式。老实说，我会接受任何一种表示法。

In addition to the formatting above it is now also common to report the confidence intervals around the parameter estimates (i.e. the B’s) as they’re a more informative ‘translation’ of what the SE value is trying to represent. I’ve omitted these above but this is something Andy Field recommends reporting as well (see Chapter 9, Section 9.14) and you have these in your output already, in the Coefficients table.
除了上面的格式之外，现在报告参数估计值周围的置信区间（即B ）也很常见，因为它们是 SE 值试图表示的内容的信息更丰富的“翻译” 。我在上面省略了这些内容，但这也是 Andy Field 建议报告的内容（请参阅第 9 章第 9.14 节），并且您已经在输出中的“系数”表中包含了这些内容。

If you’re being particularly thorough you can calculate bias corrected confidence intervals using bootstrapping, how to do this is explained in Chapter 9, Section 9.11.5.
如果您特别彻底，可以使用 bootstrapping 计算偏差校正置信区间，第 9 章第 9.11.5 节解释了如何执行此操作。

Table 1. Hierarchical Linear Regression reporting predictors of Marital Disaffection
表 1 .分层线性回归报告婚姻不满的预测因素

	Step 1 步骤1		Step 2 步骤2
Variable 多变的	B [SE] 乙[东南]	β	B [SE] 乙[东南]	β

Constant 持续的	42.21 [4.97] 42.21 [ 4.97 ]		26.84 [5.30] 26 . 84 [ 5.30 ]
Locus of Control 控制点	0.12 [0.17] 0.12 [ 0.17 ]	.04	0.07 [0.16] 0.07 [ 0.1 6 ]	.02
Positive Affect 积极影响	-1.36 [1.11] -1.36 [ 1.1 1 ]	-.07	-0.78 [1.03] -0.78 [ 1.03 ]	-.04
Negative Affect 负面影响	2.27 [0.93] 2.27 [ 0.93 ]	.15* .1 5 *	1.37 [0.88] 1.37 [ 0.88 ]	.09
Compulsivity 强迫症			-1.52 [1.14] -1.52 [ 1.14 ]	-.09
Control 控制			3.93 [1.25] 3.93 [ 1.25 ]	.23 .23
Impairment 减值			5.38 [1.31] 5.38 [ 1.31 ]	.27***
Delegation 代表团			1.40 [0.77] 1.40 [ 0.77 ]	.10
Self Worth 自我价值			-0.76 [0.91] -0.76 [ 0.91 ]	-.05

R²	.04		.20
F	3.47* 3.47 *		8.77* 8.77 *
∆R² ΔR ²			.16
∆F ΔF			11.58* 1 1.58 *

Note: N = 294. SE = Standard Error
注： N = 294 。 SE = 标准误差

* p < .05; ** p < .01; ^*** p < .001
* p < .05； ** p < .01； ^*** p < .001

Task 3: Checking your regression model’s assumptions
任务 3：检查回归模型的假设

I would like you to spend a little time checking the assumptions underpinning your model. If any of these assumptions haven’t been met then you may need to consider whether it is appropriate to generalise this model’s finding beyond the specific sample it describes. In other words, significant results in your sample are much less likely to be generalisable to the wider population if your model violates certain assumptions, which is why it is important when you’re running a regression ‘for real’ that you check all the things we’re about to discuss!
我希望您花一点时间检查支撑您的模型的假设。如果这些假设中的任何一个尚未得到满足，那么您可能需要考虑将该模型的发现推广到其描述的特定样本之外是否合适。换句话说，如果您的模型违反了某些假设，则样本中的显着结果不太可能推广到更广泛的人群，这就是为什么当您“真正”运行回归时检查所有内容非常重要我们即将讨论！

Andy Field spends a considerable amount of time in his Regression chapter (Chapter 9) explaining the various different ways you can the necessary assumptions within regression models and I’ve referenced the relevant sections next to each heading on the following three pages…
安迪·菲尔德 (Andy Field) 在他的回归章节（第 9 章）中花费了大量时间，解释了在回归模型中进行必要假设的各种不同方法，我在接下来的三页中引用了每个标题旁边的相关部分……

Errors are Independent (Chapter 9, section 9.4.1)
错误是独立的（第 9 章，第 9.4.1 节）

Inspect the Durbin-Watson value in the table labelled Model Summary
检查标记为“模型摘要”的表中的Durbin-Watson值

Check that the value is not less than 1 or greater than 3. Ideally it should be close to 2.
检查该值是否不小于 1 或大于 3。理想情况下，该值应接近 2。

Do your results suggest you should have any concerns?
您的结果是否表明您应该有任何担忧？

Multicollinearity (Chapter 9, sections 9.4.1 and 9.9.3, 9.11.5)
多重共线性（第 9 章，第 9.4.1 节和 9.9.3、9.11.5 节）

Inspect the VIF values from the table labelled Coefficients
检查标有“系数”的表中的VIF值

Check that none are greater than ten and that the average of these VIF values is not substantially greater than 1.
检查所有 VIF 值均不大于 10，并且这些 VIF 值的平均值不显着大于 1。

Do your results suggest you should have any concerns?
您的结果是否表明您应该有任何担忧？

Bias in the model: model assumptions (Chapter 9, sections 9.4.1, 9.10.3, 9.11.7)
模型中的偏差：模型假设（第 9 章，第 9.4.1、9.10.3、9.11.7 节）

Inspect the plots produced for this regression model:
检查为此回归模型生成的图：

For the ZRESID* against ZPRED* graphs their distributions look problematic. In this case the fact the residual errors this graph represents seem to be more spread out towards the right and upper areas of the graph. This potentially indicates issues with heteroscedasticity (i.e. a violation of the assumption that residual error is normally distributed across observations in the model).
对于ZRESID* 与 ZPRED* 图，它们的分布看起来有问题。在这种情况下，该图表示的残余误差似乎更加分散到图的右侧和上部区域。这可能表明存在异方差性问题（即违反了残差误差在模型中的观测值之间呈正态分布的假设）。

The histogram of the residuals is also slightly skewed (although I’ve seen worse!) and the distribution and the PP-plot shows an s-bend curve away from the ‘normal’ line. So I’d be concerned that there are non-normally distributed errors here too.
残差的直方图也略有倾斜（尽管我见过更糟糕的！），并且分布和PP图显示远离“正常”线的s 弯曲曲线。所以我担心这里也存在非正态分布的错误。

How we might address these violations is something well come back to in Exercise 3…
我们如何解决这些违规行为将在练习3中得到很好的讨论……

Bias in the model: casewise diagnostics of residuals (Field Section: 9.3, 9.10.4, 9.11.6)
模型中的偏差：残差的个案诊断（字段部分：9.3、9.10.4、9.11.6 ）

Here we are mostly examining the numerous newly saved variables that will have been added to your dataset following this analysis:
在这里，我们主要检查大量新保存的变量，这些变量将在分析后添加到您的数据集中：

Firstly, in the output check the Casewise Diagnostics table (which flags up potentially outlying participants in terms of their residuals). Ensure no more than 5% of the overall sample size appears in this box and no more than 1% of the Sample has a Std. Residual value > 3.
首先，在输出中检查Casewise Diagnostics表（根据残差标记潜在的外围参与者）。确保此框中出现的样本量不超过总样本量的 5%，并且样本的标准值不超过 1%。残值> 3.

Do your results suggest you should have any concerns?
您的结果是否表明您应该有任何担忧？

In the data view find the column for:
在数据视图中找到以下列：

Cook’s distance (hint: full names of these new columns are labelled clearly in Variable View) and check none of these values are > 1.
Cook 距离（提示：这些新列的全名在变量视图中清楚地标记）并检查这些值都不是 > 1。

Also Calculate the average leverage (hint: use Analyze > Descriptives Statistics > Descriptives to work out what the mean is for this variables) and check for values 2 or 3 times larger than this average.
还计算平均杠杆（提示：使用分析> 描述性统计 > 描述性来计算该变量的平均值）并检查值s是否比该平均值大 2 或 3 倍。

Mahalanobis distances and, given our sample is quite large, check there are no participants for whom this value is greater than >25.
马哈拉诺比斯距离，并且鉴于我们的样本相当大，请检查该值是否大于 >25 的参与者。

Standardised DfBeta (there’s one of these for each predictor and the intercept) and check that within all of these there are no values > 1.
标准化DfBeta （每个预测变量和截距都有一个）并检查所有这些中是否没有值 > 1。

Covariance Ratio
协方差比values and check none exceed the upper or lower limit of an acceptable threshold which is based on a multiplier
值并检查是否不超过基于乘数的可接受阈值的上限或下限of
的 number
数字 of independent variables offset by the number of observations (
自变量的偏移量被观测值的数量所抵消（Chapter 9, Section: 9.3.2
第 9 章，第 9.3.2 节). Spe
）。斯佩cifically, the calculation is: 1
具体计算为： 1 ± [3(k+1)/n], where k is the number of independent variables in your final model and n is the number of observations in the sample
[3(k+1)/n]，其中 k 是最终模型中自变量的数量，n 是样本中观测值的数量

Do your results suggest you should have any concerns?
您的结果是否表明您应该有任何担忧？

Finally, having checked all these assumptions what would be your next steps with this model? Would you be keen to publish the results of your model, reporting your findings regarding certain aspects of Workaholism’s relationship within Marital Disaffection?
最后，检查完所有这些假设后，您对该模型的下一步计划是什么？您是否愿意发布您的模型结果，报告您关于工作狂与婚姻不满之间关系的某些方面的发现？

Exercise 3: Tackling Sources of Bias in a Regression Model
练习 3：解决回归模型中的偏差来源

There are several different ways of trying to tackle violated assumptions in Regression Models, these include:
有几种不同的方法可以尝试解决回归模型中违反假设的问题，其中包括：

In some instances extreme cases will be driving your violated assumptions (i.e. ‘outliers’) and we discussed how to manage these in different ways in Practical 1. In regression modelling it is important to still check for implausible outliers (e.g. using boxplots) but it is also important to bear in mind that if you spot plausible outliers in these plots, whose caseswise diagnostics (Exercise 2b) subsequently reveal they have little impact, then you should definitely not be excluding them automatically from further analysis. Doing so is not justifiable as the diagnostics suggest their inclusion is having little to no impact on the overall fit of the model (see Chapter 9, Section 9.3.3).
在某些情况下，极端情况将导致您违反假设（即“异常值”），我们在实践 1 中讨论了如何以不同的方式管理这些情况。在回归建模中，仍然检查不可信的异常值（例如使用箱线图）很重要，但它同样重要的是要记住，如果您在这些图中发现了看似合理的异常值，并且其个案诊断（练习 2b）随后显示它们几乎没有影响，那么您绝对不应该自动将它们排除在进一步分析之外。这样做是不合理的，因为诊断表明它们的包含对模型的整体拟合几乎没有影响（参见第 9 章，第 9.3.3 节）。

A ‘classic’ approach to trying to improve model fit when issues of heteroscedasticity or non-normally distributed residuals arise is to look for non-normally distributed variables in your model and see whether mathematically transforming these variables will improve the normality of their distributions (see Chapter 6, Section 6.12.4). We’ll use the outcome variable from Exercise 2b to demonstrate how to do this:
当出现异方差或非正态分布残差问题时，尝试改进模型拟合的“经典”方法是在模型中查找非正态分布变量，并查看对这些变量进行数学转换是否会提高其分布的正态性（请参阅第 6 章第 6.12.4 节）。我们将使用练习 2b 中的结果变量来演示如何执行此操作：

First, Explore the distribution of the Marital Disaffection (disaff) outcome variable and note how skewed it appears on a histogram
首先，探索婚姻不满 ( disaff ) 结果变量的分布，并注意它在直方图上的倾斜程度

This variable is what we’d call ‘positively’ skewed because there is a long ‘tail’ of values at the upper end of this measurement scale, whilst the majority of values are clustered towards the lower end. If the tail ran in the opposite direction we would term this data ‘negatively’ skewed.
这个变量就是我们所说的“正”倾斜，因为在这个测量尺度的上端有一个很长的值“尾巴”，而大多数值都聚集在下端。如果尾部朝相反方向运行，我们将将此数据称为“负”偏斜。

We can reduce skew in positively skewed variables by applying a mathematical transformation to every observed value. For example, we could calculate the:
我们可以通过对每个观测值应用数学变换来减少正偏变量的偏斜。例如，我们可以计算：

Square root…
平方根…

log (natural or to the base 10)…
log（自然数或以 10 为底）...

or reciprocal (i.e. 1/x) … of every observed value in a skewed variable
或偏态变量中每个观测值的倒数（即 1/x）

Experiment with applying each of these transformations to disaff using the compute variable function, the steps you need to follow in each case will be:
尝试使用计算变量函数将这些转换应用于disaff ，在每种情况下您需要遵循的步骤如下：

Transform > Compute Variable
转换 > 计算变量

Give your Target Variable an informative name that indicates the type of transformation you’re applying (e.g. SqrRt_disaff)
为您的目标变量提供一个信息丰富的名称，指示您正在应用的转换类型（例如SqrRt_disaff ）

Depending on which of the four transforms you applying enter the corresponding syntax in the Numeric Expression box. If creating a…
根据您应用的四种转换中的哪一种，在数字表达式框中输入相应的语法。如果创建一个...

… Square Root transform use: SQRT(disaff)
… 平方根变换使用： SQRT( disaff )

… Natural Log transform use: LN(disaff)
… 自然对数变换使用： LN( disaff )

… Log₁₀transform use: LG10(disaff)
… Log ₁₀变换使用：LG10( disaff )

… Reciprocal transform use: 1/(disaff)
… 倒数变换使用：1/( disaff )

Click OK
单击“确定”

Use the Explore function again to evaluate the distribution of the outcome variable with these different types of transforms applied. Do any of the above appear to have been particularly effective at reducing the skew originally observed?
再次使用Explore函数来评估应用了这些不同类型变换的结果变量的分布。上述任何一项似乎对减少最初观察到的偏差特别有效吗？

Important Note: Having applied a transform that gives the outcome variable a more normal distribution you’d have to re-run your regression model, substituting in this transformed version for the original outcome variable. You’d then be looking to see if this normalised the casewise diagnostics and residual graphs that we were concerned about after 2b’s analysis.
重要提示：应用使结果变量具有更正态分布的变换后，您必须重新运行回归模型，用此变换版本替换原始结果变量。然后，您将查看这是否标准化了我们在 2b 分析后所关心的逐个案例诊断和残差图。

In other words, it is not a ‘sure thing’ that transforming your data will fix problems of non-normality or heteroscedasticity. This is because, technically, the problem lies with the lack of normality in the residual errors in your models, not a lack of normality in the actual variables themselves. Only sometimes will you find it to be the case that transforming your variables, so their distribution is more normal, in turn has a beneficial effect on the distribution of the residuals in your regression model.
换句话说，转换数据就能解决非正态性或异方差问题并不是“确定的事情”。这是因为，从技术上讲，问题在于模型中残差s缺乏正态性，而不是实际变量本身缺乏正态性。只有有时您会发现转换变量的情况，使它们的分布更加正态，进而对回归模型中残差的分布产生有益的影响。

One further option, which cannot be done easily in SPSS without installing an additional software package called R, which SPSS then needs to use as a plugin (see Chapter 4, Section 4.13.2), is to use ‘Robust’ Regression.
另一种选择是使用“稳健”回归，如果不安装名为 R 的附加软件包，SPSS 需要将其用作插件（请参阅第 4 章，第 4.13.2 节），则无法在 SPSS 中轻松完成此操作。

Robust Regression is an alternative to traditional “assumption dependent” linear regression modelling techniques (see Chapter 6, Section 6.12.3). As its name suggests, its advantage is that the results it reports are more ‘robust’ to violations of assumptions and outliers (i.e. it is a method that can allow you to ignore violated assumptions, if you find you’re having problems with them).
稳健回归是传统“假设相关”线性回归建模技术的替代方法（参见第 6 章第 6.12.3 节）。顾名思义，它的优点是它报告的结果对于违反假设和异常值更加“稳健”（即，如果您发现违反假设的问题，它是一种可以让您忽略这些假设的方法）。

We will discuss in a little more detail, some of the differences between how robust and normal parametric tests work in the Research Methods presentation but for now I’ll simply demonstrate what a robust regression output in SPSS looks like for Exercise 2b, and talk you through interpreting its output:
我们将在研究方法演示中更详细地讨论稳健参数检验和正常参数检验的工作方式之间的一些差异，但现在我将简单地演示练习 2b 中 SPSS 中稳健回归输出的样子，并与您讨论通过解释其输出：

On this page are two screen shots. The uppermost one shows the regression coefficients and corresponding t-statistics for the original model calculated in Exercise 2b, the one below it and to the left shows these same statistics calculated using Robust methods. You can see there’s some variation in the size of the coefficients and the t-values (i.e. the test statistic that, if above a certain value, we’d judge to be statistically significant).
此页面上有两个屏幕截图。最上面的一个显示了练习 2b 中计算的原始模型的回归系数和相应的 t 统计量，下面左侧的一个显示了使用稳健方法计算的相同统计量。您可以看到系数和 t 值的大小存在一些变化（即检验统计量，如果高于某个值，我们将判断其具有统计显着性）。

Important Notes:
重要提示：

Installing R alongside SPSS and its plug-in are not something I expect you to do during your Masters and it is becoming increasingly problematic (e.g. the steps described in the Field textbook published two years ago are already out of date). If you want to try this analysis for yourself then steps for installing the software are described here and Andy Field’s textbook describes how to run Robust Regression in Chapter 9, Section 9.12. Based on several frustrating hours this week I have a feeling the current versions of these two programs don’t speak to each other and a workaround equivalent to the one described in Field has yet to be developed. My preferred strategy would be to program Robust Regression models directly in R. If time allows, we will give a brief introduction to using R in week 10 but this will be pitched at a much more introductory level (i.e. simply getting used to using this alternative data management and statistical modelling software).
我不希望你在硕士期间将 R 与 SPSS 及其插件一起安装，而且它变得越来越有问题（例如，两年前出版的 Field 教科书中描述的步骤已经过时了）。如果您想亲自尝试此分析，则此处描述了安装软件的步骤，Andy Field 的教科书在第 9 章第 9.12 节中描述了如何运行稳健回归。基于本周令人沮丧的几个小时，我有一种感觉，这两个程序的当前版本无法相互通信，并且尚未开发出与 Field 中描述的等效的解决方法。我的首选策略是直接在 R 中对鲁棒回归模型进行编程。如果时间允许，我们将在第 10 周简要介绍如何使用 R，但这将在更介绍性的水平上进行（即只是习惯使用这种替代数据管理和统计建模软件）。

The robust regression output in SPSS, somewhat unhelpfully, doesn’t tell us the exact significance values for each regression coefficients’ t-value BUT we can work around this quite easily. In large sample sizes (n>100) if the t-statistic is 1.984 or larger we have significance at the .05 level and if the value is larger than 2.626 we have significance at the .01 threshold (see Andy Field textbook Appendix A.2 p1000 for corroboration)¹.
SPSS 中的稳健回归输出虽然有些无用，但并没有告诉我们每个回归系数 t 值的确切显着性值，但我们可以很轻松地解决这个问题。在大样本量 (n >100) 中，如果 t 统计量为 1.984 或更大，我们在 0.05 水平上具有显着性，如果该值大于2.626 ，我们在 0.01 阈值上具有显着性（参见Andy Field 教科书附录 A. 2 p1000佐证） ¹ ．

Don’t believe me? You can confirm this statement for yourself by looking at the original regression output (i.e. where you have t and significance reported). You’ll seeing these rules apply to the significance threshold the exact p-values could be ‘rounded-up’ to for the various t-values
不相信我？您可以通过查看原始回归输出（即报告的t 和显着性）来亲自确认此陈述。您将看到这些规则适用于各种 t 值的精确 p 值可以“四舍五入”的显着性阈值