这是用户在 2024-7-15 6:26 为 https://cran.r-project.org/doc/manuals/r-release/R-intro.html 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

An Introduction to R
R的介绍¶

This is an introduction to R (“GNU S”), a language and environment for statistical computing and graphics. R is similar to the award-winning1 S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).
这是对R(“GNU S”)的介绍,R是一种用于统计计算和图形的语言和环境。R类似于由John Chambers等人在贝尔实验室开发的屡获殊荣的 1 S系统,它提供了各种统计和图形技术(线性和非线性建模,统计测试,时间序列分析,分类,聚类等)。

This manual provides information on data types, programming elements, statistical modelling and graphics.
本手册提供了关于数据类型、编程元素、统计建模和图形的信息。

This manual is for R, version 4.4.1 (2024-06-14).
本手册适用于R,版本4.4.1(2024-06-14)。

Copyright © 1990 W. N. Venables
版权所有© 1990 W. N. Venables

Copyright © 1992 W. N. Venables & D. M. Smith
版权所有© 1992 W. N. Venables & D. M.史密斯

Copyright © 1997 R. Gentleman & R. Ihaka
版权所有© 1997 R.绅士与R。Ihaka

Copyright © 1997, 1998 M. Maechler
Copyright © 1997,1998 M. Maechler

Copyright © 1999–2024 R Core Team
版权所有© 1999-2024 R核心团队

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
如果版权声明和本许可声明保留在所有副本上,则允许制作和分发本手册的逐字副本。

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
允许在逐字复制的条件下复制和发布本手册的修改版本,前提是整个衍生作品的发布都遵循与本手册相同的许可声明条款。

Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.
在上述修改版本的条件下,允许将本手册翻译成另一种语言进行复制和分发,但本许可声明可以在R核心团队批准的翻译中声明。

Table of Contents 目录


Preface 序言¶

This introduction to R is derived from an original set of notes describing the S and S-PLUS environments written in 1990–2 by Bill Venables and David M. Smith when at the University of Adelaide. We have made a number of small changes to reflect differences between the R and S programs, and expanded some of the material.
R的介绍源自Bill Venables和大卫M在1990-2年撰写的一组描述S和S-PLUS环境的原始注释。史密斯在阿德莱德大学时。我们进行了一些小更改以反映R和S程序之间的差异,并扩展了一些材料。

We would like to extend warm thanks to Bill Venables (and David Smith) for granting permission to distribute this modified version of the notes in this way, and for being a supporter of R from way back.
我们要向Bill Venables(和大卫史密斯)表示热烈的感谢,感谢他们允许我们以这种方式发布这个修改后的笔记版本,感谢他们一直以来对R的支持。

Comments and corrections are always welcome. Please address email correspondence to .
评论和更正总是欢迎的。请将电子邮件发送至R-help@R-project.org。

Suggestions to the reader
给读者的建议¶

Most R novices will start with the introductory session in Appendix A. This should give some familiarity with the style of R sessions and more importantly some instant feedback on what actually happens.
大多数R新手将从附录A中的介绍性会议开始。这应该让你对R会话的风格有一些熟悉,更重要的是,对实际发生的事情有一些即时的反馈。

Many users will come to R mainly for its graphical facilities. See Graphical procedures, which can be read at almost any time and need not wait until all the preceding sections have been digested.
许多用户主要是为了它的图形化功能而使用R。请参阅图形过程,它几乎可以在任何时候阅读,而不必等到前面的所有章节都被消化了。


1 Introduction and preliminaries
1介绍和说明¶


1.1 The R environment
1.1 R环境¶

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has
R是一套用于数据处理、计算和图形显示的集成软件。除其他外,

  • an effective data handling and storage facility,
    有效的数据处理和存储设施,
  • a suite of operators for calculations on arrays, in particular matrices,
    一套运算符,用于计算数组,特别是矩阵,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
    用于数据分析的大量、连贯、综合的中间工具集,
  • graphical facilities for data analysis and display either directly at the computer or on hardcopy, and
    直接在计算机上或在硬拷贝上进行数据分析和显示的图形设施,以及
  • a well developed, simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities. (Indeed most of the system supplied functions are themselves written in the S language.)
    一种开发良好、简单有效的编程语言(称为“S”),包括条件、循环、用户定义的递归函数以及输入和输出设施。(实际上,系统提供的大多数函数本身都是用S语言编写的。

The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.
“环境”一词旨在将其描述为一个完全规划和连贯的系统,而不是像其他数据分析软件那样,由非常具体和不灵活的工具逐步增加。

R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. However, most programs written in R are essentially ephemeral, written for a single piece of data analysis.
R是一种新开发的交互式数据分析方法的载体。它发展迅速,并通过大量的软件包进行了扩展。然而,大多数用R编写的程序本质上是短暂的,是为单个数据分析而编写的。


1.3 R and statistics
1.3 R和统计¶

Our introduction to the R environment did not mention statistics, yet many people use R as a statistics system. We prefer to think of it of an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN family of Internet sites (via https://CRAN.R-project.org) and elsewhere. More details on packages are given later (see Packages).
我们对R环境的介绍没有提到统计,但许多人将R用作统计系统。我们更愿意把它看作是一个环境,在这个环境中,许多经典和现代的统计技术都得到了应用。其中一些是内置在基础R环境中的,但许多是作为包提供的。R提供了大约25个软件包(称为https://CRAN.R-project.org关于软件包的更多细节将在后面给出(参见软件包)。

Most classical statistics and much of the latest methodology is available for use with R, but users may need to be prepared to do a little work to find it.
大多数经典的统计数据和许多最新的方法都可以在R中使用,但用户可能需要做一些工作来找到它。

There is an important difference in philosophy between S (and hence R) and the other main statistical systems. In S a statistical analysis is normally done as a series of steps, with intermediate results being stored in objects.
S(因此也包括R)与其他主要统计系统之间在哲学上有着重要的区别。在S中,统计分析通常由一系列步骤完成,中间结果存储在对象中。

Thus whereas SAS and SPSS will give copious output from a regression or discriminant analysis, R will give minimal output and store the results in a fit object for subsequent interrogation by further R functions.
因此,尽管SAS和SPSS将从回归或判别分析中给出丰富的输出,但R将给出最小的输出并将结果存储在合适的对象中以供进一步的R函数进行后续询问。


1.4 R and the window system
1.4 R和窗口系统¶

The most convenient way to use R is at a graphics workstation running a windowing system. This guide is aimed at users who have this facility.
使用R最方便的方法是在运行窗口系统的图形工作站上。本指南面向拥有此功能的用户。

In particular we will occasionally refer to the use of R on an X window system although the vast bulk of what is said applies generally to any implementation of the R environment.
特别是,我们偶尔会提到在X窗口系统上使用R,尽管所说的大部分内容通常适用于R环境的任何实现。

Most users will find it necessary to interact directly with the operating system on their computer from time to time. In this guide, we mainly discuss interaction with the operating system on UNIX machines.
大多数用户会发现有必要不时地直接与计算机上的操作系统进行交互。在本指南中,我们主要讨论与UNIX机器上的操作系统的交互。

If you are running R under Windows or macOS you will need to make some small adjustments.
如果你在Windows或macOS下运行R,你需要做一些小的调整。

Setting up a workstation to take full advantage of the customizable features of R is a straightforward if somewhat tedious procedure, and will not be considered further here. Users in difficulty should seek local expert help.
设置工作站以充分利用R的可定制特性是一个简单的过程,虽然有点繁琐,但在这里将不再进一步考虑。有困难的用户应寻求当地专家的帮助。


1.5 Using R interactively
1.5交互式地使用R ¶

When you use the R program it issues a prompt when it expects input commands. The default prompt is ‘>’, which on UNIX might be the same as the shell prompt, and so it may appear that nothing is happening. However, as we shall see, it is easy to change to a different R prompt if you wish. We will assume that the UNIX shell prompt is ‘$’.
当你使用R程序时,它会在需要输入命令时发出提示。默认提示符是' > ',在UNIX上可能与shell提示符相同,因此可能看起来什么都没有发生。然而,正如我们将看到的,如果你愿意,很容易改变到不同的R提示符。我们将假设UNIX shell提示符为' $ '。

In using R under UNIX the suggested procedure for the first occasion is as follows:
在UNIX下使用R时,第一次建议的过程如下:

  1. Create a separate sub-directory, say work, to hold data files on which you will use R for this problem. This will be the working directory whenever you use R for this particular problem.
    创建一个单独的子目录,比如 work ,来保存数据文件,你将在这些文件上使用R来解决这个问题。这将是工作目录,每当你使用R这个特定的问题。
    $ mkdir work
    $ cd work
    
  2. Start the R program with the command
    使用以下命令启动R程序
    $ R
    
  3. At this point R commands may be issued (see later).
    此时可能会发出R命令(见下文)。
  4. To quit the R program the command is
    要退出R程序,命令为
    > q()
    

    At this point you will be asked whether you want to save the data from your R session. On some systems this will bring up a dialog box, and on others you will receive a text prompt to which you can respond yes, no or cancel (a single letter abbreviation will do) to save the data before quitting, quit without saving, or return to the R session. Data which is saved will be available in future R sessions.
    此时,系统会询问您是否要保存R会话中的数据。在某些系统上,这将弹出一个对话框,而在其他系统上,您将收到一个文本提示,您可以响应 yesnocancel (单个字母缩写即可)以在退出前保存数据,退出而不保存,或返回R会话。保存的数据将在未来的R会话中可用。

Further R sessions are simple.
更多的R会话很简单。

  1. Make work the working directory and start the program as before:
    work 设置为工作目录,并像以前一样启动程序:
    $ cd work
    $ R
    
  2. Use the R program, terminating with the q() command at the end of the session.
    使用R程序,在会话结束时使用 q() 命令终止。

To use R under Windows the procedure to follow is basically the same. Create a folder as the working directory, and set that in the Start In field in your R shortcut. Then launch R by double clicking on the icon.
在Windows下使用R的过程基本上是一样的。创建一个文件夹作为工作目录,并将其设置在R快捷方式的 Start In 字段中。然后双击图标启动R。

1.6 An introductory session
1.6介绍性会议¶

Readers wishing to get a feel for R at a computer before proceeding are strongly advised to work through the introductory session given in A sample session.
希望在继续之前在计算机上感受R的读者强烈建议通过示例会话中给出的介绍性会话进行工作。


1.7 Getting help with functions and features
1.7获取有关函数和特性的帮助¶

R has an inbuilt help facility similar to the man facility of UNIX. To get more information on any specific named function, for example solve, the command is
R有一个内置的帮助工具,类似于UNIX的 man 工具。要获取有关任何特定命名函数(例如 solve )的更多信息,命令为

> help(solve)

An alternative is  另一种方法是

> ?solve

For a feature specified by special characters, the argument must be enclosed in double or single quotes, making it a “character string”: This is also necessary for a few words with syntactic meaning including if, for and function.
对于由特殊字符指定的特性,参数必须用双引号或单引号括起来,使其成为“字符串”:这对于一些具有语法意义的单词也是必要的,包括 ifforfunction

> help("[[")

Either form of quote mark may be used to escape the other, as in the string "It's important". Our convention is to use double quote marks for preference.
任何一种形式的引号都可以用来转义另一种形式,如字符串 "It's important" 。我们的惯例是使用双引号表示偏好。

On most R installations help is available in HTML format by running
在大多数R安装中,通过运行以下命令可以获得HTML格式的帮助:

> help.start()

which will launch a Web browser that allows the help pages to be browsed with hyperlinks. On UNIX, subsequent help requests are sent to the HTML-based help system. The ‘Search Engine and Keywords’ link in the page loaded by help.start() is particularly useful as it is contains a high-level concept list which searches though available functions. It can be a great way to get your bearings quickly and to understand the breadth of what R has to offer.
这将启动一个Web浏览器,允许使用超链接浏览帮助页面。在UNIX上,后续的帮助请求将发送到基于HTML的帮助系统。由 help.start() 加载的页面中的“搜索引擎和关键字”链接特别有用,因为它包含一个高级概念列表,可以搜索可用的功能。它可以是一个很好的方式来快速获得你的轴承,并了解R所提供的广度。

The help.search command (alternatively ??) allows searching for help in various ways. For example,
help.search 命令(或者 ?? )允许以各种方式搜索帮助。例如,在一个示例中,

> ??solve

Try ?help.search for details and more examples.
请尝试 ?help.search 以获取详细信息和更多示例。

The examples on a help topic can normally be run by
帮助主题的示例通常可以由

> example(topic)

Windows versions of R have other optional help systems: use
Windows版本的R还有其他可选的帮助系统:使用

> ?help

for further details.  了解详情。


1.8 R commands, case sensitivity, etc.
1.8 R命令、区分大小写等。¶

Technically R is an expression language with a very simple syntax. It is case sensitive as are most UNIX based packages, so A and a are different symbols and would refer to different variables. The set of symbols which can be used in R names depends on the operating system and country within which R is being run (technically on the locale in use). Normally all alphanumeric symbols are allowed2 (and in some countries this includes accented letters) plus ‘.’ and ‘_’, with the restriction that a name must start with ‘.’ or a letter, and if it starts with ‘.’ the second character must not be a digit. Names are effectively unlimited in length.
从技术上讲,R是一种语法非常简单的表达式语言。和大多数基于UNIX的软件包一样,它是区分大小写的,所以 Aa 是不同的符号,并且会引用不同的变量。R名称中使用的符号集取决于运行R的操作系统和国家(技术上取决于使用的区域设置)。通常所有字母数字符号都允许使用 2 (在某些国家,这包括重音字母)加上' . '和' _ ',但限制是名称必须以' . '或字母开头,如果以' . '开头,第二个字符不得是数字。名字的长度实际上是无限的。

Elementary commands consist of either expressions or assignments. If an expression is given as a command, it is evaluated, printed (unless specifically made invisible), and the value is lost. An assignment also evaluates an expression and passes the value to a variable but the result is not automatically printed.
基本命令由表达式或赋值组成。如果一个表达式作为命令给出,它将被计算、打印(除非特别设置为不可见),并且值将丢失。赋值也计算表达式并将值传递给变量,但结果不会自动打印。

Commands are separated either by a semi-colon (‘;’), or by a newline. Elementary commands can be grouped together into one compound expression by braces (‘{’ and ‘}’). Comments can be put almost3 anywhere, starting with a hash mark (‘#’), everything to the end of the line is a comment.
命令由分号(' ; ')或换行符分隔。基本命令可以通过大括号(' { '和' } ')组合到一个复合表达式中。注释几乎可以放在任何地方,从一个哈希标记(“ # ”)开始,到行尾的所有内容都是注释。

If a command is not complete at the end of a line, R will give a different prompt, by default
如果一个命令在一行的末尾没有完成,R会给出一个不同的提示,默认情况下

+

on second and subsequent lines and continue to read input until the command is syntactically complete. This prompt may be changed by the user. We will generally omit the continuation prompt and indicate continuation by simple indenting.
并继续读取输入,直到命令在语法上完成。用户可以更改此提示。我们通常会省略continuation提示符,并通过简单的缩进来指示continuation。

Command lines entered at the console are limited4 to about 4095 bytes (not characters).
在控制台输入的命令行 4 被限制为大约4095字节(不是字符)。


1.9 Recall and correction of previous commands
1.9调用和更正以前的命令¶

Under many versions of UNIX and on Windows, R provides a mechanism for recalling and re-executing previous commands. The vertical arrow keys on the keyboard can be used to scroll forward and backward through a command history. Once a command is located in this way, the cursor can be moved within the command using the horizontal arrow keys, and characters can be removed with the DEL key or added with the other keys. More details are provided later: see The command-line editor.
在许多UNIX版本和Windows上,R提供了一种调用和重新执行以前命令的机制。键盘上的垂直箭头键可用于在命令历史记录中向前和向后滚动。以这种方式定位命令后,可以使用水平箭头键在命令内移动光标,并且可以使用 DEL 键删除字符或使用其他键添加字符。稍后提供更多细节:请参阅命令行编辑器。

The recall and editing capabilities under UNIX are highly customizable. You can find out how to do this by reading the manual entry for the readline library.
UNIX下的调用和编辑功能是高度可定制的。您可以通过阅读readline库的手册条目来了解如何做到这一点。

Alternatively, the Emacs text editor provides more general support mechanisms (via ESS, Emacs Speaks Statistics) for working interactively with R. See R and Emacs in R FAQ.
或者,Emacs文本编辑器提供了更通用的支持机制(通过ESS,Emacs Speaks Statistics),用于与R交互工作。R和Emacs在R FAQ中。


1.10 Executing commands from or diverting output to a file
1.10从文件执行命令或将输出转移到文件¶

If commands5 are stored in an external file, say commands.R in the working directory work, they may be executed at any time in an R session with the command
如果命令 5 存储在外部文件中,例如工作目录 work 中的 commands.R ,则可以在R会话中的任何时间使用命令

> source("commands.R")

For Windows Source is also available on the File menu. The function sink,
对于Windows源也可以在文件菜单上。函数 sink

> sink("record.lis")

will divert all subsequent output from the console to an external file, record.lis. The command
将把所有后续输出从控制台转移到外部文件 record.lis 。命令

> sink()

restores it to the console once again.
再次将其恢复到控制台。


1.11 Data permanency and removing objects
1.11数据永久性和删除对象¶

The entities that R creates and manipulates are known as objects. These may be variables, arrays of numbers, character strings, functions, or more general structures built from such components.
R创建和操作的实体被称为对象。这些组件可以是变量、数组、字符串、函数或由这些组件构建的更通用的结构。

During an R session, objects are created and stored by name (we discuss this process in the next section). The R command
在R会话期间,对象是按名称创建和存储的(我们将在下一节讨论这个过程)。R命令

> objects()

(alternatively, ls()) can be used to display the names of (most of) the objects which are currently stored within R. The collection of objects currently stored is called the workspace.
(或者, ls() )可以用来显示当前存储在R中的对象的名称。当前存储的对象集合称为工作区。

To remove objects the function rm is available:
要删除对象,可以使用功能 rm

> rm(x, y, z, ink, junk, temp, foo, bar)

All objects created during an R session can be stored permanently in a file for use in future R sessions. At the end of each R session you are given the opportunity to save all the currently available objects.
在R会话期间创建的所有对象都可以永久存储在文件中,以供将来的R会话使用。在每个R会话结束时,您都有机会保存所有当前可用的对象。

If you indicate that you want to do this, the objects are written to a file called .RData6 in the current directory, and the command lines used in the session are saved to a file called .Rhistory.
如果您指明要执行此操作,则对象将写入当前目录中名为 .RData 6 的文件,会话中使用的命令行将保存到名为 .Rhistory 的文件。

When R is started at later time from the same directory it reloads the workspace from this file. At the same time the associated commands history is reloaded.
当以后从同一目录启动R时,它会从这个文件重新加载工作区。与此同时,重新加载相关的命令历史记录。

It is recommended that you should use separate working directories for analyses conducted with R. It is quite common for objects with names x and y to be created during an analysis. Names like this are often meaningful in the context of a single analysis, but it can be quite hard to decide what they might be when the several analyses have been conducted in the same directory.
建议您应该使用单独的工作目录进行R分析。在分析过程中创建名称为 xy 的对象是很常见的。这样的名称在单个分析的上下文中通常是有意义的,但是当在同一目录中进行多个分析时,很难确定它们是什么。


2 Simple manipulations; numbers and vectors
2简单的操作;数字和向量¶


2.1 Vectors and assignment
2.1向量和赋值¶

R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers. To set up a vector named x, say, consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the R command
R操作命名的数据结构。最简单的这种结构是数字向量,它是由有序的数字集合组成的单个实体。要设置一个名为 x 的向量,比如说,由五个数字组成,即10.4、5.6、3.1、6.4和21.7,请使用R命令

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

This is an assignment statement using the function c() which in this context can take an arbitrary number of vector arguments and whose value is a vector got by concatenating its arguments end to end.7
这是一个使用函数 c() 的赋值语句,在此上下文中,该函数可以接受任意数量的向量参数,并且其值是通过将其参数首尾相连而获得的向量。 7

A number occurring by itself in an expression is taken as a vector of length one.
在表达式中单独出现的数被视为长度为1的向量。

Notice that the assignment operator (‘<-’), which consists of the two characters ‘<’ (“less than”) and ‘-’ (“minus”) occurring strictly side-by-side and it ‘points’ to the object receiving the value of the expression. In most contexts the ‘=’ operator can be used as an alternative.
注意赋值运算符(' <- ')由两个字符' < '(“小于”)和' - '(“减”)组成,它们严格地并排出现,并且它“指向”接收表达式值的对象。在大多数情况下,' = '运算符可以作为替代。

Assignment can also be made using the function assign(). An equivalent way of making the same assignment as above is with:
也可以使用函数 assign() 进行赋值。一个与上面相同的赋值的等价方法是:

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

The usual operator, <-, can be thought of as a syntactic short-cut to this.
通常的操作符 <- 可以被认为是一种语法捷径。

Assignments can also be made in the other direction, using the obvious change in the assignment operator. So the same assignment could be made using
也可以在另一个方向上进行递归,使用赋值运算符中的明显变化。所以同样的赋值可以用

> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x

If an expression is used as a complete command, the value is printed and lost8. So now if we were to use the command
如果一个表达式被用作一个完整的命令,则值被打印出来并丢失 8 。所以现在如果我们要使用

> 1/x

the reciprocals of the five values would be printed at the terminal (and the value of x, of course, unchanged).
五个值的倒数将在终端处打印(当然, x 的值不变)。

The further assignment  进一步的任务

> y <- c(x, 0, x)

would create a vector y with 11 entries consisting of two copies of x with a zero in the middle place.
将创建具有11个条目的向量 y ,该条目由 x 的两个副本组成,中间位置为零。


2.2 Vector arithmetic
2.2向量算术¶

Vectors can be used in arithmetic expressions, in which case the operations are performed element by element. Vectors occurring in the same expression need not all be of the same length.
向量可以用在算术表达式中,在这种情况下,运算是逐个元素执行的。出现在相同表达式中的载体不需要都具有相同的长度。

If they are not, the value of the expression is a vector with the same length as the longest vector which occurs in the expression. Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector. In particular a constant is simply repeated. So with the above assignments the command
如果不是,则表达式的值是一个与表达式中出现的最长向量长度相同的向量。表达式中较短的向量会根据需要(可能是部分地)重复使用,直到它们与最长向量的长度匹配。特别是一个常数只是简单地重复。因此,通过上述分配,命令

> v <- 2*x + y + 1

generates a new vector v of length 11 constructed by adding together, element by element, 2*x repeated 2.2 times, y repeated just once, and 1 repeated 11 times.
生成长度为11的新向量 v ,该向量是通过逐个元素相加、 2*x 重复2.2次、 y 仅重复一次以及 1 重复11次而构造的。

The elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a power. In addition all of the common arithmetic functions are available. log, exp, sin, cos, tan, sqrt, and so on, all have their usual meaning. max and min select the largest and smallest elements of a vector respectively. range is a function whose value is a vector of length two, namely c(min(x), max(x)). length(x) is the number of elements in x, sum(x) gives the total of the elements in x, and prod(x) their product.
基本算术运算符是通常的 +-*/^ ,用于提升到幂。此外,所有常见的算术函数都可用。 logexpsincostansqrt 等等,都有其通常的含义。 maxmin 分别选择向量的最大和最小元素。 range 是一个函数,其值是一个长度为2的向量,即 c(min(x), max(x))length(x)x 中元素的数量, sum(x) 给出 x 中元素的总数, prod(x) 给出它们的乘积。

Two statistical functions are mean(x) which calculates the sample mean, which is the same as sum(x)/length(x), and var(x) which gives
两个统计函数是 mean(x)var(x)mean(x) 计算样本平均值,与 sum(x)/length(x) 相同, var(x) 给出

sum((x-mean(x))^2)/(length(x)-1)

or sample variance. If the argument to var() is an n-by-p matrix the value is a p-by-p sample covariance matrix got by regarding the rows as independent p-variate sample vectors.
或样本方差。如果 var() 的参数是一个n × p矩阵,则该值是一个p × p样本协方差矩阵,通过将行视为独立的p变量样本向量而获得。

sort(x) returns a vector of the same size as x with the elements arranged in increasing order; however there are other more flexible sorting facilities available (see order() or sort.list() which produce a permutation to do the sorting).
sort(x) 返回一个与 x 大小相同的向量,元素以递增顺序排列;然而,还有其他更灵活的排序工具可用(参见 order()sort.list() ,它们产生一个排列来进行排序)。

Note that max and min select the largest and smallest values in their arguments, even if they are given several vectors. The parallel maximum and minimum functions pmax and pmin return a vector (of length equal to their longest argument) that contains in each element the largest (smallest) element in that position in any of the input vectors.
请注意, maxmin 在其参数中选择最大值和最小值,即使它们被赋予多个向量。并行的maximum和minimum函数 pmaxpmin 返回一个向量(长度等于其最长参数),该向量在每个元素中包含任何输入向量中该位置的最大(最小)元素。

For most purposes the user will not be concerned if the “numbers” in a numeric vector are integers, reals or even complex. Internally calculations are done as double precision real numbers, or double precision complex numbers if the input data are complex.
在大多数情况下,用户不会关心数字向量中的“数字”是整数、实数还是复数。内部计算是作为双精度真实的数字完成的,如果输入数据是复数,则作为双精度复数完成。

To work with complex numbers, supply an explicit complex part. Thus
若要使用复数,请提供显式复数部分。因此

sqrt(-17)

will give NaN and a warning, but
将给出 NaN 和警告,但

sqrt(-17+0i)

will do the computations as complex numbers.
会用复数来计算


2.3 Generating regular sequences
2.3生成正则序列¶

R has a number of facilities for generating commonly used sequences of numbers. For example 1:30 is the vector c(1, 2, …, 29, 30). The colon operator has high priority within an expression, so, for example 2*1:15 is the vector c(2, 4, …, 28, 30). Put n <- 10 and compare the sequences 1:n-1 and 1:(n-1).
R有许多生成常用数列的工具。例如, 1:30 是向量 c(1, 2, …, 29, 30) 。冒号运算符在表达式中具有高优先级,例如 2*1:15 是向量 c(2, 4, …, 28, 30) 。将 n <- 10 放入并比较序列 1:n-11:(n-1)

The construction 30:1 may be used to generate a sequence backwards.
构造 30:1 可用于反向生成序列。

The function seq() is a more general facility for generating sequences. It has five arguments, only some of which may be specified in any one call.
函数 seq() 是用于生成序列的更通用的工具。它有五个参数,其中只有一部分可以在任何一个调用中指定。

The first two arguments, if given, specify the beginning and end of the sequence, and if these are the only two arguments given the result is the same as the colon operator. That is seq(2,10) is the same vector as 2:10.
如果给出前两个参数,则指定序列的开始和结束,如果仅给出这两个参数,则结果与冒号运算符相同。也就是说 seq(2,10)2:10 是同一个向量。

Arguments to seq(), and to many other R functions, can also be given in named form, in which case the order in which they appear is irrelevant. The first two arguments may be named from=value and to=value; thus seq(1,30), seq(from=1, to=30) and seq(to=30, from=1) are all the same as 1:30. The next two arguments to seq() may be named by=value and length=value, which specify a step size and a length for the sequence respectively. If neither of these is given, the default by=1 is assumed.
seq() 和许多其他R函数的参数也可以以命名的形式给出,在这种情况下,它们出现的顺序是无关紧要的。前两个参数可以命名为 from=valueto=value ;因此 seq(1,30)seq(from=1, to=30)seq(to=30, from=1) 都与 1:30 相同。 seq() 的下两个参数可以命名为 by=valuelength=value ,它们分别指定序列的步长和长度。如果两者都没有给出,则假定为默认值 by=1

For example  例如

> seq(-5, 5, by=.2) -> s3

generates in s3 the vector c(-5.0, -4.8, -4.6, …, 4.6, 4.8, 5.0). Similarly
s3 中生成矢量 c(-5.0, -4.8, -4.6, …, 4.6, 4.8, 5.0) 。类似地

> s4 <- seq(length=51, from=-5, by=.2)

generates the same vector in s4.
s4 中生成相同的向量。

The fifth argument may be named along=vector, which is normally used as the only argument to create the sequence 1, 2, …, length(vector), or the empty sequence if the vector is empty (as it can be).
第五个参数可以命名为 along=vector ,它通常用作创建序列 1, 2, …, length(vector) 的唯一参数,或者如果向量为空(可以是空的),则为空序列。

A related function is rep() which can be used for replicating an object in various complicated ways. The simplest form is
一个相关的函数是 rep() ,它可以用于以各种复杂的方式复制对象。最简单的形式是

> s5 <- rep(x, times=5)

which will put five copies of x end-to-end in s5. Another useful version is
这将把 x 的五个副本端到端地放入 s5 中。另一个有用的版本是

> s6 <- rep(x, each=5)

which repeats each element of x five times before moving on to the next.
其在移动到下一个之前重复 x 的每个元素五次。


2.4 Logical vectors
2.4逻辑向量¶

As well as numerical vectors, R allows manipulation of logical quantities. The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”, see below). The first two are often abbreviated as T and F, respectively. Note however that T and F are just variables which are set to TRUE and FALSE by default, but are not reserved words and hence can be overwritten by the user. Hence, you should always use TRUE and FALSE.
和数字向量一样,R允许操作逻辑量。逻辑向量的元素可以具有值 TRUEFALSENA (对于“不可用”,参见下文)。前两个通常分别缩写为 TF 。然而,请注意, TF 只是默认设置为 TRUEFALSE 的变量,但不是保留字,因此可以由用户覆盖。因此,您应该始终使用 TRUEFALSE

Logical vectors are generated by conditions. For example
逻辑向量由条件生成。例如

> temp <- x > 13

sets temp as a vector of the same length as x with values FALSE corresponding to elements of x where the condition is not met and TRUE where it is.
temp 设置为与 x 具有相同长度的向量,其中值 FALSE 对应于不满足条件的 x 的元素和满足条件的 TRUE 的元素。

The logical operators are <, <=, >, >=, == for exact equality and != for inequality. In addition if c1 and c2 are logical expressions, then c1 & c2 is their intersection (“and”), c1 | c2 is their union (“or”), and !c1 is the negation of c1.
逻辑运算符是 <<=>>=== ,表示完全相等,而 != 表示不等式。此外,如果 c1c2 是逻辑表达式,则 c1 & c2 是它们的交集(“and”), c1 | c2 是它们的并集(“or”),而 !c1c1 的否定。

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0 and TRUE becoming 1. However there are situations where logical vectors and their coerced numeric counterparts are not equivalent, for example see the next subsection.
逻辑向量可以用于普通算术,在这种情况下,它们被强制转换为数字向量, FALSE 变成 0TRUE 变成 1 。然而,在某些情况下,逻辑向量和它们的强制数值对应物是不等价的,例如,请参见下一小节。


2.5 Missing values
2.5缺失值¶

In some cases the components of a vector may not be completely known. When an element or value is “not available” or a “missing value” in the statistical sense, a place within a vector may be reserved for it by assigning it the special value NA. In general any operation on an NA becomes an NA. The motivation for this rule is simply that if the specification of an operation is incomplete, the result cannot be known and hence is not available.
在某些情况下,向量的分量可能不完全已知。当一个元素或值在统计意义上是“不可用”或“缺失值”时,可以通过为其分配特殊值 NA 来为它保留向量内的位置。一般来说,对 NA 的任何操作都会变成 NA 。这条规则的动机很简单,如果操作的规范不完整,结果就不可能知道,因此也就不可用。

The function is.na(x) gives a logical vector of the same size as x with value TRUE if and only if the corresponding element in x is NA.
当且仅当 x 中的对应元素是 NA 时,函数 is.na(x) 给出与 x 大小相同的逻辑向量,值为 TRUE

> z <- c(1:3,NA);  ind <- is.na(z)

Notice that the logical expression x == NA is quite different from is.na(x) since NA is not really a value but a marker for a quantity that is not available. Thus x == NA is a vector of the same length as x all of whose values are NA as the logical expression itself is incomplete and hence undecidable.
请注意,逻辑表达式 x == NAis.na(x) 非常不同,因为 NA 实际上不是一个值,而是一个不可用的数量的标记。因此, x == NA 是一个与 x 长度相同的向量,其所有值都是 NA ,因为逻辑表达式本身是不完整的,因此不可判定。

Note that there is a second kind of “missing” values which are produced by numerical computation, the so-called Not a Number, NaN, values. Examples are
请注意,还有第二种由数值计算产生的“缺失”值,即所谓的非数字值。实例是

> 0/0

or  

> Inf - Inf

which both give NaN since the result cannot be defined sensibly.
这两个都给出 NaN ,因为结果不能被合理地定义。

In summary, is.na(xx) is TRUE both for NA and NaN values. To differentiate these, is.nan(xx) is only TRUE for NaNs.
总之,对于 NANaN 值, is.na(xx) 都是 TRUE 。为了区分这些, is.nan(xx) 对于 NaN s来说只是 TRUE

Missing values are sometimes printed as <NA> when character vectors are printed without quotes.
当字符向量不带引号打印时,缺失值有时会打印为 <NA>


2.6 Character vectors
2.6字符向量¶

Character quantities and character vectors are used frequently in R, for example as plot labels. Where needed they are denoted by a sequence of characters delimited by the double quote character, e.g., "x-values", "New iteration results".
字符量和字符向量在R中经常使用,例如作为绘图标签。在需要的地方,它们由双引号字符分隔的字符序列表示,例如, "x-values""New iteration results"

Character strings are entered using either matching double (") or single (') quotes, but are printed using double quotes (or sometimes without quotes). They use C-style escape sequences, using \ as the escape character, so \ is entered and printed as \\, and inside double quotes " is entered as \". Other useful escape sequences are \n, newline, \t, tab and \b, backspace—see ?Quotes for a full list.
字符串使用匹配的双引号( " )或单引号( ' )输入,但使用双引号(有时不带引号)打印。它们使用C风格的转义序列,使用 \ 作为转义字符,因此 \ 被输入并打印为 \\ ,而双引号内的 " 被输入为 \" 。其他有用的转义序列是 \n ,换行符, \t ,制表符和 \b ,退格键-完整列表见 ?Quotes

Character vectors may be concatenated into a vector by the c() function; examples of their use will emerge frequently.
字符向量可以通过 c() 函数连接成一个向量;它们的使用示例会经常出现。

The paste() function takes an arbitrary number of arguments and concatenates them one by one into character strings. Any numbers given among the arguments are coerced into character strings in the evident way, that is, in the same way they would be if they were printed.
0#函数接受任意数量的参数,并将它们一个接一个地连接成字符串。参数中给出的任何数字都以明显的方式强制转换为字符串,也就是说,如果它们被打印出来,它们也会以同样的方式被强制转换为字符串。

The arguments are by default separated in the result by a single blank character, but this can be changed by the named argument, sep=string, which changes it to string, possibly empty.
默认情况下,参数在结果中由单个空白字符分隔,但这可以通过命名参数 sep=string 进行更改,将其更改为 string ,可能为空。

For example  例如

> labs <- paste(c("X","Y"), 1:10, sep="")

makes labs into the character vector
使 labs 成为字符向量

c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")

Note particularly that recycling of short lists takes place here too; thus c("X", "Y") is repeated 5 times to match the sequence 1:10. 9
特别要注意的是,短列表的循环也在这里发生;因此 c("X", "Y") 重复5次以匹配序列 1:109


2.7 Index vectors; selecting and modifying subsets of a data set
2.7索引向量;选择和修改数据集的子集¶

Subsets of the elements of a vector may be selected by appending to the name of the vector an index vector in square brackets. More generally any expression that evaluates to a vector may have subsets of its elements similarly selected by appending an index vector in square brackets immediately after the expression.
向量的元素的子集可以通过将方括号中的索引向量附加到向量的名称来选择。更一般地,任何计算为向量的表达式都可以通过在表达式后面紧接着方括号中附加索引向量来类似地选择其元素的子集。

Such index vectors can be any of four distinct types.
这样的索引向量可以是四种不同类型中的任何一种。

  1. A logical vector. In this case the index vector is recycled to the same length as the vector from which elements are to be selected. Values corresponding to TRUE in the index vector are selected and those corresponding to FALSE are omitted. For example
    逻辑向量在这种情况下,索引向量被回收到与要从中选择元素的向量相同的长度。选择与索引向量中的 TRUE 对应的值,并且省略与 FALSE 对应的值。例如
    > y <- x[!is.na(x)]
    

    creates (or re-creates) an object y which will contain the non-missing values of x, in the same order. Note that if x has missing values, y will be shorter than x. Also
    创建(或重新创建)一个对象 y ,它将以相同的顺序包含 x 的非缺失值。请注意,如果 x 有缺失值,则 y 将短于 x 。也

    > (x+1)[(!is.na(x)) & x>0] -> z
    

    creates an object z and places in it the values of the vector x+1 for which the corresponding value in x was both non-missing and positive.
    创建对象 z ,并将矢量 x+1 的值放置在其中,其中 x 中的对应值是非缺失的并且是正的。

  2. A vector of positive integral quantities. In this case the values in the index vector must lie in the set {1, 2, …, length(x)}. The corresponding elements of the vector are selected and concatenated, in that order, in the result. The index vector can be of any length and the result is of the same length as the index vector. For example x[6] is the sixth component of x and
    一个正整数向量。在这种情况下,索引向量中的值必须位于集合{1,2,..., length(x) }中。在结果中,选择向量的相应元素并按该顺序连接。索引向量可以是任何长度,结果的长度与索引向量的长度相同。例如, x[6]x 的第六个分量,
    > x[1:10]
    

    selects the first 10 elements of x (assuming length(x) is not less than 10). Also
    选择 x 的前10个元素(假设 length(x) 不小于10)。也

    > c("x","y")[rep(c(1,2,2,1), times=4)]
    

    (an admittedly unlikely thing to do) produces a character vector of length 16 consisting of "x", "y", "y", "x" repeated four times.
    (an无可否认,这是不可能的)产生一个长度为16的字符向量,由重复四次的 "x", "y", "y", "x" 组成。

  3. A vector of negative integral quantities. Such an index vector specifies the values to be excluded rather than included. Thus
    一个负整数向量。这样的索引向量指定要排除而不是包括的值。因此
    > y <- x[-(1:5)]
    

    gives y all but the first five elements of x.
    y 给出了 x 的前五个元素。

  4. A vector of character strings. This possibility only applies where an object has a names attribute to identify its components. In this case a sub-vector of the names vector may be used in the same way as the positive integral labels in item 2 further above.
    字符串的向量。这种可能性仅适用于对象具有 names 属性来标识其组件的情况。在这种情况下,名称向量的子向量可以以与上文第2项中的正整数标签相同的方式使用。
    > fruit <- c(5, 10, 1, 20)
    > names(fruit) <- c("orange", "banana", "apple", "peach")
    > lunch <- fruit[c("apple","orange")]
    

    The advantage is that alphanumeric names are often easier to remember than numeric indices. This option is particularly useful in connection with data frames, as we shall see later.
    优点是字母数字名称通常比数字索引更容易记住。这个选项对于数据帧特别有用,我们将在后面看到。

An indexed expression can also appear on the receiving end of an assignment, in which case the assignment operation is performed only on those elements of the vector. The expression must be of the form vector[index_vector] as having an arbitrary expression in place of the vector name does not make much sense here.
索引表达式也可以出现在赋值的接收端,在这种情况下,赋值操作仅对向量的这些元素执行。该表达式必须采用 vector[index_vector] 形式,因为用任意表达式代替向量名称在这里没有多大意义。

For example  例如

> x[is.na(x)] <- 0

replaces any missing values in x by zeros and
x 中的任何缺失值替换为零,

> y[y < 0] <- -y[y < 0]

has the same effect as
具有与

> y <- abs(y)

2.8 Other types of objects
2.8其他类型的对象¶

Vectors are the most important type of object in R, but there are several others which we will meet more formally in later sections.
向量是R中最重要的对象类型,但还有其他几种我们将在后面的章节中更正式地介绍。

  • matrices or more generally arrays are multi-dimensional generalizations of vectors. In fact, they are vectors that can be indexed by two or more indices and will be printed in special ways. See Arrays and matrices.
    矩阵或更一般的数组是向量的多维概括。事实上,它们是可以由两个或多个索引索引的向量,并且将以特殊的方式打印。请参见数组和矩阵。
  • factors provide compact ways to handle categorical data. See Ordered and unordered factors.
    因子提供了处理分类数据的紧凑方法。请参见有序因子和无序因子。
  • lists are a general form of vector in which the various elements need not be of the same type, and are often themselves vectors or lists. Lists provide a convenient way to return the results of a statistical computation. See Lists.
    列表是向量的一般形式,其中各种元素不需要是相同类型的,并且它们本身通常是向量或列表。列表提供了一种返回统计计算结果的方便方法。请参见列表。
  • data frames are matrix-like structures, in which the columns can be of different types. Think of data frames as ‘data matrices’ with one row per observational unit but with (possibly) both numerical and categorical variables.
    数据帧是类似矩阵的结构,其中列可以是不同的类型。将数据框架视为“数据矩阵”,每个观察单位一行,但(可能)同时包含数值和分类变量。

    Many experiments are best described by data frames: the treatments are categorical but the response is numeric. See Data frames.
    许多实验最好用数据框架来描述:处理是分类的,但响应是数字的。请参见数据框。
  • functions are themselves objects in R which can be stored in the project’s workspace. This provides a simple and convenient way to extend R. See Writing your own functions.
    函数本身就是R中的对象,可以存储在项目的工作区中。这为扩展R提供了一种简单方便的方法。请参阅编写自己的函数。

3 Objects, their modes and attributes
3对象,它们的模式和属性¶


3.1 Intrinsic attributes: mode and length
3.1内在属性:mode和length ¶

The entities R operates on are technically known as objects. Examples are vectors of numeric (real) or complex values, vectors of logical values and vectors of character strings. These are known as “atomic” structures since their components are all of the same type, or mode, namely numeric10, complex, logical, character and raw.
R操作的实体在技术上称为对象。例如数字(真实的)或复数值的向量、逻辑值的向量和字符串的向量。这些被称为“原子”结构,因为它们的组件都是相同的类型或模式,即数字 10 ,复杂,逻辑,字符和原始。

Vectors must have their values all of the same mode. Thus any given vector must be unambiguously either logical, numeric, complex, character or raw. (The only apparent exception to this rule is the special “value” listed as NA for quantities not available, but in fact there are several types of NA). Note that a vector can be empty and still have a mode. For example the empty character string vector is listed as character(0) and the empty numeric vector as numeric(0).
向量的值必须全部为同一模式。因此,任何给定的向量必须是明确的逻辑,数字,复杂,字符或原始。(The这一规则的唯一明显例外是特殊的“价值”列为 NA 数量不可用,但实际上有几种类型的 NA )。注意,向量可以是空的,但仍然有一个模式。例如,空字符串向量被列为 character(0) ,空数字向量被列为 numeric(0)

R also operates on objects called lists, which are of mode list. These are ordered sequences of objects which individually can be of any mode. lists are known as “recursive” rather than atomic structures since their components can themselves be lists in their own right.
R还对称为列表的对象进行操作,这些对象具有模式列表。这些是有序的对象序列,单独地可以是任何模式。列表被称为“递归”而不是原子结构,因为它们的组件本身也可以是列表。

The other recursive structures are those of mode function and expression. Functions are the objects that form part of the R system along with similar user written functions, which we discuss in some detail later. Expressions as objects form an advanced part of R which will not be discussed in this guide, except indirectly when we discuss formulae used with modeling in R.
其他递归结构是模式函数和表达式。函数是构成R系统的一部分的对象,沿着的还有类似的用户编写的函数,我们稍后会详细讨论。作为对象的表达式构成了R的高级部分,除了在讨论R建模时使用的公式时间接讨论外,本指南不会讨论它。

By the mode of an object we mean the basic type of its fundamental constituents. This is a special case of a “property” of an object. Another property of every object is its length. The functions mode(object) and length(object) can be used to find out the mode and length of any defined structure 11.
我们所说的对象的模式,是指它的基本成分的基本类型。这是对象的“属性”的一个特例。每个物体的另一个属性是它的长度。函数 mode(object)length(object) 可用于找出任何定义的结构 11 的模式和长度。

Further properties of an object are usually provided by attributes(object), see Getting and setting attributes. Because of this, mode and length are also called “intrinsic attributes” of an object.
对象的更多属性通常由 attributes(object) 提供,请参阅获取和设置属性。因此,模式和长度也称为对象的“内在属性”。

For example, if z is a complex vector of length 100, then in an expression mode(z) is the character string "complex" and length(z) is 100.
例如,如果 z 是长度为100的复向量,则在表达式中 mode(z) 是字符串 "complex"length(z)100

R caters for changes of mode almost anywhere it could be considered sensible to do so, (and a few where it might not be). For example with
R几乎可以在任何被认为合理的地方满足模式更改的要求(也可以在一些可能不合理的地方)。例如用

> z <- 0:9

we could put  我们可以把

> digits <- as.character(z)

after which digits is the character vector c("0", "1", "2", …, "9"). A further coercion, or change of mode, reconstructs the numerical vector again:
digits 之后是字符向量 c("0", "1", "2", …, "9") 。进一步的强制,或者说模式的改变,再次重构了数字向量:

> d <- as.integer(digits)

Now d and z are the same.12 There is a large collection of functions of the form as.something() for either coercion from one mode to another, or for investing an object with some other attribute it may not already possess. The reader should consult the different help files to become familiar with them.
dz 是一样的。有大量的函数以 as.something() 的形式存在,用于从一种模式强制转换到另一种模式,或者用于为对象赋予它可能还没有拥有的其他属性。读者应该查阅不同的帮助文件以熟悉它们。


3.2 Changing the length of an object
3.2改变对象的长度¶

An “empty” object may still have a mode. For example
一个“空”的对象可能仍然有一个模式。例如

> e <- numeric()

makes e an empty vector structure of mode numeric. Similarly character() is a empty character vector, and so on. Once an object of any size has been created, new components may be added to it simply by giving it an index value outside its previous range. Thus
使 e 成为一个模式为numeric的空向量结构。类似地, character() 是一个空的字符向量,等等。一旦创建了一个任意大小的对象,只需给它一个超出其先前范围的索引值,就可以向其添加新的组件。因此

> e[3] <- 17

now makes e a vector of length 3, (the first two components of which are at this point both NA). This applies to any structure at all, provided the mode of the additional component(s) agrees with the mode of the object in the first place.
现在使 e 成为长度为3的向量,(其前两个分量此时都是 NA )。这适用于任何结构,只要附加成分的模式首先与对象的模式一致。

This automatic adjustment of lengths of an object is used often, for example in the scan() function for input. (see The scan() function.)
这种对象长度的自动调整经常使用,例如在用于输入的 scan() 功能中。(see scan() 功能。

Conversely to truncate the size of an object requires only an assignment to do so. Hence if alpha is an object of length 10, then
相反,截断对象的大小只需要一个赋值即可。因此,如果 alpha 是长度为10的对象,则

> alpha <- alpha[2 * 1:5]

makes it an object of length 5 consisting of just the former components with even index. (The old indices are not retained, of course.) We can then retain just the first three values by
使其成为长度为5的对象,仅由具有偶数索引的前一个组件组成。(The当然,旧的索引不会保留。)然后我们可以只保留前三个值,

> length(alpha) <- 3

and vectors can be extended (by missing values) in the same way.
并且向量可以以相同的方式扩展(通过缺失值)。


3.3 Getting and setting attributes
3.3获取和设置属性¶

The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object. The function attr(object, name) can be used to select a specific attribute. These functions are rarely used, except in rather special circumstances when some new attribute is being created for some particular purpose, for example to associate a creation date or an operator with an R object.
函数 attributes(object) 返回当前为该对象定义的所有非内在属性的列表。函数 attr(object, name) 可用于选择特定属性。这些函数很少使用,除非在一些特殊的情况下,当一些新的属性被创建用于某些特定的目的,例如将创建日期或操作符与R对象相关联时。

The concept, however, is very important.
然而,这个概念非常重要。

Some care should be exercised when assigning or deleting attributes since they are an integral part of the object system used in R.
在分配或删除属性时应该小心,因为它们是R中使用的对象系统的组成部分。

When it is used on the left hand side of an assignment it can be used either to associate a new attribute with object or to change an existing one. For example
当它用在赋值语句的左手侧时,它既可以用来将新属性与 object 关联,也可以用来更改现有属性。例如

> attr(z, "dim") <- c(10,10)

allows R to treat z as if it were a 10-by-10 matrix.
允许R将 z 视为10 × 10矩阵。


3.4 The class of an object
3.4对象的类¶

All objects in R have a class, reported by the function class. For simple vectors this is just the mode, for example "numeric", "logical", "character" or "list", but "matrix", "array", "factor" and "data.frame" are other possible values.
R中的所有对象都有一个类,由函数 class 报告。对于简单向量,这只是模式,例如 "numeric""logical""character""list" ,但 "matrix""array""factor""data.frame" 是其他可能的值。

A special attribute known as the class of the object is used to allow for an object-oriented style13 of programming in R. For example if an object has class "data.frame", it will be printed in a certain way, the plot() function will display it graphically in a certain way, and other so-called generic functions such as summary() will react to it as an argument in a way sensitive to its class.
一个被称为对象类的特殊属性用于支持R中面向对象的编程风格 13 。例如,如果一个对象有类 "data.frame" ,它将以某种方式打印, plot() 函数将以某种方式以图形方式显示它,而其他所谓的泛型函数(如 summary() )将以对其类敏感的方式将其作为参数进行反应。

To remove temporarily the effects of class, use the function unclass(). For example if winter has the class "data.frame" then
要暂时删除类的效果,请使用函数 unclass() 。例如,如果 winter 具有类 "data.frame" ,则

> winter

will print it in data frame form, which is rather like a matrix, whereas
将以数据框的形式打印出来,这很像一个矩阵,而

> unclass(winter)

will print it as an ordinary list. Only in rather special situations do you need to use this facility, but one is when you are learning to come to terms with the idea of class and generic functions.
将其打印为普通列表。只有在相当特殊的情况下,你才需要使用这个工具,但是一个是当你学习接受类和泛型函数的概念时。

Generic functions and classes will be discussed further in Classes, generic functions and object orientation, but only briefly.
泛型函数和类将在类,泛型函数和面向对象中进一步讨论,但只是简单地讨论。


4 Ordered and unordered factors
4有序和无序因子¶

A factor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length. R provides both ordered and unordered factors. While the “real” application of factors is with model formulae (see Contrasts), we here look at a specific example.
因子是一个向量对象,用于指定相同长度的其他向量的分量的离散分类(分组)。R提供有序和无序因子。虽然因子的“真实的”应用是在模型公式中(请参阅对比),但我们在此查看一个具体示例。

4.1 A specific example
4.1一个具体的例子¶

Suppose, for example, we have a sample of 30 tax accountants from all the states and territories of Australia14 and their individual state of origin is specified by a character vector of state mnemonics as
例如,假设我们有一个来自澳大利亚所有州和地区的30名税务会计师的样本 14 ,他们各自的原籍州由州助记符的特征向量指定为

> state <- c("tas", "sa",  "qld", "nsw", "nsw", "nt",  "wa",  "wa",
             "qld", "vic", "nsw", "vic", "qld", "qld", "sa",  "tas",
             "sa",  "nt",  "wa",  "vic", "qld", "nsw", "nsw", "wa",
             "sa",  "act", "nsw", "vic", "vic", "act")

Notice that in the case of a character vector, “sorted” means sorted in alphabetical order.
请注意,对于字符向量,“sorted”表示按字母顺序排序。

A factor is similarly created using the factor() function:
使用 factor() 函数类似地创建因子:

> statef <- factor(state)

The print() function handles factors slightly differently from other objects:
print() 函数处理因子的方式与其他对象略有不同:

> statef
 [1] tas sa  qld nsw nsw nt  wa  wa  qld vic nsw vic qld qld sa
[16] tas sa  nt  wa  vic qld nsw nsw wa  sa  act nsw vic vic act
Levels:  act nsw nt qld sa tas vic wa

To find out the levels of a factor the function levels() can be used.
要找出因子的水平,可以使用函数 levels()

> levels(statef)
[1] "act" "nsw" "nt"  "qld" "sa"  "tas" "vic" "wa"

4.2 The function tapply() and ragged arrays
4.2函数 tapply() 和不规则数组¶

To continue the previous example, suppose we have the incomes of the same tax accountants in another vector (in suitably large units of money)
继续前面的例子,假设我们在另一个向量中有相同税务会计师的收入(以适当大的货币单位)

> incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56,
               61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46,
               59, 46, 58, 43)

To calculate the sample mean income for each state we can now use the special function tapply():
为了计算每个州的样本平均收入,我们现在可以使用特殊函数 tapply()

> incmeans <- tapply(incomes, statef, mean)

giving a means vector with the components labelled by the levels
给出一个均值向量,其分量由水平标记

   act    nsw     nt    qld     sa    tas    vic     wa
44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250

The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here incomes, defined by the levels of the second component, here statef15, as if they were separate vector structures. The result is a structure of the same length as the levels attribute of the factor containing the results. The reader should consult the help document for more details.
函数 tapply() 用于将函数(此处为 mean() )应用于第一参数(此处为 incomes )的每组分量,该组分量由第二分量(此处为 statef 15 )的级别定义,就好像它们是单独的向量结构一样。结果是一个与包含结果的因子的levels属性长度相同的结构。读者应查阅帮助文档以了解更多详细信息。

Suppose further we needed to calculate the standard errors of the state income means. To do this we need to write an R function to calculate the standard error for any given vector. Since there is an builtin function var() to calculate the sample variance, such a function is a very simple one liner, specified by the assignment:
进一步假设我们需要计算州收入均值的标准误差。为此,我们需要编写一个R函数来计算任何给定向量的标准误差。由于有一个内置函数 var() 来计算样本方差,这样的函数是一个非常简单的线性函数,由赋值指定:

> stdError <- function(x) sqrt(var(x)/length(x))

(Writing functions will be considered later in Writing your own functions. Note that R’s a builtin function sd() is something different.) After this assignment, the standard errors are calculated by
(编写函数将在后面的编写自己的函数中考虑。注意R的内置函数 sd() 是不同的。赋值后,标准误差计算如下:

> incster <- tapply(incomes, statef, stdError)

and the values calculated are then
然后计算出的值为

> incster
act    nsw  nt    qld     sa tas   vic     wa
1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575

As an exercise you may care to find the usual 95% confidence limits for the state mean incomes. To do this you could use tapply() once more with the length() function to find the sample sizes, and the qt() function to find the percentage points of the appropriate t-distributions. (You could also investigate R’s facilities for t-tests.)
作为一个练习,你可能会关心找到通常的95%置信区间的国家平均收入。要做到这一点,您可以再次使用 tapply()length() 函数来查找样本大小,并使用 qt() 函数来查找适当t分布的百分比。(You也可以研究R的t检验设施。

The function tapply() can also be used to handle more complicated indexing of a vector by multiple categories. For example, we might wish to split the tax accountants by both state and sex. However in this simple instance (just one factor) what happens can be thought of as follows.
函数 tapply() 也可以用来处理更复杂的多类别向量索引。例如,我们可能希望按州和性别划分税务会计师。然而,在这个简单的例子中(只有一个因素),所发生的事情可以被认为是如下。

The values in the vector are collected into groups corresponding to the distinct entries in the factor. The function is then applied to each of these groups individually. The value is a vector of function results, labelled by the levels attribute of the factor.
向量中的值被收集到对应于因子中不同条目的组中。然后将该函数单独应用于这些组中的每一个。该值是函数结果的向量,由因子的 levels 属性标记。

The combination of a vector and a labelling factor is an example of what is sometimes called a ragged array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same the indexing may be done implicitly and much more efficiently, as we see in the next section.
向量和标签因子的组合是有时被称为不规则数组的一个例子,因为子类的大小可能是不规则的。当子类的大小都相同时,索引可以隐式地完成,并且效率更高,正如我们在下一节中看到的那样。


4.3 Ordered factors
4.3有序因子¶

The levels of factors are stored in alphabetical order, or in the order they were specified to factor if they were specified explicitly.
因子水平按字母顺序存储,如果显式指定,则按它们指定为 factor 的顺序存储。

Sometimes the levels will have a natural ordering that we want to record and want our statistical analysis to make use of. The ordered() function creates such ordered factors but is otherwise identical to factor. For most purposes the only difference between ordered and unordered factors is that the former are printed showing the ordering of the levels, but the contrasts generated for them in fitting linear models are different.
有时,这些水平会有一个我们想要记录并希望我们的统计分析利用的自然顺序。 ordered() 函数创建这样的有序因子,但在其他方面与 factor 相同。在大多数情况下,有序因子和无序因子之间的唯一区别是前者显示水平的排序,但在拟合线性模型时为它们生成的对比是不同的。


5 Arrays and matrices
5数组和矩阵¶


5.1 Arrays 5.1数组¶

An array can be considered as a multiply subscripted collection of data entries, for example numeric. R allows simple facilities for creating and handling arrays, and in particular the special case of matrices.
数组可以被认为是数据项的多下标集合,例如数值。R允许创建和处理数组的简单设施,特别是矩阵的特殊情况。

A dimension vector is a vector of non-negative integers. If its length is k then the array is k-dimensional, e.g. a matrix is a 2-dimensional array. The dimensions are indexed from one up to the values given in the dimension vector.
维数向量是非负整数的向量。如果它的长度是k,那么数组是k维的,例如矩阵是2维数组。维度的索引从1到维度向量中给定的值。

A vector can be used by R as an array only if it has a dimension vector as its dim attribute. Suppose, for example, z is a vector of 1500 elements. The assignment
一个vector可以被R用作一个数组,只有当它有一个维度vector作为它的dim属性。例如,假设 z 是一个1500个元素的向量。转让

> dim(z) <- c(3,5,100)

gives it the dim attribute that allows it to be treated as a 3 by 5 by 100 array.
为它提供dim属性,使其可以被视为3 x 5 x 100数组。

Other functions such as matrix() and array() are available for simpler and more natural looking assignments, as we shall see in The array() function.
其他函数,如 matrix()array() ,可用于更简单和更自然的赋值,正如我们将在 array() 函数中看到的那样。

The values in the data vector give the values in the array in the same order as they would occur in FORTRAN, that is “column major order,” with the first subscript moving fastest and the last subscript slowest.
数据向量中的值以与FORTRAN中相同的顺序给出数组中的值,即“列优先顺序”,第一个下标移动最快,最后一个下标移动最慢。

For example if the dimension vector for an array, say a, is c(3,4,2) then there are 3 * 4 * 2 = 24 entries in a and the data vector holds them in the order a[1,1,1], a[2,1,1], …, a[2,4,2], a[3,4,2].
例如,如果一个数组的维度向量,比如说 a ,是 c(3,4,2) ,那么在 a 中有3 * 4 * 2 = 24个条目,数据向量以 a[1,1,1], a[2,1,1], …, a[2,4,2], a[3,4,2] 的顺序保存它们。

Arrays can be one-dimensional: such arrays are usually treated in the same way as vectors (including when printing), but the exceptions can cause confusion.
数组可以是一维的:这种数组通常以与向量相同的方式处理(包括打印时),但例外情况可能会导致混淆。


5.2 Array indexing. Subsections of an array
5.2数组索引。数组的子部分¶

Individual elements of an array may be referenced by giving the name of the array followed by the subscripts in square brackets, separated by commas.
数组中的单个元素可以通过给出数组的名称,后跟方括号中的下标(用逗号分隔)来引用。

More generally, subsections of an array may be specified by giving a sequence of index vectors in place of subscripts; however if any index position is given an empty index vector, then the full range of that subscript is taken.
更一般地说,数组的子部分可以通过给出一系列索引向量来代替下标来指定;但是,如果任何索引位置被赋予空索引向量,则采用该下标的整个范围。

Continuing the previous example, a[2,,] is a 4 * 2 array with dimension vector c(4,2) and data vector containing the values
继续前面的示例, a[2,,] 是一个4 * 2数组,其中维度向量 c(4,2) 和数据向量包含值

c(a[2,1,1], a[2,2,1], a[2,3,1], a[2,4,1],
  a[2,1,2], a[2,2,2], a[2,3,2], a[2,4,2])

in that order. a[,,] stands for the entire array, which is the same as omitting the subscripts entirely and using a alone.
按照这个顺序 a[,,] 代表整个数组,这与完全省略下标并单独使用 a 相同。

For any array, say Z, the dimension vector may be referenced explicitly as dim(Z) (on either side of an assignment).
对于任何数组,比如说 Z ,维度向量可以显式引用为 dim(Z) (在赋值的任何一侧)。

Also, if an array name is given with just one subscript or index vector, then the corresponding values of the data vector only are used; in this case the dimension vector is ignored. This is not the case, however, if the single index is not a vector but itself an array, as we next discuss.
此外,如果数组名只有一个下标或索引向量,则只使用数据向量的相应值;在这种情况下,维度向量被忽略。然而,如果单个索引不是一个向量,而是一个数组,就不是这种情况了,我们接下来会讨论。


5.3 Index matrices
5.3索引矩阵¶

As well as an index vector in any subscript position, a matrix may be used with a single index matrix in order either to assign a vector of quantities to an irregular collection of elements in the array, or to extract an irregular collection as a vector.
除了在任何下标位置的索引向量之外,矩阵可以与单个索引矩阵一起使用,以便将量的向量分配给数组中的元素的不规则集合,或者提取不规则集合作为向量。

A matrix example makes the process clear. In the case of a doubly indexed array, an index matrix may be given consisting of two columns and as many rows as desired. The entries in the index matrix are the row and column indices for the doubly indexed array.
一个矩阵示例使该过程变得清晰。在双索引数组的情况下,索引矩阵可以由两列和任意多的行组成。索引矩阵中的条目是双索引数组的行索引和列索引。

Suppose for example we have a 4 by 5 array X and we wish to do the following:
例如,假设我们有一个4 × 5的数组 X ,我们希望做以下事情:

  • Extract elements X[1,3], X[2,2] and X[3,1] as a vector structure, and
    提取元素 X[1,3]X[2,2]X[3,1] 作为向量结构,以及
  • Replace these entries in the array X by zeroes.
    将数组 X 中的这些条目替换为零。

In this case we need a 3 by 2 subscript array, as in the following example.
在这种情况下,我们需要一个3乘2的下标数组,如下面的示例所示。

> x <- array(1:20, dim=c(4,5))   # Generate a 4 by 5 array.
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
> i <- array(c(1:3,3:1), dim=c(3,2))
> i                             # i is a 3 by 2 index array.
     [,1] [,2]
[1,]    1    3
[2,]    2    2
[3,]    3    1
> x[i]                          # Extract those elements
[1] 9 6 3
> x[i] <- 0                     # Replace those elements by zeros.
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    0   13   17
[2,]    2    0   10   14   18
[3,]    0    7   11   15   19
[4,]    4    8   12   16   20
>

Negative indices are not allowed in index matrices. NA and zero values are allowed: rows in the index matrix containing a zero are ignored, and rows containing an NA produce an NA in the result.
索引矩阵中不允许有负索引。允许 NA 和零值:索引矩阵中包含零的行被忽略,包含 NA 的行在结果中产生 NA

As a less trivial example, suppose we wish to generate an (unreduced) design matrix for a block design defined by factors blocks (b levels) and varieties (v levels). Further suppose there are n plots in the experiment. We could proceed as follows:
作为一个不太平凡的例子,假设我们希望为因子 blocksb 水平)和 varietiesv 水平)定义的区组设计生成一个(未简化)设计矩阵。进一步假设实验中有 n 图。我们可以这样做:

> Xb <- matrix(0, n, b)
> Xv <- matrix(0, n, v)
> ib <- cbind(1:n, blocks)
> iv <- cbind(1:n, varieties)
> Xb[ib] <- 1
> Xv[iv] <- 1
> X <- cbind(Xb, Xv)

To construct the incidence matrix, N say, we could use
为了构造关联矩阵,比如说,我们可以使用

> N <- crossprod(Xb, Xv)

However a simpler direct way of producing this matrix is to use table():
然而,产生该矩阵的更简单的直接方式是使用 table()

> N <- table(blocks, varieties)

Index matrices must be numerical: any other form of matrix (e.g. a logical or character matrix) supplied as a matrix is treated as an indexing vector.
索引矩阵必须是数字的:任何其他形式的矩阵(例如逻辑或字符矩阵)作为矩阵提供,都被视为索引向量。


5.4 The array() function
5.4 array() 函数¶

As well as giving a vector structure a dim attribute, arrays can be constructed from vectors by the array function, which has the form
除了给向量结构一个 dim 属性外,还可以通过 array 函数从向量构造数组,其形式为

> Z <- array(data_vector, dim_vector)

For example, if the vector h contains 24 or fewer, numbers then the command
例如,如果矢量 h 包含24个或更少的数字,则命令

> Z <- array(h, dim=c(3,4,2))

would use h to set up 3 by 4 by 2 array in Z. If the size of h is exactly 24 the result is the same as
将使用 hZ 中设置3 × 4 × 2阵列。如果 h 的大小正好是24,则结果与

> Z <- h ; dim(Z) <- c(3,4,2)

However if h is shorter than 24, its values are recycled from the beginning again to make it up to size 24 (see Mixed vector and array arithmetic. The recycling rule) but dim(h) <- c(3,4,2) would signal an error about mismatching length. As an extreme but common example
但是,如果 h 小于24,它的值将从开始再次循环,以使其大小为24(请参见混合向量和数组运算。回收规则),但 dim(h) <- c(3,4,2) 将发出有关长度不匹配的错误信号。作为一个极端但常见的例子

> Z <- array(0, c(3,4,2))

makes Z an array of all zeros.
使 Z 成为全零数组。

At this point dim(Z) stands for the dimension vector c(3,4,2), and Z[1:24] stands for the data vector as it was in h, and Z[] with an empty subscript or Z with no subscript stands for the entire array as an array.
此时 dim(Z) 代表维度向量 c(3,4,2)Z[1:24] 代表数据向量,就像在 h 中一样,带有空下标的 Z[] 或没有下标的 Z 代表整个数组作为数组。

Arrays may be used in arithmetic expressions and the result is an array formed by element-by-element operations on the data vector. The dim attributes of operands generally need to be the same, and this becomes the dimension vector of the result. So if A, B and C are all similar arrays, then
数组可用于算术表达式中,结果是通过对数据向量进行逐个元素操作形成的数组。操作数的 dim 属性通常需要相同,这将成为结果的维度向量。所以如果 ABC 都是相似的数组,那么

> D <- 2*A*B + C + 1

makes D a similar array with its data vector being the result of the given element-by-element operations. However the precise rule concerning mixed array and vector calculations has to be considered a little more carefully.
使 D 成为一个类似的数组,其数据向量是给定的逐个元素操作的结果。然而,关于混合数组和向量计算的精确规则必须更仔细地考虑。


5.4.1 Mixed vector and array arithmetic. The recycling rule
5.4.1混合向量和数组运算。回收规则¶

The precise rule affecting element by element mixed calculations with vectors and arrays is somewhat quirky and hard to find in the references. From experience we have found the following to be a reliable guide.
使用向量和数组逐个元素混合计算的精确规则有点古怪,很难在参考文献中找到。根据经验,我们发现以下是一个可靠的指南。

  • The expression is scanned from left to right.
    从左到右扫描表达式。
  • Any short vector operands are extended by recycling their values until they match the size of any other operands.
    任何短向量操作数都通过循环它们的值来扩展,直到它们与任何其他操作数的大小匹配。
  • As long as short vectors and arrays only are encountered, the arrays must all have the same dim attribute or an error results.
    只要只遇到短向量和数组,数组必须都具有相同的 dim 属性,否则将导致错误。
  • Any vector operand longer than a matrix or array operand generates an error.
    任何比矩阵或数组操作数长的向量操作数都会生成错误。
  • If array structures are present and no error or coercion to vector has been precipitated, the result is an array structure with the common dim attribute of its array operands.
    如果数组结构存在,并且没有错误或强制向量,则结果是一个数组结构,其数组操作数具有公共 dim 属性。

5.5 The outer product of two arrays
5.5两个数组的外积¶

An important operation on arrays is the outer product. If a and b are two numeric arrays, their outer product is an array whose dimension vector is obtained by concatenating their two dimension vectors (order is important), and whose data vector is got by forming all possible products of elements of the data vector of a with those of b. The outer product is formed by the special operator %o%:
数组的一个重要运算是外积。如果 ab 是两个数值数组,则它们的外积是一个数组,其维度向量是通过连接它们的两个维度向量获得的(顺序很重要),其数据向量是通过将 a 的数据向量的元素与 b 的元素的所有可能的乘积形成而获得的。外积由特殊算子 %o% 形成:

> ab <- a %o% b

An alternative is  另一种方法是

> ab <- outer(a, b, "*")

The multiplication function can be replaced by an arbitrary function of two variables. For example if we wished to evaluate the function f(x; y) = cos(y)/(1 + x^2) over a regular grid of values with x- and y-coordinates defined by the R vectors x and y respectively, we could proceed as follows:
乘法函数可以用任意两个变量的函数代替。例如,如果我们希望在具有分别由R向量 xy 定义的x坐标和y坐标的值的规则网格上计算函数f(x; y)= cos(y)/(1 + x^2),我们可以如下进行:

> f <- function(x, y) cos(y)/(1 + x^2)
> z <- outer(x, y, f)

In particular the outer product of two ordinary vectors is a doubly subscripted array (that is a matrix, of rank at most 1). Notice that the outer product operator is of course non-commutative. Defining your own R functions will be considered further in Writing your own functions.
特别是两个普通向量的外积是一个双下标数组(即秩至多为1的矩阵)。注意,外积运算符当然是不可交换的。定义自己的R函数将在编写自己的函数中进一步考虑。

An example: Determinants of 2 by 2 single-digit matrices
一个例子:2乘2个位数矩阵的行列式¶

As an artificial but cute example, consider the determinants of 2 by 2 matrices [a, b; c, d] where each entry is a non-negative integer in the range 0, 1, ..., 9, that is a digit.
作为一个人工但可爱的例子,考虑2乘2矩阵[a,B; c,d]的行列式,其中每个条目是范围0,1,.,9,这是一个数字。

The problem is to find the determinants, ad - bc, of all possible matrices of this form and represent the frequency with which each value occurs as a high density plot. This amounts to finding the probability distribution of the determinant if each digit is chosen independently and uniformly at random.
问题是找到所有可能的矩阵的行列式ad - bc,并将每个值出现的频率表示为高密度图。这相当于找到行列式的概率分布,如果每个数字是独立和均匀随机选择的。

A neat way of doing this uses the outer() function twice:
一个简单的方法是使用两次 outer() 函数:

> d <- outer(0:9, 0:9)
> fr <- table(outer(d, d, "-"))
> plot(fr, xlab="Determinant", ylab="Frequency")

Notice that plot() here uses a histogram like plot method, because it “sees” that fr is of class "table". The “obvious” way of doing this problem with for loops, to be discussed in Grouping, loops and conditional execution, is so inefficient as to be impractical.
请注意,这里的 plot() 使用了类似于plot方法的直方图,因为它“看到” fr 属于类 "table" 。用 for 循环来解决这个问题的“显而易见”的方法,将在循环,循环和条件执行中讨论,是如此低效以至于不切实际。

It is also perhaps surprising that about 1 in 20 such matrices is singular.
也许令人惊讶的是,大约每20个这样的矩阵中就有1个是奇异的。


5.6 Generalized transpose of an array
5.6数组的广义转置¶

The function aperm(a, perm) may be used to permute an array, a. The argument perm must be a permutation of the integers {1, ..., k}, where k is the number of subscripts in a. The result of the function is an array of the same size as a but with old dimension given by perm[j] becoming the new j-th dimension. The easiest way to think of this operation is as a generalization of transposition for matrices. Indeed if A is a matrix, (that is, a doubly subscripted array) then B given by
函数 aperm(a, perm) 可用于置换数组 a 。参数 perm 必须是整数{1,.,k},其中k是 a 中的下标的数量。该函数的结果是一个与 a 大小相同的数组,但由 perm[j] 给出的旧维度成为新的第 j 维度。最简单的方法来考虑这个操作是作为一个推广的转置矩阵。事实上,如果 A 是一个矩阵,(也就是说,一个双下标数组),那么 B 由下式给出:

> B <- aperm(A, c(2,1))

is just the transpose of A. For this special case a simpler function t() is available, so we could have used B <- t(A).
A 的转置。对于这种特殊情况,可以使用更简单的函数 t() ,因此我们可以使用 B <- t(A)


5.7 Matrix facilities
5.7矩阵设施¶

As noted above, a matrix is just an array with two subscripts. However it is such an important special case it needs a separate discussion. R contains many operators and functions that are available only for matrices. For example t(X) is the matrix transpose function, as noted above. The functions nrow(A) and ncol(A) give the number of rows and columns in the matrix A respectively.
如上所述,矩阵只是一个有两个下标的数组。然而,这是一个如此重要的特殊情况,需要单独讨论。R包含许多仅适用于矩阵的运算符和函数。例如, t(X) 是矩阵转置函数,如上所述。函数 nrow(A)ncol(A) 分别给出矩阵 A 中的行数和列数。


5.7.1 Matrix multiplication
5.7.1矩阵乘法¶

The operator %*% is used for matrix multiplication. An n by 1 or 1 by n matrix may of course be used as an n-vector if in the context such is appropriate.
运算符 %*% 用于矩阵乘法。当然,如果在上下文中合适的话,n乘1或1乘n矩阵可以用作n向量。

Conversely, vectors which occur in matrix multiplication expressions are automatically promoted either to row or column vectors, whichever is multiplicatively coherent, if possible, (although this is not always unambiguously possible, as we see later).
相反,如果可能的话,矩阵乘法表达式中出现的向量会自动提升为行向量或列向量,无论哪一个是乘法一致的(尽管这并不总是明确可行的,正如我们稍后看到的那样)。

If, for example, A and B are square matrices of the same size, then
例如,如果 AB 是相同大小的方阵,则

> A * B

is the matrix of element by element products and
是元素与元素乘积的矩阵,

> A %*% B

is the matrix product. If x is a vector, then
是矩阵乘积。如果 x 是向量,则

> x %*% A %*% x

is a quadratic form.16
是一个二次型。 16

The function crossprod() forms “cross products”, meaning that crossprod(X, y) is the same as t(X) %*% y but the operation is more efficient. If the second argument to crossprod() is omitted it is taken to be the same as the first.
函数 crossprod() 形成“叉积”,这意味着 crossprod(X, y)t(X) %*% y 相同,但操作更有效。如果 crossprod() 的第二个参数被省略,它将被视为与第一个参数相同。

The meaning of diag() depends on its argument. diag(v), where v is a vector, gives a diagonal matrix with elements of the vector as the diagonal entries. On the other hand diag(M), where M is a matrix, gives the vector of main diagonal entries of M. This is the same convention as that used for diag() in MATLAB. Also, somewhat confusingly, if k is a single numeric value then diag(k) is the k by k identity matrix!
diag() 的含义取决于它的参数。 diag(v) ,其中 v 是一个向量,给出了一个对角矩阵,其中向量的元素作为对角元素。另一方面,其中 M 是矩阵, diag(M) 给出 M 的主对角线项的向量。这与MATLAB中的 diag() 使用的约定相同。此外,有点令人困惑的是,如果 k 是一个单一的数值,那么 diag(k)kk 的单位矩阵!


5.7.2 Linear equations and inversion
5.7.2线性方程和反演¶

Solving linear equations is the inverse of matrix multiplication. When after
解线性方程组是矩阵乘法的逆过程。等到

> b <- A %*% x

only A and b are given, the vector x is the solution of that linear equation system. In R,
仅给出了 Ab ,矢量 x 是该线性方程组的解。在R中,

> solve(A,b)

solves the system, returning x (up to some accuracy loss). Note that in linear algebra, formally x = A^{-1} %*% b where A^{-1} denotes the inverse of A, which can be computed by
求解系统,返回 x (直到一些精度损失)。请注意,在线性代数中,形式上是 x = A^{-1} %*% b ,其中 A^{-1} 表示 A 的逆,可以通过以下方式计算:

solve(A)

but rarely is needed. Numerically, it is both inefficient and potentially unstable to compute x <- solve(A) %*% b instead of solve(A,b).
但很少被需要。从数值上讲,计算 x <- solve(A) %*% b 而不是 solve(A,b) 既低效又可能不稳定。

The quadratic form  x %*% A^{-1} %*% x   which is used in multivariate computations, should be computed by something like17 x %*% solve(A,x), rather than computing the inverse of A.
在多变量计算中使用的二次型 x %*% A^{-1} %*% x 应该通过类似于 17 x %*% solve(A,x) 的东西来计算,而不是计算 A 的逆。


5.7.3 Eigenvalues and eigenvectors
5.7.3特征值和特征向量¶

The function eigen(Sm) calculates the eigenvalues and eigenvectors of a symmetric matrix Sm. The result of this function is a list of two components named values and vectors. The assignment
函数 eigen(Sm) 计算对称矩阵 Sm 的特征值和特征向量。此函数的结果是两个名为 valuesvectors 的组件的列表。转让

> ev <- eigen(Sm)

will assign this list to ev. Then ev$val is the vector of eigenvalues of Sm and ev$vec is the matrix of corresponding eigenvectors. Had we only needed the eigenvalues we could have used the assignment:
将此列表分配给 ev 。则 ev$valSm 的特征值的向量,并且 ev$vec 是对应的特征向量的矩阵。如果我们只需要特征值,我们可以使用分配:

> evals <- eigen(Sm)$values

evals now holds the vector of eigenvalues and the second component is discarded. If the expression
evals 现在保存特征值的向量,并且丢弃第二个分量。如果表达式

> eigen(Sm)

is used by itself as a command the two components are printed, with their names. For large matrices it is better to avoid computing the eigenvectors if they are not needed by using the expression
本身用作命令,打印两个组件及其名称。对于大型矩阵,如果不需要使用以下表达式,则最好避免计算特征向量:

> evals <- eigen(Sm, only.values = TRUE)$values

5.7.4 Singular value decomposition and determinants
5.7.4奇异值分解和行列式¶

The function svd(M) takes an arbitrary matrix argument, M, and calculates the singular value decomposition of M. This consists of a matrix of orthonormal columns U with the same column space as M, a second matrix of orthonormal columns V whose column space is the row space of M and a diagonal matrix of positive entries D such that M = U %*% D %*% t(V). D is actually returned as a vector of the diagonal elements. The result of svd(M) is actually a list of three components named d, u and v, with evident meanings.
函数 svd(M) 采用任意矩阵参数 M ,并计算 M 的奇异值分解。这包括具有与 M 相同列空间的正交归一化列 U 的矩阵、其列空间是 M 的行空间的正交归一化列 V 的第二矩阵以及正项 D 的对角矩阵,使得 M = U %*% D %*% t(V)D 实际上是作为对角元素的向量返回的。 svd(M) 的结果实际上是一个名为 duv 的三个组件的列表,具有明显的含义。

If M is in fact square, then, it is not hard to see that
如果 M 实际上是正方形,那么,不难看出,

> absdetM <- prod(svd(M)$d)

calculates the absolute value of the determinant of M. If this calculation were needed often with a variety of matrices it could be defined as an R function
计算 M 的行列式的绝对值。如果这种计算需要经常与各种矩阵,它可以被定义为一个R函数

> absdet <- function(M) prod(svd(M)$d)

after which we could use absdet() as just another R function. As a further trivial but potentially useful example, you might like to consider writing a function, say tr(), to calculate the trace of a square matrix. [Hint: You will not need to use an explicit loop. Look again at the diag() function.]
之后我们可以使用 absdet() 作为另一个R函数。作为一个更简单但可能有用的例子,你可能会考虑编写一个函数,比如 tr() ,来计算一个方阵的迹。[Hint:您不需要使用显式循环。再看看 diag() 函数。

R has a builtin function det to calculate a determinant, including the sign, and another, determinant, to give the sign and modulus (optionally on log scale),
R有一个内置函数 det 来计算行列式,包括符号,另一个内置函数 determinant 来给出符号和模数(可选对数标度),


5.7.5 Least squares fitting and the QR decomposition
5.7.5最小二乘拟合和QR分解¶

The function lsfit() returns a list giving results of a least squares fitting procedure. An assignment such as
函数 lsfit() 返回一个列表,给出最小二乘拟合过程的结果。一项任务,如

> ans <- lsfit(X, y)

gives the results of a least squares fit where y is the vector of observations and X is the design matrix. See the help facility for more details, and also for the follow-up function ls.diag() for, among other things, regression diagnostics. Note that a grand mean term is automatically included and need not be included explicitly as a column of X. Further note that you almost always will prefer using lm(.) (see Linear models) to lsfit() for regression modelling.
给出了最小二乘拟合的结果,其中 y 是观测向量, X 是设计矩阵。请参阅帮助工具以了解更多详细信息,以及用于回归诊断等的后续函数 ls.diag() 。请注意,总平均值项会自动包含,并且不需要显式包含为 X 列。进一步注意,您几乎总是更喜欢使用 lm(.) (请参阅线性模型)而不是 lsfit() 来进行回归建模。

Another closely related function is qr() and its allies. Consider the following assignments
另一个密切相关的功能是 qr() 及其盟友。考虑以下分配

> Xplus <- qr(X)
> b <- qr.coef(Xplus, y)
> fit <- qr.fitted(Xplus, y)
> res <- qr.resid(Xplus, y)

These compute the orthogonal projection of y onto the range of X in fit, the projection onto the orthogonal complement in res and the coefficient vector for the projection in b, that is, b is essentially the result of the MATLAB ‘backslash’ operator.
这些计算 yfit 中的 X 的范围上的正交投影,到 res 中的正交补上的投影以及 b 中的投影的系数向量,即 b 本质上是MATLAB“反斜杠”运算符的结果。

It is not assumed that X has full column rank. Redundancies will be discovered and removed as they are found.
不假定 X 具有完整列秩。冗余将被发现并删除,因为他们被发现。

This alternative is the older, low-level way to perform least squares calculations. Although still useful in some contexts, it would now generally be replaced by the statistical models features, as will be discussed in Statistical models in R.
此替代方法是执行最小二乘计算的较旧的低级方法。虽然在某些情况下仍然有用,但现在它通常会被统计模型特征所取代,正如R中的统计模型中所讨论的那样。


5.8 Forming partitioned matrices, cbind() and rbind()
5.8形成分块矩阵, cbind()rbind()

As we have already seen informally, matrices can be built up from other vectors and matrices by the functions cbind() and rbind(). Roughly cbind() forms matrices by binding together matrices horizontally, or column-wise, and rbind() vertically, or row-wise.
正如我们已经非正式地看到的,矩阵可以通过函数 cbind()rbind() 从其他向量和矩阵建立。大致上, cbind() 通过水平或列方式将矩阵绑定在一起来形成矩阵,而 rbind() 垂直或行方式。

In the assignment  在转让中

> X <- cbind(arg_1, arg_2, arg_3, ...)

the arguments to cbind() must be either vectors of any length, or matrices with the same column size, that is the same number of rows. The result is a matrix with the concatenated arguments arg_1, arg_2, … forming the columns.
cbind() 的参数必须是任意长度的向量,或者是具有相同列大小(即相同行数)的矩阵。结果是一个矩阵,其中参数arg_1,arg_2,.串联起来形成列。

If some of the arguments to cbind() are vectors they may be shorter than the column size of any matrices present, in which case they are cyclically extended to match the matrix column size (or the length of the longest vector if no matrices are given).
如果 cbind() 的一些参数是向量,它们可能比存在的任何矩阵的列大小都短,在这种情况下,它们被循环扩展以匹配矩阵列大小(或最长向量的长度,如果没有给出矩阵)。

The function rbind() does the corresponding operation for rows. In this case any vector argument, possibly cyclically extended, are of course taken as row vectors.
函数 rbind() 对行执行相应的操作。在这种情况下,任何可能循环扩展的向量参数当然都被视为行向量。

Suppose X1 and X2 have the same number of rows. To combine these by columns into a matrix X, together with an initial column of 1s we can use
假设 X1X2 具有相同的行数。为了将这些按列联合收割机组合成矩阵 X ,连同初始列 1 ,我们可以使用

> X <- cbind(1, X1, X2)

The result of rbind() or cbind() always has matrix status. Hence cbind(x) and rbind(x) are possibly the simplest ways explicitly to allow the vector x to be treated as a column or row matrix respectively.
rbind()cbind() 的结果始终具有矩阵状态。因此, cbind(x)rbind(x) 可能是明确允许向量 x 分别被视为列矩阵或行矩阵的最简单方式。


5.9 The concatenation function, c(), with arrays
5.9连接函数 c() 与数组¶

It should be noted that whereas cbind() and rbind() are concatenation functions that respect dim attributes, the basic c() function does not, but rather clears numeric objects of all dim and dimnames attributes. This is occasionally useful in its own right.
应该注意的是,尽管 cbind()rbind() 是遵守 dim 属性的串联函数,但基本的 c() 函数不遵守,而是清除所有 dimdimnames 属性的数字对象。这本身有时是有用的。

The official way to coerce an array back to a simple vector object is to use as.vector()
将数组强制转换为简单向量对象的正式方法是使用 as.vector()

> vec <- as.vector(X)

However a similar result can be achieved by using c() with just one argument, simply for this side-effect:
然而,通过使用只有一个参数的 c() 可以实现类似的结果,只是为了这个副作用:

> vec <- c(X)

There are slight differences between the two, but ultimately the choice between them is largely a matter of style (with the former being preferable).
两者之间有细微的差异,但最终在它们之间的选择在很大程度上是一个风格问题(前者更可取)。


5.10 Frequency tables from factors
5.10因子的频率表¶

Recall that a factor defines a partition into groups. Similarly a pair of factors defines a two way cross classification, and so on. The function table() allows frequency tables to be calculated from equal length factors. If there are k factor arguments, the result is a k-way array of frequencies.
回想一下,因子将分区定义为多个组。类似地,一对因子定义了一个双向交叉分类,等等。函数 table() 允许从等长因子计算频率表。如果有k个因子参数,则结果是k路频率数组。

Suppose, for example, that statef is a factor giving the state code for each entry in a data vector. The assignment
例如,假设 statef 是为数据向量中的每个条目给出状态代码的因子。转让

> statefr <- table(statef)

gives in statefr a table of frequencies of each state in the sample. The frequencies are ordered and labelled by the levels attribute of the factor. This simple case is equivalent to, but more convenient than,
statefr 中给出了样本中每个状态的频率表。频率由因子的 levels 属性排序和标记。这个简单的例子相当于,但更方便,

> statefr <- tapply(statef, statef, length)

Further suppose that incomef is a factor giving a suitably defined “income class” for each entry in the data vector, for example with the cut() function:
进一步假设 incomef 是为数据向量中的每个条目给出适当定义的“收入类别”的因子,例如使用 cut() 函数:

> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef

Then to calculate a two-way table of frequencies:
然后计算一个双向频率表:

> table(incomef,statef)
         statef
incomef   act nsw nt qld sa tas vic wa
  (35,45]   1   1  0   1  0   0   1  0
  (45,55]   1   1  1   1  2   0   1  3
  (55,65]   0   3  1   3  2   2   2  1
  (65,75]   0   1  0   0  0   0   1  0

Extension to higher-way frequency tables is immediate.
扩展到更高的方式频率表是立即的。


6 Lists and data frames
6列表和数据帧¶


6.1 Lists 6.1列表¶

An R list is an object consisting of an ordered collection of objects known as its components.
一个R列表是一个对象,由一个有序的对象集合组成,称为它的组件。

There is no particular need for the components to be of the same mode or type, and, for example, a list could consist of a numeric vector, a logical value, a matrix, a complex vector, a character array, a function, and so on. Here is a simple example of how to make a list:
没有特别的必要让这些组件具有相同的模式或类型,例如,列表可以由数字向量、逻辑值、矩阵、复向量、字符数组、函数等组成。下面是一个如何创建列表的简单示例:

> Lst <- list(name="Fred", wife="Mary", no.children=3,
              child.ages=c(4,7,9))

Components are always numbered and may always be referred to as such. Thus if Lst is the name of a list with four components, these may be individually referred to as Lst[[1]], Lst[[2]], Lst[[3]] and Lst[[4]]. If, further, Lst[[4]] is a vector subscripted array then Lst[[4]][1] is its first entry.
组件总是编号的,并可能总是这样称呼。因此,如果 Lst 是具有四个分量的列表的名称,则这些分量可以被单独地称为 Lst[[1]]Lst[[2]]Lst[[3]]Lst[[4]] 。此外,如果 Lst[[4]] 是一个向量下标数组,那么 Lst[[4]][1] 是它的第一个条目。

If Lst is a list, then the function length(Lst) gives the number of (top level) components it has.
如果 Lst 是一个列表,那么函数 length(Lst) 给出了它拥有的(顶级)组件的数量。

Components of lists may also be named, and in this case the component may be referred to either by giving the component name as a character string in place of the number in double square brackets, or, more conveniently, by giving an expression of the form
也可以命名列表的组件,在这种情况下,可以通过将组件名称作为字符串来代替双方括号中的数字来引用组件,或者更方便地,通过给出以下形式的表达式来引用组件

> name$component_name

for the same thing.
为了同样的事

This is a very useful convention as it makes it easier to get the right component if you forget the number.
这是一个非常有用的约定,因为如果您忘记了数字,它可以更容易地获得正确的组件。

So in the simple example given above:
在上面给出的简单例子中:

Lst$name is the same as Lst[[1]] and is the string "Fred",
Lst$nameLst[[1]] 相同,是字符串 "Fred"

Lst$wife is the same as Lst[[2]] and is the string "Mary",
Lst$wifeLst[[2]] 相同,是字符串 "Mary"

Lst$child.ages[1] is the same as Lst[[4]][1] and is the number 4.
Lst$child.ages[1]Lst[[4]][1] 相同,是数字 4

Additionally, one can also use the names of the list components in double square brackets, i.e., Lst[["name"]] is the same as Lst$name. This is especially useful, when the name of the component to be extracted is stored in another variable as in
此外,还可以在双方括号中使用列表组件的名称,即, Lst[["name"]]Lst$name 相同。当要提取的组件的名称存储在另一个变量中时(如

> x <- "name"; Lst[[x]]

It is very important to distinguish Lst[[1]] from Lst[1]. ‘[[]]’ is the operator used to select a single element, whereas ‘[]’ is a general subscripting operator. Thus the former is the first object in the list Lst, and if it is a named list the name is not included. The latter is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist.
区分 Lst[[1]]Lst[1] 是非常重要的。' [[]] '是用于选择单个元素的运算符,而' [] '是通用下标运算符。因此,前者是列表 Lst 中的第一个对象,如果它是一个命名列表,则不包括名称。后者是仅由第一条目组成的列表 Lst 的子列表。如果它是一个命名列表,则将名称转移到子列表。

The names of components may be abbreviated down to the minimum number of letters needed to identify them uniquely. Thus Lst$coefficients may be minimally specified as Lst$coe and Lst$covariance as Lst$cov.
组件的名称可以缩写成唯一标识它们所需的最小字母数。因此, Lst$coefficients 可以最低限度地被指定为 Lst$coe 并且 Lst$covariance 可以被指定为 Lst$cov

The vector of names is in fact simply an attribute of the list like any other and may be handled as such. Other structures besides lists may, of course, similarly be given a names attribute also.
名称向量实际上只是列表的一个属性,就像任何其他属性一样,可以这样处理。当然,除了列表之外的其他结构也可以类似地被赋予names属性。


6.2 Constructing and modifying lists
6.2构造和修改列表¶

New lists may be formed from existing objects by the function list(). An assignment of the form
新的列表可以通过函数 list() 从现有对象中形成。形式的赋值

> Lst <- list(name_1=object_1, ..., name_m=object_m)

sets up a list Lst of m components using object_1, …, object_m for the components and giving them names as specified by the argument names, (which can be freely chosen). If these names are omitted, the components are numbered only. The components used to form the list are copied when forming the new list and the originals are not affected.
使用object_1,...,object_m为组件设置一个包含m个组件的列表 Lst ,并根据参数名称指定它们的名称(可以自由选择)。如果省略这些名称,则仅对组件进行编号。用于形成列表的组件在形成新列表时被复制,并且原始组件不受影响。

Lists, like any subscripted object, can be extended by specifying additional components. For example
列表,像任何下标对象一样,可以通过指定其他组件来扩展。例如

> Lst[5] <- list(matrix=Mat)

6.2.1 Concatenating lists
6.2.1连接列表¶

When the concatenation function c() is given list arguments, the result is an object of mode list also, whose components are those of the argument lists joined together in sequence.
当连接函数 c() 被给予列表参数时,结果也是模式列表的对象,其组件是按顺序连接在一起的参数列表的组件。

> list.ABC <- c(list.A, list.B, list.C)

Recall that with vector objects as arguments the concatenation function similarly joined together all arguments into a single vector structure. In this case all other attributes, such as dim attributes, are discarded.
回想一下,以向量对象作为参数,串联函数类似地将所有参数连接在一起成为单个向量结构。在这种情况下,所有其他属性(如 dim 属性)都将被丢弃。


6.3 Data frames
6.3数据帧¶

A data frame is a list with class "data.frame". There are restrictions on lists that may be made into data frames, namely
数据帧是一个类为 "data.frame" 的列表。对可以被制成数据帧的列表有一些限制,即

  • The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames.
    组件必须是向量(数字、字符或逻辑)、因子、数字矩阵、列表或其他数据框。
  • Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively.
    矩阵、列表和数据框为新数据框提供的变量分别与它们的列、元素或变量一样多。
  • Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same number of rows.
    作为数据框变量出现的向量结构必须具有相同的长度,矩阵结构必须具有相同的行数。

A data frame may for many purposes be regarded as a matrix with columns possibly of differing modes and attributes. It may be displayed in matrix form, and its rows and columns extracted using matrix indexing conventions.
一个数据框在许多情况下可以被看作是一个矩阵,其中的列可能具有不同的模式和属性。它可以以矩阵形式显示,并且使用矩阵索引约定提取其行和列。


6.3.1 Making data frames
6.3.1制作数据帧¶

Objects satisfying the restrictions placed on the columns (components) of a data frame may be used to form one using the function data.frame:
满足对数据框的列(组件)的限制的对象可用于使用函数 data.frame 形成一个数据框:

> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)

A list whose components conform to the restrictions of a data frame may be coerced into a data frame using the function as.data.frame()
可以使用函数 as.data.frame() 将其组件符合数据帧的限制的列表强制到数据帧中

The simplest way to construct a data frame from scratch is to use the read.table() function to read an entire data frame from an external file. This is discussed further in Reading data from files.
从头开始构造数据帧的最简单方法是使用 read.table() 函数从外部文件读取整个数据帧。这将在从文件中阅读数据中进一步讨论。


6.3.2 attach() and detach()
6.3.2 attach()detach()

The $ notation, such as accountants$home, for list components is not always very convenient. A useful facility would be somehow to make the components of a list or data frame temporarily visible as variables under their component name, without the need to quote the list name explicitly each time.
列表组件的 $ 表示法(如 accountants$home )并不总是很方便。一个有用的工具是以某种方式使列表或数据框的组件临时显示为组件名称下的变量,而不需要每次都显式引用列表名称。

The attach() function takes a ‘database’ such as a list or data frame as its argument. Thus suppose lentils is a data frame with three variables lentils$u, lentils$v, lentils$w. The attach
attach() 函数接受一个“数据库”,如列表或数据框作为其参数。因此,假设 lentils 是具有三个变量 lentils$ulentils$vlentils$w 的数据帧。附接

> attach(lentils)

places the data frame in the search path at position 2, and provided there are no variables u, v or w in position 1, u, v and w are available as variables from the data frame in their own right. At this point an assignment such as
将数据帧放置在搜索路径中的位置2处,并且假设在位置1处没有变量 uvw ,则 uvw 可作为来自数据帧的变量以其自身的权利获得。在这一点上,

> u <- v+w

does not replace the component u of the data frame, but rather masks it with another variable u in the workspace at position 1 on the search path. To make a permanent change to the data frame itself, the simplest way is to resort once again to the $ notation:
不会替换数据框的组件 u ,而是在搜索路径上的位置1的工作区中使用另一个变量 u 将其屏蔽。要对数据框本身进行永久更改,最简单的方法是再次采用 $ 表示法:

> lentils$u <- v+w

However the new value of component u is not visible until the data frame is detached and attached again.
但是,组件 u 的新值不可见,直到拆离并再次附着数据框。

To detach a data frame, use the function
要拆离数据框,请使用函数

> detach()

More precisely, this statement detaches from the search path the entity currently at position 2. Thus in the present context the variables u, v and w would be no longer visible, except under the list notation as lentils$u and so on. Entities at positions greater than 2 on the search path can be detached by giving their number to detach, but it is much safer to always use a name, for example by detach(lentils) or detach("lentils")
更准确地说,该语句将当前位于位置2的实体从搜索路径中分离出来。因此,在本上下文中,变量 uvw 将不再可见,除了在列表标记下为 lentils$u 等等。在搜索路径上大于2的位置处的实体可以通过将它们的编号赋予 detach 来分离,但是总是使用名称(例如,通过 detach(lentils)detach("lentils") )要安全得多

Note: In R lists and data frames can only be attached at position 2 or above, and what is attached is a copy of the original object. You can alter the attached values via assign, but the original list or data frame is unchanged.
注意:在R中,列表和数据框只能附加在位置2或以上,附加的是原始对象的副本。您可以通过 assign 更改附加值,但原始列表或数据框不变。


6.3.3 Working with data frames
6.3.3使用数据帧¶

A useful convention that allows you to work with many different problems comfortably together in the same workspace is
有一个有用的约定可以让您在同一个工作区中轻松地处理许多不同的问题,

  • gather together all variables for any well defined and separate problem in a data frame under a suitably informative name;
    在一个适当的信息名称下,将数据框中任何明确定义和单独问题的所有变量聚集在一起;
  • when working with a problem attach the appropriate data frame at position 2, and use the workspace at level 1 for operational quantities and temporary variables;
    当处理问题时,在位置2附加适当的数据框,并使用级别1的工作空间作为操作量和临时变量;
  • before leaving a problem, add any variables you wish to keep for future reference to the data frame using the $ form of assignment, and then detach();
    在留下问题之前,使用 $ 赋值形式添加任何您希望保留以供将来引用的变量到数据框中,然后使用 detach() ;
  • finally remove all unwanted variables from the workspace and keep it as clean of left-over temporary variables as possible.
    最后,从工作区中删除所有不需要的变量,并尽可能地保持它的干净。

In this way it is quite simple to work with many problems in the same directory, all of which have variables named x, y and z, for example.
通过这种方式,在同一个目录中处理许多问题非常简单,例如,所有这些问题都有名为 xyz 的变量。


6.3.4 Attaching arbitrary lists
6.3.4附加任意列表¶

attach() is a generic function that allows not only directories and data frames to be attached to the search path, but other classes of object as well. In particular any object of mode "list" may be attached in the same way:
attach() 是一个通用函数,它不仅允许将目录和数据框附加到搜索路径,还允许将其他类的对象附加到搜索路径。特别地,模式 "list" 的任何对象可以以相同的方式被附加:

> attach(any.old.list)

Anything that has been attached can be detached by detach, by position number or, preferably, by name.
任何已附加的内容都可以通过 detach 、位置号或名称进行分离。


6.3.5 Managing the search path
6.3.5管理搜索路径¶

The function search shows the current search path and so is a very useful way to keep track of which data frames and lists (and packages) have been attached and detached. Initially it gives
函数 search 显示了当前的搜索路径,因此是一种非常有用的方法来跟踪哪些数据帧和列表(以及包)已经被附加和分离。最初,它提供

> search()
[1] ".GlobalEnv"   "Autoloads"    "package:base"

where .GlobalEnv is the workspace.18
其中 .GlobalEnv 是工作空间。 18

After lentils is attached we have
lentils 连接后,我们有

> search()
[1] ".GlobalEnv"   "lentils"      "Autoloads"    "package:base"
> ls(2)
[1] "u" "v" "w"

and as we see ls (or objects) can be used to examine the contents of any position on the search path.
如我们所见, ls (或 objects )可用于检查搜索路径上任何位置的内容。

Finally, we detach the data frame and confirm it has been removed from the search path.
最后,我们分离数据框并确认它已从搜索路径中删除。

> detach("lentils")
> search()
[1] ".GlobalEnv"   "Autoloads"    "package:base"

7 Reading data from files
7从文件中阅读数据¶

Large data objects will usually be read as values from external files rather than entered during an R session at the keyboard. R input facilities are simple and their requirements are fairly strict and even rather inflexible.
大型数据对象通常会从外部文件中读取值,而不是在R会话期间在键盘上输入。R输入设施很简单,它们的要求相当严格,甚至相当不灵活。

There is a clear presumption by the designers of R that you will be able to modify your input files using other tools, such as file editors or Perl19 to fit in with the requirements of R. Generally this is very simple.
R的设计者有一个明确的假设,即您可以使用其他工具修改输入文件,例如文件编辑器或Perl 19 ,以适应R的要求。一般来说,这很简单。

If variables are to be held mainly in data frames, as we strongly suggest they should be, an entire data frame can be read directly with the read.table() function. There is also a more primitive input function, scan(), that can be called directly.
如果变量主要保存在数据帧中,正如我们强烈建议的那样,可以直接使用 read.table() 函数读取整个数据帧。还有一个更原始的输入函数 scan() ,可以直接调用。

For more details on importing data into R and also exporting data, see the R Data Import/Export manual.
有关将数据导入R以及导出数据的更多详细信息,请参阅R数据导入/导出手册。


7.1 The read.table() function
7.1 read.table() 函数¶

To read an entire data frame directly, the external file will normally have a special form.
为了直接读取整个数据帧,外部文件通常具有特殊的格式。

  • The first line of the file should have a name for each variable in the data frame.
    文件的第一行应包含数据框中每个变量的名称。
  • Each additional line of the file has as its first item a row label and the values for each variable.
    文件的每一个附加行的第一项都是行标签和每个变量的值。

If the file has one fewer item in its first line than in its second, this arrangement is presumed to be in force. So the first few lines of a file to be read as a data frame might look as follows.
如果文件的第一行比第二行少一项,则假定这种安排有效。因此,作为数据帧读取的文件的前几行可能如下所示。

Input file form with names and row labels:

     Price    Floor     Area   Rooms     Age  Cent.heat
01   52.00    111.0      830     5       6.2      no
02   54.75    128.0      710     5       7.5      no
03   57.50    101.0     1000     5       4.2      no
04   57.50    131.0      690     6       8.8      no
05   59.75     93.0      900     5       1.9     yes
...

By default numeric items (except row labels) are read as numeric variables and non-numeric variables, such as Cent.heat in the example, as character variables. This can be changed if necessary.
默认情况下,数字项(行标签除外)被读取为数字变量,非数字变量(如示例中的 Cent.heat )被读取为字符变量。如有必要,可以进行更改。

The function read.table() can then be used to read the data frame directly
然后可以使用函数 read.table() 直接读取数据帧

> HousePrice <- read.table("houses.data")

Often you will want to omit including the row labels directly and use the default labels. In this case the file may omit the row label column as in the following.
通常,您会希望省略直接包含行标签,而使用默认标签。在这种情况下,文件可能会省略行标签列,如下所示。

Input file form without row labels:

Price    Floor     Area   Rooms     Age  Cent.heat
52.00    111.0      830     5       6.2      no
54.75    128.0      710     5       7.5      no
57.50    101.0     1000     5       4.2      no
57.50    131.0      690     6       8.8      no
59.75     93.0      900     5       1.9     yes
...

The data frame may then be read as
然后,数据帧可以被读取为

> HousePrice <- read.table("houses.data", header=TRUE)

where the header=TRUE option specifies that the first line is a line of headings, and hence, by implication from the form of the file, that no explicit row labels are given.
其中 header=TRUE 选项指定第一行是一行标题,因此,从文件的形式来看,没有明确的行标签。


7.2 The scan() function
7.2 scan() 函数¶

Suppose the data vectors are of equal length and are to be read in parallel. Further suppose that there are three vectors, the first of mode character and the remaining two of mode numeric, and the file is input.dat. The first step is to use scan() to read in the three vectors as a list, as follows
假设数据向量具有相等的长度并且将被并行读取。进一步假设有三个向量,第一个是模式字符,其余两个是模式数字,文件是 input.dat 。第一步是使用 scan() 将三个向量作为列表读入,如下所示

> inp <- scan("input.dat", list("",0,0))

The second argument is a dummy list structure that establishes the mode of the three vectors to be read. The result, held in inp, is a list whose components are the three vectors read in. To separate the data items into three separate vectors, use assignments like
第二个参数是一个伪列表结构,它建立了要读取的三个向量的模式。结果保存在 inp 中,是一个列表,其分量是读入的三个向量。若要将数据项分隔为三个单独的向量,请使用以下赋值方法,

> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]

More conveniently, the dummy list can have named components, in which case the names can be used to access the vectors read in. For example
更方便的是,虚拟列表可以有命名的组件,在这种情况下,名称可以用来访问读入的向量。例如

> inp <- scan("input.dat", list(id="", x=0, y=0))

If you wish to access the variables separately they may either be re-assigned to variables in the working frame:
如果您希望单独访问变量,则可以将它们重新分配给工作框架中的变量:

> label <- inp$id; x <- inp$x; y <- inp$y

or the list may be attached at position 2 of the search path (see Attaching arbitrary lists).
或者列表可以附加在搜索路径的位置2(请参阅附加任意列表)。

If the second argument is a single value and not a list, a single vector is read in, all components of which must be of the same mode as the dummy value.
如果第二个参数是单个值而不是列表,则读入单个向量,其所有分量必须与伪值具有相同的模式。

> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

There are more elaborate input facilities available and these are detailed in the manuals.
还有更复杂的输入设施,这些都在手册中详细说明。


7.3 Accessing builtin datasets
7.3安装内置数据集¶

Around 100 datasets are supplied with R (in package datasets), and others are available in packages (including the recommended packages supplied with R). To see the list of datasets currently available use
R提供了大约100个数据集(在包数据集中),其他数据集在包中可用(包括R提供的推荐包)。要查看当前可用的数据集列表,请使用

data()

All the datasets supplied with R are available directly by name. However, many packages still use the obsolete convention in which data was also used to load datasets into R, for example
R提供的所有数据集都可以直接通过名称获得。然而,许多软件包仍然使用过时的约定,其中 data 也用于将数据集加载到R中,例如

data(infert)

and this can still be used with the standard packages (as in this example). In most cases this will load an R object of the same name. However, in a few cases it loads several objects, so see the on-line help for the object to see what to expect.
并且这仍然可以与标准包一起使用(如在该示例中)。在大多数情况下,这将加载同名的R对象。但是,在少数情况下,它会加载多个对象,因此请参阅对象的联机帮助以了解预期内容。

7.3.1 Loading data from other R packages
7.3.1从其他R包加载数据¶

To access data from a particular package, use the package argument, for example
要访问特定包中的数据,请使用 package 参数,例如

data(package="rpart")
data(Puromycin, package="datasets")

If a package has been attached by library, its datasets are automatically included in the search.
如果某个包已由 library 附加,则其数据集将自动包含在搜索中。

User-contributed packages can be a rich source of datasets.
用户贡献的包可以是数据集的丰富来源。


7.4 Editing data
7.4编辑数据¶

When invoked on a data frame or matrix, edit brings up a separate spreadsheet-like environment for editing. This is useful for making small changes once a data set has been read. The command
当在数据框或矩阵上调用时, edit 会带来一个单独的类似电子表格的编辑环境。这对于在读取数据集后进行小的更改非常有用。命令

> xnew <- edit(xold)

will allow you to edit your data set xold, and on completion the changed object is assigned to xnew. If you want to alter the original dataset xold, the simplest way is to use fix(xold), which is equivalent to xold <- edit(xold).
将允许您编辑数据集 xold ,完成后,更改的对象将分配给 xnew 。如果你想改变原始数据集 xold ,最简单的方法是使用 fix(xold) ,它相当于 xold <- edit(xold)

Use  使用

> xnew <- edit(data.frame())

to enter new data via the spreadsheet interface.
通过电子表格界面输入新数据。


8 Probability distributions
8概率分布¶


8.1 R as a set of statistical tables
8.1 R作为一组统计表¶

One convenient use of R is to provide a comprehensive set of statistical tables. Functions are provided to evaluate the cumulative distribution function P(X <= x), the probability density function and the quantile function (given q, the smallest x such that P(X <= x) > q), and to simulate from the distribution.
R的一个方便用途是提供一套全面的统计表。提供函数来评估累积分布函数P(X <= x)、概率密度函数和分位数函数(给定q,使得P(X <= x)> q的最小x),并从分布进行模拟。

Distribution 分布R name R名称additional arguments 补充论点
betabetashape1, shape2, ncp
binomial 二项式binomsize, prob
Cauchycauchylocation, scale
chi-squared 卡方chisqdf, ncp
exponential 指数exprate
Ffdf1, df2, ncp
gamma 伽马gammashape, scale
geometric 几何geomprob
hypergeometric 超几何hyperm, n, k
log-normal 对数正lnormmeanlog, sdlog
logisticlogislocation, scale
negative binomial 负二项nbinomsize, prob
normal 正常normmean, sd
Poisson 泊松poislambda
signed rank 符号秩signrankn
Student’s t 学生ttdf, ncp
uniform 均匀unifmin, max
Weibullweibullshape, scale
Wilcoxonwilcoxm, n

Prefix the name given here by ‘d’ for the density, ‘p’ for the CDF, ‘q’ for the quantile function and ‘r’ for simulation (random deviates). The first argument is x for dxxx, q for pxxx, p for qxxx and n for rxxx (except for rhyper, rsignrank and rwilcox, for which it is nn). In not quite all cases is the non-centrality parameter ncp currently available: see the on-line help for details.
在这里给出的名称前面加上' d '表示密度,' p '表示CDF,' q '表示分位数函数,' r '表示模拟(随机偏差)。第一个参数是 dxxxxpxxxqqxxxprxxxn (除了 rhyperrsignrankrwilcox ,它是 nn )。在不完全所有的情况下,非中心性参数 ncp 当前可用:有关详细信息,请参阅在线帮助。

The pxxx and qxxx functions all have logical arguments lower.tail and log.p and the dxxx ones have log. This allows, e.g., getting the cumulative (or “integrated”) hazard function, H(t) = - log(1 - F(t)), by
pxxxqxxx 函数都有逻辑参数 lower.taillog.pdxxx 函数有 log 。这允许,例如,得到累积(或“积分”)风险函数,H(t)= - log(1 - F(t)),通过

 - pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)

or more accurate log-likelihoods (by dxxx(..., log = TRUE)), directly.
或更精确的对数似然(通过 dxxx(..., log = TRUE) )。

In addition there are functions ptukey and qtukey for the distribution of the studentized range of samples from a normal distribution, and dmultinom and rmultinom for the multinomial distribution. Further distributions are available in contributed packages, notably SuppDists.
此外,还有函数 ptukeyqtukey 用于来自正态分布的样本的学生化范围的分布,以及用于多项分布的函数 dmultinomrmultinom 。更多的发行版可以在贡献包中找到,特别是SuppDists。

Here are some examples
这里有一些例子

> ## 2-tailed p-value for t distribution
> 2*pt(-2.43, df = 13)
> ## upper 1% point for an F(2, 7) distribution
> qf(0.01, 2, 7, lower.tail = FALSE)

See the on-line help on RNG for how random-number generation is done in R.
请参阅 RNG 上的在线帮助,了解如何在R中生成随机数。


8.2 Examining the distribution of a set of data
8.2检查一组数据的分布¶

Given a (univariate) set of data we can examine its distribution in a large number of ways. The simplest is to examine the numbers. Two slightly different summaries are given by summary and fivenum and a display of the numbers by stem (a “stem and leaf” plot).
给定一组(单变量)数据,我们可以用很多方法来检查它的分布。最简单的方法是检查数字。 summaryfivenum 给出了两个略有不同的摘要, stem 显示了数字(“茎和叶”图)。

> attach(faithful)
> summary(eruptions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.600   2.163   4.000   3.488   4.454   5.100
> fivenum(eruptions)
[1] 1.6000 2.1585 4.0000 4.4585 5.1000
> stem(eruptions)

  The decimal point is 1 digit(s) to the left of the |

  16 | 070355555588
  18 | 000022233333335577777777888822335777888
  20 | 00002223378800035778
  22 | 0002335578023578
  24 | 00228
  26 | 23
  28 | 080
  30 | 7
  32 | 2337
  34 | 250077
  36 | 0000823577
  38 | 2333335582225577
  40 | 0000003357788888002233555577778
  42 | 03335555778800233333555577778
  44 | 02222335557780000000023333357778888
  46 | 0000233357700000023578
  48 | 00000022335800333
  50 | 0370

A stem-and-leaf plot is like a histogram, and R has a function hist to plot histograms.
茎叶图就像直方图,R有一个函数 hist 来绘制直方图。

> hist(eruptions)
## make the bins smaller, make a plot of density
> hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)
> lines(density(eruptions, bw=0.1))
> rug(eruptions) # show the actual data points

More elegant density plots can be made by density, and we added a line produced by density in this example. The bandwidth bw was chosen by trial-and-error as the default gives too much smoothing (it usually does for “interesting” densities). (Better automated methods of bandwidth choice are available, and in this example bw = "SJ" gives a good result.)
更优雅的密度图可以由 density 绘制,我们在这个例子中添加了一条由 density 生成的线。带宽 bw 是通过试错法选择的,因为默认值提供了太多的平滑(它通常用于“感兴趣”的密度)。(有更好的自动化带宽选择方法,在本例中, bw = "SJ" 给出了一个很好的结果。)

images/hist

We can plot the empirical cumulative distribution function by using the function ecdf.
我们可以使用函数 ecdf 绘制经验累积分布函数。

> plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)

This distribution is obviously far from any standard distribution. How about the right-hand mode, say eruptions of longer than 3 minutes? Let us fit a normal distribution and overlay the fitted CDF.
这种分布显然与任何标准分布都相去甚远。右手模式怎么样,比如喷发时间超过3分钟?让我们拟合一个正态分布并覆盖拟合的CDF。

> long <- eruptions[eruptions > 3]
> plot(ecdf(long), do.points=FALSE, verticals=TRUE)
> x <- seq(3, 5.4, 0.01)
> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)
images/ecdf

Quantile-quantile (Q-Q) plots can help us examine this more carefully.
分位数-分位数(Q-Q)图可以帮助我们更仔细地检查这一点。

par(pty="s")       # arrange for a square figure region
qqnorm(long); qqline(long)

which shows a reasonable fit but a shorter right tail than one would expect from a normal distribution. Let us compare this with some simulated data from a t distribution
这显示了合理的拟合,但比正态分布预期的右尾短。让我们将其与来自t分布的一些模拟数据进行比较

images/QQ
x <- rt(250, df = 5)
qqnorm(x); qqline(x)

which will usually (if it is a random sample) show longer tails than expected for a normal. We can make a Q-Q plot against the generating distribution by
其通常(如果是随机样本)将显示比正常情况下预期的更长的尾部。我们可以通过以下方式绘制生成分布的Q-Q图:

qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn")
qqline(x)

Finally, we might want a more formal test of agreement with normality (or not). R provides the Shapiro-Wilk test
最后,我们可能需要一个更正式的检验是否符合正态性。R提供Shapiro-Wilk检验

> shapiro.test(long)

         Shapiro-Wilk normality test

data:  long
W = 0.9793, p-value = 0.01052

and the Kolmogorov-Smirnov test
Kolmogorov-Smirnov检验

> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))

         One-sample Kolmogorov-Smirnov test

data:  long
D = 0.0661, p-value = 0.4284
alternative hypothesis: two.sided

(Note that the distribution theory is not valid here as we have estimated the parameters of the normal distribution from the same sample.)
(Note分布理论在这里是无效的,因为我们已经从相同的样本中估计了正态分布的参数。


8.3 One- and two-sample tests
8.3单样本和双样本检验¶

So far we have compared a single sample to a normal distribution. A much more common operation is to compare aspects of two samples. Note that in R, all “classical” tests including the ones used below are in package stats which is normally loaded.
到目前为止,我们已经比较了单个样本和正态分布。一个更常见的操作是比较两个样本的各个方面。请注意,在R中,所有“经典”测试,包括下面使用的测试,都在通常加载的包stats中。

Consider the following sets of data on the latent heat of the fusion of ice (cal/gm) from Rice (1995, p.490)
考虑以下几组来自Rice(1995年,第490页)的关于冰融合潜热(cal/gm)的数据。

Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97
          80.05 80.03 80.02 80.00 80.02
Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

Boxplots provide a simple graphical comparison of the two samples.
箱形图提供了两个样品的简单图形比较。

A <- scan()
79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97
80.05 80.03 80.02 80.00 80.02

B <- scan()
80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

boxplot(A, B)

which indicates that the first group tends to give higher results than the second.
这表明第一组倾向于给出比第二组更高的结果。

images/ice

To test for the equality of the means of the two examples, we can use an unpaired t-test by
为了检验两个例子的平均值是否相等,我们可以使用非配对t检验,

> t.test(A, B)

         Welch Two Sample t-test

data:  A and B
t = 3.2499, df = 12.027, p-value = 0.00694
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01385526 0.07018320
sample estimates:
mean of x mean of y
 80.02077  79.97875

which does indicate a significant difference, assuming normality. By default the R function does not assume equality of variances in the two samples. We can use the F test to test for equality in the variances, provided that the two samples are from normal populations.
这确实表明了一个显著的差异,假设正态性。默认情况下,R函数不假设两个样本的方差相等。我们可以使用F检验来检验方差的相等性,前提是两个样本来自正态总体。

> var.test(A, B)

         F test to compare two variances

data:  A and B
F = 0.5837, num df = 12, denom df =  7, p-value = 0.3938
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1251097 2.1052687
sample estimates:
ratio of variances
         0.5837405

which shows no evidence of a significant difference, and so we can use the classical t-test that assumes equality of the variances.
这表明没有显著差异的证据,因此我们可以使用假设方差相等的经典t检验。

> t.test(A, B, var.equal=TRUE)

         Two Sample t-test

data:  A and B
t = 3.4722, df = 19, p-value = 0.002551
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.01669058 0.06734788
sample estimates:
mean of x mean of y
 80.02077  79.97875

All these tests assume normality of the two samples. The two-sample Wilcoxon (or Mann-Whitney) test only assumes a common continuous distribution under the null hypothesis.
所有这些检验均假定两个样本呈正态分布。双样本Wilcoxon(或Mann-Whitney)检验仅在零假设下假设共同连续分布。

> wilcox.test(A, B)

         Wilcoxon rank sum test with continuity correction

data:  A and B
W = 89, p-value = 0.007497
alternative hypothesis: true location shift is not equal to 0

Warning message:
Cannot compute exact p-value with ties in: wilcox.test(A, B)

Note the warning: there are several ties in each sample, which suggests strongly that these data are from a discrete distribution (probably due to rounding).
注意警告:每一个样本都有几个联系,这强烈表明这些数据来自离散分布(可能是由于四舍五入)。

There are several ways to compare graphically the two samples. We have already seen a pair of boxplots. The following
有几种方法可以用图形比较两个样本。我们已经看到了两个箱线图。以下

> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B))
> plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)

will show the two empirical CDFs, and qqplot will perform a Q-Q plot of the two samples. The Kolmogorov-Smirnov test is of the maximal vertical distance between the two ecdfs, assuming a common continuous distribution:
将显示两个经验CDF, qqplot 将执行两个样本的Q-Q图。Kolmogorov-Smirnov检验是两个ecdf之间的最大垂直距离,假设一个共同的连续分布:

> ks.test(A, B)

         Two-sample Kolmogorov-Smirnov test

data:  A and B
D = 0.5962, p-value = 0.05919
alternative hypothesis: two-sided

Warning message:
cannot compute correct p-values with ties in: ks.test(A, B)

9 Grouping, loops and conditional execution
9循环、循环和条件执行¶


9.1 Grouped expressions
9.1分组表达式¶

R is an expression language in the sense that its only command type is a function or expression which returns a result.
R是一种表达式语言,因为它唯一的命令类型是返回结果的函数或表达式。

Even an assignment is an expression whose result is the value assigned, and it may be used wherever any expression may be used; in particular multiple assignments are possible.
甚至赋值也是一个表达式,其结果是赋值的值,并且它可以在任何可能使用表达式的地方使用;特别是多个赋值是可能的。

Commands may be grouped together in braces, {expr_1; ; expr_m}, in which case the value of the group is the result of the last expression in the group evaluated. Since such a group is also an expression it may, for example, be itself included in parentheses and used as part of an even larger expression, and so on.
命令可以在大括号 {expr_1; ; expr_m} 中组合在一起,在这种情况下,组的值是组中最后一个表达式的结果。因为这样的一个群也是一个表达式,所以它本身可以包含在括号中,并作为一个更大的表达式的一部分使用,等等。


9.2 Control statements
9.2控制语句¶


9.2.1 Conditional execution: if statements
9.2.1条件执行: if 语句¶

The language has available a conditional construction of the form
该语言具有以下形式的条件结构:

> if (expr_1) expr_2 else expr_3

where expr_1 must evaluate to a single logical value and the result of the entire expression is then evident.
其中expr_1必须计算为单个逻辑值,然后整个表达式的结果是明显的。

The “short-circuit” operators && and || are often used as part of the condition in an if statement. Whereas & and | apply element-wise to vectors, && and || apply to vectors of length one, and only evaluate their second argument if necessary.
“短路”运算符 &&|| 通常用作 if 语句中条件的一部分。而 &| 适用于向量的元素方式, &&|| 适用于长度为1的向量,并且仅在必要时评估其第二个参数。

There is a vectorized version of the if/else construct, the ifelse function. This has the form ifelse(condition, a, b) and returns a vector of the same length as condition, with elements a[i] if condition[i] is true, otherwise b[i] (where a and b are recycled as necessary).
有一个矢量化版本的 if / else 结构,即 ifelse 函数。它的形式为 ifelse(condition, a, b) ,返回一个与 condition 长度相同的向量,如果 condition[i] 为真,则返回元素 a[i] ,否则返回元素 b[i] (其中 ab 根据需要回收)。


9.2.2 Repetitive execution: for loops, repeat and while
9.2.2重复执行: for 循环, repeatwhile

There is also a for loop construction which has the form
还有一个 for 循环结构,其形式为

> for (name in expr_1) expr_2

where name is the loop variable. expr_1 is a vector expression, (often a sequence like 1:20), and expr_2 is often a grouped expression with its sub-expressions written in terms of the dummy name. expr_2 is repeatedly evaluated as name ranges through the values in the vector result of expr_1.
其中 name 是循环变量。expr_1是一个向量表达式,(通常是一个序列,如 1:20 ),而expr_2通常是一个分组表达式,其子表达式用哑名来写。expr_2被重复地评估为贯穿expr_1的向量结果中的值的名称范围。

As an example, suppose ind is a vector of class indicators and we wish to produce separate plots of y versus x within classes. One possibility here is to use coplot(),20 which will produce an array of plots corresponding to each level of the factor. Another way to do this, now putting all plots on the one display, is as follows:
举个例子,假设 ind 是一个类指示符向量,我们希望在类中生成 yx 的单独图。这里的一种可能性是使用 coplot()20 ,这将产生对应于因子的每个水平的图的阵列。另一种方法是将所有图放在一个显示器上,如下所示:

> xc <- split(x, ind)
> yc <- split(y, ind)
> for (i in 1:length(yc)) {
    plot(xc[[i]], yc[[i]])
    abline(lsfit(xc[[i]], yc[[i]]))
  }

(Note the function split() which produces a list of vectors obtained by splitting a larger vector according to the classes specified by a factor. This is a useful function, mostly used in connection with boxplots. See the help facility for further details.)
(Note函数 split() ,其产生通过根据由因子指定的类分割较大向量而获得的向量列表。这是一个有用的函数,主要用于箱线图。更多详情请参见 help 设施。)

Warning: for() loops are used in R code much less often than in compiled languages. Code that takes a ‘whole object’ view is likely to be both clearer and faster in R.
警告:R代码中使用 for() 循环的频率远远低于编译语言。采用“整个对象”视图的代码在R中可能更清晰,更快。

Other looping facilities include the
其他循环工具包括

> repeat expr

statement and the  声明和

> while (condition) expr

statement.  声明

The break statement can be used to terminate any loop, possibly abnormally. This is the only way to terminate repeat loops.
break 语句可以用来终止任何循环,可能是异常的。这是终止 repeat 循环的唯一方法。

The next statement can be used to discontinue one particular cycle and skip to the “next”.
next 语句可用于中断一个特定的循环并跳到“下一个”。

Control statements are most often used in connection with functions which are discussed in Writing your own functions, and where more examples will emerge.
控制语句最常与函数结合使用,在编写自己的函数中讨论过,在那里会出现更多的例子。


10 Writing your own functions
10编写自己的函数¶

As we have seen informally along the way, the R language allows the user to create objects of mode function. These are true R functions that are stored in a special internal form and may be used in further expressions and so on.
正如我们在沿着非正式地看到的那样,R语言允许用户创建模式函数的对象。这些是真正的R函数,以特殊的内部形式存储,可以在进一步的表达式中使用。

In the process, the language gains enormously in power, convenience and elegance, and learning to write useful functions is one of the main ways to make your use of R comfortable and productive.
在这个过程中,语言在功能、方便性和优雅性方面获得了巨大的进步,学习编写有用的函数是使您舒适和高效地使用R的主要方法之一。

It should be emphasized that most of the functions supplied as part of the R system, such as mean(), var(), postscript() and so on, are themselves written in R and thus do not differ materially from user written functions.
应该强调的是,作为R系统的一部分提供的大多数函数,例如 mean()var()postscript() 等,本身都是用R编写的,因此与用户编写的函数没有实质性差异。

A function is defined by an assignment of the form
函数由以下形式的赋值定义:

> name <- function(arg_1, arg_2, ...) expression

The expression is an R expression, (usually a grouped expression), that uses the arguments, arg_i, to calculate a value. The value of the expression is the value returned for the function.
该表达式是一个R表达式(通常是一个分组表达式),它使用参数arg_i来计算一个值。表达式的值是函数返回的值。

A call to the function then usually takes the form name(expr_1, expr_2, …) and may occur anywhere a function call is legitimate.
对函数的调用通常采用 name(expr_1, expr_2, …) 的形式,并且可以在函数调用合法的任何地方发生。


10.1 Simple examples
10.1简单的例子¶

As a first example, consider a function to calculate the two sample t-statistic, showing “all the steps”. This is an artificial example, of course, since there are other, simpler ways of achieving the same end.
作为第一个例子,考虑一个函数来计算两个样本的t-统计量,显示“所有步骤”。当然,这是一个人为的例子,因为还有其他更简单的方法可以达到同样的目的。

The function is defined as follows:
函数定义如下:

> twosam <- function(y1, y2) {
    n1  <- length(y1); n2  <- length(y2)
    yb1 <- mean(y1);   yb2 <- mean(y2)
    s1  <- var(y1);    s2  <- var(y2)
    s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2)
    tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2))
    tst
  }

With this function defined, you could perform two sample t-tests using a call such as
定义了这个函数后,您可以使用以下调用执行两个样本t检验,

> tstat <- twosam(data$male, data$female); tstat

As a second example, consider a function to emulate directly the MATLAB backslash command, which returns the coefficients of the orthogonal projection of the vector y onto the column space of the matrix, X. (This is ordinarily called the least squares estimate of the regression coefficients.) This would ordinarily be done with the qr() function; however this is sometimes a bit tricky to use directly and it pays to have a simple function such as the following to use it safely.
作为第二个例子,考虑一个直接模拟MATLAB反斜杠命令的函数,它返回向量y在矩阵X的列空间上的正交投影的系数。(This通常称为回归系数的最小二乘估计。这通常是用 qr() 函数来完成的;然而,这有时候直接使用有点棘手,它支付了一个简单的函数,如以下安全地使用它。

Thus given a n by 1 vector y and an n by p matrix X then X \ y is defined as (X’X)^{-}X’y, where (X’X)^{-} is a generalized inverse of X'X.
因此,给定n乘1向量y和n乘p矩阵X,则X | y被定义为(X 'X)^{-} X' y,其中(X 'X)^{-}是X' X的广义逆。

> bslash <- function(X, y) {
  X <- qr(X)
  qr.coef(X, y)
}

After this object is created it may be used in statements such as
创建此对象后,它可以在以下语句中使用,

> regcoeff <- bslash(Xmat, yvar)

and so on.  

The classical R function lsfit() does this job quite well, and more21. It in turn uses the functions qr() and qr.coef() in the slightly counterintuitive way above to do this part of the calculation. Hence there is probably some value in having just this part isolated in a simple to use function if it is going to be in frequent use.
经典的R函数 lsfit() 很好地完成了这项工作,而更多的是 21 。它反过来使用函数 qr()qr.coef() 以上面稍微违反直觉的方式来完成这部分计算。因此,如果要频繁使用,那么将这一部分隔离在一个简单易用的函数中可能会有一些价值。

If so, we may wish to make it a matrix binary operator for even more convenient use.
如果是这样的话,我们可能希望使它成为一个矩阵二元运算符,以便更方便地使用。


10.2 Defining new binary operators
10.2定义新的二元运算符¶

Had we given the bslash() function a different name, namely one of the form
如果我们给 bslash() 函数一个不同的名字,即形式之一,

%anything%

it could have been used as a binary operator in expressions rather than in function form. Suppose, for example, we choose ! for the internal character. The function definition would then start as
它可以被用作表达式中的二元运算符,而不是函数形式。例如,假设我们选择 ! 作为内部字符。函数定义将开始为

> "%!%" <- function(X, y) { ... }

(Note the use of quote marks.) The function could then be used as X %!% y. (The backslash symbol itself is not a convenient choice as it presents special problems in this context.)
(Note使用引号。)该函数可以用作 X %!% y 。(The反斜杠符号本身不是一个方便的选择,因为它在此上下文中存在特殊问题。)

The matrix multiplication operator, %*%, and the outer product matrix operator %o% are other examples of binary operators defined in this way.
矩阵乘法运算符 %*% 和外积矩阵运算符 %o% 是以这种方式定义的二元运算符的其他示例。


10.3 Named arguments and defaults
10.3命名参数和默认值¶

As first noted in Generating regular sequences, if arguments to called functions are given in the “name=object” form, they may be given in any order. Furthermore the argument sequence may begin in the unnamed, positional form, and specify named arguments after the positional arguments.
正如第一次在生成规则序列中指出的,如果被调用函数的参数以“ name=object “形式给出,它们可以以任何顺序给出。此外,参数序列可以以未命名的位置形式开始,并在位置参数之后指定命名参数。

Thus if there is a function fun1 defined by
因此,如果有一个函数 fun1 定义为

> fun1 <- function(data, data.frame, graph, limit) {
    [function body omitted]
  }

then the function may be invoked in several ways, for example
则该函数可以以多种方式调用,例如

> ans <- fun1(d, df, TRUE, 20)
> ans <- fun1(d, df, graph=TRUE, limit=20)
> ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)

are all equivalent.  都是等价的

In many cases arguments can be given commonly appropriate default values, in which case they may be omitted altogether from the call when the defaults are appropriate. For example, if fun1 were defined as
在许多情况下,参数可以被赋予通常合适的默认值,在这种情况下,当默认值合适时,它们可以从调用中完全省略。例如,如果 fun1 被定义为

> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }

it could be called as
可以称之为

> ans <- fun1(d, df)

which is now equivalent to the three cases above, or as
这相当于上述三种情况,或者说,

> ans <- fun1(d, df, limit=10)

which changes one of the defaults.
这会改变其中一个默认值。

It is important to note that defaults may be arbitrary expressions, even involving other arguments to the same function; they are not restricted to be constants as in our simple example here.
需要注意的是,默认值可以是任意的表达式,甚至可以包含同一函数的其他参数;它们并不像我们这里的简单例子那样被限制为常量。


10.4 The ‘’ argument
10.4" “参数¶

Another frequent requirement is to allow one function to pass on argument settings to another. For example many graphics functions use the function par() and functions like plot() allow the user to pass on graphical parameters to par() to control the graphical output. (See Permanent changes: The par() function, for more details on the par() function.) This can be done by including an extra argument, literally ‘’, of the function, which may then be passed on. An outline example is given below.
另一个常见的要求是允许一个函数将参数设置传递给另一个函数。例如,许多图形函数使用函数 par() ,而像 plot() 这样的函数允许用户将图形参数传递给 par() 以控制图形输出。(See永久更改: par() 函数,有关 par() 函数的更多详细信息。这可以通过在函数中包含一个额外的参数来实现,字面意思是' ',然后可以传递。下面给出了一个概要示例。

fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) {
  [omitted statements]
  if (graph)
    par(pch="*", ...)
  [more omissions]
}

Less frequently, a function will need to refer to components of ‘’. The expression list(...) evaluates all such arguments and returns them in a named list, while ..1, ..2, etc. evaluate them one at a time, with ‘..n’ returning the n-th unmatched argument.
函数需要引用' '的组件,这种情况不太常见。表达式 list(...) 计算所有这样的参数,并在命名列表中返回它们,而 ..1..2 等一次计算一个参数,' ..n '返回第n个不匹配的参数。


10.5 Assignments within functions
10.5在函数中调用¶

Note that any ordinary assignments done within the function are local and temporary and are lost after exit from the function. Thus the assignment X <- qr(X) does not affect the value of the argument in the calling program.
请注意,在函数内完成的任何普通赋值都是局部的和临时的,在退出函数后会丢失。因此,赋值 X <- qr(X) 不会影响调用程序中参数的值。

To understand completely the rules governing the scope of R assignments the reader needs to be familiar with the notion of an evaluation frame. This is a somewhat advanced, though hardly difficult, topic and is not covered further here.
为了完全理解控制R赋值范围的规则,读者需要熟悉评估框架的概念。这是一个有点先进的,但并不困难,主题,并没有在这里进一步介绍。

If global and permanent assignments are intended within a function, then either the ‘superassignment’ operator, <<- or the function assign() can be used. See the help document for details.
如果全局和永久赋值是在函数中进行的,那么可以使用“superassignment”操作符 <<- 或函数 assign() 。详见 help 文档。


10.6 More advanced examples
10.6更多高级示例¶


10.6.1 Efficiency factors in block designs
10.6.1区组设计中的效率因子¶

As a more complete, if a little pedestrian, example of a function, consider finding the efficiency factors for a block design. (Some aspects of this problem have already been discussed in Index matrices.)
作为一个更完整的,如果一个小行人,一个功能的例子,考虑找到一个块设计的效率因素。(Some这个问题的各个方面已经在指数矩阵中讨论过了。)

A block design is defined by two factors, say blocks (b levels) and varieties (v levels). If R and K are the v by v and b by b replications and block size matrices, respectively, and N is the b by v incidence matrix, then the efficiency factors are defined as the eigenvalues of the matrix E = I_v - R^{-1/2}N’K^{-1}NR^{-1/2} = I_v - A’A, where A = K^{-1/2}NR^{-1/2}. One way to write the function is given below.
区组设计由两个因子定义,例如 blocksb 水平)和 varietiesv 水平)。如果R和K分别是v乘v和B乘B的重复矩阵和块大小矩阵,并且N是B乘v的关联矩阵,则效率因子被定义为矩阵E = I_v - R^{-1/2} N 'K ^{-1}NR^{-1/2} = I_v -A' A的特征值,其中A = K^{-1/2}NR^{-1/2}。下面给出了一种编写函数的方法。

> bdeff <- function(blocks, varieties) {
    blocks <- as.factor(blocks)             # minor safety move
    b <- length(levels(blocks))
    varieties <- as.factor(varieties)       # minor safety move
    v <- length(levels(varieties))
    K <- as.vector(table(blocks))           # remove dim attr
    R <- as.vector(table(varieties))        # remove dim attr
    N <- table(blocks, varieties)
    A <- 1/sqrt(K) * N * rep(1/sqrt(R), rep(b, v))
    sv <- svd(A)
    list(eff=1 - sv$d^2, blockcv=sv$u, varietycv=sv$v)
}

It is numerically slightly better to work with the singular value decomposition on this occasion rather than the eigenvalue routines.
在这种情况下,使用奇异值分解而不是特征值例程在数值上稍微好一点。

The result of the function is a list giving not only the efficiency factors as the first component, but also the block and variety canonical contrasts, since sometimes these give additional useful qualitative information.
函数的结果是一个列表,不仅给出了效率因子作为第一个分量,而且还给出了块和品种的典型对比,因为有时这些提供了额外的有用的定性信息。


10.6.2 Dropping all names in a printed array
10.6.2删除打印数组中的所有名称¶

For printing purposes with large matrices or arrays, it is often useful to print them in close block form without the array names or numbers. Removing the dimnames attribute will not achieve this effect, but rather the array must be given a dimnames attribute consisting of empty strings. For example to print a matrix, X
对于大型矩阵或数组的打印目的,以封闭块形式打印它们而不带数组名称或数字通常很有用。删除 dimnames 属性不会达到这种效果,而是必须给数组一个由空字符串组成的 dimnames 属性。例如打印矩阵, X

> temp <- X
> dimnames(temp) <- list(rep("", nrow(X)), rep("", ncol(X)))
> temp; rm(temp)

This can be much more conveniently done using a function, no.dimnames(), shown below, as a “wrap around” to achieve the same result. It also illustrates how some effective and useful user functions can be quite short.
使用下面所示的函数 no.dimnames() 可以更方便地实现这一点,作为“环绕”来实现相同的结果。它还说明了一些有效和有用的用户功能是如何非常短的。

no.dimnames <- function(a) {
  ## Remove all dimension names from an array for compact printing.
  d <- list()
  l <- 0
  for(i in dim(a)) {
    d[[l <- l + 1]] <- rep("", i)
  }
  dimnames(a) <- d
  a
}

With this function defined, an array may be printed in close format using
定义了这个函数后,可以使用

> no.dimnames(X)

This is particularly useful for large integer arrays, where patterns are the real interest rather than the values.
这对于大型整数数组特别有用,因为模式是真实的兴趣而不是值。


10.6.3 Recursive numerical integration
10.6.3递归数值积分¶

Functions may be recursive, and may themselves define functions within themselves. Note, however, that such functions, or indeed variables, are not inherited by called functions in higher evaluation frames as they would be if they were on the search path.
函数可以是递归的,并且可以自己在自己内部定义函数。然而,请注意,这些函数或变量不会被更高求值框架中的调用函数继承,因为如果它们在搜索路径上,它们将被继承。

The example below shows a naive way of performing one-dimensional numerical integration. The integrand is evaluated at the end points of the range and in the middle.
下面的例子展示了一种简单的一维数值积分方法。在范围的端点和中间评估被积函数。

If the one-panel trapezium rule answer is close enough to the two panel, then the latter is returned as the value. Otherwise the same process is recursively applied to each panel.
如果一个面板的mixzium规则答案与两个面板的答案足够接近,则返回后者作为值。否则,相同的过程递归地应用于每个面板。

The result is an adaptive integration process that concentrates function evaluations in regions where the integrand is farthest from linear.
其结果是一个自适应的整合过程,集中在区域的被积函数是最远离线性的功能评估。

There is, however, a heavy overhead, and the function is only competitive with other algorithms when the integrand is both smooth and very difficult to evaluate.
然而,有一个沉重的开销,只有当被积函数是光滑的,很难评估与其他算法的功能是有竞争力的。

The example is also given partly as a little puzzle in R programming.
这个例子也部分地作为R编程中的一个小难题给出。

area <- function(f, a, b, eps = 1.0e-06, lim = 10) {
  fun1 <- function(f, a, b, fa, fb, a0, eps, lim, fun) {
    ## function ‘fun1’ is only visible inside ‘area’
    d <- (a + b)/2
    h <- (b - a)/4
    fd <- f(d)
    a1 <- h * (fa + fd)
    a2 <- h * (fd + fb)
    if(abs(a0 - a1 - a2) < eps || lim == 0)
      return(a1 + a2)
    else {
      return(fun(f, a, d, fa, fd, a1, eps, lim - 1, fun) +
             fun(f, d, b, fd, fb, a2, eps, lim - 1, fun))
    }
  }
  fa <- f(a)
  fb <- f(b)
  a0 <- ((fa + fb) * (b - a))/2
  fun1(f, a, b, fa, fb, a0, eps, lim, fun1)
}

10.7 Scope 10.7范围¶

The discussion in this section is somewhat more technical than in other parts of this document. However, it details one of the major differences between S-PLUS and R.
本节的讨论比本文件其他部分的讨论更具技术性。然而,它详细说明了S-PLUS和R之间的主要区别之一。

The symbols which occur in the body of a function can be divided into three classes; formal parameters, local variables and free variables. The formal parameters of a function are those occurring in the argument list of the function.
函数体中出现的符号可以分为三类:形式参数、局部变量和自由变量。函数的形参是出现在函数的参数列表中的形参。

Their values are determined by the process of binding the actual function arguments to the formal parameters. Local variables are those whose values are determined by the evaluation of expressions in the body of the functions. Variables which are not formal parameters or local variables are called free variables.
它们的值由将实际函数参数绑定到形式参数的过程确定。局部变量是其值由函数体中表达式的求值确定的变量。非形式参数或局部变量的变量称为自由变量。

Free variables become local variables if they are assigned to. Consider the following function definition.
如果自由变量被赋值给,它们就变成了局部变量。考虑下面的函数定义。

f <- function(x) {
  y <- 2*x
  print(x)
  print(y)
  print(z)
}

In this function, x is a formal parameter, y is a local variable and z is a free variable.
在这个函数中, x 是形参, y 是局部变量, z 是自由变量。

In R the free variable bindings are resolved by first looking in the environment in which the function was created. This is called lexical scope. First we define a function called cube.
在R中,自由变量绑定首先通过查看创建函数的环境来解决。这就是所谓的词法作用域。首先,我们定义一个名为 cube 的函数。

cube <- function(n) {
  sq <- function() n*n
  n*sq()
}

The variable n in the function sq is not an argument to that function. Therefore it is a free variable and the scoping rules must be used to ascertain the value that is to be associated with it. Under static scope (S-PLUS) the value is that associated with a global variable named n. Under lexical scope (R) it is the parameter to the function cube since that is the active binding for the variable n at the time the function sq was defined. The difference between evaluation in R and evaluation in S-PLUS is that S-PLUS looks for a global variable called n while R first looks for a variable called n in the environment created when cube was invoked.
函数 sq 中的变量 n 不是该函数的参数。因此,它是一个自由变量,必须使用作用域规则来确定与它关联的值。在静态作用域(S-PLUS)下,该值与名为 n 的全局变量关联。在词法作用域(R)下,它是函数 cube 的参数,因为在定义函数 sq 时,它是变量 n 的活动绑定。R中的求值和S-PLUS中的求值之间的区别在于,S-PLUS查找名为 n 的全局变量,而R首先在调用 cube 时创建的环境中查找名为 n 的变量。

## first evaluation in S
S> cube(2)
Error in sq(): Object "n" not found
Dumped
S> n <- 3
S> cube(2)
[1] 18
## then the same function evaluated in R
R> cube(2)
[1] 8

Lexical scope can also be used to give functions mutable state. In the following example we show how R can be used to mimic a bank account. A functioning bank account needs to have a balance or total, a function for making withdrawals, a function for making deposits and a function for stating the current balance.
词法作用域也可以用来赋予函数可变的状态。在下面的例子中,我们展示了如何使用R来模拟银行账户。一个正常运行的银行账户需要有一个余额或总额,一个取款功能,一个存款功能和一个说明当前余额的功能。

We achieve this by creating the three functions within account and then returning a list containing them. When account is invoked it takes a numerical argument total and returns a list containing the three functions. Because these functions are defined in an environment which contains total, they will have access to its value.
我们通过在 account 中创建三个函数,然后返回一个包含它们的列表来实现这一点。当 account 被调用时,它接受一个数字参数 total ,并返回一个包含三个函数的列表。因为这些函数是在包含 total 的环境中定义的,所以它们可以访问它的值。

The special assignment operator, <<-, is used to change the value associated with total. This operator looks back in enclosing environments for an environment that contains the symbol total and when it finds such an environment it replaces the value, in that environment, with the value of right hand side. If the global or top-level environment is reached without finding the symbol total then that variable is created and assigned to there. For most users <<- creates a global variable and assigns the value of the right hand side to it22. Only when <<- has been used in a function that was returned as the value of another function will the special behavior described here occur.
特殊赋值运算符 <<- 用于更改与 total 关联的值。该操作符在封闭环境中查找包含符号 total 的环境,当它找到这样的环境时,它将该环境中的值替换为右侧的值。如果到达全局或顶级环境时没有找到符号 total ,则创建该变量并将其分配给该环境。对于大多数用户 <<- 创建一个全局变量,并将右侧的值赋给它 22 。只有当 <<- 在一个函数中被用作另一个函数的值时,才会发生这里描述的特殊行为。

open.account <- function(total) {
  list(
    deposit = function(amount) {
      if(amount <= 0)
        stop("Deposits must be positive!\n")
      total <<- total + amount
      cat(amount, "deposited.  Your balance is", total, "\n\n")
    },
    withdraw = function(amount) {
      if(amount > total)
        stop("You don't have that much money!\n")
      total <<- total - amount
      cat(amount, "withdrawn.  Your balance is", total, "\n\n")
    },
    balance = function() {
      cat("Your balance is", total, "\n\n")
    }
  )
}

ross <- open.account(100)
robert <- open.account(200)

ross$withdraw(30)
ross$balance()
robert$balance()

ross$deposit(50)
ross$balance()
ross$withdraw(500)

10.8 Customizing the environment
10.8自定义环境¶

Users can customize their environment in several different ways. There is a site initialization file and every directory can have its own special initialization file. Finally, the special functions .First and .Last can be used.
用户可以通过几种不同的方式自定义其环境。有一个站点初始化文件,每个目录都可以有自己的特殊初始化文件。最后,可以使用特殊功能 .First.Last

The location of the site initialization file is taken from the value of the R_PROFILE environment variable. If that variable is unset, the file Rprofile.site in the R home subdirectory etc is used. This file should contain the commands that you want to execute every time R is started under your system. A second, personal, profile file named .Rprofile23 can be placed in any directory. If R is invoked in that directory then that file will be sourced. This file gives individual users control over their workspace and allows for different startup procedures in different working directories. If no .Rprofile file is found in the startup directory, then R looks for a .Rprofile file in the user’s home directory and uses that (if it exists). If the environment variable R_PROFILE_USER is set, the file it points to is used instead of the .Rprofile files.
站点初始化文件的位置取自 R_PROFILE 环境变量的值。如果未设置该变量,则使用R home文件夹 etc 中的文件 Rprofile.site 。这个文件应该包含每次在系统下启动R时要执行的命令。另一个名为 .Rprofile 23 的个人配置文件可以放在任何目录中。如果在该目录中调用R,则该文件将被源化。该文件使各个用户能够控制其工作区,并允许在不同的工作目录中执行不同的启动过程。如果在启动目录中没有找到 .Rprofile 文件,则R在用户的主目录中查找 .Rprofile 文件并使用该文件(如果存在)。如果设置了环境变量 R_PROFILE_USER ,则使用它指向的文件而不是 .Rprofile 文件。

Any function named .First() in either of the two profile files or in the .RData image has a special status. It is automatically performed at the beginning of an R session and may be used to initialize the environment. For example, the definition in the example below alters the prompt to $ and sets up various other useful things that can then be taken for granted in the rest of the session.
在这两个配置文件或 .RData 镜像中,任何名为 .First() 的函数都具有特殊状态。它在R会话开始时自动执行,可用于初始化环境。例如,下面示例中的定义将提示符更改为 $ ,并设置了各种其他有用的东西,然后可以在会话的其余部分中将其视为理所当然。

Thus, the sequence in which files are executed is, Rprofile.site, the user profile, .RData and then .First(). A definition in later files will mask definitions in earlier files.
因此,执行文件的顺序是: Rprofile.site 、用户简档、 .RData 、然后是 .First() 。后面文件中的定义将屏蔽前面文件中的定义。

> .First <- function() {
  options(prompt="$ ", continue="+\t")  # $ is the prompt
  options(digits=5, length=999)         # custom numbers and printout
  x11()                                 # for graphics
  par(pch = "+")                        # plotting character
  source(file.path(Sys.getenv("HOME"), "R", "mystuff.R"))
                                        # my personal functions
  library(MASS)                         # attach a package
}

Similarly a function .Last(), if defined, is (normally) executed at the very end of the session. An example is given below.
类似地,函数 .Last() ,如果定义了,(通常)在会话的最后执行。以下是一个例子。

> .Last <- function() {
  graphics.off()                        # a small safety measure.
  cat(paste(date(),"\nAdios\n"))        # Is it time for lunch?
}

10.9 Classes, generic functions and object orientation
10.9类、泛型函数和面向对象¶

The class of an object determines how it will be treated by what are known as generic functions. Put the other way round, a generic function performs a task or action on its arguments specific to the class of the argument itself. If the argument lacks any class attribute, or has a class not catered for specifically by the generic function in question, there is always a default action provided.
一个对象的类决定了它将如何被所谓的泛型函数处理。换句话说,泛型函数在其参数上执行特定于参数本身的类的任务或操作。如果参数缺少任何 class 属性,或者有一个类不是由所讨论的泛型函数专门提供的,则总是提供一个默认操作。

An example makes things clearer. The class mechanism offers the user the facility of designing and writing generic functions for special purposes. Among the other generic functions are plot() for displaying objects graphically, summary() for summarizing analyses of various types, and anova() for comparing statistical models.
一个例子让事情变得更清楚。类机制为用户提供了为特殊目的设计和编写泛型函数的便利。其他通用函数包括: plot() 用于图形化显示对象, summary() 用于汇总各种类型的分析, anova() 用于比较统计模型。

The number of generic functions that can treat a class in a specific way can be quite large. For example, the functions that can accommodate in some fashion objects of class "data.frame" include
可以以特定方式处理类的泛型函数的数量可能相当大。例如,可以以某种方式容纳类 "data.frame" 对象的函数包括

[     [[<-    any    as.matrix
[<-   mean    plot   summary

A currently complete list can be got by using the methods() function:
使用 methods() 函数可以得到当前完整的列表:

> methods(class="data.frame")

Conversely the number of classes a generic function can handle can also be quite large. For example the plot() function has a default method and variants for objects of classes "data.frame", "density", "factor", and more. A complete list can be got again by using the methods() function:
相反,泛型函数可以处理的类的数量也可以非常大。例如, plot() 函数有一个默认的方法和变量,用于类 "data.frame""density""factor" 等的对象。使用 methods() 函数可以再次获得完整的列表:

> methods(plot)

For many generic functions the function body is quite short, for example
对于许多泛型函数,函数体非常短,例如

> coef
function (object, ...)
UseMethod("coef")

The presence of UseMethod indicates this is a generic function. To see what methods are available we can use methods()
UseMethod 的存在表明这是一个泛型函数。要查看哪些方法可用,我们可以使用 methods()

> methods(coef)
[1] coef.aov*         coef.Arima*       coef.default*     coef.listof*
[5] coef.nls*         coef.summary.nls*

   Non-visible functions are asterisked

In this example there are six methods, none of which can be seen by typing its name. We can read these by either of
在这个例子中有六个方法,没有一个可以通过键入其名称来看到。我们可以通过以下两种方式来解读

> getAnywhere("coef.aov")
A single object matching ‘coef.aov’ was found
It was found in the following places
  registered S3 method for coef from namespace stats
  namespace:stats
with value

function (object, ...)
{
    z <- object$coef
    z[!is.na(z)]
}

> getS3method("coef", "aov")
function (object, ...)
{
    z <- object$coef
    z[!is.na(z)]
}

A function named gen.cl will be invoked by the generic gen for class cl, so do not name functions in this style unless they are intended to be methods.
名为 gen.cl 的函数将由类 cl 的泛型 gen 调用,所以不要以这种风格命名函数,除非它们打算成为方法。

The reader is referred to the R Language Definition for a more complete discussion of this mechanism.
读者可以参考R语言定义来更完整地讨论这种机制。


11 Statistical models in R
11 R中的统计模型¶

This section presumes the reader has some familiarity with statistical methodology, in particular with regression analysis and the analysis of variance.
本节假定读者对统计方法学有一定的了解,特别是回归分析和方差分析。

Later we make some rather more ambitious presumptions, namely that something is known about generalized linear models and nonlinear regression.
稍后,我们做了一些更雄心勃勃的假设,即对广义线性模型和非线性回归有所了解。

The requirements for fitting statistical models are sufficiently well defined to make it possible to construct general tools that apply in a broad spectrum of problems.
拟合统计模型的要求已经得到了充分的定义,使得构建适用于广泛问题的通用工具成为可能。

R provides an interlocking suite of facilities that make fitting statistical models very simple. As we mention in the introduction, the basic output is minimal, and one needs to ask for the details by calling extractor functions.
R提供了一套环环相扣的工具,使拟合统计模型变得非常简单。正如我们在介绍中提到的,基本输出是最小的,需要通过调用提取器函数来询问细节。


11.1 Defining statistical models; formulae
11.1定义统计模型;公式¶

The template for a statistical model is a linear regression model with independent, homoscedastic errors
统计模型的模板是具有独立同方差误差的线性回归模型

y_i = sum_{j=0}^p beta_j x_{ij} + e_i,     i = 1, ..., n,

where the e_i are NID(0, sigma^2). In matrix terms this would be written
其中e_i是NID(0,sigma^2)。在矩阵中,这可以写成

y = X  beta + e

where the y is the response vector, X is the model matrix or design matrix and has columns x_0, x_1, ..., x_p, the determining variables. Very often x_0 will be a column of ones defining an intercept term.
其中y是响应向量,X是模型矩阵或设计矩阵,具有列x_0,x_1,.,x_p,决定变量。x_0通常是定义截距项的一列。

Examples 示例¶

Before giving a formal specification, a few examples may usefully set the picture.
在给出一个正式的规范之前,一些例子可能会有用地设置图片。

Suppose y, x, x0, x1, x2, … are numeric variables, X is a matrix and A, B, C, … are factors. The following formulae on the left side below specify statistical models as described on the right.
假设 yxx0x1x2 、.是数值变量, X 是矩阵, ABC 、.是因子。下面左侧的公式指定了右侧描述的统计模型。

y ~ x
y ~ 1 + x

Both imply the same simple linear regression model of y on x. The first has an implicit intercept term, and the second an explicit one.
两者都意味着y对x的简单线性回归模型。第一个有一个隐式的截距项,第二个是显式的。

y ~ 0 + x
y ~ -1 + x
y ~ x - 1

Simple linear regression of y on x through the origin (that is, without an intercept term).
y对x通过原点的简单线性回归(即没有截距项)。

log(y) ~ x1 + x2

Multiple regression of the transformed variable,log(y), on x1 and x2 (with an implicit intercept term).
转换变量log(y)在x1和x2上的多元回归(具有隐式截距项)。

y ~ poly(x,2)
y ~ 1 + x + I(x^2)

Polynomial regression of y on x of degree 2. The first form uses orthogonal polynomials, and the second uses explicit powers, as basis.
y对x的二次多项式回归。第一种形式使用正交多项式,第二种形式使用显式幂作为基础。

y ~ X + poly(x,2)

Multiple regression y with model matrix consisting of the matrix X as well as polynomial terms in x to degree 2.
多元回归y,模型矩阵由矩阵X以及x中的2阶多项式项组成。

y ~ A

Single classification analysis of variance model of y, with classes determined by A.
单分类方差分析模型的y,与类确定的A。

y ~ A + x

Single classification analysis of covariance model of y, with classes determined by A, and with covariate x.
y的协方差模型的单一分类分析,类别由A确定,协变量为x。

y ~ A*B
y ~ A + B + A:B
y ~ B %in% A
y ~ A/B

Two factor non-additive model of y on A and B. The first two specify the same crossed classification and the second two specify the same nested classification. In abstract terms all four specify the same model subspace.
A和B上y的双因子非加性模型。前两个指定相同的交叉分类,后两个指定相同的嵌套分类。在抽象的术语中,所有四个指定相同的模型子空间。

y ~ (A + B + C)^2
y ~ A*B*C - A:B:C

Three factor experiment but with a model containing main effects and two factor interactions only. Both formulae specify the same model.
三因子试验,但模型仅包含主效应和两因子交互作用。两个公式指定相同的模型。

y ~ A * x
y ~ A/x
y ~ A/(1 + x) - 1

Separate simple linear regression models of y on x within the levels of A, with different codings. The last form produces explicit estimates of as many different intercepts and slopes as there are levels in A.
在A的水平内,用不同的编码分离y对x的简单线性回归模型。最后一种形式给出了A中有多少水平就有多少不同截距和斜率的显式估计。

y ~ A*B + Error(C)

An experiment with two treatment factors, A and B, and error strata determined by factor C. For example a split plot experiment, with whole plots (and hence also subplots), determined by factor C.
一个有两个处理因素A和B以及由因素C确定的误差层的实验。例如,裂区实验,整区(因此也包括子区)由因子C确定。

The operator ~ is used to define a model formula in R. The form, for an ordinary linear model, is
运算符 ~ 用于在R中定义模型公式。对于普通线性模型,形式为

response ~ op_1 term_1 op_2 term_2 op_3 term_3 ...

where  哪里

response 响应

is a vector or matrix, (or expression evaluating to a vector or matrix) defining the response variable(s).
是定义响应变量的向量或矩阵(或向量或矩阵的表达式)。

op_i

is an operator, either + or -, implying the inclusion or exclusion of a term in the model, (the first is optional).
是一个运算符,可以是 +- ,表示在模型中包含或排除一个项(第一个是可选的)。

term_i 术语i

is either  要么是

  • a vector or matrix expression, or 1,
    向量或矩阵表达式,或 1
  • a factor, or   一个因素,或
  • a formula expression consisting of factors, vectors or matrices connected by formula operators.
    由公式运算符连接的因子、向量或矩阵组成的公式表达式。

In all cases each term defines a collection of columns either to be added to or removed from the model matrix. A 1 stands for an intercept column and is by default included in the model matrix unless explicitly removed.
在所有情况下,每个项都定义了要添加到模型矩阵或从模型矩阵中删除的列的集合。 1 代表截距列,默认情况下包含在模型矩阵中,除非明确删除。

The formula operators are similar in effect to the Wilkinson and Rogers notation used by such programs as Glim and Genstat. One inevitable change is that the operator ‘.’ becomes ‘:’ since the period is a valid name character in R.
公式运算符的效果类似于Glim和Genstat等程序使用的威尔金森和罗杰斯符号。一个不可避免的变化是操作符' . '变成了' : ',因为句号在R中是一个有效的名称字符。

The notation is summarized below (based on Chambers & Hastie, 1992, p.29):
符号总结如下(基于Chambers & Hastie,1992,第29页):

Y ~ M

Y is modeled as M.
Y被建模为M。

M_1 + M_2

Include M_1 and M_2.
包括M_1和M_2。

M_1 - M_2

Include M_1 leaving out terms of M_2.
包括M_1,但不包括M_2项。

M_1 : M_2

The tensor product of M_1 and M_2. If both terms are factors, then the “subclasses” factor.
M_1和M_2的张量积。如果两项都是因子,则“子类”因子。

M_1 %in% M_2

Similar to M_1:M_2, but with a different coding.
M_1:M_2 类似,但编码不同。

M_1 * M_2

M_1 + M_2 + M_1:M_2.

M_1 / M_2

M_1 + M_2 %in% M_1.

M^n

All terms in M together with “interactions” up to order n
M中的所有项以及n阶以下的“相互作用”

I(M)

Insulate M. Inside M all operators have their normal arithmetic meaning, and that term appears in the model matrix.
绝缘M.在M内部,所有运算符都有其正常的算术含义,并且该术语出现在模型矩阵中。

Note that inside the parentheses that usually enclose function arguments all operators have their normal arithmetic meaning. The function I() is an identity function used to allow terms in model formulae to be defined using arithmetic operators.
请注意,在通常包含函数参数的括号内,所有运算符都具有正常的算术含义。函数 I() 是用于允许使用算术运算符来定义模型公式中的项的恒等函数。

Note particularly that the model formulae specify the columns of the model matrix, the specification of the parameters being implicit. This is not the case in other contexts, for example in specifying nonlinear models.
特别注意,模型公式指定了模型矩阵的列,参数的指定是隐式的。在其他情况下,例如在指定非线性模型时,情况并非如此。


11.1.1 Contrasts 11.1.1对比¶

We need at least some idea how the model formulae specify the columns of the model matrix. This is easy if we have continuous variables, as each provides one column of the model matrix (and the intercept will provide a column of ones if included in the model).
我们至少需要知道模型公式如何指定模型矩阵的列。如果我们有连续变量,这很容易,因为每个变量提供模型矩阵的一列(如果模型中包含截距,则截距将提供一列1)。

What about a k-level factor A? The answer differs for unordered and ordered factors. For unordered factors k - 1 columns are generated for the indicators of the second, …, k-th levels of the factor. (Thus the implicit parameterization is to contrast the response at each level with that at the first.) For ordered factors the k - 1 columns are the orthogonal polynomials on 1, ..., k, omitting the constant term.
那么k级因子 A 呢?对于无序和有序因子,答案是不同的。对于无序因子,为因子的第二、.、第k个水平的指标生成k - 1列。(Thus隐式参数化是将每个水平的响应与第一水平的响应进行对比。对于有序因子,k - 1列是关于1,...,k,省略常数项。

Although the answer is already complicated, it is not the whole story. First, if the intercept is omitted in a model that contains a factor term, the first such term is encoded into k columns giving the indicators for all the levels. Second, the whole behavior can be changed by the options setting for contrasts. The default setting in R is
虽然答案已经很复杂了,但这并不是故事的全部。首先,如果在包含因子项的模型中省略截距,则第一个此类项将编码到k列中,给出所有水平的指标。第二,整个行为可以通过 contrastsoptions 设置来改变。R中的默认设置为

options(contrasts = c("contr.treatment", "contr.poly"))

The main reason for mentioning this is that R and S have different defaults for unordered factors, S using Helmert contrasts. So if you need to compare your results to those of a textbook or paper which used S-PLUS, you will need to set
提到这一点的主要原因是R和S对于无序因子有不同的默认值,S使用Helmert对比。因此,如果您需要将您的结果与使用S-PLUS的教科书或论文的结果进行比较,则需要设置

options(contrasts = c("contr.helmert", "contr.poly"))

This is a deliberate difference, as treatment contrasts (R’s default) are thought easier for newcomers to interpret.
这是一个故意的区别,因为治疗对比(R的默认)被认为更容易为新人解释。

We have still not finished, as the contrast scheme to be used can be set for each term in the model using the functions contrasts and C.
我们还没有完成,因为可以使用函数 contrastsC 为模型中的每个项设置要使用的对比度方案。

We have not yet considered interaction terms: these generate the products of the columns introduced for their component terms.
我们还没有考虑交互作用项:这些项生成为其分量项引入的列的乘积。

Although the details are complicated, model formulae in R will normally generate the models that an expert statistician would expect, provided that marginality is preserved.
尽管细节很复杂,但R中的模型公式通常会生成专家统计学家所期望的模型,前提是保留边缘性。

Fitting, for example, a model with an interaction but not the corresponding main effects will in general lead to surprising results, and is for experts only.
例如,拟合具有交互作用但不具有相应主效应的模型通常会导致令人惊讶的结果,并且仅适用于专家。


11.2 Linear models
11.2线性模型¶

The basic function for fitting ordinary multiple models is lm(), and a streamlined version of the call is as follows:
普通多车型拟合的基本功能为 lm() ,精简版调用如下:

> fitted.model <- lm(formula, data = data.frame)

For example  例如

> fm2 <- lm(y ~ x1 + x2, data = production)

would fit a multiple regression model of y on x1 and x2 (with implicit intercept term).
将在x1和x2上拟合y的多元回归模型(具有隐式截距项)。

The important (but technically optional) parameter data = production specifies that any variables needed to construct the model should come first from the production data frame. This is the case regardless of whether data frame production has been attached on the search path or not.
重要的(但技术上可选的)参数 data = production 指定构建模型所需的任何变量都应首先来自 production 数据帧。无论数据帧 production 是否已被附加在搜索路径上,情况都是如此。


11.3 Generic functions for extracting model information
11.3用于提取模型信息的通用函数¶

The value of lm() is a fitted model object; technically a list of results of class "lm". Information about the fitted model can then be displayed, extracted, plotted and so on by using generic functions that orient themselves to objects of class "lm". These include
lm() 的值是一个拟合模型对象;技术上是类 "lm" 的结果列表。然后,通过使用将自身定向到类 "lm" 的对象的通用函数,可以显示、提取、绘制关于拟合模型的信息等。这些包括

add1    deviance   formula      predict  step
alias   drop1      kappa        print    summary
anova   effects    labels       proj     vcov
coef    family     plot         residuals

A brief description of the most commonly used ones is given below.
下面简要介绍最常用的几种方法。

anova(object_1, object_2)

Compare a submodel with an outer model and produce an analysis of variance table.
将子模型与外部模型进行比较,并生成方差分析表。

coef(object)

Extract the regression coefficient (matrix).
提取回归系数(矩阵)。

Long form: coefficients(object).
长表: coefficients(object)

deviance(object)

Residual sum of squares, weighted if appropriate.
残差平方和,适当时加权。

formula(object)

Extract the model formula.
提取模型公式。

plot(object)

Produce four plots, showing residuals, fitted values and some diagnostics.
生成四个图,显示残差、拟合值和一些诊断。

predict(object, newdata=data.frame)

The data frame supplied must have variables specified with the same labels as the original. The value is a vector or matrix of predicted values corresponding to the determining variable values in data.frame.
所提供的数据框必须具有使用与原始数据框相同的标签指定的变量。该值是与data.frame中的确定变量值相对应的预测值的向量或矩阵。

print(object)

Print a concise version of the object. Most often used implicitly.
打印对象的简明版本。最常用于含蓄。

residuals(object)

Extract the (matrix of) residuals, weighted as appropriate.
提取残差(矩阵),适当加权。

Short form: resid(object).
简称: resid(object)

step(object)

Select a suitable model by adding or dropping terms and preserving hierarchies. The model with the smallest value of AIC (Akaike’s An Information Criterion) discovered in the stepwise search is returned.
通过添加或删除项并保留层次结构来选择合适的模型。返回在逐步搜索中发现的AIC(Akaike's An Information Criterion)值最小的模型。

summary(object)

Print a comprehensive summary of the results of the regression analysis.
打印回归分析结果的全面摘要。

vcov(object)

Returns the variance-covariance matrix of the main parameters of a fitted model object.
返回拟合模型对象的主参数的方差-协方差矩阵。


11.4 Analysis of variance and model comparison
11.4方差分析和模型比较¶

The model fitting function aov(formula, data=data.frame) operates at the simplest level in a very similar way to the function lm(), and most of the generic functions listed in the table in Generic functions for extracting model information apply.
模型拟合函数 aov(formula, data=data.frame) 以与函数 lm() 非常类似的方式在最简单的级别上操作,并且用于提取模型信息的通用函数中的表中列出的大多数通用函数都适用。

It should be noted that in addition aov() allows an analysis of models with multiple error strata such as split plot experiments, or balanced incomplete block designs with recovery of inter-block information. The model formula
应该注意的是,此外 aov() 允许分析具有多个错误层的模型,例如裂区实验,或具有块间信息恢复的平衡不完全区组设计。模型公式

response ~ mean.formula + Error(strata.formula)

specifies a multi-stratum experiment with error strata defined by the strata.formula. In the simplest case, strata.formula is simply a factor, when it defines a two strata experiment, namely between and within the levels of the factor.
指定具有strata.formula定义的错误层的多层实验。在最简单的情况下,分层公式只是一个因素,当它定义了一个两层的实验,即之间和内的水平的因素。

For example, with all determining variables factors, a model formula such as that in:
例如,对于所有决定性变量因子,模型公式如下:

> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)

would typically be used to describe an experiment with mean model v + n*p*k and three error strata, namely “between farms”, “within farms, between blocks” and “within blocks”.
通常用于描述具有平均模型 v + n*p*k 和三个误差层的实验,即“农场之间”、“农场内、区组之间”和“区组内”。


11.4.1 ANOVA tables
11.4.1方差分析表¶

Note also that the analysis of variance table (or tables) are for a sequence of fitted models. The sums of squares shown are the decrease in the residual sums of squares resulting from an inclusion of that term in the model at that place in the sequence. Hence only for orthogonal experiments will the order of inclusion be inconsequential.
另请注意,方差分析表(或多个表)是针对一系列拟合模型的。显示的平方和是残差平方和的减少,这是由于在序列中的该位置将该项包含在模型中。因此,只有正交试验的顺序将是无关紧要的。

For multistratum experiments the procedure is first to project the response onto the error strata, again in sequence, and to fit the mean model to each projection. For further details, see Chambers & Hastie (1992).
对于多层实验,该过程首先将响应投影到误差层上,再次按顺序,并将平均模型拟合到每个投影。更多细节见Chambers & Hastie(1992)。

A more flexible alternative to the default full ANOVA table is to compare two or more models directly using the anova() function.
默认完整ANOVA表的一个更灵活的替代方法是使用 anova() 函数直接比较两个或多个模型。

> anova(fitted.model.1, fitted.model.2, ...)

The display is then an ANOVA table showing the differences between the fitted models when fitted in sequence. The fitted models being compared would usually be an hierarchical sequence, of course.
然后显示的是ANOVA表,显示按顺序拟合时拟合模型之间的差异。当然,被比较的拟合模型通常是一个层次序列。

This does not give different information to the default, but rather makes it easier to comprehend and control.
这并没有给默认值提供不同的信息,而是使其更容易理解和控制。


11.5 Updating fitted models
11.5更新拟合模型¶

The update() function is largely a convenience function that allows a model to be fitted that differs from one previously fitted usually by just a few additional or removed terms. Its form is
update() 函数在很大程度上是一个方便的函数,它允许拟合的模型与之前拟合的模型不同,通常只是增加或删除了一些项。其形式

> new.model <- update(old.model, new.formula)

In the new.formula the special name consisting of a period, ‘.’, only, can be used to stand for “the corresponding part of the old model formula”. For example,
在新的.formula中,只有由句点组成的特殊名称“ . ”才能用于代表“旧模型公式的相应部分”。例如,在一个示例中,

> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production)
> fm6  <- update(fm05, . ~ . + x6)
> smf6 <- update(fm6, sqrt(.) ~ .)

would fit a five variate multiple regression with variables (presumably) from the data frame production, fit an additional model including a sixth regressor variable, and fit a variant on the model where the response had a square root transform applied.
将用来自数据框 production 的变量(假定)拟合五变量多元回归,拟合包括第六个回归变量的附加模型,并拟合模型上的变量,其中响应应用了平方根变换。

Note especially that if the data= argument is specified on the original call to the model fitting function, this information is passed on through the fitted model object to update() and its allies.
特别注意,如果在最初调用模型拟合函数时指定了 data= 参数,则此信息将通过拟合模型对象传递给 update() 及其盟友。

The name ‘.’ can also be used in other contexts, but with slightly different meaning. For example
名称' . '也可以在其他上下文中使用,但含义略有不同。例如

> fmfull <- lm(y ~ . , data = production)

would fit a model with response y and regressor variables all other variables in the data frame production.
将拟合具有响应 y 和回归变量的模型,所有其他变量在数据帧 production 中。

Other functions for exploring incremental sequences of models are add1(), drop1() and step(). The names of these give a good clue to their purpose, but for full details see the on-line help.
用于探索模型的增量序列的其他函数是 add1()drop1()step() 。它们的名称很好地说明了它们的用途,但要了解详细信息,请参阅在线帮助。


11.6 Generalized linear models
11.6广义线性模型¶

Generalized linear modeling is a development of linear models to accommodate both non-normal response distributions and transformations to linearity in a clean and straightforward way.
广义线性建模是线性模型的发展,以一种干净直接的方式适应非正态响应分布和线性转换。

A generalized linear model may be described in terms of the following sequence of assumptions:
广义线性模型可以按照以下假设顺序来描述:

  • There is a response, y, of interest and stimulus variables x_1, x_2, …, whose values influence the distribution of the response.
    存在感兴趣的响应y和刺激变量x_1,x_2,.,其值影响响应的分布。
  • The stimulus variables influence the distribution of y through a single linear function, only. This linear function is called the linear predictor, and is usually written
    刺激变量仅通过单个线性函数影响y的分布。这个线性函数被称为线性预测器,通常写为
    eta = beta_1 x_1 + beta_2 x_2 + ... + beta_p x_p,
    

    hence x_i has no influence on the distribution of y if and only if beta_i is zero.
    因此x_i对y的分布没有影响当且仅当β_i为零。

  • The distribution of y is of the form
    y的分布形式为
    f_Y(y; mu, phi)
      = exp((A/phi) * (y lambda(mu) - gamma(lambda(mu))) + tau(y, phi))
    

    where phi is a scale parameter (possibly known), and is constant for all observations, A represents a prior weight, assumed known but possibly varying with the observations, and mu is the mean of y.
    其中phi是尺度参数(可能已知),并且对于所有观测是恒定的,A表示先验权重,假设已知但可能随观测而变化,并且mu是y的平均值。

    So it is assumed that the distribution of y is determined by its mean and possibly a scale parameter as well.
    因此,假设y的分布由其平均值和可能的尺度参数决定。

  • The mean, mu, is a smooth invertible function of the linear predictor:
    平均值mu是线性预测器的平滑可逆函数:
    mu = m(eta),    eta = m^{-1}(mu) = ell(mu)
    

    and this inverse function, ell(), is called the link function.
    并且该反函数ell()被称为链接函数。

These assumptions are loose enough to encompass a wide class of models useful in statistical practice, but tight enough to allow the development of a unified methodology of estimation and inference, at least approximately.
这些假设是松散的,足以涵盖广泛的一类模型在统计实践中有用的,但足够紧密,允许开发一个统一的方法估计和推断,至少近似。

The reader is referred to any of the current reference works on the subject for full details, such as McCullagh & Nelder (1989) or Dobson (1990).
读者可以参考有关该主题的任何当前参考著作以获取完整详细信息,例如McCullagh & Nelder(1989)或多布森(1990)。


11.6.1 Families 11.6.1家庭¶

The class of generalized linear models handled by facilities supplied in R includes gaussian, binomial, poisson, inverse gaussian and gamma response distributions and also quasi-likelihood models where the response distribution is not explicitly specified. In the latter case the variance function must be specified as a function of the mean, but in other cases this function is implied by the response distribution.
R中提供的工具处理的广义线性模型类包括高斯、二项式、泊松、逆高斯和伽马响应分布,以及响应分布未显式指定的准似然模型。在后一种情况下,方差函数必须指定为均值的函数,但在其他情况下,响应分布隐含了此函数。

Each response distribution admits a variety of link functions to connect the mean with the linear predictor. Those automatically available are shown in the following table:
每个响应分布都允许各种链接函数将均值与线性预测值连接起来。下表显示了自动可用的资源:

Family name 姓氏Link functions 链路功能
binomiallogit, probit, log, cloglog
gaussianidentity, log, inverse
Gammaidentity, inverse, log
inverse.gaussian1/mu^2, identity, inverse, log
poissonidentity, log, sqrt
quasilogit, probit, cloglog, identity, inverse, log, 1/mu^2, sqrt

The combination of a response distribution, a link function and various other pieces of information that are needed to carry out the modeling exercise is called the family of the generalized linear model.
响应分布、链接函数和执行建模练习所需的各种其他信息的组合称为广义线性模型族。


11.6.2 The glm() function
11.6.2 glm() 函数¶

Since the distribution of the response depends on the stimulus variables through a single linear function only, the same mechanism as was used for linear models can still be used to specify the linear part of a generalized model. The family has to be specified in a different way.
由于响应的分布仅通过单个线性函数依赖于刺激变量,因此与线性模型相同的机制仍然可以用于指定广义模型的线性部分。必须以不同的方式指定族。

The R function to fit a generalized linear model is glm() which uses the form
拟合广义线性模型的R函数是 glm() ,它使用以下形式:

> fitted.model <- glm(formula, family=family.generator, data=data.frame)

The only new feature is the family.generator, which is the instrument by which the family is described. It is the name of a function that generates a list of functions and expressions that together define and control the model and estimation process.
唯一的新功能是family.generator,它是描述族的工具。它是一个函数的名称,生成一个函数和表达式的列表,这些函数和表达式一起定义和控制模型和估计过程。

Although this may seem a little complicated at first sight, its use is quite simple.
虽然乍一看这似乎有点复杂,但它的使用非常简单。

The names of the standard, supplied family generators are given under “Family Name” in the table in Families. Where there is a choice of links, the name of the link may also be supplied with the family name, in parentheses as a parameter. In the case of the quasi family, the variance function may also be specified in this way.
提供的标准族生成器的名称在族中的表中的“族名称”下给出。在有链接选择的情况下,链接的名称也可以与族名称一起提供,在括号中作为参数。在 quasi 族的情况下,也可以以这种方式指定方差函数。

Some examples make the process clear.
一些例子使这个过程变得清晰。

The gaussian family
gaussian 家庭

A call such as
电话,如

> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)

achieves the same result as
达到与

> fm <- lm(y ~ x1+x2, data=sales)

but much less efficiently. Note how the gaussian family is not automatically provided with a choice of links, so no parameter is allowed. If a problem requires a gaussian family with a nonstandard link, this can usually be achieved through the quasi family, as we shall see later.
但效率低得多。请注意,高斯族不会自动提供链接选择,因此不允许使用任何参数。如果一个问题需要一个具有非标准链接的高斯族,这通常可以通过 quasi 族来实现,我们将在后面看到。

The binomial family
binomial 家庭

Consider a small, artificial example, from Silvey (1970).
考虑一个小的,人为的例子,来自Silvey(1970)。

On the Aegean island of Kalythos the male inhabitants suffer from a congenital eye disease, the effects of which become more marked with increasing age. Samples of islander males of various ages were tested for blindness and the results recorded. The data is shown below:
在爱琴海的Kalythos岛上,男性居民患有先天性眼病,其影响随着年龄的增长而变得更加明显。对不同年龄段的岛民男性进行了失明测试,并记录了结果。数据如下所示:

Age: 年龄:2035455570
No. tested: 号测试:5050505050
No. blind: 号盲态: 617263744

The problem we consider is to fit both logistic and probit models to this data, and to estimate for each model the LD50, that is the age at which the chance of blindness for a male inhabitant is 50%.
我们考虑的问题是将logistic和probit模型拟合到该数据,并估计每个模型的LD 50,即男性居民失明的可能性为50%的年龄。

If y is the number of blind at age x and n the number tested, both models have the form y ~ B(n, F(beta_0 + beta_1 x)) where for the probit case, F(z) = Phi(z) is the standard normal distribution function, and in the logit case (the default), F(z) = e^z/(1+e^z).
如果y是x岁时的盲数,n是测试的盲数,则两个模型的形式为y ~ B(n,F(beta_0 + beta_1 x)),其中对于概率单位情况,F(z)= Phi(z)是标准正态分布函数,而在对数单位情况(默认值)下,F(z)= e^z/(1+e^z)。

In both cases the LD50 is LD50 = - beta_0/beta_1 that is, the point at which the argument of the distribution function is zero.
在这两种情况下,LD 50都是LD 50 = - beta_0/beta_1,即分布函数的自变量为零的点。

The first step is to set the data up as a data frame
第一步是将数据设置为数据框

> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5),
                         y = c(6,17,26,37,44))

To fit a binomial model using glm() there are three possibilities for the response:
要使用 glm() 拟合二项式模型,响应有三种可能性:

  • If the response is a vector it is assumed to hold binary data, and so must be a 0/1 vector.
    如果响应是一个向量,则假设它保存的是二进制数据,因此必须是一个0/1向量。
  • If the response is a two-column matrix it is assumed that the first column holds the number of successes for the trial and the second holds the number of failures.
    如果响应是一个两列矩阵,则假设第一列为试验的成功次数,第二列为失败次数。
  • If the response is a factor, its first level is taken as failure (0) and all other levels as ‘success’ (1).
    如果响应是一个因子,则其第一个水平被视为失败(0),所有其他水平被视为“成功”(1)。

Here we need the second of these conventions, so we add a matrix to our data frame:
在这里,我们需要这些约定中的第二个,所以我们在数据框架中添加了一个矩阵:

> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)

To fit the models we use
以适应我们使用的模型

> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos)
> fml <- glm(Ymat ~ x, family = binomial, data = kalythos)

Since the logit link is the default the parameter may be omitted on the second call. To see the results of each fit we could use
由于logit链接是默认的,因此在第二次调用时可以省略该参数。为了查看每个拟合的结果,我们可以使用

> summary(fmp)
> summary(fml)

Both models fit (all too) well. To find the LD50 estimate we can use a simple function:
这两种模式都很适合。为了计算LD 50估计值,我们可以使用一个简单的函数:

> ld50 <- function(b) -b[1]/b[2]
> ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)

The actual estimates from this data are 43.663 years and 43.601 years respectively.
根据这些数据的实际估计值分别为43.663年和43.601年。

Poisson models 泊松模型¶

With the Poisson family the default link is the log, and in practice the major use of this family is to fit surrogate Poisson log-linear models to frequency data, whose actual distribution is often multinomial. This is a large and important subject we will not discuss further here.
使用Poisson系列,默认链接是 log ,实际上,该系列的主要用途是将替代Poisson对数线性模型拟合到频率数据,其实际分布通常是多项式。这是一个重大而重要的问题,我们在这里不再讨论。

It even forms a major part of the use of non-gaussian generalized models overall.
它甚至构成了非高斯广义模型整体使用的主要部分。

Occasionally genuinely Poisson data arises in practice and in the past it was often analyzed as gaussian data after either a log or a square-root transformation.
在实践中偶尔会出现真正的泊松数据,在过去,它通常被分析为高斯数据后,无论是对数或平方根变换。

As a graceful alternative to the latter, a Poisson generalized linear model may be fitted as in the following example:
作为后者的一个优雅的替代方案,泊松广义线性模型可以像下面的例子那样拟合:

> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt),
              data = worm.counts)

Quasi-likelihood models
准似然模型¶

For all families the variance of the response will depend on the mean and will have the scale parameter as a multiplier. The form of dependence of the variance on the mean is a characteristic of the response distribution; for example for the Poisson distribution Var(y) = mu.
对于所有族,响应的方差将取决于均值,并将尺度参数作为乘数。方差对均值的依赖形式是响应分布的一个特征;例如泊松分布Var(y)= mu。

For quasi-likelihood estimation and inference the precise response distribution is not specified, but rather only a link function and the form of the variance function as it depends on the mean.
对于拟似然估计和推断,没有指定精确的响应分布,而只是一个链接函数和方差函数的形式,因为它取决于均值。

Since quasi-likelihood estimation uses formally identical techniques to those for the gaussian distribution, this family provides a way of fitting gaussian models with non-standard link functions or variance functions, incidentally.
由于拟似然估计使用与高斯分布相同的技术,这个家族提供了一种用非标准链接函数或方差函数拟合高斯模型的方法。

For example, consider fitting the non-linear regression y = theta_1 z_1 / (z_2 - theta_2) + e which may be written alternatively as y = 1 / (beta_1 x_1 + beta_2 x_2) + e where x_1 = z_2/z_1, x_2 = -1/z_1, beta_1 = 1/theta_1, and beta_2 = theta_2/theta_1.
例如,考虑拟合非线性回归y = theta_1 z_1 /(z_2 - theta_2)+ e,其可替代地写作y = 1 /(beta_1 x_1 + beta_2 x_2)+ e,其中x_1 = z_2/z_1,x_2 = -1/z_1,beta_1 = 1/theta_1,且beta_2 = theta_2/theta_1。

Supposing a suitable data frame to be set up we could fit this non-linear regression as
假设要建立一个合适的数据框架,我们可以将这个非线性回归拟合为

> nlfit <- glm(y ~ x1 + x2 - 1,
               family = quasi(link=inverse, variance=constant),
               data = biochem)

The reader is referred to the manual and the help document for further information, as needed.
读者可根据需要查阅手册和帮助文件以获取更多信息。


11.7 Nonlinear least squares and maximum likelihood models
11.7非线性最小二乘和最大似然模型¶

Certain forms of nonlinear model can be fitted by Generalized Linear Models (glm()). But in the majority of cases we have to approach the nonlinear curve fitting problem as one of nonlinear optimization. R’s nonlinear optimization routines are optim(), nlm() and nlminb(), We seek the parameter values that minimize some index of lack-of-fit, and they do this by trying out various parameter values iteratively. Unlike linear regression for example, there is no guarantee that the procedure will converge on satisfactory estimates.
某些形式的非线性模型可以用广义线性模型( glm() )拟合。但在大多数情况下,我们不得不接近的非线性曲线拟合问题作为一个非线性优化。R的非线性优化例程是 optim()nlm()nlminb() ,我们寻求使某些失拟指数最小化的参数值,并且他们通过迭代地尝试各种参数值来做到这一点。例如,与线性回归不同,不能保证该过程将收敛于令人满意的估计。

All the methods require initial guesses about what parameter values to try, and convergence may depend critically upon the quality of the starting values.
所有的方法都需要对尝试什么参数值进行初始猜测,并且收敛性可能严重依赖于初始值的质量。


11.7.1 Least squares
11.7.1最小二乘法¶

One way to fit a nonlinear model is by minimizing the sum of the squared errors (SSE) or residuals. This method makes sense if the observed errors could have plausibly arisen from a normal distribution.
拟合非线性模型的一种方法是最小化平方误差(SSE)或残差的总和。如果观测到的误差可能是由正态分布引起的,那么这种方法是有意义的。

Here is an example from Bates & Watts (1988), page 51. The data are:
这里有一个例子,来自Bates & Watts(1988),第51页。数据如下:

> x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56,
         1.10, 1.10)
> y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)

The fit criterion to be minimized is:
要最小化的拟合标准为:

> fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)

In order to do the fit we need initial estimates of the parameters. One way to find sensible starting values is to plot the data, guess some parameter values, and superimpose the model curve using those values.
为了进行拟合,我们需要对参数进行初始估计。找到合理的起始值的一种方法是绘制数据,猜测一些参数值,并使用这些值绘制模型曲线。

> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 200 * xfit/(0.1 + xfit)
> lines(spline(xfit, yfit))

We could do better, but these starting values of 200 and 0.1 seem adequate. Now do the fit:
我们可以做得更好,但这些200和0.1的起始值似乎足够了。现在做fit:

> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)

After the fitting, out$minimum is the SSE, and out$estimate are the least squares estimates of the parameters. To obtain the approximate standard errors (SE) of the estimates we do:
拟合后, out$minimum 是SSE, out$estimate 是参数的最小二乘估计值。为了获得估计值的近似标准误差(SE),我们这样做:

> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))

The 2 which is subtracted in the line above represents the number of parameters. A 95% confidence interval would be the parameter estimate +/- 1.96 SE. We can superimpose the least squares fit on a new plot:
在上面的行中减去的 2 表示参数的数量。95%置信区间为参数估计值+/- 1.96 SE。我们可以在一个新的图上使用最小二乘拟合:

> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 212.68384222 * xfit/(0.06412146 + xfit)
> lines(spline(xfit, yfit))

The standard package stats provides much more extensive facilities for fitting non-linear models by least squares. The model we have just fitted is the Michaelis-Menten model, so we can use
标准的stats包提供了更广泛的工具,用于通过最小二乘法拟合非线性模型。我们刚刚拟合的模型是Michaelis-Menten模型,因此我们可以使用

> df <- data.frame(x=x, y=y)
> fit <- nls(y ~ SSmicmen(x, Vm, K), df)
> fit
Nonlinear regression model
  model:  y ~ SSmicmen(x, Vm, K)
   data:  df
          Vm            K
212.68370711   0.06412123
 residual sum-of-squares:  1195.449
> summary(fit)

Formula: y ~ SSmicmen(x, Vm, K)

Parameters:
    Estimate Std. Error t value Pr(>|t|)
Vm 2.127e+02  6.947e+00  30.615 3.24e-11
K  6.412e-02  8.281e-03   7.743 1.57e-05

Residual standard error: 10.93 on 10 degrees of freedom

Correlation of Parameter Estimates:
      Vm
K 0.7651

11.7.2 Maximum likelihood
11.7.2最大似然¶

Maximum likelihood is a method of nonlinear model fitting that applies even if the errors are not normal. The method finds the parameter values which maximize the log likelihood, or equivalently which minimize the negative log-likelihood.
最大似然法是一种非线性模型拟合的方法,即使误差不是正态分布也适用。该方法找到最大化对数似然的参数值,或者等效地最小化负对数似然的参数值。

Here is an example from Dobson (1990), pp. 108–111. This example fits a logistic model to dose-response data, which clearly could also be fit by glm(). The data are:
这里有一个例子,来自多布森(1990),pp. 108-111.该示例将逻辑模型拟合到剂量-反应数据,其显然也可以由 glm() 拟合。数据如下:

> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113,
         1.8369, 1.8610, 1.8839)
> y <- c( 6, 13, 18, 28, 52, 53, 61, 60)
> n <- c(59, 60, 62, 56, 63, 59, 62, 60)

The negative log-likelihood to minimize is:
要最小化的负对数似然为:

> fn <- function(p)
   sum( - (y*(p[1]+p[2]*x) - n*log(1+exp(p[1]+p[2]*x))
           + log(choose(n, y)) ))

We pick sensible starting values and do the fit:
我们选择合理的起始值并进行拟合:

> out <- nlm(fn, p = c(-50,20), hessian = TRUE)

After the fitting, out$minimum is the negative log-likelihood, and out$estimate are the maximum likelihood estimates of the parameters. To obtain the approximate SEs of the estimates we do:
拟合后, out$minimum 是负对数似然, out$estimate 是参数的最大似然估计。为了获得估计的近似SE,我们执行以下操作:

> sqrt(diag(solve(out$hessian)))

A 95% confidence interval would be the parameter estimate +/- 1.96 SE.
95%置信区间为参数估计值+/- 1.96 SE。


11.8 Some non-standard models
11.8一些非标准模型¶

We conclude this chapter with just a brief mention of some of the other facilities available in R for special regression and data analysis problems.
在结束本章时,我们简单地提到了R中用于特殊回归和数据分析问题的其他一些工具。

  • Mixed models. The recommended nlme package provides functions lme() and nlme() for linear and non-linear mixed-effects models, that is linear and non-linear regressions in which some of the coefficients correspond to random effects. These functions make heavy use of formulae to specify the models.
    混合模型。推荐的nlme软件包为线性和非线性混合效应模型提供了函数 lme()nlme() ,即线性和非线性回归,其中一些系数对应于随机效应。这些函数大量使用公式来指定模型。
  • Local approximating regressions. The loess() function fits a nonparametric regression by using a locally weighted regression. Such regressions are useful for highlighting a trend in messy data or for data reduction to give some insight into a large data set.
    局部近似回归。 loess() 函数通过使用局部加权回归拟合非参数回归。这种回归对于突出混乱数据的趋势或数据简化以提供对大型数据集的一些洞察是有用的。

    Function loess is in the standard package stats, together with code for projection pursuit regression.
    函数 loess 在标准包stats中,与投影追踪回归的代码一起。

  • Robust regression. There are several functions available for fitting regression models in a way resistant to the influence of extreme outliers in the data. Function lqs in the recommended package MASS provides state-of-art algorithms for highly-resistant fits. Less resistant but statistically more efficient methods are available in packages, for example function rlm in package MASS.
    稳健回归。有几个函数可用于拟合回归模型,以抵抗数据中极端离群值的影响。推荐软件包中的函数 lqs MASS为高阻力配合提供了最先进的算法。封装中的方法阻力较小,但在统计上更有效,例如封装MASS中的功能 rlm
  • Additive models. This technique aims to construct a regression function from smooth additive functions of the determining variables, usually one for each determining variable. Functions avas and ace in package acepack and functions bruto and mars in package mda provide some examples of these techniques in user-contributed packages to R. An extension is Generalized Additive Models, implemented in user-contributed packages gam and mgcv.
    加法模型。该技术旨在从确定变量的平滑加性函数构造回归函数,通常每个确定变量一个。acepack包中的函数 avasace 以及mda包中的函数 brutomars 在用户贡献给R的包中提供了这些技术的一些示例。一个扩展是广义加性模型,在用户贡献的软件包gam和mgcv中实现。
  • Tree-based models. Rather than seek an explicit global linear model for prediction or interpretation, tree-based models seek to bifurcate the data, recursively, at critical points of the determining variables in order to partition the data ultimately into groups that are as homogeneous as possible within, and as heterogeneous as possible between.
    基于树的模型而不是寻求一个明确的全局线性模型进行预测或解释,基于树的模型寻求分叉的数据,递归地,在决定变量的临界点,以分区的数据最终成组,是尽可能同质的内部,和尽可能异质之间。

    The results often lead to insights that other data analysis methods tend not to yield.
    这些结果通常会带来其他数据分析方法无法产生的见解。

    Models are again specified in the ordinary linear model form. The model fitting function is tree(), but many other generic functions such as plot() and text() are well adapted to displaying the results of a tree-based model fit in a graphical way.
    模型再次以普通线性模型的形式被指定。模型拟合函数是 tree() ,但许多其他通用函数(如 plot()text() )也适用于以图形方式显示基于树的模型拟合的结果。

    Tree models are available in R via the user-contributed packages rpart and tree.
    树模型在R中通过用户贡献的包rpart和tree可用。


12 Graphical procedures
12个图形程序¶

Graphical facilities are an important and extremely versatile component of the R environment. It is possible to use the facilities to display a wide variety of statistical graphs and also to build entirely new types of graph.
图形化工具是R环境中一个重要且用途极其广泛的组件。可以使用这些工具来显示各种各样的统计图表,也可以构建全新类型的图表。

The graphics facilities can be used in both interactive and batch modes, but in most cases, interactive use is more productive. Interactive use is also easy because at startup time R initiates a graphics device driver which opens a special graphics window for the display of interactive graphics. Although this is done automatically, it may useful to know that the command used is X11() under UNIX, windows() under Windows and quartz() under macOS. A new device can always be opened by dev.new().
图形工具可以在交互式和批处理模式下使用,但在大多数情况下,交互式使用更具生产力。交互式使用也很容易,因为在启动时,R启动一个图形设备驱动程序,打开一个特殊的图形窗口,用于显示交互式图形。虽然这是自动完成的,但了解UNIX下使用的命令是 X11() ,Windows下使用的命令是 windows() ,macOS下使用的命令是 quartz() 可能很有用。新设备始终可以通过 dev.new() 打开。

Once the device driver is running, R plotting commands can be used to produce a variety of graphical displays and to create entirely new kinds of display.
一旦设备驱动程序运行,R绘图命令可以用来产生各种图形显示,并创建全新的显示类型。

Plotting commands are divided into three basic groups:
打印命令分为三个基本组:

In addition, R maintains a list of graphical parameters which can be manipulated to customize your plots.
此外,R维护了一个图形参数列表,可以操纵这些参数来自定义绘图。

This manual only describes what are known as ‘base’ graphics. A separate graphics sub-system in package grid coexists with base – it is more powerful but harder to use. There is a recommended package lattice which builds on grid and provides ways to produce multi-panel plots akin to those in the Trellis system in S.
本手册仅描述了所谓的“基本”图形。在包网格中有一个单独的图形子系统与基础共存-它更强大,但更难使用。有一个推荐的包lattice,它建立在网格上,并提供了生成类似于S中的网格系统的多面板图的方法。


12.1 High-level plotting commands
12.1高级绘图命令¶

High-level plotting functions are designed to generate a complete plot of the data passed as arguments to the function.
高级绘图函数用于生成作为参数传递给函数的数据的完整绘图。

Where appropriate, axes, labels and titles are automatically generated (unless you request otherwise.) High-level plotting commands always start a new plot, erasing the current plot if necessary.
在适当的情况下,轴、标签和标题会自动生成(除非您另有要求)。高级打印命令总是开始一个新的打印,如有必要,会擦除当前打印。


12.1.1 The plot() function
12.1.1 plot() 函数¶

One of the most frequently used plotting functions in R is the plot() function. This is a generic function: the type of plot produced is dependent on the type or class of the first argument.
R中最常用的绘图函数之一是 plot() 函数。这是一个泛型函数:生成的图的类型取决于第一个参数的类型或类。

plot(x, y)
plot(xy)

If x and y are vectors, plot(x, y) produces a scatterplot of y against x. The same effect can be produced by supplying one argument (second form) as either a list containing two elements x and y or a two-column matrix.
如果x和y是向量,则 plot(x, y) 生成y相对于x的散点图。通过提供一个参数(第二种形式)作为包含两个元素x和y的列表或两列矩阵也可以产生相同的效果。

plot(x)

If x is a time series, this produces a time-series plot. If x is a numeric vector, it produces a plot of the values in the vector against their index in the vector. If x is a complex vector, it produces a plot of imaginary versus real parts of the vector elements.
如果x是一个时间序列,这将产生一个时间序列图。如果x是一个数字向量,它会生成一个向量中的值与向量中的索引的关系图。如果x是一个复向量,它会产生一个向量元素虚部与真实的部的关系图。

plot(f)
plot(f, y)

f is a factor object, y is a numeric vector. The first form generates a bar plot of f; the second form produces boxplots of y for each level of f.
f是一个因子对象,y是一个数值向量。第一种形式生成f的条形图;第二种形式为f的每个水平生成y的箱线图。

plot(df)
plot(~ expr)
plot(y ~ expr)

df is a data frame, y is any object, expr is a list of object names separated by ‘+’ (e.g., a + b + c). The first two forms produce distributional plots of the variables in a data frame (first form) or of a number of named objects (second form). The third form plots y against every object named in expr.
df是数据帧,y是任何对象,expr是由' + '分隔的对象名称的列表(例如, a + b + c )。前两种形式生成数据框中变量的分布图(第一种形式)或多个命名对象的分布图(第二种形式)。第三种形式将y绘制到expr中命名的每个对象上。


12.1.2 Displaying multivariate data
12.1.2处理多变量数据¶

R provides two very useful functions for representing multivariate data. If X is a numeric matrix or data frame, the command
R提供了两个非常有用的函数来表示多变量数据。如果 X 是数值矩阵或数据框,则命令

> pairs(X)

produces a pairwise scatterplot matrix of the variables defined by the columns of X, that is, every column of X is plotted against every other column of X and the resulting n(n-1) plots are arranged in a matrix with plot scales constant over the rows and columns of the matrix.
产生由 X 的列定义的变量的成对散点图矩阵,即, X 的每一列相对于 X 的每隔一列绘制,并且所得的n(n-1)个图以矩阵的行和列上的图标度恒定的方式排列在矩阵中。

When three or four variables are involved a coplot may be more enlightening. If a and b are numeric vectors and c is a numeric vector or factor object (all of the same length), then the command
当涉及三个或四个变量时,共图可能更有启发性。如果 ab 是数值向量,而 c 是数值向量或因子对象(长度都相同),则命令

> coplot(a ~ b | c)

produces a number of scatterplots of a against b for given values of c. If c is a factor, this simply means that a is plotted against b for every level of c. When c is numeric, it is divided into a number of conditioning intervals and for each interval a is plotted against b for values of c within the interval. The number and position of intervals can be controlled with given.values= argument to coplot()—the function co.intervals() is useful for selecting intervals. You can also use two given variables with a command like
对于给定的 c 值,生成多个 ab 的散点图。如果 c 是一个因子,这仅仅意味着对于 c 的每个水平, a 都相对于 b 作图。当 c 为数值时,将其划分为多个调节区间,对于每个区间,针对区间内的 c 值,将 ab 作图。间隔的数量和位置可以通过 given.values= 参数到 coplot() 来控制-功能 co.intervals() 对于选择间隔很有用。您也可以使用两个给定的变量与命令,如

> coplot(a ~ b | c + d)

which produces scatterplots of a against b for every joint conditioning interval of c and d.
其针对 cd 的每个关节调节间隔产生 ab 的散点图。

The coplot() and pairs() function both take an argument panel= which can be used to customize the type of plot which appears in each panel. The default is points() to produce a scatterplot but by supplying some other low-level graphics function of two vectors x and y as the value of panel= you can produce any type of plot you wish. An example panel function useful for coplots is panel.smooth().
coplot()pairs() 函数都接受一个参数 panel= ,该参数可用于自定义每个面板中显示的绘图类型。默认值为 points() 以生成散点图,但通过提供两个向量 xy 的其他低级图形函数作为 panel= 的值,您可以生成任何类型的图。一个对coplot有用的面板函数示例是 panel.smooth()


12.1.3 Display graphics
12.1.3显示图形¶

Other high-level graphics functions produce different types of plots. Some examples are:
其他高级图形函数生成不同类型的绘图。例如:

qqnorm(x)
qqline(x)
qqplot(x, y)

Distribution-comparison plots. The first form plots the numeric vector x against the expected Normal order scores (a normal scores plot) and the second adds a straight line to such a plot by drawing a line through the distribution and data quartiles. The third form plots the quantiles of x against those of y to compare their respective distributions.
分布比较图。第一种形式绘制了数字向量 x 与预期正态序评分的关系图(正态评分图),第二种形式通过绘制一条穿过分布和数据四分位数的直线,向该图添加了一条直线。第三种形式绘制 x 的分位数与 y 的分位数,以比较它们各自的分布。

hist(x)
hist(x, nclass=n)
hist(x, breaks=b, …)

Produces a histogram of the numeric vector x. A sensible number of classes is usually chosen, but a recommendation can be given with the nclass= argument. Alternatively, the breakpoints can be specified exactly with the breaks= argument. If the probability=TRUE argument is given, the bars represent relative frequencies divided by bin width instead of counts.
生成数字向量 x 的直方图。通常会选择合理数量的类,但可以使用 nclass= 参数给出建议。或者,可以使用 breaks= 参数精确指定断点。如果给定了 probability=TRUE 参数,则条形表示相对频率除以区间宽度而不是计数。

dotchart(x, …)

Constructs a dot chart of the data in x. In a dot chart the y-axis gives a labelling of the data in x and the x-axis gives its value. For example it allows easy visual selection of all data entries with values lying in specified ranges.
构建 x 中数据的点图。在点图中,y轴给出 x 中数据的标签,x轴给出其值。例如,它允许轻松地可视化选择所有值位于指定范围内的数据条目。

image(x, y, z, …)
contour(x, y, z, …)
persp(x, y, z, …)

Plots of three variables. The image plot draws a grid of rectangles using different colours to represent the value of z, the contour plot draws contour lines to represent the value of z, and the persp plot draws a 3D surface.
三个变量的图。 image 图使用不同颜色绘制矩形网格以表示 z 的值, contour 图绘制等高线以表示 z 的值,而 persp 图绘制3D表面。


12.1.4 Arguments to high-level plotting functions
12.1.4高级绘图函数的参数¶

There are a number of arguments which may be passed to high-level graphics functions, as follows:
有许多参数可以传递给高级图形函数,如下所示:

add=TRUE

Forces the function to act as a low-level graphics function, superimposing the plot on the current plot (some functions only).
强制函数作为低级图形函数,将绘图叠加在当前绘图上(仅限某些函数)。

axes=FALSE

Suppresses generation of axes—useful for adding your own custom axes with the axis() function. The default, axes=TRUE, means include axes.
禁止生成轴-这对于使用 axis() 函数添加您自己的自定义轴很有用。默认值 axes=TRUE 表示包含轴。

log="x"
log="y"
log="xy"

Causes the x, y or both axes to be logarithmic. This will work for many, but not all, types of plot.
使x、y轴或两个轴都为对数。这将适用于许多,但不是所有类型的情节。

type=

The type= argument controls the type of plot produced, as follows:
type= 参数控制生成的图的类型,如下所示:

type="p"

Plot individual points (the default)
绘制单个点(默认)

type="l"

Plot lines  情节线

type="b"

Plot points connected by lines (both)
绘制由线连接的点(两者)

type="o"

Plot points overlaid by lines
绘制被线覆盖的点

type="h"

Plot vertical lines from points to the zero axis (high-density)
绘制从点到零轴的垂直线(高密度)

type="s"
type="S"

Step-function plots. In the first form, the top of the vertical defines the point; in the second, the bottom.
阶跃函数图。在第一种形式中,垂直线的顶部定义了点;在第二种形式中,底部定义了点。

type="n"

No plotting at all. However axes are still drawn (by default) and the coordinate system is set up according to the data. Ideal for creating plots with subsequent low-level graphics functions.
完全没有阴谋。但是,轴仍然被绘制(默认情况下),坐标系根据数据设置。非常适合使用后续低级图形功能创建绘图。

xlab=string
ylab=string

Axis labels for the x and y axes. Use these arguments to change the default labels, usually the names of the objects used in the call to the high-level plotting function.
x和y轴的轴标签。使用这些参数更改默认标签,通常是调用高级绘图函数时使用的对象的名称。

main=string

Figure title, placed at the top of the plot in a large font.
图标题,以大字体显示在图的顶部。

sub=string

Sub-title, placed just below the x-axis in a smaller font.
副标题,以较小的字体放置在x轴的正下方。


12.2 Low-level plotting commands
12.2低级绘图命令¶

Sometimes the high-level plotting functions don’t produce exactly the kind of plot you desire. In this case, low-level plotting commands can be used to add extra information (such as points, lines or text) to the current plot.
有时候,高级绘图函数并不能精确地生成您想要的那种绘图。在这种情况下,可以使用低级绘图命令向当前绘图添加额外信息(如点、线或文本)。

Some of the more useful low-level plotting functions are:
一些更有用的低级绘图函数是:

points(x, y)
lines(x, y)

Adds points or connected lines to the current plot. plot()’s type= argument can also be passed to these functions (and defaults to "p" for points() and "l" for lines().)
将点或连接线添加到当前图中。 plot()type= 参数也可以传递给这些函数(对于 points() ,默认为 "p" ;对于 lines() ,默认为 "l" )。

text(x, y, labels, …)

Add text to a plot at points given by x, y. Normally labels is an integer or character vector in which case labels[i] is plotted at point (x[i], y[i]). The default is 1:length(x).
将文本添加到图中 x, y 给出的点。通常 labels 是一个整数或字符向量,在这种情况下, labels[i] 被绘制在点 (x[i], y[i]) 。默认值为 1:length(x)

Note: This function is often used in the sequence
注意:此函数经常在序列中使用

> plot(x, y, type="n"); text(x, y, names)

The graphics parameter type="n" suppresses the points but sets up the axes, and the text() function supplies special characters, as specified by the character vector names for the points.
图形参数 type="n" 抑制点但设置轴,而 text() 函数提供特殊字符,如由点的字符向量 names 指定的。

abline(a, b)
abline(h=y)
abline(v=x)
abline(lm.obj)

Adds a line of slope b and intercept a to the current plot. h=y may be used to specify y-coordinates for the heights of horizontal lines to go across a plot, and v=x similarly for the x-coordinates for vertical lines. Also lm.obj may be list with a coefficients component of length 2 (such as the result of model-fitting functions,) which are taken as an intercept and slope, in that order.
在当前图中添加斜率为0#、截距为1#的直线。 h=y 可用于指定穿过图的水平线的高度的y坐标,并且 v=x 类似地用于垂直线的x坐标。lm.obj也可以与长度为2的 coefficients 分量(例如模型拟合函数的结果)一起列出,其依次被视为截距和斜率。

polygon(x, y, …)

Draws a polygon defined by the ordered vertices in (x, y) and (optionally) shade it in with hatch lines, or fill it if the graphics device allows the filling of figures.
绘制由( xy )中的有序顶点定义的多边形,并(可选)使用阴影线对其进行着色,或者如果图形设备允许填充图形,则填充它。

legend(x, y, legend, …)

Adds a legend to the current plot at the specified position. Plotting characters, line styles, colors etc., are identified with the labels in the character vector legend. At least one other argument v (a vector the same length as legend) with the corresponding values of the plotting unit must also be given, as follows:
将图例添加到当前图的指定位置。绘制字符、线条样式、颜色等,用字符向量 legend 中的标签来标识。还必须给出至少一个其他参数v(与 legend 长度相同的向量)以及绘图单位的相应值,如下所示:

legend( , fill=v)

Colors for filled boxes  填充框的颜色

legend( , col=v)

Colors in which points or lines will be drawn
绘制点或线时使用的颜色

legend( , lty=v)

Line styles  线样式

legend( , lwd=v)

Line widths  线宽

legend( , pch=v)

Plotting characters (character vector)
打印字符(字符向量)

title(main, sub)

Adds a title main to the top of the current plot in a large font and (optionally) a sub-title sub at the bottom in a smaller font.
将标题 main 以大字体添加到当前绘图的顶部,并(可选)将子标题 sub 以小字体添加到底部。

axis(side, …)

Adds an axis to the current plot on the side given by the first argument (1 to 4, counting clockwise from the bottom.) Other arguments control the positioning of the axis within or beside the plot, and tick positions and labels. Useful for adding custom axes after calling plot() with the axes=FALSE argument.
在第一个参数给定的一侧向当前图添加一个轴(1到4,从底部顺时针计数)。其他参数控制轴在图中或图旁的位置,以及标记位置和标签。用于在使用 axes=FALSE 参数调用 plot() 后添加自定义轴。

Low-level plotting functions usually require some positioning information (e.g., x and y coordinates) to determine where to place the new plot elements. Coordinates are given in terms of user coordinates which are defined by the previous high-level graphics command and are chosen based on the supplied data.
低级绘图功能通常需要一些定位信息(例如,x和y坐标)来确定放置新绘图元素的位置。坐标以用户坐标的形式给出,这些坐标由之前的高级图形命令定义并根据提供的数据进行选择。

Where x and y arguments are required, it is also sufficient to supply a single argument being a list with elements named x and y. Similarly a matrix with two columns is also valid input. In this way functions such as locator() (see below) may be used to specify positions on a plot interactively.
在需要 xy 参数的情况下,提供单个参数(包含名为 xy 的元素的列表)也就足够了。类似地,具有两列的矩阵也是有效输入。以这种方式,可以使用诸如 locator() (见下文)的函数来交互地指定绘图上的位置。


12.2.1 Mathematical annotation
12.2.1数学注释¶

In some cases, it is useful to add mathematical symbols and formulae to a plot. This can be achieved in R by specifying an expression rather than a character string in any one of text, mtext, axis, or title. For example, the following code draws the formula for the Binomial probability function:
在某些情况下,向图中添加数学符号和公式非常有用。在R中,这可以通过在 textmtextaxistitle 中指定表达式而不是字符串来实现。例如,下面的代码绘制二项式概率函数的公式:

> text(x, y, expression(paste(bgroup("(", atop(n, x), ")"), p^x, q^{n-x})))

More information, including a full listing of the features available can obtained from within R using the commands:
更多信息,包括可用功能的完整列表,可以从R中使用命令获得:

> help(plotmath)
> example(plotmath)
> demo(plotmath)

12.2.2 Hershey vector fonts
12.2.2 Hershey矢量字体¶

It is possible to specify Hershey vector fonts for rendering text when using the text and contour functions. There are three reasons for using the Hershey fonts:
使用 textcontour 函数时,可以指定用于呈现文本的Hershey矢量字体。使用Hershey字体有三个原因:

  • Hershey fonts can produce better output, especially on a computer screen, for rotated and/or small text.
    好时字体可以产生更好的输出,特别是在计算机屏幕上,旋转和/或小文本。
  • Hershey fonts provide certain symbols that may not be available in the standard fonts. In particular, there are zodiac signs, cartographic symbols and astronomical symbols.
    Hershey字体提供了某些在标准字体中可能无法使用的符号。特别是,有十二生肖,制图符号和天文符号。
  • Hershey fonts provide Cyrillic and Japanese (Kana and Kanji) characters.
    Hershey字体提供西里尔和日语(假名和汉字)字符。

More information, including tables of Hershey characters can be obtained from within R using the commands:
更多信息,包括Hershey字符表,可以从R中使用命令获得:

> help(Hershey)
> demo(Hershey)
> help(Japanese)
> demo(Japanese)

12.3 Interacting with graphics
12.3与图形交互¶

R also provides functions which allow users to extract or add information to a plot using a mouse. The simplest of these is the locator() function:
R还提供了允许用户使用鼠标提取或添加信息到图中的功能。其中最简单的是 locator() 函数:

locator(n, type)

Waits for the user to select locations on the current plot using the left mouse button. This continues until n (default 512) points have been selected, or another mouse button is pressed. The type argument allows for plotting at the selected points and has the same effect as for high-level graphics commands; the default is no plotting. locator() returns the locations of the points selected as a list with two components x and y.
等待用户使用鼠标左键选择当前图上的位置。此操作一直持续到选择了 n (默认值为512)个点或按下另一个鼠标按钮。 type 参数允许在选定点处打印,其效果与高级图形命令相同;默认值为不打印。 locator() 返回所选点的位置,作为包含两个组件 xy 的列表。

locator() is usually called with no arguments. It is particularly useful for interactively selecting positions for graphic elements such as legends or labels when it is difficult to calculate in advance where the graphic should be placed.
locator() 通常不带参数调用。当难以预先计算图形的放置位置时,它对于交互式地选择图例或标签等图形元素的位置特别有用。

For example, to place some informative text near an outlying point, the command
例如,要在外围点附近放置一些信息性文本,

> text(locator(1), "Outlier", adj=0)

may be useful. (locator() will be ignored if the current device, such as postscript does not support interactive pointing.)
可能有用(如果当前设备(例如 postscript )不支持交互式指向,则会忽略 locator() 。)

identify(x, y, labels)

Allow the user to highlight any of the points defined by x and y (using the left mouse button) by plotting the corresponding component of labels nearby (or the index number of the point if labels is absent). Returns the indices of the selected points when another button is pressed.
允许用户通过在附近绘制 labels 的相应分量(或如果没有 labels ,则绘制点的索引号)来突出显示由 xy 定义的任何点(使用鼠标左键)。当按下另一个按钮时,返回选定点的索引。

Sometimes we want to identify particular points on a plot, rather than their positions. For example, we may wish the user to select some observation of interest from a graphical display and then manipulate that observation in some way. Given a number of (x, y) coordinates in two numeric vectors x and y, we could use the identify() function as follows:
有时候我们想在图上确定特定的点,而不是它们的位置。例如,我们可能希望用户从图形显示中选择一些感兴趣的观察,然后以某种方式操纵该观察。给定两个数字向量 xy 中的(x,y)坐标的数量,我们可以如下使用 identify() 函数:

> plot(x, y)
> identify(x, y)

The identify() functions performs no plotting itself, but simply allows the user to move the mouse pointer and click the left mouse button near a point. If there is a point near the mouse pointer it will be marked with its index number (that is, its position in the x/y vectors) plotted nearby. Alternatively, you could use some informative string (such as a case name) as a highlight by using the labels argument to identify(), or disable marking altogether with the plot = FALSE argument. When the process is terminated (see above), identify() returns the indices of the selected points; you can use these indices to extract the selected points from the original vectors x and y.
identify() 函数本身不执行绘图,只是允许用户移动鼠标指针并在点附近单击鼠标左键。如果在鼠标指针附近有一个点,它将被标记为其索引号(即它在 x / y 矢量中的位置)。或者,您可以使用一些信息字符串(如案例名称)作为突出显示,方法是使用 labels 参数到 identify() ,或者使用 plot = FALSE 参数完全禁用标记。当过程终止时(见上文), identify() 返回选定点的索引;您可以使用这些索引从原始向量 xy 中提取选定点。


12.4 Using graphics parameters
12.4使用图形参数¶

When creating graphics, particularly for presentation or publication purposes, R’s defaults do not always produce exactly that which is required. You can, however, customize almost every aspect of the display using graphics parameters. R maintains a list of a large number of graphics parameters which control things such as line style, colors, figure arrangement and text justification among many others. Every graphics parameter has a name (such as ‘col’, which controls colors,) and a value (a color number, for example.)
当创建图形时,特别是用于演示或发布目的时,R的默认值并不总是生成所需的图形。但是,您可以使用图形参数自定义显示的几乎每个方面。R维护了一个包含大量图形参数的列表,这些参数控制诸如线型、颜色、图形排列和文本对齐等。每个图形参数都有一个名称(如' col ',它控制颜色)和一个值(例如颜色编号)。

A separate list of graphics parameters is maintained for each active device, and each device has a default set of parameters when initialized.
为每个活动设备维护一个单独的图形参数列表,并且每个设备在初始化时都有一组默认参数。

Graphics parameters can be set in two ways: either permanently, affecting all graphics functions which access the current device; or temporarily, affecting only a single graphics function call.
图形参数可以通过两种方式设置:永久性地影响访问当前设备的所有图形函数;或者临时性地只影响单个图形函数调用。


12.4.1 Permanent changes: The par() function
12.4.1永久更改: par() 函数¶

The par() function is used to access and modify the list of graphics parameters for the current graphics device.
par() 函数用于访问和修改当前图形设备的图形参数列表。

par()

Without arguments, returns a list of all graphics parameters and their values for the current device.
不带参数,返回当前设备的所有图形参数及其值的列表。

par(c("col", "lty"))

With a character vector argument, returns only the named graphics parameters (again, as a list.)
使用字符向量参数时,仅返回命名的图形参数(同样以列表形式)。

par(col=4, lty=2)

With named arguments (or a single list argument), sets the values of the named graphics parameters, and returns the original values of the parameters as a list.
使用命名参数(或单个列表参数),设置命名图形参数的值,并以列表形式返回参数的原始值。

Setting graphics parameters with the par() function changes the value of the parameters permanently, in the sense that all future calls to graphics functions (on the current device) will be affected by the new value.
使用 par() 函数设置图形参数会永久更改参数的值,也就是说,将来对图形函数(在当前设备上)的所有调用都将受到新值的影响。

You can think of setting graphics parameters in this way as setting “default” values for the parameters, which will be used by all graphics functions unless an alternative value is given.
您可以将以这种方式设置图形参数视为设置参数的“默认”值,除非给出替代值,否则所有图形函数都将使用这些参数。

Note that calls to par() always affect the global values of graphics parameters, even when par() is called from within a function. This is often undesirable behavior—usually we want to set some graphics parameters, do some plotting, and then restore the original values so as not to affect the user’s R session. You can restore the initial values by saving the result of par() when making changes, and restoring the initial values when plotting is complete.
请注意,对 par() 的调用始终会影响图形参数的全局值,即使是在函数内部调用 par() 时也是如此。这通常是不受欢迎的行为-通常我们希望设置一些图形参数,进行一些绘图,然后恢复原始值,以免影响用户的R会话。您可以通过在进行更改时保存 par() 的结果并在绘图完成时恢复初始值来恢复初始值。

> oldpar <- par(col=4, lty=2)
  ... plotting commands ...
> par(oldpar)

To save and restore all settable24 graphical parameters use
要保存和恢复所有可设置的 24 图形参数,请使用

> oldpar <- par(no.readonly=TRUE)
  ... plotting commands ...
> par(oldpar)

12.4.2 Temporary changes: Arguments to graphics functions
12.4.2临时更改:图形函数的参数¶

Graphics parameters may also be passed to (almost) any graphics function as named arguments. This has the same effect as passing the arguments to the par() function, except that the changes only last for the duration of the function call. For example:
图形参数也可以作为命名参数传递给(几乎)任何图形函数。这与将参数传递给 par() 函数具有相同的效果,只是更改仅持续函数调用的持续时间。例如:

> plot(x, y, pch="+")

produces a scatterplot using a plus sign as the plotting character, without changing the default plotting character for future plots.
使用加号作为打印字符生成散点图,而不更改将来打印的默认打印字符。

Unfortunately, this is not implemented entirely consistently and it is sometimes necessary to set and reset graphics parameters using par().
不幸的是,这并不是完全一致地实现的,有时需要使用 par() 设置和重置图形参数。


12.5 Graphics parameters list
12.5图形参数列表¶

The following sections detail many of the commonly-used graphical parameters. The R help documentation for the par() function provides a more concise summary; this is provided as a somewhat more detailed alternative.
以下部分详细介绍了许多常用的图形参数。 par() 函数的R帮助文档提供了一个更简洁的总结;这是作为一个更详细的替代方案提供的。

Graphics parameters will be presented in the following form:
图形参数将以以下形式显示:

name=value

A description of the parameter’s effect. name is the name of the parameter, that is, the argument name to use in calls to par() or a graphics function. value is a typical value you might use when setting the parameter.
参数效果的说明。name是参数的名称,即调用 par() 或图形函数时使用的参数名称。value是设置参数时可能使用的典型值。

Note that axes is not a graphics parameter but an argument to a few plot methods: see xaxt and yaxt.
注意 axes 不是一个图形参数,而是一些 plot 方法的参数:参见 xaxtyaxt


12.5.1 Graphical elements
12.5.1图形元素¶

R plots are made up of points, lines, text and polygons (filled regions.) Graphical parameters exist which control how these graphical elements are drawn, as follows:
R图由点、线、文本和多边形(填充区域)组成。存在控制如何绘制这些图形元素的图形参数,如下所示:

pch="+"

Character to be used for plotting points. The default varies with graphics drivers, but it is usually a circle. Plotted points tend to appear slightly above or below the appropriate position unless you use "." as the plotting character, which produces centered points.
用于打印点的字符。默认值因图形驱动程序而异,但通常为圆形。除非使用 "." 作为打印字符,否则打印的点往往会显示在适当位置的上方或下方,这会产生居中的点。

pch=4

When pch is given as an integer between 0 and 25 inclusive, a specialized plotting symbol is produced. To see what the symbols are, use the command
pch 作为0和25之间的整数给出时,产生专用绘图符号。要查看符号是什么,请使用命令

> legend(locator(1), as.character(0:25), pch = 0:25)

Those from 21 to 25 may appear to duplicate earlier symbols, but can be coloured in different ways: see the help on points and its examples.
从21到25的符号可能看起来与之前的符号重复,但可以以不同的方式着色:请参阅关于 points 的帮助及其示例。

In addition, pch can be a character or a number in the range 32:255 representing a character in the current font.
此外, pch 可以是表示当前字体中的字符的范围 32:255 中的字符或数字。

lty=2

Line types. Alternative line styles are not supported on all graphics devices (and vary on those that do) but line type 1 is always a solid line, line type 0 is always invisible, and line types 2 and onwards are dotted or dashed lines, or some combination of both.
线路类型。并非所有图形设备都支持替代线样式(并且在支持的图形设备上有所不同),但线类型1始终为实线,线类型0始终不可见,线类型2及以后为点线或虚线,或两者的某种组合。

lwd=2

Line widths. Desired width of lines, in multiples of the “standard” line width. Affects axis lines as well as lines drawn with lines(), etc. Not all devices support this, and some have restrictions on the widths that can be used.
线宽。所需的线宽,以“标准”线宽的倍数表示。影响轴线以及使用 lines() 绘制的线等。并非所有设备都支持此功能,有些设备对可使用的宽度有限制。

col=2

Colors to be used for points, lines, text, filled regions and images. A number from the current palette (see ?palette) or a named colour.
用于点、线、文本、填充区域和图像的颜色。当前调色板中的数字(参见 ?palette )或命名颜色。

col.axis
col.lab
col.main
col.sub

The color to be used for axis annotation, x and y labels, main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的颜色。

font=2

An integer which specifies which font to use for text. If possible, device drivers arrange so that 1 corresponds to plain text, 2 to bold face, 3 to italic, 4 to bold italic and 5 to a symbol font (which include Greek letters).
一个整数,指定文本使用的字体。如果可能的话,设备驱动程序会安排 1 对应于纯文本, 2 对应于粗体, 3 对应于斜体, 4 对应于粗体斜体, 5 对应于符号字体(包括希腊字母)。

font.axis
font.lab
font.main
font.sub

The font to be used for axis annotation, x and y labels, main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的字体。

adj=-0.1

Justification of text relative to the plotting position. 0 means left justify, 1 means right justify and 0.5 means to center horizontally about the plotting position. The actual value is the proportion of text that appears to the left of the plotting position, so a value of -0.1 leaves a gap of 10% of the text width between the text and the plotting position.
文字相对于打印位置的对正。 0 表示左对齐, 1 表示右对齐, 0.5 表示围绕打印位置水平居中。实际值是显示在打印位置左侧的文本的比例,因此值 -0.1 在文本和打印位置之间留下文本宽度的10%的间隙。

cex=1.5

Character expansion. The value is the desired size of text characters (including plotting characters) relative to the default text size.
性格膨胀。该值是文本字符(包括打印字符)相对于默认文本大小的所需大小。

cex.axis
cex.lab
cex.main
cex.sub

The character expansion to be used for axis annotation, x and y labels, main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的字符扩展。


12.5.2 Axes and tick marks
12.5.2轴和刻度线¶

Many of R’s high-level plots have axes, and you can construct axes yourself with the low-level axis() graphics function. Axes have three main components: the axis line (line style controlled by the lty graphics parameter), the tick marks (which mark off unit divisions along the axis line) and the tick labels (which mark the units.) These components can be customized with the following graphics parameters.
R的许多高级图都有轴,你可以用低级的 axis() graphics函数自己构造轴。轴有三个主要组件:轴线(由 lty 图形参数控制的线样式)、刻度线(沿轴线沿着划分单位)和刻度标签(标记单位)。可以使用以下图形参数自定义这些组件。

lab=c(5, 7, 12)

The first two numbers are the desired number of tick intervals on the x and y axes respectively. The third number is the desired length of axis labels, in characters (including the decimal point.) Choosing a too-small value for this parameter may result in all tick labels being rounded to the same number!
前两个数字分别是x轴和y轴上所需的刻度间隔数。第三个数字是所需的轴标签长度,以字符为单位(包括小数点)。为该参数选择一个太小的值可能会导致所有刻度标签被舍入为相同的数字!

las=1

Orientation of axis labels. 0 means always parallel to axis, 1 means always horizontal, and 2 means always perpendicular to the axis.
轴标签的方向。 0 表示始终平行于轴, 1 表示始终水平, 2 表示始终垂直于轴。

mgp=c(3, 1, 0)

Positions of axis components. The first component is the distance from the axis label to the axis position, in text lines. The second component is the distance to the tick labels, and the final component is the distance from the axis position to the axis line (usually zero).
轴组件的位置。第一个分量是从轴标签到轴位置的距离,以文本行表示。第二个分量是到刻度标签的距离,最后一个分量是从轴位置到轴线的距离(通常为零)。

Positive numbers measure outside the plot region, negative numbers inside.
正数在绘图区域外测量,负数在绘图区域内测量。

tck=0.01

Length of tick marks, as a fraction of the size of the plotting region. When tck is small (less than 0.5) the tick marks on the x and y axes are forced to be the same size. A value of 1 gives grid lines. Negative values give tick marks outside the plotting region. Use tck=0.01 and mgp=c(1,-1.5,0) for internal tick marks.
刻度线的长度,作为打印区域大小的一部分。当 tck 很小时(小于0.5),x轴和y轴上的刻度线将被强制为相同的大小。值为1表示栅格线。负值将在打印区域外给出刻度线。使用 tck=0.01mgp=c(1,-1.5,0) 作为内部刻度线。

xaxs="r"
yaxs="i"

Axis styles for the x and y axes, respectively. With styles "i" (internal) and "r" (the default) tick marks always fall within the range of the data, however style "r" leaves a small amount of space at the edges.
x轴和y轴的轴样式。对于样式 "i" (内部)和 "r" (默认),刻度线始终落在数据范围内,但是样式 "r" 在边缘处留下少量空间。


12.5.3 Figure margins
12.5.3图形页边距¶

A single plot in R is known as a figure and comprises a plot region surrounded by margins (possibly containing axis labels, titles, etc.) and (usually) bounded by the axes themselves.
R中的单个绘图称为 figure ,包括由页边距包围的绘图区域(可能包含轴标签、标题等)并且(通常)由轴本身限制。

A typical figure is
一个典型的人物是

images/fig11

Graphics parameters controlling figure layout include:
控制图形布局的图形参数包括:

mai=c(1, 0.5, 0.5, 0)

Widths of the bottom, left, top and right margins, respectively, measured in inches.
底部、左侧、顶部和右侧边距的宽度,分别以英寸为单位。

mar=c(4, 2, 2, 1)

Similar to mai, except the measurement unit is text lines.
类似于 mai ,除了测量单位是文本行。

mar and mai are equivalent in the sense that setting one changes the value of the other. The default values chosen for this parameter are often too large; the right-hand margin is rarely needed, and neither is the top margin if no title is being used.
marmai 在设置一个会改变另一个的值的意义上是等效的。为这个参数选择的默认值通常太大;很少需要右边距,如果没有使用标题,也不需要上边距。

The bottom and left margins must be large enough to accommodate the axis and tick labels. Furthermore, the default is chosen without regard to the size of the device surface: for example, using the postscript() driver with the height=4 argument will result in a plot which is about 50% margin unless mar or mai are set explicitly. When multiple figures are in use (see below) the margins are reduced, however this may not be enough when many figures share the same page.
底部和左边距必须足够大,以容纳轴和刻度标签。此外,默认值的选择与设备表面的大小无关:例如,使用带有 height=4 参数的 postscript() 驱动程序将导致大约50%的边缘,除非显式设置 marmai 。当使用多个图形时(见下文),边距会减少,但当许多图形共享同一页面时,这可能不够。


12.5.4 Multiple figure environment
12.5.4多图形环境¶

R allows you to create an n by m array of figures on a single page. Each figure has its own margins, and the array of figures is optionally surrounded by an outer margin, as shown in the following figure.
R允许你在一个页面上创建一个n乘m的数字数组。每个图形都有自己的边距,并且图形阵列可以选择由外部边距包围,如下图所示。

images/fig12

The graphical parameters relating to multiple figures are as follows:
与多个图形相关的图形参数如下:

mfcol=c(3, 2)
mfrow=c(2, 4)

Set the size of a multiple figure array. The first value is the number of rows; the second is the number of columns. The only difference between these two parameters is that setting mfcol causes figures to be filled by column; mfrow fills by rows.
设置多地物阵列的大小。第一个值是行数;第二个值是列数。这两个参数之间的唯一区别是设置 mfcol 会导致数字按列填充; mfrow 按行填充。

The layout in the Figure could have been created by setting mfrow=c(3,2); the figure shows the page after four plots have been drawn.
图中的布局可以通过设置 mfrow=c(3,2) 创建;图中显示了绘制四个图后的页面。

Setting either of these can reduce the base size of symbols and text (controlled by par("cex") and the pointsize of the device). In a layout with exactly two rows and columns the base size is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66.
设置这两项都可以减小符号和文本的基本大小(由 par("cex") 和设备的点大小控制)。在只有两行和两列的布局中,基本大小将以0.83的因子减小:如果有三行或三列或更多行或列,则减小因子为0.66。

mfg=c(2, 2, 3, 2)

Position of the current figure in a multiple figure environment. The first two numbers are the row and column of the current figure; the last two are the number of rows and columns in the multiple figure array. Set this parameter to jump between figures in the array.
当前地物在多地物环境中的位置。前两个数字是当前图形的行数和列数;后两个数字是多图形数组中的行数和列数。设置此参数以在数组中的图形之间跳转。

You can even use different values for the last two numbers than the true values for unequally-sized figures on the same page.
您甚至可以使用不同的值为最后两个数字比真正的价值观不相等大小的数字在同一页上。

fig=c(4, 9, 1, 4)/10

Position of the current figure on the page. Values are the positions of the left, right, bottom and top edges respectively, as a percentage of the page measured from the bottom left corner. The example value would be for a figure in the bottom right of the page.
当前图形在页面上的位置。值分别是左、右、下和上边缘的位置,以从左下角开始测量的页面百分比表示。示例值将用于页面右下角的图形。

Set this parameter for arbitrary positioning of figures within a page. If you want to add a figure to a current page, use new=TRUE as well (unlike S).
设置此参数可在页面中任意定位图形。如果你想在当前页面上添加一个图形,也可以使用 new=TRUE (与S不同)。

oma=c(2, 0, 3, 0)
omi=c(0, 0, 0.8, 0)

Size of outer margins. Like mar and mai, the first measures in text lines and the second in inches, starting with the bottom margin and working clockwise.
外部边距的大小。与 marmai 一样,第一个以文本行为单位,第二个以英寸为单位,从底部边距开始顺时针工作。

Outer margins are particularly useful for page-wise titles, etc. Text can be added to the outer margins with the mtext() function with argument outer=TRUE. There are no outer margins by default, however, so you must create them explicitly using oma or omi.
外边距对于页面标题等特别有用。可以使用带参数 outer=TRUEmtext() 函数将文本添加到外边距。但是,默认情况下没有外部边距,因此必须使用 omaomi 显式创建它们。

More complicated arrangements of multiple figures can be produced by the split.screen() and layout() functions, as well as by the grid and lattice packages.
通过 split.screen()layout() 函数以及网格和格子包可以产生多个图形的更复杂的排列。


12.6 Device drivers
12.6设备驱动程序¶

R can generate graphics (of varying levels of quality) on almost any type of display or printing device. Before this can begin, however, R needs to be informed what type of device it is dealing with. This is done by starting a device driver. The purpose of a device driver is to convert graphical instructions from R (“draw a line,” for example) into a form that the particular device can understand.
R可以在几乎任何类型的显示器或打印设备上生成图形(不同质量级别)。然而,在这开始之前,R需要被告知它正在处理什么类型的设备。这是通过启动设备驱动程序来完成的。设备驱动程序的目的是将来自R的图形指令(例如“画一条线”)转换为特定设备可以理解的形式。

Device drivers are started by calling a device driver function. There is one such function for every device driver: type help(Devices) for a list of them all. For example, issuing the command
设备驱动程序通过调用设备驱动程序函数启动。每个设备驱动程序都有一个这样的函数:键入 help(Devices) 以获得所有设备驱动程序的列表。例如,发出命令

> postscript()

causes all future graphics output to be sent to the printer in PostScript format. Some commonly-used device drivers are:
使所有将来的图形输出以JPEG格式发送到打印机。一些常用的设备驱动程序是:

X11()

For use with the X11 window system on Unix-alikes
在类Unix系统上与X11窗口系统一起使用

windows()

For use on Windows  适用于Windows

quartz()

For use on macOS  适用于macOS

postscript()

For printing on PostScript printers, or creating PostScript graphics files.
用于在Windows打印机上打印或创建Windows图形文件。

pdf()

Produces a PDF file, which can also be included into PDF files.
生成PDF文件,该文件也可以包含在PDF文件中。

png()

Produces a bitmap PNG file. (Not always available: see its help page.)
生成位图PNG文件。(Not总是可用的:请参阅其帮助页面。)

jpeg()

Produces a bitmap JPEG file, best used for image plots. (Not always available: see its help page.)
生成位图JPEG文件,最适合用于 image 打印。(Not总是可用的:请参阅其帮助页面。)

When you have finished with a device, be sure to terminate the device driver by issuing the command
当您完成了一个设备,一定要终止设备驱动程序发出命令

> dev.off()

This ensures that the device finishes cleanly; for example in the case of hardcopy devices this ensures that every page is completed and has been sent to the printer. (This will happen automatically at the normal end of a session.)
这确保了设备完成干净;例如,在硬拷贝设备的情况下,这确保了每一页都完成并已发送到打印机。(This将在会话正常结束时自动发生。)


12.6.1 PostScript diagrams for typeset documents
12.6.1类型化文档的结构图¶

By passing the file argument to the postscript() device driver function, you may store the graphics in PostScript format in a file of your choice. The plot will be in landscape orientation unless the horizontal=FALSE argument is given, and you can control the size of the graphic with the width and height arguments (the plot will be scaled as appropriate to fit these dimensions.) For example, the command
通过将 file 参数传递给 postscript() 设备驱动程序函数,您可以将图形存储为您选择的文件中的NTFS格式。除非给出了 horizontal=FALSE 参数,否则图将以横向方向显示,并且您可以使用 widthheight 参数控制图形的大小(图将根据这些尺寸进行适当缩放。)例如,命令

> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)

will produce a file containing PostScript code for a figure five inches high, perhaps for inclusion in a document. It is important to note that if the file named in the command already exists, it will be overwritten.
将产生一个文件,其中包含一个五英寸高的图形的密码,可能包含在一个文档中。需要注意的是,如果命令中指定的文件已经存在,它将被覆盖。

This is the case even if the file was only created earlier in the same R session.
即使该文件只是在同一个R会话中较早创建的,情况也是如此。

Many usages of PostScript output will be to incorporate the figure in another document. This works best when encapsulated PostScript is produced: R always produces conformant output, but only marks the output as such when the onefile=FALSE argument is supplied. This unusual notation stems from S-compatibility: it really means that the output will be a single page (which is part of the EPSF specification). Thus to produce a plot for inclusion use something like
在许多情况下,将图形合并到另一个文档中是对图形输出的一种用法。这在产生封装的输出时效果最好:R总是产生一致的输出,但只有在提供 onefile=FALSE 参数时才将输出标记为一致。这种不寻常的表示法源于S兼容性:它实际上意味着输出将是单个页面(这是EPSF规范的一部分)。因此,要生成包含的图,请使用以下内容

> postscript("plot1.eps", horizontal=FALSE, onefile=FALSE,
             height=8, width=6, pointsize=10)

12.6.2 Multiple graphics devices
12.6.2多个图形设备¶

In advanced use of R it is often useful to have several graphics devices in use at the same time. Of course only one graphics device can accept graphics commands at any one time, and this is known as the current device. When multiple devices are open, they form a numbered sequence with names giving the kind of device at any position.
在R的高级使用中,同时使用多个图形设备通常很有用。当然,在任何时候只有一个图形设备可以接受图形命令,这就是所谓的当前设备。当多个设备打开时,它们会形成一个编号序列,其中的名称给出了任何位置的设备类型。

The main commands used for operating with multiple devices, and their meanings are as follows:
用于操作多个设备的主要命令及其含义如下:

X11()

[UNIX]

windows()
win.printer()
win.metafile()

[Windows]

quartz()

[macOS]

postscript()
pdf()
png()
jpeg()
tiff()
bitmap()

Each new call to a device driver function opens a new graphics device, thus extending by one the device list. This device becomes the current device, to which graphics output will be sent.
对设备驱动程序函数的每个新调用都打开一个新的图形设备,从而将设备列表扩展一个。此设备将成为当前设备,图形输出将发送到该设备。

dev.list()

Returns the number and name of all active devices. The device at position 1 on the list is always the null device which does not accept graphics commands at all.
返回所有活动设备的编号和名称。列表中位置1处的设备始终是空设备,它根本不接受图形命令。

dev.next()
dev.prev()

Returns the number and name of the graphics device next to, or previous to the current device, respectively.
分别返回当前设备旁边或前面的图形设备的编号和名称。

dev.set(which=k)

Can be used to change the current graphics device to the one at position k of the device list. Returns the number and label of the device.
可用于将当前图形设备更改为设备列表中位置k处的设备。返回设备的编号和标签。

dev.off(k)

Terminate the graphics device at point k of the device list. For some devices, such as postscript devices, this will either print the file immediately or correctly complete the file for later printing, depending on how the device was initiated.
在设备列表的点k处终止图形设备。对于某些设备(如 postscript 设备),这将立即打印文件或正确完成文件以便稍后打印,具体取决于设备的启动方式。

dev.copy(device, …, which=k)
dev.print(device, …, which=k)

Make a copy of the device k. Here device is a device function, such as postscript, with extra arguments, if needed, specified by ‘’. dev.print is similar, but the copied device is immediately closed, so that end actions, such as printing hardcopies, are immediately performed.
复制设备k。这里的 device 是一个设备函数,比如 postscript ,如果需要的话,有额外的参数,由' '指定。 dev.print 类似,但复制的设备会立即关闭,以便立即执行结束操作,例如打印硬拷贝。

graphics.off()

Terminate all graphics devices on the list, except the null device.
终止列表中的所有图形设备,空设备除外。


12.7 Dynamic graphics
12.7动态图形¶

R does not have builtin capabilities for dynamic or interactive graphics, e.g. rotating point clouds or to “brushing” (interactively highlighting) points. However, extensive dynamic graphics facilities are available in the system GGobi by Swayne, Cook and Buja available from
R没有内置的动态或交互式图形功能,例如旋转点云或“刷”(交互式突出显示)点。然而,在Swayne、Cook和布贾的系统GGobi中可获得广泛的动态图形设施,其可从

http://ggobi.org/

and these can be accessed from R via the package rggobi, described at http://ggobi.org/rggobi.html.
并且这些可以通过在http://ggobi.org/rggobi.html描述的软件包rggobi从R访问。

Also, package rgl provides ways to interact with 3D plots, for example of surfaces.
此外,软件包rgl提供了与3D绘图(例如曲面)交互的方法。


13 Packages 13个包¶

All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code.
所有R函数和数据集都存储在包中。只有当包被加载时,其内容才可用。这样做既是为了提高效率(完整列表将占用更多内存,并且比子集搜索时间更长),也是为了帮助包开发人员,他们受到保护,不会与其他代码发生名称冲突。

The process of developing packages is described in Creating R packages in Writing R Extensions. Here, we will describe them from a user’s point of view.
在编写R扩展中创建R包中描述了开发包的过程。在这里,我们将从用户的角度来描述它们。

To see which packages are installed at your site, issue the command
要查看站点上安装了哪些软件包,请发出以下命令

> library()

with no arguments. To load a particular package (e.g., the boot package containing functions from Davison & Hinkley (1997)), use a command like
没有争论为了加载特定的包(例如,包含Davison & Hinkley(1997)中函数的靴子包),使用如下命令

> library(boot)

Users connected to the Internet can use the install.packages() and update.packages() functions (available through the Packages menu in the Windows and macOS GUIs, see Installing packages in R Installation and Administration) to install and update packages.
连接到Internet的用户可以使用 install.packages()update.packages() 功能(可通过Windows和macOS GUI中的 Packages 菜单获得,请参阅R安装和管理中的安装包)来安装和更新包。

To see which packages are currently loaded, use
要查看当前加载了哪些包,请使用

> search()

to display the search list. Some packages may be loaded but not available on the search list (see Namespaces): these will be included in the list given by
显示搜索列表。有些软件包可能已经加载,但在搜索列表中不可用(请参阅搜索空间):这些软件包将包含在由

> loadedNamespaces()

To see a list of all available help topics in an installed package, use
若要查看已安装包中所有可用帮助主题的列表,请使用

> help.start()

to start the HTML help system, and then navigate to the package listing in the Reference section.
启动HTML帮助系统,然后导航到 Reference 部分中列出的软件包。


13.1 Standard packages
13.1标准包¶

The standard (or base) packages are considered part of the R source code. They contain the basic functions that allow R to work, and the datasets and standard statistical and graphical functions that are described in this manual. They should be automatically available in any R installation.
标准(或基础)包被认为是R源代码的一部分。它们包含允许R工作的基本功能,以及本手册中描述的数据集和标准统计和图形功能。它们应该在任何R安装中自动可用。

For a complete list, see R packages in R FAQ.
有关完整列表,请参见R FAQ中的R包。


13.2 Contributed packages and CRAN
13.2贡献包和CRAN ¶

There are thousands of contributed packages for R, written by many different authors. Some of these packages implement specialized statistical methods, others give access to data or hardware, and others are designed to complement textbooks. Some (the recommended packages) are distributed with every binary distribution of R. Most are available for download from CRAN (https://CRAN.R-project.org/ and its mirrors) and other repositories such as Bioconductor (https://www.bioconductor.org/). The R FAQ contains a list of CRAN packages current at the time of release, but the collection of available packages changes very frequently.
有成千上万的R软件包,由许多不同的作者编写。其中一些软件包实现了专门的统计方法,另一些软件包提供了数据或硬件访问,还有一些软件包旨在补充教科书。有些(推荐的软件包)随R的每个二进制发行版一起发行。大多数都可以从CRAN(https://CRAN.R-project.org/及其镜像)和其他存储库(如Bioconductor(https://www.bioconductor. org/))下载。R FAQ包含了一个CRAN软件包的列表,但可用的软件包的集合变化非常频繁。


13.3 Namespaces 13.3移动空间¶

Packages have namespaces, which do three things: they allow the package writer to hide functions and data that are meant only for internal use, they prevent functions from breaking when a user (or other package writer) picks a name that clashes with one in the package, and they provide a way to refer to an object within a particular package.
包有命名空间,它做三件事:它们允许包编写者隐藏只供内部使用的函数和数据,它们防止当用户(或其他包编写者)选择与包中的名称冲突的名称时函数中断,它们提供了一种引用特定包中对象的方法。

For example, t() is the transpose function in R, but users might define their own function named t. Namespaces prevent the user’s definition from taking precedence, and breaking every function that tries to transpose a matrix.
例如, t() 是R中的转置函数,但用户可以定义自己的函数 t 。空格可以防止用户的定义优先,并阻止每个试图转置矩阵的函数。

There are two operators that work with namespaces. The double-colon operator :: selects definitions from a particular namespace. In the example above, the transpose function will always be available as base::t, because it is defined in the base package. Only functions that are exported from the package can be retrieved in this way.
有两个操作符可以处理名称空间。双冒号操作符 :: 从特定的名称空间中选择定义。在上面的例子中,转置函数将始终作为 base::t 可用,因为它在 base 包中定义。只有从包中导出的函数才能以这种方式检索。

The triple-colon operator ::: may be seen in a few places in R code: it acts like the double-colon operator but also allows access to hidden objects. Users are more likely to use the getAnywhere() function, which searches multiple packages.
三冒号运算符 ::: 可以在R代码中的一些地方看到:它的作用类似于双冒号运算符,但也允许访问隐藏对象。用户更有可能使用 getAnywhere() 功能,该功能搜索多个软件包。

Packages are often inter-dependent, and loading one may cause others to be automatically loaded. The colon operators described above will also cause automatic loading of the associated package.
包通常是相互依赖的,加载一个包可能会导致其他包自动加载。上面描述的冒号操作符也会导致自动加载相关的包。

When packages with namespaces are loaded automatically they are not added to the search list.
当带有命名空间的包自动加载时,它们不会添加到搜索列表中。


14 OS facilities
14个操作系统设施¶

R has quite extensive facilities to access the OS under which it is running: this allows it to be used as a scripting language and that ability is much used by R itself, for example to install packages.
R有相当广泛的工具来访问它运行的操作系统:这允许它被用作脚本语言,并且R本身也经常使用这种能力,例如安装软件包。

Because R’s own scripts need to work across all platforms, considerable effort has gone into make the scripting facilities as platform-independent as is feasible.
由于R自己的脚本需要跨所有平台工作,因此在使脚本工具尽可能独立于平台方面做了大量工作。


14.1 Files and directories
14.1文件和目录¶

There are many functions to manipulate files and directories. Here are pointers to some of the more commonly used ones.
有许多函数可以操作文件和目录。这里有一些更常用的指针。

To create an (empty) file or directory, use file.create or dir.create. (These are the analogues of the POSIX utilities touch and mkdir.) For temporary files and directories in the R session directory see tempfile.
要创建(空)文件或目录,请使用 file.createdir.create 。(这些是POSIX实用程序 touchmkdir 的类似物。有关R会话目录中的临时文件和目录,请参见 tempfile

Files can be removed by either file.remove or unlink: the latter can remove directory trees.
文件可以通过 file.removeunlink 删除:后者可以删除目录树。

For directory listings use list.files (also available as dir) or list.dirs. These can select files using a regular expression: to select by wildcards use Sys.glob.
对于目录列表,请使用 list.files (也可用作 dir )或 list.dirs 。这些可以使用正则表达式选择文件:要通过通配符选择,请使用 Sys.glob

Many types of information on a filepath (including for example if it is a file or directory) can be found by file.info.
文件路径上的许多类型的信息(包括例如文件或目录)可以通过 file.info 找到。

There are several ways to find out if a file ‘exists’ (a file can exist on the filesystem and not be visible to the current user). There are functions file.exists, file.access and file_test with various versions of this test: file_test is a version of the POSIX test command for those familiar with shell scripting.
有几种方法可以确定文件是否“存在”(文件可以存在于文件系统中,但当前用户看不到)。这个测试有不同版本的函数 file.existsfile.accessfile_testfile_test 是POSIX test 命令的一个版本,适合那些熟悉shell脚本的人。

Function file.copy is the R analogue of the POSIX command cp.
函数 file.copy 是POSIX命令 cp 的R模拟。

Choosing files can be done interactively by file.choose: the Windows port has the more versatile functions choose.files and choose.dir and there are similar functions in the tcltk package: tk_choose.files and tk_choose.dir.
选择文件可以通过 file.choose 交互式完成:Windows端口具有更通用的功能 choose.fileschoose.dir ,tcltk包中也有类似的功能: tk_choose.filestk_choose.dir

Functions file.show and file.edit will display and edit one or more files in a way appropriate to the R port, using the facilities of a console (such as RGui on Windows or R.app on macOS) if one is in use.
函数 file.showfile.edit 将以适合R端口的方式显示和编辑一个或多个文件,使用控制台的工具(例如Windows上的RGui或macOS上的R.app)(如果正在使用)。

There is some support for links in the filesystem: see functions file.link and Sys.readlink.
在文件系统中有一些对链接的支持:参见函数 file.linkSys.readlink


14.2 Filepaths 14.2文件路径¶

With a few exceptions, R relies on the underlying OS functions to manipulate filepaths. Some aspects of this are allowed to depend on the OS, and do, even down to the version of the OS.
除了少数例外,R依赖于底层OS函数来操作文件路径。这方面的某些方面允许依赖于操作系统,甚至依赖于操作系统的版本。

There are POSIX standards for how OSes should interpret filepaths and many R users assume POSIX compliance: but Windows does not claim to be compliant and other OSes may be less than completely compliant.
对于操作系统如何解释文件路径,有POSIX标准,许多R用户假设POSIX兼容:但Windows并不声称是兼容的,其他操作系统可能不完全兼容。

The following are some issues which have been encountered with filepaths.
以下是文件路径遇到的一些问题。

  • POSIX filesystems are case-sensitive, so foo.png and Foo.PNG are different files. However, the defaults on Windows and macOS are to be case-insensitive, and FAT filesystems (commonly used on removable storage) are not normally case-sensitive (and all filepaths may be mapped to lower case).
    POSIX文件系统区分大小写,所以 foo.pngFoo.PNG 是不同的文件。但是,Windows和macOS上的默认值是不区分大小写的,FAT文件系统(通常用于可移动存储)通常不区分大小写(所有文件路径都可以映射为小写)。
  • Almost all the Windows’ OS services support the use of slash or backslash as the filepath separator, and R converts the known exceptions to the form required by Windows.
    几乎所有的Windows操作系统服务都支持使用斜杠或反斜杠作为文件路径分隔符,并且R将已知的异常转换为Windows所需的形式。
  • The behaviour of filepaths with a trailing slash is OS-dependent. Such paths are not valid on Windows and should not be expected to work. POSIX-2008 requires such paths to match only directories, but earlier versions allowed them to also match files. So they are best avoided.
    带有尾随斜杠的文件路径的行为取决于操作系统。这样的路径在Windows上无效,不应该预期工作。POSIX-2008要求这样的路径只匹配目录,但早期版本允许它们也匹配文件。所以最好避免。
  • Multiple slashes in filepaths such as /abc//def are valid on POSIX filesystems and treated as if there was only one slash. They are usually accepted by Windows’ OS functions. However, leading double slashes may have a different meaning.
    文件路径中的多个斜杠(如 /abc//def )在POSIX文件系统上是有效的,并被视为只有一个斜杠。它们通常被Windows的操作系统功能所接受。但是,前导双斜杠可能具有不同的含义。
  • Windows’ UNC filepaths (such as \\server\dir1\dir2\file and \\?\UNC\server\dir1\dir2\file) are not supported, but they may work in some R functions. POSIX filesystems are allowed to treat a leading double slash specially.
    不支持Windows的可扩展文件路径(例如 \\server\dir1\dir2\file\\?\UNC\server\dir1\dir2\file ),但它们可以在某些R函数中工作。POSIX文件系统允许特殊处理前导双斜线。
  • Windows allows filepaths containing drives and relative to the current directory on a drive, e.g. d:foo/bar refers to d:/a/b/c/foo/bar if the current directory on drive d: is /a/b/c. It is intended that these work, but the use of absolute paths is safer.
    Windows允许包含驱动器且相对于驱动器上当前目录的文件路径,例如,如果驱动器 d: 上的当前目录为 /a/b/c ,则 d:foo/bard:/a/b/c/foo/bar 。这些都是可行的,但使用绝对路径更安全。

Functions basename and dirname select parts of a file path: the recommended way to assemble a file path from components is file.path. Function pathexpand does ‘tilde expansion’, substituting values for home directories (the current user’s, and perhaps those of other users).
函数 basenamedirname 选择文件路径的一部分:推荐的从组件组装文件路径的方法是 file.path 。函数 pathexpand 执行“波浪线扩展”,替换主目录的值(当前用户的,可能还有其他用户的)。

On filesystems with links, a single file can be referred to by many filepaths. Function normalizePath will find a canonical filepath.
在带有链接的文件系统上,单个文件可以被多个文件路径引用。函数 normalizePath 将查找规范文件路径。

Windows has the concepts of short (‘8.3’) and long file names: normalizePath will return an absolute path using long file names and shortPathName will return a version using short names. The latter does not contain spaces and uses backslash as the separator, so is sometimes useful for exporting names from R.
Windows有短('8.3')和长文件名的概念: normalizePath 将使用长文件名返回绝对路径, shortPathName 将使用短文件名返回版本。后者不包含空格,并使用反斜杠作为分隔符,因此有时候对于从R导出名称很有用。

File permissions are a related topic. R has support for the POSIX concepts of read/write/execute permission for owner/group/all but this may be only partially supported on the filesystem, so for example on Windows only read-only files (for the account running the R session) are recognized.
文件权限是一个相关的主题。R支持所有者/组/所有者的读/写/执行权限的POSIX概念,但这可能只在文件系统上部分支持,因此例如在Windows上只识别只读文件(用于运行R会话的帐户)。

Access Control Lists (ACLs) are employed on several filesystems, but do not have an agreed standard and R has no facilities to control them. Use Sys.chmod to change permissions.
访问控制列表(Access Control List,简写为ACL)被应用在多个文件系统上,但没有一个一致的标准,R也没有控制它们的工具。使用 Sys.chmod 更改权限。


14.3 System commands
14.3系统命令¶

Functions system and system2 are used to invoke a system command and optionally collect its output. system2 is a little more general but its main advantage is that it is easier to write cross-platform code using it.
函数 systemsystem2 用于调用系统命令并可选地收集其输出。 system2 有点通用,但它的主要优点是使用它编写跨平台代码更容易。

system behaves differently on Windows from other OSes (because the API C call of that name does). Elsewhere it invokes a shell to run the command: the Windows port of R has a function shell to do that.
system 在Windows上的行为与其他操作系统不同(因为该名称的API C调用确实如此)。在其他地方,它调用一个shell来运行命令:R的Windows端口有一个函数 shell 来做这件事。

To find out if the OS includes a command, use Sys.which, which attempts to do this in a cross-platform way (unfortunately it is not a standard OS service).
要了解操作系统是否包含命令,请使用 Sys.which ,它试图以跨平台的方式执行此操作(不幸的是,它不是标准的操作系统服务)。

Function shQuote will quote filepaths as needed for commands in the current OS.
函数 shQuote 将根据当前操作系统中命令的需要引用文件路径。


14.4 Compression and Archives
14.4压缩和归档¶

Recent versions of R have extensive facilities to read and write compressed files, often transparently. Reading of files in R is to a very large extent done by connections, and the file function which is used to open a connection to a file (or a URL) and is able to identify the compression used from the ‘magic’ header of the file.
R的最新版本具有广泛的工具来读取和写入压缩文件,通常是透明的。在R中,文件的阅读在很大程度上是通过连接完成的,而 file 函数用于打开到文件(或URL)的连接,并且能够从文件的“magic”头中识别所使用的压缩。

The type of compression which has been supported for longest is gzip compression, and that remains a good general compromise. Files compressed by the earlier Unix compress utility can also be read, but these are becoming rare. Two other forms of compression, those of the bzip2 and xz utilities are also available. These generally achieve higher rates of compression (depending on the file, much higher) at the expense of slower decompression and much slower compression.
支持时间最长的压缩类型是 gzip 压缩,这仍然是一个很好的通用折衷方案。由早期的Unix compress 实用程序压缩的文件也可以读取,但这些正在变得越来越少。另外还有两种压缩形式,即 bzip2xz 实用程序。这些通常实现更高的压缩率(取决于文件,更高),但代价是更慢的解压缩和更慢的压缩。

There is some confusion between xz and lzma compression (see https://en.wikipedia.org/wiki/Xz and https://en.wikipedia.org/wiki/LZMA): R can read files compressed by most versions of either.
xzlzma 压缩之间有一些混淆(参见https://en.wikipedia.org/wiki/Xz和https://en.wikipedia.org/wiki/LZMA):R可以读取由大多数版本压缩的文件。

File archives are single files which contain a collection of files, the most common ones being ‘tarballs’ and zip files as used to distribute R packages. R can list and unpack both (see functions untar and unzip) and create both (for zip with the help of an external program).
文件存档是包含文件集合的单个文件,最常见的是用于分发R包的“tarball”和zip文件。R可以列出和解包两者(参见函数 untarunzip ),并创建两者(在外部程序的帮助下为 zip )。


Appendix A A sample session
附录A示例会话¶

The following session is intended to introduce to you some features of the R environment by using them. Many features of the system will be unfamiliar and puzzling at first, but this puzzlement will soon disappear.
下面的会话旨在通过使用它们向您介绍R环境的一些功能。系统的许多特性一开始会让人感到陌生和困惑,但这种困惑很快就会消失。

Start R appropriately for your platform (see Invoking R).
为您的平台适当地启动R(请参阅重新启动R)。

The R program begins, with a banner.
R程序以一个横幅开始。

(Within R code, the prompt on the left hand side will not be shown to avoid confusion.)
(在R代码中,为避免混淆,左手侧的提示将不显示。)

help.start()

Start the HTML interface to on-line help (using a web browser available at your machine). You should briefly explore the features of this facility with the mouse.
启动在线帮助的HTML界面(使用机器上可用的Web浏览器)。您应该用鼠标简要地浏览一下此工具的功能。

Iconify the help window and move on to the next part.
图标化帮助窗口并继续下一部分。

x <- rnorm(50)
y <- rnorm(x)

Generate two pseudo-random normal vectors of x- and y-coordinates.
生成x和y坐标的两个伪随机法向量。

plot(x, y)

Plot the points in the plane. A graphics window will appear automatically.
把这些点标绘在平面上。图形窗口将自动出现。

ls()

See which R objects are now in the R workspace.
查看哪些R对象现在在R工作区中。

rm(x, y)

Remove objects no longer needed. (Clean up).
删除不再需要的对象。(清理)。

x <- 1:20

Make x = (1, 2, ..., 20).
使x =(1,2,...,20)。

w <- 1 + sqrt(x)/2

A ‘weight’ vector of standard deviations.
标准差的“权重”向量。

dummy <- data.frame(x=x, y= x + rnorm(x)*w)
dummy

Make a data frame of two columns, x and y, and look at it.
制作一个包含x和y两列的数据框,然后观察它。

fm <- lm(y ~ x, data=dummy)
summary(fm)

Fit a simple linear regression and look at the analysis. With y to the left of the tilde, we are modelling y dependent on x.
拟合一个简单的线性回归,看看分析结果。在波浪号左边有 y ,我们对y依赖于x进行建模。

fm1 <- lm(y ~ x, data=dummy, weight=1/w^2)
summary(fm1)

Since we know the standard deviations, we can do a weighted regression.
因为我们知道标准差,我们可以做加权回归。

attach(dummy)

Make the columns in the data frame visible as variables.
使数据框中的列作为变量可见。

lrf <- lowess(x, y)

Make a nonparametric local regression function.
建立一个非参数局部回归函数。

plot(x, y)

Standard point plot.  标准点图。

lines(x, lrf$y)

Add in the local regression.
加入本地回归。

abline(0, 1, lty=3)

The true regression line: (intercept 0, slope 1).
真实回归线:(截距0,斜率1)。

abline(coef(fm))

Unweighted regression line.
未加权回归线。

abline(coef(fm1), col = "red")

Weighted regression line.
加权回归线。

detach()

Remove data frame from the search path.
从搜索路径中删除数据框。

plot(fitted(fm), resid(fm),
     xlab="Fitted values",
     ylab="Residuals",
     main="Residuals vs Fitted")

A standard regression diagnostic plot to check for heteroscedasticity. Can you see it?
用于检查异方差的标准回归诊断图。你能看见吗?

qqnorm(resid(fm), main="Residuals Rankit Plot")

A normal scores plot to check for skewness, kurtosis and outliers. (Not very useful here.)
一个正常的分数图,以检查偏度,峰度和离群值。(Not在这里很有用)。

rm(fm, fm1, lrf, x, dummy)

Clean up again.  再打扫一下。

The next section will look at data from the classical experiment of Michelson to measure the speed of light. This dataset is available in the morley object, but we will read it to illustrate the read.table function.
下一节将研究迈克尔逊测量光速的经典实验的数据。该数据集在 morley 对象中可用,但我们将阅读它来说明 read.table 函数。

filepath <- system.file("data", "morley.tab" , package="datasets")
filepath

Get the path to the data file.
获取数据文件的路径。

file.show(filepath)

Optional. Look at the file.
可选的.看看档案

mm <- read.table(filepath)
mm

Read in the Michelson data as a data frame, and look at it. There are five experiments (column Expt) and each has 20 runs (column Run) and sl is the recorded speed of light, suitably coded.
将迈克尔逊数据作为一个数据框读入,并查看它,其中有5个实验(第0列),每个实验有20次运行(第1列),第2列是记录的光速,适当编码。

mm$Expt <- factor(mm$Expt)
mm$Run <- factor(mm$Run)

Change Expt and Run into factors.
ExptRun 改为因子。

attach(mm)

Make the data frame visible at position 2 (the default).
使数据框在位置2可见(默认)。

plot(Expt, Speed, main="Speed of Light Data", xlab="Experiment No.")

Compare the five experiments with simple boxplots.
用简单的箱线图比较这五个实验。

fm <- aov(Speed ~ Run + Expt, data=mm)
summary(fm)

Analyze as a randomized block, with ‘runs’ and ‘experiments’ as factors.
作为随机区组进行分析,以“运行”和“实验”作为因子。

fm0 <- update(fm, . ~ . - Run)
anova(fm0, fm)

Fit the sub-model omitting ‘runs’, and compare using a formal analysis of variance.
拟合子模型,省略“运行”,并使用正式的方差分析进行比较。

detach()
rm(fm, fm0)

Clean up before moving on.
在离开前清理干净。

We now look at some more graphical features: contour and image plots.
现在我们来看看一些更多的图形特征:轮廓图和图像图。

x <- seq(-pi, pi, len=50)
y <- x

x is a vector of 50 equally spaced values in the interval [-pi\, pi]. y is the same.
x是在区间[-pi\,pi]中的50个等距值的向量。Y是一样的。

f <- outer(x, y, function(x, y) cos(y)/(1 + x^2))

f is a square matrix, with rows and columns indexed by x and y respectively, of values of the function cos(y)/(1 + x^2).
f是一个正方形矩阵,行和列分别由x和y索引,函数cos(y)/(1 + x^2)的值。

oldpar <- par(no.readonly = TRUE)
par(pty="s")

Save the plotting parameters and set the plotting region to “square”.
保存绘图参数并将绘图区域设置为“square”。

contour(x, y, f)
contour(x, y, f, nlevels=15, add=TRUE)

Make a contour map of f; add in more lines for more detail.
绘制f的等值线图;添加更多线条以获得更多细节。

fa <- (f-t(f))/2

fa is the “asymmetric part” of f. (t() is transpose).
fa 是f的“不对称部分”。( t() 是转置)。

contour(x, y, fa, nlevels=15)

Make a contour plot, …
绘制等高线图,...

par(oldpar)

… and restore the old graphics parameters.
.并恢复旧的图形参数。

image(x, y, f)
image(x, y, fa)

Make some high density image plots, (of which you can get hardcopies if you wish), …
制作一些高密度的图像图(如果你愿意,你可以得到硬拷贝),.

objects(); rm(x, y, f, fa)

… and clean up before moving on.
......在离开之前清理干净。

R can do complex arithmetic, also.
R也可以做复杂的算术。

th <- seq(-pi, pi, len=100)
z <- exp(1i*th)

1i is used for the complex number i.
1i 用于复数i。

par(pty="s")
plot(z, type="l")

Plotting complex arguments means plot imaginary versus real parts. This should be a circle.
绘制复杂的参数意味着绘制假想的部分与真实的部分。这应该是一个圆圈。

w <- rnorm(100) + rnorm(100)*1i

Suppose we want to sample points within the unit circle. One method would be to take complex numbers with standard normal real and imaginary parts …
假设我们想在单位圆内采样点。一种方法是把复数与标准正常的真实的和虚部...

w <- ifelse(Mod(w) > 1, 1/w, w)

… and to map any outside the circle onto their reciprocal.
......并将任何圈外的元素映射到它们的倒数上。

plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+",xlab="x", ylab="y")
lines(z)

All points are inside the unit circle, but the distribution is not uniform.
所有点都在单位圆内,但分布不均匀。

w <- sqrt(runif(100))*exp(2*pi*runif(100)*1i)
plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+", xlab="x", ylab="y")
lines(z)

The second method uses the uniform distribution. The points should now look more evenly spaced over the disc.
第二种方法使用均匀分布。现在,这些点在光盘上的间距看起来应该更均匀。

rm(th, w, z)

Clean up again.  再打扫一下。

q()

Quit the R program. You will be asked if you want to save the R workspace, and for an exploratory session like this, you probably do not want to save it.
退出R程序。系统会询问您是否要保存R工作区,对于这样的探索性会话,您可能不想保存它。


Appendix B Invoking R
附录B标记R ¶

Users of R on Windows or macOS should read the OS-specific section first, but command-line use is also supported.
Windows或macOS上的R用户应首先阅读操作系统特定部分,但也支持命令行使用。


B.1 Invoking R from the command line
B.1从命令行中调用R ¶

When working at a command line on UNIX or Windows, the command ‘R’ can be used both for starting the main R program in the form
当在UNIX或Windows上使用命令行时,命令' R '可以用于启动主R程序,格式如下:

R [options] [<infile] [>outfile],

or, via the R CMD interface, as a wrapper to various R tools (e.g., for processing files in R documentation format or manipulating add-on packages) which are not intended to be called “directly”.
或者,通过 R CMD 接口,作为各种R工具的包装器(例如,用于处理R文档格式的文件或操作附加软件包),这些软件包不打算“直接”调用。

At the Windows command-line, Rterm.exe is preferred to R.
在Windows命令行中, Rterm.exeR 更受欢迎。

You need to ensure that either the environment variable TMPDIR is unset or it points to a valid place to create temporary files and directories.
您需要确保环境变量 TMPDIR 未被设置,或者它指向一个有效的位置来创建临时文件和目录。

Most options control what happens at the beginning and at the end of an R session. The startup mechanism is as follows (see also the on-line help for topic ‘Startup’ for more information, and the section below for some Windows-specific details).
大多数选项控制R会话开始和结束时发生的事情。启动机制如下所示(有关详细信息,请参阅主题“ Startup ”的联机帮助,有关某些特定于Windows的详细信息,请参阅下面的部分)。

  • Unless --no-environ was given, R searches for user and site files to process for setting environment variables. The name of the site file is the one pointed to by the environment variable R_ENVIRON; if this is unset, R_HOME/etc/Renviron.site is used (if it exists). The user file is the one pointed to by the environment variable R_ENVIRON_USER if this is set; otherwise, files .Renviron in the current or in the user’s home directory (in that order) are searched for. These files should contain lines of the form ‘name=value’. (See help("Startup") for a precise description.) Variables you might want to set include R_PAPERSIZE (the default paper size), R_PRINTCMD (the default print command) and R_LIBS (specifies the list of R library trees searched for add-on packages).
    除非给出 --no-environ ,否则R搜索用户和站点文件以处理设置环境变量。站点文件的名称是环境变量 R_ENVIRON 指向的名称;如果未设置,则使用 R_HOME/etc/Renviron.site (如果存在)。如果设置了环境变量 R_ENVIRON_USER ,则用户文件是环境变量 R_ENVIRON_USER 指向的文件;否则,将搜索当前目录或用户主目录中的文件 .Renviron (按此顺序)。这些文件应包含格式为“ name=value ”的行。(See help("Startup") 为一个精确的描述。您可能想要设置的变量包括 R_PAPERSIZE (默认纸张大小), R_PRINTCMD (默认打印命令)和 R_LIBS (指定搜索附加软件包的R库树列表)。
  • Then R searches for the site-wide startup profile unless the command line option --no-site-file was given. The name of this file is taken from the value of the R_PROFILE environment variable. If that variable is unset, the default R_HOME/etc/Rprofile.site is used if this exists.
    然后R搜索站点范围的启动配置文件,除非给出命令行选项 --no-site-file 。该文件的名称取自 R_PROFILE 环境变量的值。如果该变量未设置,则使用默认值 R_HOME/etc/Rprofile.site (如果存在)。
  • Then, unless --no-init-file was given, R searches for a user profile and sources it. The name of this file is taken from the environment variable R_PROFILE_USER; if unset, a file called .Rprofile in the current directory or in the user’s home directory (in that order) is searched for.
    然后,除非指定了 --no-init-file ,否则R会搜索用户配置文件并将其作为源。该文件的名称取自环境变量 R_PROFILE_USER ;如果未设置,则会在当前目录或用户的主目录(按此顺序)中搜索名为 .Rprofile 的文件。
  • It also loads a saved workspace from file .RData in the current directory if there is one (unless --no-restore or --no-restore-data was specified).
    它还从当前目录中的文件 .RData 加载已保存的工作区(如果有)(除非指定了 --no-restore--no-restore-data )。
  • Finally, if a function .First() exists, it is executed. This function (as well as .Last() which is executed at the end of the R session) can be defined in the appropriate startup profiles, or reside in .RData.
    最后,如果函数 .First() 存在,则执行该函数。这个函数(以及在R会话结束时执行的 .Last() )可以在适当的启动配置文件中定义,或者驻留在 .RData 中。

In addition, there are options for controlling the memory available to the R process (see the on-line help for topic ‘Memory’ for more information). Users will not normally need to use these unless they are trying to limit the amount of memory used by R.
此外,还有控制R进程可用内存的选项(有关更多信息,请参阅主题' Memory '的在线帮助)。用户通常不需要使用这些,除非他们试图限制R使用的内存量。

R accepts the following command-line options.
R接受以下命令行选项。

--help
-h

Print short help message to standard output and exit successfully.
将简短的帮助消息打印到标准输出并成功退出。

--version

Print version information to standard output and exit successfully.
将版本信息打印到标准输出并成功退出。

--encoding=enc

Specify the encoding to be assumed for input from the console or stdin. This needs to be an encoding known to iconv: see its help page. (--encoding enc is also accepted.) The input is re-encoded to the locale R is running in and needs to be representable in the latter’s encoding (so e.g. you cannot re-encode Greek text in a French locale unless that locale uses the UTF-8 encoding).
指定从控制台或 stdin 输入时采用的编码。这需要是一个已知的编码 iconv :请参阅其帮助页面。( --encoding enc 也被接受。输入被重新编码到R运行的语言环境中,并且需要在后者的编码中可表示(例如,您不能在法语语言环境中重新编码希腊语文本,除非该语言环境使用UTF-8编码)。

RHOME

Print the path to the R “home directory” to standard output and exit successfully. Apart from the front-end shell script and the man page, R installation puts everything (executables, packages, etc.) into this directory.
将R“home directory”的路径打印到标准输出并成功退出。除了前端shell脚本和手册页之外,R安装将所有内容(可执行文件,包等)到这个目录。

--save
--no-save

Control whether data sets should be saved or not at the end of the R session. If neither is given in an interactive session, the user is asked for the desired behavior when ending the session with q(); in non-interactive use one of these must be specified or implied by some other option (see below).
控制是否应该在R会话结束时保存数据集。如果在交互式会话中两者都没有给出,则在非交互式使用中,当使用 q() 结束会话时,用户会被要求提供所需的行为,其中一个必须由其他选项指定或暗示(见下文)。

--no-environ

Do not read any user file to set environment variables.
不要读取任何用户文件来设置环境变量。

--no-site-file

Do not read the site-wide profile at startup.
不要在启动时读取站点范围的配置文件。

--no-init-file

Do not read the user’s profile at startup.
不要在启动时读取用户的配置文件。

--restore
--no-restore
--no-restore-data

Control whether saved images (file .RData in the directory where R was started) should be restored at startup or not. The default is to restore. (--no-restore implies all the specific --no-restore-* options.)
控制是否应在启动时恢复已保存的图像(R启动目录中的文件 .RData )。默认值是恢复。( --no-restore 意味着所有特定的 --no-restore-* 选项。

--no-restore-history

Control whether the history file (normally file .Rhistory in the directory where R was started, but can be set by the environment variable R_HISTFILE) should be restored at startup or not. The default is to restore.
控制是否在启动时恢复历史文件(通常是R启动目录中的文件 .Rhistory ,但可以通过环境变量 R_HISTFILE 设置)。默认值是恢复。

--no-Rconsole

(Windows only) Prevent loading the Rconsole file at startup.
(仅限Windows)防止在启动时加载 Rconsole 文件。

--vanilla

Combine --no-save, --no-environ, --no-site-file, --no-init-file and --no-restore. Under Windows, this also includes --no-Rconsole.
联合收割机 --no-save--no-environ--no-site-file--no-init-file--no-restore 。在Windows下,这也包括 --no-Rconsole

-f file
--file=file

(not Rgui.exe) Take input from file: ‘-’ means stdin. Implies --no-save unless --save has been set. On a Unix-alike, shell metacharacters should be avoided in file (but spaces are allowed).
(not Rgui.exe )从文件中获取输入:' - '表示 stdin 。表示 --no-save ,除非已设置 --save 。在类Unix系统上,shell元字符应该避免出现在文件中(但允许使用空格)。

-e expression

(not Rgui.exe) Use expression as an input line. One or more -e options can be used, but not together with -f or --file. Implies --no-save unless --save has been set. (There is a limit of 10,000 bytes on the total length of expressions used in this way. Expressions containing spaces or shell metacharacters will need to be quoted.)
(not Rgui.exe )使用表达式作为输入行。可以使用一个或多个 -e 选项,但不能与 -f--file 一起使用。表示 --no-save ,除非已设置 --save 。(以这种方式使用的表达式的总长度限制为10,000字节。包含空格或shell元字符的表达式需要加引号。)

--no-readline

(UNIX only) Turn off command-line editing via readline. This is useful when running R from within Emacs using the ESS (“Emacs Speaks Statistics”) package. See The command-line editor, for more information. Command-line editing is enabled for default interactive use (see --interactive). This option also affects tilde-expansion: see the help for path.expand.
(UNIX仅限)通过readline关闭命令行编辑。这在使用ESS(“Emacs Speaks Statistics”)包从Emacs中运行R时很有用。有关更多信息,请参见命令行编辑器。默认交互式使用启用命令行编辑(请参阅 --interactive )。此选项也会影响波浪号扩展:请参阅 path.expand 的帮助。

--min-vsize=N
--min-nsize=N

For expert use only: set the initial trigger sizes for garbage collection of vector heap (in bytes) and cons cells (number) respectively. Suffix ‘M’ specifies megabytes or millions of cells respectively. The defaults are 6Mb and 350k respectively and can also be set by environment variables R_NSIZE and R_VSIZE.
仅供专家使用:分别设置向量堆(字节)和cons单元(数量)垃圾收集的初始触发器大小。后缀' M '分别指定兆字节或百万个单元格。默认值分别为6Mb和350k,也可以通过环境变量 R_NSIZER_VSIZE 进行设置。

--max-ppsize=N

Specify the maximum size of the pointer protection stack as N locations. This defaults to 10000, but can be increased to allow large and complicated calculations to be done. Currently the maximum value accepted is 100000.
将指针保护堆栈的最大大小指定为N个位置。默认值为10000,但可以增加以允许进行大型和复杂的计算。目前接受的最大值是100000。

--quiet
--silent
-q

Do not print out the initial copyright and welcome messages.
不要打印出最初的版权和欢迎信息。

--no-echo

Make R run as quietly as possible. This option is intended to support programs which use R to compute results for them. It implies --quiet and --no-save.
让R尽可能安静地运行。此选项旨在支持使用R计算结果的程序。 --quiet--no-save

--interactive

(UNIX only) Assert that R really is being run interactively even if input has been redirected: use if input is from a FIFO or pipe and fed from an interactive program. (The default is to deduce that R is being run interactively if and only if stdin is connected to a terminal or pty.) Using -e, -f or --file asserts non-interactive use even if --interactive is given.
(UNIX only)断言R实际上正在交互式运行,即使输入已被重定向:如果输入来自FIFO或管道,并从交互式程序馈送,则使用。(The默认情况下,当且仅当 stdin 连接到终端或 pty 时,推断R正在交互式运行。使用 -e-f--file 断言非交互式使用,即使给出了 --interactive

Note that this does not turn on command-line editing.
请注意,这不会打开命令行编辑。

--ess

(Windows only) Set Rterm up for use by R-inferior-mode in ESS, including asserting interactive use (without the command-line editor) and no buffering of stdout.
(仅限Windows)设置 Rterm 供ESS中的 R-inferior-mode 使用,包括断言交互式使用(不使用命令行编辑器)和不缓冲 stdout

--verbose

Print more information about progress, and in particular set R’s option verbose to TRUE. R code uses this option to control the printing of diagnostic messages.
打印更多关于进度的信息,特别是将R的选项 verbose 设置为 TRUE 。R代码使用此选项控制诊断消息的打印。

--debugger=name
-d name

(UNIX only) Run R through debugger name. For most debuggers (the exceptions are valgrind and recent versions of gdb), further command line options are disregarded, and should instead be given when starting the R executable from inside the debugger.
(UNIX仅限)通过调试器名称运行R。对于大多数调试器(例外是 valgrind 和最新版本的 gdb ),进一步的命令行选项被忽略,而应该在从调试器内部启动R可执行文件时提供。

--gui=type
-g type

(UNIX only) Use type as graphical user interface (note that this also includes interactive graphics). Currently, possible values for type are ‘X11’ (the default) and, provided that ‘Tcl/Tk’ support is available, ‘Tk’. (For back-compatibility, ‘x11’ and ‘tk’ are accepted.)
(UNIX仅限)使用文字作为图形用户界面(注意,这也包括交互式图形)。当前,type的可能值为' X11 '(默认值),如果' Tcl/Tk '支持可用,则为' Tk '。(For向后兼容性,接受' x11 '和' tk '。

--arch=name

(UNIX only) Run the specified sub-architecture.
(UNIX仅限)运行指定的子体系结构。

--args

This flag does nothing except cause the rest of the command line to be skipped: this can be useful to retrieve values from it with commandArgs(TRUE).
这个标志除了导致命令行的其余部分被跳过之外什么也不做:这对于使用 commandArgs(TRUE) 从它检索值很有用。

Note that input and output can be redirected in the usual way (using ‘<’ and ‘>’), but the line length limit of 4095 bytes still applies. Warning and error messages are sent to the error channel (stderr).
请注意,输入和输出可以以通常的方式重定向(使用' < '和' > '),但4095字节的行长度限制仍然适用。警告和错误消息被发送到错误通道( stderr )。

The command R CMD allows the invocation of various tools which are useful in conjunction with R, but not intended to be called “directly”. The general form is
命令 R CMD 允许调用与R结合使用的各种工具,但不打算“直接”调用。一般形式是

R CMD command args

where command is the name of the tool and args the arguments passed on to it.
其中command是工具的名称,args是传递给它的参数。

Currently, the following tools are available.
目前,有以下工具可用。

BATCH

Run R in batch mode. Runs R --restore --save with possibly further options (see ?BATCH).
在批处理模式下运行R。选项 R --restore --save ,可能有更多选项(参见 ?BATCH )。

COMPILE

(UNIX only) Compile C, C++, Fortran … files for use with R.
(UNIX仅限)编译C、C++、Fortran.文件以用于R。

SHLIB

Build shared library for dynamic loading.
为动态加载构建共享库。

INSTALL

Install add-on packages.  安装附加组件包。

REMOVE

Remove add-on packages.  删除附加组件包。

build

Build (that is, package) add-on packages.
构建(即打包)附加组件包。

check

Check add-on packages.  检查附加软件包。

LINK

(UNIX only) Front-end for creating executable programs.
(UNIX仅限)用于创建可执行程序的前端。

Rprof

Post-process R profiling files.
后处理R分析文件。

Rdconv
Rd2txt

Convert Rd format to various other formats, including HTML, LaTeX, plain text, and extracting the examples. Rd2txt can be used as shorthand for Rd2conv -t txt.
将Rd格式转换为各种其他格式,包括HTML,LaTeX,纯文本,并提取示例。 Rd2txt 可以用作 Rd2conv -t txt 的简写。

Rd2pdf

Convert Rd format to PDF.
将Rd格式转换为PDF。

Stangle

Extract S/R code from Sweave or other vignette documentation
从Swave或其他小插曲文档中提取S/R代码

Sweave

Process Sweave or other vignette documentation
处理Swave或其他小插曲文件

Rdiff

Diff R output ignoring headers etc
Diff R输出忽略标头等

config

Obtain configuration information
获得配置信息

javareconf

(Unix only) Update the Java configuration variables
(Unix仅限)更新Java配置变量

rtags

(Unix only) Create Emacs-style tag files from C, R, and Rd files
(Unix仅限)从C、R和Rd文件创建Emacs样式的标记文件

open

(Windows only) Open a file via Windows’ file associations
(仅限Windows)通过Windows的文件关联打开文件

texify

(Windows only) Process (La)TeX files with R’s style files
(仅限Windows)使用R的样式文件处理(La)TeX文件

Use  使用

R CMD command --help

to obtain usage information for each of the tools accessible via the R CMD interface.
以获得可经由 R CMD 接口访问的每个工具的使用信息。

In addition, you can use options --arch=, --no-environ, --no-init-file, --no-site-file and --vanilla between R and CMD: these affect any R processes run by the tools. (Here --vanilla is equivalent to --no-environ --no-site-file --no-init-file.) However, note that R CMD does not of itself use any R startup files (in particular, neither user nor site Renviron files), and all of the R processes run by these tools (except BATCH) use --no-restore. Most use --vanilla and so invoke no R startup files: the current exceptions are INSTALL, REMOVE, Sweave and SHLIB (which uses --no-site-file --no-init-file).
此外,您可以在 RCMD 之间使用选项 --arch=--no-environ--no-init-file--no-site-file--vanilla :这些选项会影响工具运行的任何R进程。(Here --vanilla 相当于 --no-environ --no-site-file --no-init-file 。然而,请注意 R CMD 本身并不使用任何R启动文件(特别是,无论是用户还是站点 Renviron 文件),并且这些工具运行的所有R进程(除了 BATCH )都使用 --no-restore 。大多数使用 --vanilla ,因此不调用R启动文件:当前的例外是 INSTALLREMOVESweaveSHLIB (使用 --no-site-file --no-init-file )。

R CMD cmd args

for any other executable cmd on the path or given by an absolute filepath: this is useful to have the same environment as R or the specific commands run under, for example to run ldd or pdflatex. Under Windows cmd can be an executable or a batch file, or if it has extension .sh or .pl the appropriate interpreter (if available) is called to run it.
对于路径上或由绝对文件路径给出的任何其他可执行文件 cmd :这对于具有与R相同的环境或特定命令运行环境非常有用,例如运行 lddpdflatex 。在Windows下,cmd可以是可执行文件或批处理文件,或者如果它具有扩展名 .sh.pl ,则调用适当的解释器(如果可用)来运行它。


B.2 Invoking R under Windows
B.2在Windows下编译R ¶

There are two ways to run R under Windows. Within a terminal window (e.g. cmd.exe or a more capable shell), the methods described in the previous section may be used, invoking by R.exe or more directly by Rterm.exe. For interactive use, there is a console-based GUI (Rgui.exe).
在Windows下运行R有两种方法。在一个终端窗口(例如 cmd.exe 或一个更强大的shell)中,可以使用上一节中描述的方法,由 R.exe 或更直接地由 Rterm.exe 调用。对于交互式使用,有一个基于控制台的GUI( Rgui.exe )。

The startup procedure under Windows is very similar to that under UNIX, but references to the ‘home directory’ need to be clarified, as this is not always defined on Windows. If the environment variable R_USER is defined, that gives the home directory. Next, if the environment variable HOME is defined, that gives the home directory. After those two user-controllable settings, R tries to find system defined home directories. It first tries to use the Windows "personal" directory (typically My Documents in recent versions of Windows). If that fails, and environment variables HOMEDRIVE and HOMEPATH are defined (and they normally are) these define the home directory. Failing all those, the home directory is taken to be the starting directory.
Windows下的启动过程与UNIX下的启动过程非常相似,但需要澄清对“主目录”的引用,因为Windows上并不总是定义此目录。如果定义了环境变量 R_USER ,则会给出主目录。接下来,如果定义了环境变量 HOME ,则会给出主目录。在这两个用户可控制的设置之后,R试图找到系统定义的主目录。它首先尝试使用Windows“个人”目录(在最近版本的Windows中通常为 My Documents )。如果失败,则定义环境变量 HOMEDRIVEHOMEPATH (它们通常是),这些定义了主目录。如果所有这些都失败,则将home目录作为起始目录。

You need to ensure that either the environment variables TMPDIR, TMP and TEMP are either unset or one of them points to a valid place to create temporary files and directories.
您需要确保环境变量 TMPDIRTMPTEMP 未设置,或者其中一个指向有效的位置以创建临时文件和目录。

Environment variables can be supplied as ‘name=value’ pairs on the command line.
环境变量可以在命令行中作为' name=value '对提供。

If there is an argument ending .RData (in any case) it is interpreted as the path to the workspace to be restored: it implies --restore and sets the working directory to the parent of the named file. (This mechanism is used for drag-and-drop and file association with RGui.exe, but also works for Rterm.exe. If the named file does not exist it sets the working directory if the parent directory exists.)
如果有一个以 .RData 结尾的参数(在任何情况下),它都被解释为要恢复的工作区的路径:它意味着 --restore 并将工作目录设置为命名文件的父目录。(This机制用于与 RGui.exe 的拖放和文件关联,但也适用于 Rterm.exe 。如果指定的文件不存在,则在父目录存在的情况下设置工作目录。

The following additional command-line options are available when invoking RGui.exe.
调用 RGui.exe 时,可以使用以下附加命令行选项。

--mdi
--sdi
--no-mdi

Control whether Rgui will operate as an MDI program (with multiple child windows within one main window) or an SDI application (with multiple top-level windows for the console, graphics and pager). The command-line setting overrides the setting in the user’s Rconsole file.
控制 Rgui 是否将作为MDI程序(在一个主窗口中具有多个子窗口)或SDI应用程序(具有用于控制台、图形和寻呼机的多个顶级窗口)运行。命令行设置将覆盖用户的 Rconsole 文件中的设置。

--debug

Enable the “Break to debugger” menu item in Rgui, and trigger a break to the debugger during command line processing.
Rgui 中启用“Break to debugger”菜单项,并在命令行处理期间触发对调试器的中断。

Under Windows with R CMD you may also specify your own .bat, .exe, .sh or .pl file. It will be run under the appropriate interpreter (Perl for .pl) with several environment variables set appropriately, including R_HOME, R_OSTYPE, PATH, BSTINPUTS and TEXINPUTS. For example, if you already have latex.exe on your path, then
在Windows下,您还可以使用 R CMD 指定自己的 .bat.exe.sh.pl 文件。它将在适当的解释器(Perl for .pl )下运行,并适当设置了几个环境变量,包括 R_HOMER_OSTYPEPATHBSTINPUTSTEXINPUTS 。例如,如果您的路径上已经有 latex.exe ,则

R CMD latex.exe mydoc

will run LaTeX on mydoc.tex, with the path to R’s share/texmf macros appended to TEXINPUTS. (Unfortunately, this does not help with the MiKTeX build of LaTeX, but R CMD texify mydoc will work in that case.)
将在 mydoc.tex 上运行LaTeX,并将R的 share/texmf 宏的路径附加到 TEXINPUTS 。(不幸的是,这对LaTeX的MiKTeX构建没有帮助,但在这种情况下 R CMD texify mydoc 将起作用。


B.3 Invoking R under macOS
B.3在macOS下编译R ¶

There are two ways to run R under macOS. Within a Terminal.app window by invoking R, the methods described in the first subsection apply. There is also console-based GUI (R.app) that by default is installed in the Applications folder on your system. It is a standard double-clickable macOS application.
在macOS下运行R有两种方法。在 Terminal.app 窗口中,通过调用 R ,应用第一小节中描述的方法。还有基于控制台的GUI( R.app ),默认情况下安装在系统上的 Applications 文件夹中。它是一个标准的双击macOS应用程序。

The startup procedure under macOS is very similar to that under UNIX, but R.app does not make use of command-line arguments.
macOS下的启动过程与UNIX下的启动过程非常相似,但 R.app 不使用命令行参数。

The ‘home directory’ is the one inside the R.framework, but the startup and current working directory are set as the user’s home directory unless a different startup directory is given in the Preferences window accessible from within the GUI.
“home directory”是R.framework内部的目录,但是启动和当前工作目录被设置为用户的home目录,除非在GUI中的首选项窗口中提供了不同的启动目录。


B.4 Scripting with R
B.4使用R编写脚本¶

If you just want to run a file foo.R of R commands, the recommended way is to use R CMD BATCH foo.R. If you want to run this in the background or as a batch job use OS-specific facilities to do so: for example in most shells on Unix-alike OSes R CMD BATCH foo.R & runs a background job.
如果你只是想运行一个文件 foo.R 的R命令,推荐的方法是使用 R CMD BATCH foo.R 。如果你想在后台运行或者作为一个批处理作业,使用特定于操作系统的工具来实现:例如在大多数Unix类操作系统的shell中 R CMD BATCH foo.R & 运行一个后台作业。

You can pass parameters to scripts via additional arguments on the command line: for example (where the exact quoting needed will depend on the shell in use)
您可以通过命令行上的附加参数将参数传递给脚本:例如(其中所需的确切引用将取决于所使用的shell)

R CMD BATCH "--args arg1 arg2" foo.R &

will pass arguments to a script which can be retrieved as a character vector by
将把参数传递给一个脚本,该脚本可以作为字符向量被检索,

args <- commandArgs(TRUE)

This is made simpler by the alternative front-end Rscript, which can be invoked by
这通过替代前端 Rscript 变得更简单,它可以由

Rscript foo.R arg1 arg2

and this can also be used to write executable script files like (at least on Unix-alikes, and in some Windows shells)
这也可以用来编写可执行的脚本文件,如(至少在Unix类和一些Windows shell中)

#! /path/to/Rscript
args <- commandArgs(TRUE)
...
q(status=<exit status code>)

If this is entered into a text file runfoo and this is made executable (by chmod 755 runfoo), it can be invoked for different arguments by
如果这被输入到一个文本文件 runfoo 中,并且这是可执行的(通过 chmod 755 runfoo ),它可以被调用为不同的参数,通过

runfoo arg1 arg2

For further options see help("Rscript"). This writes R output to stdout and stderr, and this can be redirected in the usual way for the shell running the command.
更多选项请参见 help("Rscript") 。这将R输出写入 stdoutstderr ,并且可以以运行命令的shell的常用方式重定向。

If you do not wish to hardcode the path to Rscript but have it in your path (which is normally the case for an installed R except on Windows, but e.g. macOS users may need to add /usr/local/bin to their path), use
如果您不希望将路径硬编码为 Rscript ,而是将其放在您的路径中(通常情况下,除了在Windows上安装R,但例如macOS用户可能需要将 /usr/local/bin 添加到其路径中),请使用

#! /usr/bin/env Rscript
...

At least in Bourne and bash shells, the #! mechanism does not allow extra arguments like #! /usr/bin/env Rscript --vanilla.
至少在Bourne和bash shell中, #! 机制不允许像 #! /usr/bin/env Rscript --vanilla 这样的额外参数。

One thing to consider is what stdin() refers to. It is commonplace to write R scripts with segments like
有一件事要考虑的是 stdin() 指的是什么。编写带有如下段的R脚本是很常见的

chem <- scan(n=24)
2.90 3.10 3.40 3.40 3.70 3.70 2.80 2.50 2.40 2.40 2.70 2.20
5.28 3.37 3.03 3.03 28.95 3.77 3.40 2.20 3.50 3.60 3.70 3.70

and stdin() refers to the script file to allow such traditional usage. If you want to refer to the process’s stdin, use "stdin" as a file connection, e.g. scan("stdin", ...).
stdin() 引用脚本文件以允许这种传统用法。如果你想引用进程的 stdin ,使用 "stdin" 作为 file 连接,例如 scan("stdin", ...)

Another way to write executable script files (suggested by François Pinard) is to use a here document like
编写可执行脚本文件的另一种方法(由FrançoisPinard建议)是使用一个here文档,如

#!/bin/sh
[environment variables can be set here]
R --no-echo [other options] <<EOF

   R program goes here...

EOF

but here stdin() refers to the program source and "stdin" will not be usable.
但是这里 stdin() 指的是程序源,而 "stdin" 将不可用。

Short scripts can be passed to Rscript on the command-line via the -e flag. (Empty scripts are not accepted.)
短脚本可以通过 -e 标志传递给命令行上的 Rscript 。(不接受空脚本。)

Note that on a Unix-alike the input filename (such as foo.R) should not contain spaces nor shell metacharacters.
注意,在类Unix系统上,输入文件名(如 foo.R )不应包含空格或shell元字符。


Appendix C The command-line editor
附录C命令行编辑器¶

C.1 Preliminaries C.1目录¶

When the GNU readline library is available at the time R is configured for compilation under UNIX, an inbuilt command line editor allowing recall, editing and re-submission of prior commands is used. Note that other versions of readline exist and may be used by the inbuilt command line editor: this is most common on macOS. You can find out which version (if any) is available by running extSoftVersion() in an R session.
当GNU readline库在R配置为在UNIX下编译时可用时,使用内置的命令行编辑器,允许调用,编辑和重新提交先前的命令。请注意,存在其他版本的readline,并且可能由内置命令行编辑器使用:这在macOS上最常见。您可以通过在R会话中运行 extSoftVersion() 来找出可用的版本(如果有的话)。

It can be disabled (useful for usage with ESS 25) using the startup option --no-readline.
可以使用启动选项 --no-readline 禁用它(对于ESS 25 使用很有用)。

Windows versions of R have somewhat simpler command-line editing: see ‘Console’ under the ‘Help’ menu of the GUI, and the file README.Rterm for command-line editing under Rterm.exe.
Windows版本的R有更简单的命令行编辑:参见GUI的“ Help ”菜单下的“ Console ”,以及用于命令行编辑的文件 Rterm.exe

When using R with GNU26 readline capabilities, the functions described below are available, as well as others (probably) documented in man readline or info readline on your system.
当使用带有GNU 26 readline功能的R时,可以使用下面描述的函数,以及在您的系统上的 man readlineinfo readline 中(可能)记录的其他函数。

Many of these use either Control or Meta characters. Control characters, such as Control-m, are obtained by holding the CTRL down while you press the m key, and are written as C-m below. Meta characters, such as Meta-b, are typed by holding down META27 and pressing b, and written as M-b in the following. If your terminal does not have a META key enabled, you can still type Meta characters using two-character sequences starting with ESC. Thus, to enter M-b, you could type ESCb. The ESC character sequences are also allowed on terminals with real Meta keys. Note that case is significant for Meta characters.
其中许多使用Control或Meta字符。控制字符,如 Control-m ,是通过按住 CTRL 同时按下 m 键获得的,并在下面写为 C-m 。Meta字符(如 Meta-b )通过按住 META27 并按下 b 来键入,并在下文中写作 M-b 。如果您的终端没有启用 META 键,您仍然可以使用从 ESC 开始的两个字符序列键入Meta字符。因此,要输入 M-b ,您可以键入 ESC b 。在具有真实的Meta键的终端上也允许使用 ESC 字符序列。请注意,大小写对于Meta字符很重要。

Some but not all versions28 of readline will recognize resizing of the terminal window so this is best avoided.
一些但不是所有版本的readline 28 会识别终端窗口的错误,因此最好避免这种情况。

C.2 Editing actions
C.2编辑操作¶

The R program keeps a history of the command lines you type, including the erroneous lines, and commands in your history may be recalled, changed if necessary, and re-submitted as new commands.
R程序保留了您键入的命令行的历史记录,包括错误的行,并且历史记录中的命令可能会被调用,必要时进行更改,并作为新命令重新提交。

In Emacs-style command-line editing any straight typing you do while in this editing phase causes the characters to be inserted in the command you are editing, displacing any characters to the right of the cursor. In vi mode character insertion mode is started by M-i or M-a, characters are typed and insertion mode is finished by typing a further ESC. (The default is Emacs-style, and only that is described here: for vi mode see the readline documentation.)
在Emacs样式的命令行编辑中,在此编辑阶段执行的任何直接键入操作都会导致在正在编辑的命令中插入字符,从而将任何字符替换到光标右侧。在vi模式下,字符插入模式由 M-iM-a 开始,输入字符,然后再输入 ESC 完成插入模式。(The默认为Emacs-style,这里只描述了这一点:对于vi模式,请参阅readline文档。)

Pressing the RET command at any time causes the command to be re-submitted.
任何时候按下 RET 命令都会导致重新提交命令。

Other editing actions are summarized in the following table.
下表汇总了其他编辑操作。

C.3 Command-line editor summary
C.3命令行编辑器摘要¶

Command recall and vertical motion
命令召回和垂直运动¶

C-p

Go to the previous command (backwards in the history).
返回上一个命令(在历史记录中向后)。

C-n

Go to the next command (forwards in the history).
转到下一个命令(在历史中向前)。

C-r text

Find the last command with the text string in it. This can be cancelled by C-g (and on some versions of R by C-c).
找到最后一个包含文本字符串的命令。这可以通过 C-g 取消(在某些版本的R中可以通过 C-c 取消)。

On most terminals, you can also use the up and down arrow keys instead of C-p and C-n, respectively.
在大多数终端上,您也可以使用向上和向下箭头键分别代替 C-pC-n

Horizontal motion of the cursor
光标的水平移动¶

C-a

Go to the beginning of the command.
转到命令的开头。

C-e

Go to the end of the line.
到队伍的尽头去。

M-b

Go back one word.  后退一个字。

M-f

Go forward one word.  向前走一个字。

C-b

Go back one character.  返回一个字符。

C-f

Go forward one character.
前进一个字符。

On most terminals, you can also use the left and right arrow keys instead of C-b and C-f, respectively.
在大多数终端上,您也可以使用左箭头键和右箭头键,而不是分别使用 C-bC-f

Editing and re-submission
编辑和重新提交¶

text

Insert text at the cursor.
在光标处插入文本。

C-f text

Append text after the cursor.
在光标后追加文本。

DEL

Delete the previous character (left of the cursor).
删除前一个字符(光标左侧)。

C-d

Delete the character under the cursor.
删除光标下的字符。

M-d

Delete the rest of the word under the cursor, and “save” it.
删除光标下的其余单词,并保存它。

C-k

Delete from cursor to end of command, and “save” it.
从光标到命令末尾删除,然后保存它。

C-y

Insert (yank) the last “saved” text here.
在此处插入(猛拉)最后一个“保存”文本。

C-t

Transpose the character under the cursor with the next.
将光标下的字符与下一个字符调换位置。

M-l

Change the rest of the word to lower case.
把这个词的其余部分改为小写。

M-c

Change the rest of the word to upper case.
把这个词的其余部分改为大写。

RET

Re-submit the command to R.
将命令重新提交给R。

The final RET terminates the command line editing sequence.
最后的 RET 终止命令行编辑序列。

The readline key bindings can be customized in the usual way via a ~/.inputrc file. These customizations can be conditioned on application R, that is by including a section like
readline键绑定可以通过一个 ~/.inputrc 文件以通常的方式进行定制。这些定制可以根据应用程序 R 进行调整,也就是说,通过包含一个类似

$if R
  "\C-xd": "q('no')\n"
$endif

Appendix D Function and variable index
附录D函数和变量索引¶

Jump to:    跳转到:-   :   !   ?   .   *   /   &   %   ^   +   <   =   >   |   ~ -:!?.*/&%^+ <=>|~  
A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X  
Index Entry 索引条目Section 部分

-
-Vector arithmetic  向量算术

:
:Generating regular sequences
生成正则序列
::Namespaces  命名空间
:::Namespaces  命名空间

!
!Logical vectors  逻辑向量
!=Logical vectors  逻辑向量

?
?Getting help  获得帮助
??Getting help  获得帮助

.
.Updating fitted models  更新拟合模型
.FirstCustomizing the environment
自定义环境
.LastCustomizing the environment
自定义环境

*
*Vector arithmetic  向量算术

/
/Vector arithmetic  向量算术

&
&Logical vectors  逻辑向量
&&Conditional execution  条件执行

%
%*%Multiplication  乘法
%o%The outer product of two arrays
两个数组的外积

^
^Vector arithmetic  向量算术

+
+Vector arithmetic  向量算术

<
<Logical vectors  逻辑向量
<<-Scope  范围
<=Logical vectors  逻辑向量

=
==Logical vectors  逻辑向量

>
>Logical vectors  逻辑向量
>=Logical vectors  逻辑向量

|
|Logical vectors  逻辑向量
||Conditional execution  条件执行

~
~Formulae for statistical models
统计模型公式

A
ablineLow-level plotting commands
低级打印命令
aceSome non-standard models
一些非标准型号
add1Updating fitted models  更新拟合模型
anovaGeneric functions for extracting model information
用于提取模型信息的通用函数
anovaANOVA tables  ANOVA表
aovAnalysis of variance and model comparison
方差分析和模型比较
apermGeneralized transpose of an array
数组的广义转置
arrayThe array() function  array()函数
as.data.frameMaking data frames  制作数据框
as.vectorThe concatenation function c() with arrays
连接函数c()与数组
attachattach() and detach()  attach()和detach()
attrGetting and setting attributes
获取和设置属性
attrGetting and setting attributes
获取和设置属性
attributesGetting and setting attributes
获取和设置属性
attributesGetting and setting attributes
获取和设置属性
avasSome non-standard models
一些非标准型号
axisLow-level plotting commands
低级打印命令

B
boxplotOne- and two-sample tests
单样本和双样本检验
breakRepetitive execution  重复执行
brutoSome non-standard models
一些非标准型号

C
cVectors and assignment  向量和赋值
cCharacter vectors  特征向量
cThe concatenation function c() with arrays
连接函数c()与数组
cConcatenating lists  连接列表
CContrasts  对比
cbindForming partitioned matrices
形成分块矩阵
coefGeneric functions for extracting model information
用于提取模型信息的通用函数
coefficientsGeneric functions for extracting model information
用于提取模型信息的通用函数
contourDisplay graphics  显示图形
contrastsContrasts  对比
coplotDisplaying multivariate data
多变量数据
cosVector arithmetic  向量算术
crossprodIndex matrices  指数矩阵
crossprodMultiplication  乘法
cutFrequency tables from factors
因子频率表

D
dataAccessing builtin datasets
内置数据集
data.frameMaking data frames  制作数据框
densityExamining the distribution of a set of data
检查一组数据的分布
detSingular value decomposition and determinants
奇异值分解与行列式
detachattach() and detach()  attach()和detach()
determinantSingular value decomposition and determinants
奇异值分解与行列式
dev.listMultiple graphics devices
多个图形装置
dev.nextMultiple graphics devices
多个图形装置
dev.offMultiple graphics devices
多个图形装置
dev.prevMultiple graphics devices
多个图形装置
dev.setMultiple graphics devices
多个图形装置
devianceGeneric functions for extracting model information
用于提取模型信息的通用函数
diagMultiplication  乘法
dimArrays  阵列
dotchartDisplay graphics  显示图形
drop1Updating fitted models  更新拟合模型

E
ecdfExamining the distribution of a set of data
检查一组数据的分布
editEditing data  编辑数据
eigenEigenvalues and eigenvectors
特征值和特征向量
elseConditional execution  条件执行
ErrorAnalysis of variance and model comparison
方差分析和模型比较
exampleGetting help  获得帮助
expVector arithmetic  向量算术

F
FLogical vectors  逻辑向量
factorFactors  因素
FALSELogical vectors  逻辑向量
fivenumExamining the distribution of a set of data
检查一组数据的分布
forRepetitive execution  重复执行
formulaGeneric functions for extracting model information
用于提取模型信息的通用函数
functionWriting your own functions
编写自己的函数

G
getAnywhereObject orientation  面向对象
getS3methodObject orientation  面向对象
glmThe glm() function  glm()函数

H
helpGetting help  获得帮助
helpGetting help  获得帮助
help.searchGetting help  获得帮助
help.startGetting help  获得帮助
histExamining the distribution of a set of data
检查一组数据的分布
histDisplay graphics  显示图形

I
identifyInteracting with graphics
与图形交互
ifConditional execution  条件执行
ifConditional execution  条件执行
ifelseConditional execution  条件执行
imageDisplay graphics  显示图形
is.naMissing values  缺失值
is.nanMissing values  缺失值

J
jpegDevice drivers  设备驱动程序

K
ks.testExamining the distribution of a set of data
检查一组数据的分布

L
legendLow-level plotting commands
低级打印命令
lengthVector arithmetic  向量算术
lengthThe intrinsic attributes mode and length
内部属性模式和长度
levelsFactors  因素
linesLow-level plotting commands
低级打印命令
listLists  列出
lmLinear models  线性模型
lmeSome non-standard models
一些非标准型号
locatorInteracting with graphics
与图形交互
loessSome non-standard models
一些非标准型号
loessSome non-standard models
一些非标准型号
logVector arithmetic  向量算术
lqsSome non-standard models
一些非标准型号
lsfitLeast squares fitting and the QR decomposition
最小二乘拟合和QR分解

M
marsSome non-standard models
一些非标准型号
maxVector arithmetic  向量算术
meanVector arithmetic  向量算术
methodsObject orientation  面向对象
minVector arithmetic  向量算术
modeThe intrinsic attributes mode and length
内部属性模式和长度

N
NAMissing values  缺失值
NaNMissing values  缺失值
ncolMatrix facilities  矩阵设施
nextRepetitive execution  重复执行
nlmNonlinear least squares and maximum likelihood models
非线性最小二乘和最大似然模型
nlmLeast squares  最小二乘
nlmMaximum likelihood  最大似然
nlmeSome non-standard models
一些非标准型号
nlminbNonlinear least squares and maximum likelihood models
非线性最小二乘和最大似然模型
nrowMatrix facilities  矩阵设施

O
optimNonlinear least squares and maximum likelihood models
非线性最小二乘和最大似然模型
orderVector arithmetic  向量算术
orderedOrdered factors  有序因子
orderedOrdered factors  有序因子
outerThe outer product of two arrays
两个数组的外积

P
pairsDisplaying multivariate data
多变量数据
parThe par() function  par()函数
pasteCharacter vectors  特征向量
pdfDevice drivers  设备驱动程序
perspDisplay graphics  显示图形
plotGeneric functions for extracting model information
用于提取模型信息的通用函数
plotThe plot() function  plot()函数
pmaxVector arithmetic  向量算术
pminVector arithmetic  向量算术
pngDevice drivers  设备驱动程序
pointsLow-level plotting commands
低级打印命令
polygonLow-level plotting commands
低级打印命令
postscriptDevice drivers  设备驱动程序
predictGeneric functions for extracting model information
用于提取模型信息的通用函数
printGeneric functions for extracting model information
用于提取模型信息的通用函数
prodVector arithmetic  向量算术

Q
qqlineExamining the distribution of a set of data
检查一组数据的分布
qqlineDisplay graphics  显示图形
qqnormExamining the distribution of a set of data
检查一组数据的分布
qqnormDisplay graphics  显示图形
qqplotDisplay graphics  显示图形
qrLeast squares fitting and the QR decomposition
最小二乘拟合和QR分解
quartzDevice drivers  设备驱动程序

R
rangeVector arithmetic  向量算术
rbindForming partitioned matrices
形成分块矩阵
read.tableThe read.table() function
read.table()函数
repGenerating regular sequences
生成正则序列
repeatRepetitive execution  重复执行
residGeneric functions for extracting model information
用于提取模型信息的通用函数
residualsGeneric functions for extracting model information
用于提取模型信息的通用函数
rlmSome non-standard models
一些非标准型号
rmData permanency and removing objects
数据永久性和删除对象

S
scanThe scan() function  扫描()函数
sdThe function tapply() and ragged arrays
函数tapply()和参差不齐的数组
searchManaging the search path
管理搜索路径
seqGenerating regular sequences
生成正则序列
shapiro.testExamining the distribution of a set of data
检查一组数据的分布
sinVector arithmetic  向量算术
sinkExecuting commands from or diverting output to a file
从文件执行命令或将输出转移到文件
solveLinear equations and inversion
线性方程组与反演
sortVector arithmetic  向量算术
sourceExecuting commands from or diverting output to a file
从文件执行命令或将输出转移到文件
splitRepetitive execution  重复执行
sqrtVector arithmetic  向量算术
stemExamining the distribution of a set of data
检查一组数据的分布
stepGeneric functions for extracting model information
用于提取模型信息的通用函数
stepUpdating fitted models  更新拟合模型
sumVector arithmetic  向量算术
summaryExamining the distribution of a set of data
检查一组数据的分布
summaryGeneric functions for extracting model information
用于提取模型信息的通用函数
svdSingular value decomposition and determinants
奇异值分解与行列式

T
TLogical vectors  逻辑向量
tGeneralized transpose of an array
数组的广义转置
t.testOne- and two-sample tests
单样本和双样本检验
tableIndex matrices  指数矩阵
tableFrequency tables from factors
因子频率表
tanVector arithmetic  向量算术
tapplyThe function tapply() and ragged arrays
函数tapply()和参差不齐的数组
textLow-level plotting commands
低级打印命令
titleLow-level plotting commands
低级打印命令
treeSome non-standard models
一些非标准型号
TRUELogical vectors  逻辑向量

U
unclassThe class of an object
对象的类别
updateUpdating fitted models  更新拟合模型

V
varVector arithmetic  向量算术
varThe function tapply() and ragged arrays
函数tapply()和参差不齐的数组
var.testOne- and two-sample tests
单样本和双样本检验
vcovGeneric functions for extracting model information
用于提取模型信息的通用函数
vectorVectors and assignment  向量和赋值

W
whileRepetitive execution  重复执行
wilcox.testOne- and two-sample tests
单样本和双样本检验
windowsDevice drivers  设备驱动程序

X
X11Device drivers  设备驱动程序


Appendix E Concept index
附录E概念索引¶

Jump to:    跳转到:A   B   C   D   E   F   G   I   K   L   M   N   O   P   Q   R   S   T   U   V   W ABCDEFGIKLMNOPQRSTUVW(英文)  
Index Entry 索引条目Section 部分

A
Accessing builtin datasets
内置数据集
Accessing builtin datasets
内置数据集
Additive models  可加模型Some non-standard models
一些非标准型号
Analysis of variance  方差分析Analysis of variance and model comparison
方差分析和模型比较
Arithmetic functions and operators
算术函数和运算符
Vector arithmetic  向量算术
Arrays  阵列Arrays  阵列
Assignment  分配Vectors and assignment  向量和赋值
Attributes  属性Objects  对象

B
Binary operators  二元操作符Defining new binary operators
定义新的二元运算符
Box plots  箱形图One- and two-sample tests
单样本和双样本检验

C
Character vectors  特征向量Character vectors  特征向量
Classes The class of an object
对象的类别
Classes Object orientation  面向对象
Concatenating lists  连接列表Concatenating lists  连接列表
Contrasts  对比Contrasts  对比
Control statements  控制语句Control statements  控制语句
CRANContributed packages and CRAN
贡献包和CRAN
Customizing the environment
自定义环境
Customizing the environment
自定义环境

D
Data frames  数据帧Data frames  数据帧
Default values  默认值Named arguments and defaults
命名参数和默认值
Density estimation  密度估计Examining the distribution of a set of data
检查一组数据的分布
Determinants  决定因素Singular value decomposition and determinants
奇异值分解与行列式
Diverting input and output
转移输入和输出
Executing commands from or diverting output to a file
从文件执行命令或将输出转移到文件
Dynamic graphics  动态图形Dynamic graphics  动态图形

E
Eigenvalues and eigenvectors
特征值和特征向量
Eigenvalues and eigenvectors
特征值和特征向量
Empirical CDFs  经验CDFExamining the distribution of a set of data
检查一组数据的分布

F
Factors  因素Factors  因素
Factors  因素Contrasts  对比
Families  家庭Families  家庭
Formulae  公式Formulae for statistical models
统计模型公式

G
Generalized linear models
广义线性模型
Generalized linear models
广义线性模型
Generalized transpose of an array
数组的广义转置
Generalized transpose of an array
数组的广义转置
Generic functions  通用功能Object orientation  面向对象
Graphics device drivers  图形设备驱动程序Device drivers  设备驱动程序
Graphics parameters  图形参数The par() function  par()函数
Grouped expressions  分组表达式Grouped expressions  分组表达式

I
Indexing of and by arrays
数组和的索引
Array indexing  数组索引
Indexing vectors  索引向量Index vectors  索引向量

K
Kolmogorov-Smirnov test  Kolmogorov-Smirnov检验Examining the distribution of a set of data
检查一组数据的分布

L
Least squares fitting  最小二乘拟合Least squares fitting and the QR decomposition
最小二乘拟合和QR分解
Linear equations  线性方程Linear equations and inversion
线性方程组与反演
Linear models  线性模型Linear models  线性模型
Lists  列出Lists  列出
Local approximating regressions
局部逼近回归
Some non-standard models
一些非标准型号
Loops and conditional execution
循环和条件执行
Loops and conditional execution
循环和条件执行

M
Matrices  矩阵Arrays  阵列
Matrix multiplication  矩阵乘法Multiplication  乘法
Maximum likelihood  最大似然Maximum likelihood  最大似然
Missing values  缺失值Missing values  缺失值
Mixed models  混合模型Some non-standard models
一些非标准型号

N
Named arguments  命名参数Named arguments and defaults
命名参数和默认值
Namespace  命名空间Namespaces  命名空间
Nonlinear least squares  非线性最小二乘Nonlinear least squares and maximum likelihood models
非线性最小二乘和最大似然模型

O
Object orientation  面向对象Object orientation  面向对象
Objects  对象Objects  对象
One- and two-sample tests
单样本和双样本检验
One- and two-sample tests
单样本和双样本检验
Ordered factors  有序因子Factors  因素
Ordered factors  有序因子Contrasts  对比
Outer products of arrays
数组的外积
The outer product of two arrays
两个数组的外积

P
Packages R and statistics  R与统计
Packages Packages 
Probability distributions
概率分布
Probability distributions
概率分布

Q
QR decomposition  QR分解Least squares fitting and the QR decomposition
最小二乘拟合和QR分解
Quantile-quantile plots  分位数-分位数图Examining the distribution of a set of data
检查一组数据的分布

R
Reading data from files
从文件中阅读数据
Reading data from files
从文件中阅读数据
Recycling rule  回收规则Vector arithmetic  向量算术
Recycling rule  回收规则The recycling rule  回收规则
Regular sequences  规则序列Generating regular sequences
生成正则序列
Removing objects  删除对象Data permanency and removing objects
数据永久性和删除对象
Robust regression  稳健回归Some non-standard models
一些非标准型号

S
Scope  范围Scope  范围
Search path  搜索路径Managing the search path
管理搜索路径
Shapiro-Wilk test  Shapiro-Wilk检验Examining the distribution of a set of data
检查一组数据的分布
Singular value decomposition
奇异值分解
Singular value decomposition and determinants
奇异值分解与行列式
Statistical models  统计模型Statistical models in R
R中的统计模型
Student’s t test 学生t检验One- and two-sample tests
单样本和双样本检验

T
Tabulation  制表Frequency tables from factors
因子频率表
Tree-based models  基于树的模型Some non-standard models
一些非标准型号

U
Updating fitted models  更新拟合模型Updating fitted models  更新拟合模型

V
Vectors  向量Simple manipulations numbers and vectors
简单的操作数字和矢量

W
Wilcoxon test  Wilcoxon检验One- and two-sample tests
单样本和双样本检验
Workspace  工作空间Data permanency and removing objects
数据永久性和删除对象
Writing functions  写入功能Writing your own functions
编写自己的函数


Appendix F References
附录F参考资料¶

D. M. Bates and D. G. Watts (1988), Nonlinear Regression Analysis and Its Applications. John Wiley & Sons, New York.
D. M. Bates和D. G. Watts(1988),非线性回归分析及其应用。约翰威利父子公司,纽约。

Richard A. Becker, John M. Chambers and Allan R. Wilks (1988), The New S Language. Chapman & Hall, New York. This book is often called the “Blue Book”.
Richard A.作者:John M. Chambers和Allan R. Wilks(1988),The New S Language。查普曼和霍尔,纽约。这本书通常被称为“蓝皮书”。

John M. Chambers and Trevor J. Hastie eds. (1992), Statistical Models in S. Chapman & Hall, New York. This is also called the “White Book”.
John M. Chambers和特雷弗J. Hastie编辑。(1992),Statistical Models in S.查普曼和霍尔,纽约。这也被称为“白色书”。

John M. Chambers (1998) Programming with Data. Springer, New York. This is also called the “Green Book”.
John M. Chambers(1998)Programming with Data.斯普林格,纽约。这也被称为“绿色书”。

A. C. Davison and D. V. Hinkley (1997), Bootstrap Methods and Their Applications, Cambridge University Press.
A. C. Davison和D.陈文辉(1997),自举方法及其应用,剑桥大学出版社.

Annette J. Dobson (1990), An Introduction to Generalized Linear Models, Chapman and Hall, London.
Annette J.多布森(1990),An Introduction to Generalized Linear Models,Chapman and Hall,伦敦.

Peter McCullagh and John A. Nelder (1989), Generalized Linear Models. Second edition, Chapman and Hall, London.
Peter McCullagh和John A. Nelder(1989),Generalized Linear Models。第二版,查普曼和霍尔,伦敦。

John A. Rice (1995), Mathematical Statistics and Data Analysis. Second edition. Duxbury Press, Belmont, CA.
John A. Rice(1995),Mathematical Statistics and Data Analysis.第二版。Duxbury Press,贝尔蒙特,CA.

S. D. Silvey (1970), Statistical Inference. Penguin, London.
S. D. Silvey(1970),Statistical Inference.企鹅,伦敦。


Footnotes 脚注

(1)

ACM Software Systems award, 1998: https://awards.acm.org/award_winners/chambers_6640862.cfm.
ACM软件系统奖,1998年:https://awards.acm.org/award_winners/chambers_6640862.cfm。

(2)

For portable R code (including that to be used in R packages) only A–Z, a–z, and 0–9 should be used.
对于可移植的R代码(包括在R包中使用的代码),只能使用A-Z,a-z和0-9。

(3)

not inside strings, nor within the argument list of a function definition
不在字符串内部,也不在函数定义的参数列表中

(4)

some of the consoles will not allow you to enter more, and amongst those which do some will silently discard the excess and some will use it as the start of the next line.
一些控制台将不允许您输入更多,并且在那些允许输入更多的控制台中,一些控制台将默默地丢弃多余的控制台,并且一些控制台将使用它作为下一行的开始。

(5)

of unlimited length. 无限的长度。

(6)

The leading “dot” in this file name makes it invisible in normal file listings in UNIX, and in default GUI file listings on macOS and Windows.
此文件名中的前导“点”使其在UNIX中的正常文件列表以及macOS和Windows上的默认GUI文件列表中不可见。

(7)

With other than vector types of argument, such as list mode arguments, the action of c() is rather different. See Concatenating lists.
对于其他非向量类型的参数,例如 list 模式参数, c() 的作用是相当不同的。请参见串联列表。

(8)

Actually, it is still available as .Last.value before any other statements are executed.
实际上,在执行任何其他语句之前,它仍然可以作为 .Last.value 使用。

(9)

paste(..., collapse=ss) joins the arguments into a single character string putting ss in between, e.g., ss <- "|". There are more tools for character manipulation, see the help for sub and substring.
paste(..., collapse=ss) 将参数连接到单个字符串中,并在其间放置ss,例如,一号。有更多的工具用于字符操作,请参阅 subsubstring 的帮助。

(10)

numeric mode is actually an amalgam of two distinct modes, namely integer and double precision, as explained in the manual.
数字模式实际上是两种不同模式的混合,即整数和双精度,如手册中所述。

(11)

Note however that length(object) does not always contain intrinsic useful information, e.g., when object is a function.
然而,请注意, length(object) 并不总是包含固有的有用信息,例如, object 是一个函数。

(12)

In general, coercion from numeric to character and back again will not be exactly reversible, because of roundoff errors in the character representation.
一般来说,从数字到字符再返回的强制转换是不完全可逆的,因为字符表示中存在舍入错误。

(13)

A different style using ‘formal’ or ‘S4’ classes is provided in package methods.
methods 中提供了使用“formal”或“S4”类的不同样式。

(14)

Readers should note that there are eight states and territories in Australia, namely the Australian Capital Territory, New South Wales, the Northern Territory, Queensland, South Australia, Tasmania, Victoria and Western Australia.
读者应注意,澳洲共有八个州及实封,分别为澳大利亚首都直辖区、新南威尔士州、北领地、昆士兰州、南澳、塔斯马尼亚、维多利亚及西澳。

(15)

Note that tapply() also works in this case when its second argument is not a factor, e.g., ‘tapply(incomes, state)’, and this is true for quite a few other functions, since arguments are coerced to factors when necessary (using as.factor()).
请注意,当第二个参数不是因子时, tapply() 也适用于这种情况,例如,' tapply(incomes, state) ',这对很多其他函数来说也是如此,因为参数在必要时被强制转换为因子(使用 as.factor() )。

(16)

Note that x %*% x is ambiguous, as it could mean either x’x or x x’, where x is the column form. In such cases the smaller matrix seems implicitly to be the interpretation adopted, so the scalar x’x is in this case the result. The matrix x x’ may be calculated either by cbind(x) %*% x or x %*% rbind(x) since the result of rbind() or cbind() is always a matrix. However, the best way to compute x’x or x x’ is crossprod(x) or x %o% x respectively.
请注意, x %*% x 是不明确的,因为它可能意味着x'x或x x ',其中x是列形式。在这种情况下,较小的矩阵似乎隐含地是所采用的解释,因此标量x'x在这种情况下是结果。由于 rbind()cbind() 的结果总是矩阵,所以矩阵x x'可以由 cbind(x) %*% xx %*% rbind(x) 计算。然而,计算x'x或x x'的最佳方法分别是 crossprod(x)x %o% x

(17)

Even better would be to form a matrix square root B with A = BB’ and find the squared length of the solution of By = x , perhaps using the Cholesky or eigendecomposition of A.
更好的方法是用A = BB'构成一个矩阵平方根B,并求出By = x的解的平方长度,也许可以用A的Cholesky或特征分解。

(18)

See the on-line help for autoload for the meaning of the second term.
有关第二个术语的含义,请参阅 autoload 的在线帮助。

(19)

Under UNIX, the utilities sed or awk can be used.
在UNIX下,可以使用实用程序 sedawk

(20)

to be discussed later, or use xyplot from package lattice.
或使用package lattice中的 xyplot

(21)

See also the methods described in Statistical models in R
另请参见R中的统计模型中描述的方法

(22)

In some sense this mimics the behavior in S-PLUS since in S-PLUS this operator always creates or assigns to a global variable.
在某种意义上,这模仿了S-PLUS中的行为,因为在S-PLUS中,该操作符总是创建或分配给全局变量。

(23)

So it is hidden under UNIX.
它隐藏在UNIX下。

(24)

Some graphics parameters such as the size of the current device are for information only.
某些图形参数(如当前设备的大小)仅供参考。

(25)

The ‘Emacs Speaks Statistics’ package; see the URL https://ESS.R-project.org/
“Emacs Speaks Statistics”包;请https://ESS.R-project.org/

(26)

It is possible to build R using an emulation of GNU readline, such as one based on NetBSD’s editline (also known as libedit), in which case only a subset of the capabilities may be provided.
可以使用GNU readline的模拟来构建R,例如基于NetBSD的editline(也称为libedit),在这种情况下,可能只提供一部分功能。

(27)

On a PC keyboard this is usually the Alt key, occasionally the ‘Windows’ key. On a Mac keyboard normally no meta key is available.
在PC键盘上,这通常是Alt键,偶尔是“Windows”键。在Mac键盘上通常没有可用的Meta键。

(28)

In particular, not versions 6.3 or later: this is worked around as from R 3.4.0.
特别是,不是6.3或更高版本:这是从R 3.4.0开始解决的。