This is an introduction to R (“GNU S”), a language and environment for
statistical computing and graphics. R is similar to the
award-winning1 S
system, which was developed at Bell Laboratories by John Chambers et al.
It provides a wide variety of statistical and graphical techniques
(linear and nonlinear modelling, statistical tests, time series
analysis, classification, clustering, ...).
这是对R(“GNU S”)的介绍,R是一种用于统计计算和图形的语言和环境。R类似于由John Chambers等人在贝尔实验室开发的屡获殊荣的 1 S系统,它提供了各种统计和图形技术(线性和非线性建模,统计测试,时间序列分析,分类,聚类等)。
This manual provides information on data types, programming elements,
statistical modelling and graphics.
本手册提供了关于数据类型、编程元素、统计建模和图形的信息。
This manual is for R, version 4.4.1 (2024-06-14).
本手册适用于R,版本4.4.1(2024-06-14)。
Copyright © 1990 W. N. Venables
版权所有© 1990 W. N. Venables
Copyright © 1992 W. N. Venables & D. M. Smith
版权所有© 1992 W. N. Venables & D. M.史密斯
Copyright © 1997 R. Gentleman & R. Ihaka
版权所有© 1997 R.绅士与R。Ihaka
Copyright © 1997, 1998 M. Maechler
Copyright © 1997,1998 M. Maechler
Copyright © 1999–2024 R Core Team
版权所有© 1999-2024 R核心团队
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
如果版权声明和本许可声明保留在所有副本上,则允许制作和分发本手册的逐字副本。Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
允许在逐字复制的条件下复制和发布本手册的修改版本,前提是整个衍生作品的发布都遵循与本手册相同的许可声明条款。Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.
在上述修改版本的条件下,允许将本手册翻译成另一种语言进行复制和分发,但本许可声明可以在R核心团队批准的翻译中声明。
array()
function 5.4 array()
功能
cbind()
and rbind()
cbind()
和 rbind()
c()
, with arraysc()
与数组This introduction to R is derived from an original set of notes
describing the S and S-PLUS environments written in 1990–2 by
Bill Venables and David M. Smith when at the University of
Adelaide. We have made a number of small changes to reflect differences
between the R and S programs, and expanded some of the material.
R的介绍源自Bill Venables和大卫M在1990-2年撰写的一组描述S和S-PLUS环境的原始注释。史密斯在阿德莱德大学时。我们进行了一些小更改以反映R和S程序之间的差异,并扩展了一些材料。
We would like to extend warm thanks to Bill Venables (and David
Smith) for granting permission to distribute this modified version of
the notes in this way, and for being a supporter of R from way back.
我们要向Bill Venables(和大卫史密斯)表示热烈的感谢,感谢他们允许我们以这种方式发布这个修改后的笔记版本,感谢他们一直以来对R的支持。
Comments and corrections are always welcome. Please address email
correspondence to R-help@R-project.org.
评论和更正总是欢迎的。请将电子邮件发送至R-help@R-project.org。
Most R novices will start with the introductory session in Appendix
A. This should give some familiarity with the style of R sessions
and more importantly some instant feedback on what actually happens.
大多数R新手将从附录A中的介绍性会议开始。这应该让你对R会话的风格有一些熟悉,更重要的是,对实际发生的事情有一些即时的反馈。
Many users will come to R mainly for its graphical facilities.
See Graphical procedures, which can be read at almost any time and need not wait
until all the preceding sections have been digested.
许多用户主要是为了它的图形化功能而使用R。请参阅图形过程,它几乎可以在任何时候阅读,而不必等到前面的所有章节都被消化了。
R is an integrated suite of software facilities for data
manipulation, calculation and graphical display. Among other things it
has
R是一套用于数据处理、计算和图形显示的集成软件。除其他外,
The term “environment” is intended to characterize it as a fully
planned and coherent system, rather than an incremental accretion of
very specific and inflexible tools, as is frequently the case with other
data analysis software.
“环境”一词旨在将其描述为一个完全规划和连贯的系统,而不是像其他数据分析软件那样,由非常具体和不灵活的工具逐步增加。
R is very much a vehicle for newly developing methods of interactive
data analysis. It has developed rapidly, and has been extended by a
large collection of packages. However, most programs written in
R are essentially ephemeral, written for a single piece of data
analysis.
R是一种新开发的交互式数据分析方法的载体。它发展迅速,并通过大量的软件包进行了扩展。然而,大多数用R编写的程序本质上是短暂的,是为单个数据分析而编写的。
R can be regarded as an implementation of the S language which
was developed at Bell Laboratories by Rick Becker, John Chambers
and Allan Wilks, and also forms the basis of the S-PLUS systems.
R可以被看作是S语言的一个实现,S语言是由Rick Becker、John Chambers和Allan Wilks在贝尔实验室开发的,也是S-PLUS系统的基础。
The evolution of the S language is characterized by four books by
John Chambers and coauthors. For R, the basic reference is The
New S Language: A Programming Environment for Data Analysis and
Graphics by Richard A. Becker, John M. Chambers and
Allan R. Wilks. The new features of the 1991 release of S
are covered in Statistical Models in S edited by John M.
Chambers and Trevor J. Hastie. The formal methods and classes of the
methods package are based on those described in Programming
with Data by John M. Chambers. See References, for precise
references.
S语言的演变由约翰·钱伯斯和合著者的四本书描述。对于R,基本参考是Richard A.作者:John M. Chambers和Allan R.威尔克斯1991年发行的S的新特性在由John M.钱伯斯和特雷弗·J·哈斯蒂。方法包的形式化方法和类基于John M.钱伯斯参见参考文献,了解精确参考文献。
There are now a number of books which describe how to use R for data
analysis and statistics, and documentation for S/S-PLUS can
typically be used with R, keeping the differences between the S
implementations in mind.
See What documentation exists for R? in R FAQ.
现在有很多书描述了如何使用R进行数据分析和统计,S/S-PLUS的文档通常可以与R一起使用,记住S实现之间的差异。看看R有哪些文档?在R FAQ中。
Our introduction to the R environment did not mention
statistics, yet many people use R as a statistics system. We
prefer to think of it of an environment within which many classical and
modern statistical techniques have been implemented. A few of these are
built into the base R environment, but many are supplied as
packages. There are about 25 packages supplied with R (called
“standard” and “recommended” packages) and many more are available
through the CRAN family of Internet sites (via
https://CRAN.R-project.org) and elsewhere. More details on
packages are given later (see Packages).
我们对R环境的介绍没有提到统计,但许多人将R用作统计系统。我们更愿意把它看作是一个环境,在这个环境中,许多经典和现代的统计技术都得到了应用。其中一些是内置在基础R环境中的,但许多是作为包提供的。R提供了大约25个软件包(称为https://CRAN.R-project.org关于软件包的更多细节将在后面给出(参见软件包)。
Most classical statistics and much of the latest methodology is
available for use with R, but users may need to be prepared to do a
little work to find it.
大多数经典的统计数据和许多最新的方法都可以在R中使用,但用户可能需要做一些工作来找到它。
There is an important difference in philosophy between S (and hence
R) and the other main statistical systems. In S a statistical
analysis is normally done as a series of steps, with intermediate
results being stored in objects.
S(因此也包括R)与其他主要统计系统之间在哲学上有着重要的区别。在S中,统计分析通常由一系列步骤完成,中间结果存储在对象中。
Thus whereas SAS and SPSS will give
copious output from a regression or discriminant analysis, R will
give minimal output and store the results in a fit object for subsequent
interrogation by further R functions.
因此,尽管SAS和SPSS将从回归或判别分析中给出丰富的输出,但R将给出最小的输出并将结果存储在合适的对象中以供进一步的R函数进行后续询问。
The most convenient way to use R is at a graphics workstation running
a windowing system. This guide is aimed at users who have this
facility.
使用R最方便的方法是在运行窗口系统的图形工作站上。本指南面向拥有此功能的用户。
In particular we will occasionally refer to the use of R
on an X window system although the vast bulk of what is said applies
generally to any implementation of the R environment.
特别是,我们偶尔会提到在X窗口系统上使用R,尽管所说的大部分内容通常适用于R环境的任何实现。
Most users will find it necessary to interact directly with the
operating system on their computer from time to time. In this guide, we
mainly discuss interaction with the operating system on UNIX machines.
大多数用户会发现有必要不时地直接与计算机上的操作系统进行交互。在本指南中,我们主要讨论与UNIX机器上的操作系统的交互。
If you are running R under Windows or macOS you will need to make
some small adjustments.
如果你在Windows或macOS下运行R,你需要做一些小的调整。
Setting up a workstation to take full advantage of the customizable
features of R is a straightforward if somewhat tedious procedure, and
will not be considered further here. Users in difficulty should seek
local expert help.
设置工作站以充分利用R的可定制特性是一个简单的过程,虽然有点繁琐,但在这里将不再进一步考虑。有困难的用户应寻求当地专家的帮助。
When you use the R program it issues a prompt when it expects input
commands. The default prompt is ‘>
’, which on UNIX might be
the same as the shell prompt, and so it may appear that nothing is
happening. However, as we shall see, it is easy to change to a
different R prompt if you wish. We will assume that the UNIX shell
prompt is ‘$
’.
当你使用R程序时,它会在需要输入命令时发出提示。默认提示符是' >
',在UNIX上可能与shell提示符相同,因此可能看起来什么都没有发生。然而,正如我们将看到的,如果你愿意,很容易改变到不同的R提示符。我们将假设UNIX shell提示符为' $
'。
In using R under UNIX the suggested procedure for the first occasion
is as follows:
在UNIX下使用R时,第一次建议的过程如下:
$ mkdir work $ cd work
$ R
> q()
At this point you will be asked whether you want to save the data from
your R session. On some systems this will bring up a dialog box, and
on others you will receive a text prompt to which you can respond
yes, no or cancel (a single letter abbreviation will
do) to save the data before quitting, quit without saving, or return to
the R session. Data which is saved will be available in future R
sessions.
此时,系统会询问您是否要保存R会话中的数据。在某些系统上,这将弹出一个对话框,而在其他系统上,您将收到一个文本提示,您可以响应 yes , no 或 cancel (单个字母缩写即可)以在退出前保存数据,退出而不保存,或返回R会话。保存的数据将在未来的R会话中可用。
Further R sessions are simple.
更多的R会话很简单。
$ cd work $ R
q()
command at the end
of the session.
q()
命令终止。To use R under Windows the procedure to
follow is basically the same. Create a folder as the working directory,
and set that in the Start In field in your R shortcut.
Then launch R by double clicking on the icon.
在Windows下使用R的过程基本上是一样的。创建一个文件夹作为工作目录,并将其设置在R快捷方式的 Start In 字段中。然后双击图标启动R。
Readers wishing to get a feel for R at a computer before proceeding
are strongly advised to work through the introductory session
given in A sample session.
希望在继续之前在计算机上感受R的读者强烈建议通过示例会话中给出的介绍性会话进行工作。
R has an inbuilt help facility similar to the man
facility of
UNIX. To get more information on any specific named function, for
example solve
, the command is
R有一个内置的帮助工具,类似于UNIX的 man
工具。要获取有关任何特定命名函数(例如 solve
)的更多信息,命令为
> help(solve)
An alternative is 另一种方法是
> ?solve
For a feature specified by special characters, the argument must be
enclosed in double or single quotes, making it a “character string”:
This is also necessary for a few words with syntactic meaning including
if
, for
and function
.
对于由特殊字符指定的特性,参数必须用双引号或单引号括起来,使其成为“字符串”:这对于一些具有语法意义的单词也是必要的,包括 if
, for
和 function
。
> help("[[")
Either form of quote mark may be used to escape the other, as in the
string "It's important"
. Our convention is to use
double quote marks for preference.
任何一种形式的引号都可以用来转义另一种形式,如字符串 "It's important"
。我们的惯例是使用双引号表示偏好。
On most R installations help is available in HTML format by
running
在大多数R安装中,通过运行以下命令可以获得HTML格式的帮助:
> help.start()
which will launch a Web browser that allows the help pages to be browsed
with hyperlinks. On UNIX, subsequent help requests are sent to the
HTML-based help system. The ‘Search Engine and Keywords’ link in the
page loaded by help.start()
is particularly useful as it is
contains a high-level concept list which searches though available
functions. It can be a great way to get your bearings quickly and to
understand the breadth of what R has to offer.
这将启动一个Web浏览器,允许使用超链接浏览帮助页面。在UNIX上,后续的帮助请求将发送到基于HTML的帮助系统。由 help.start()
加载的页面中的“搜索引擎和关键字”链接特别有用,因为它包含一个高级概念列表,可以搜索可用的功能。它可以是一个很好的方式来快速获得你的轴承,并了解R所提供的广度。
The help.search
command (alternatively ??
)
allows searching for help in various
ways. For example,
help.search
命令(或者 ??
)允许以各种方式搜索帮助。例如,在一个示例中,
> ??solve
Try ?help.search
for details and more examples.
请尝试 ?help.search
以获取详细信息和更多示例。
The examples on a help topic can normally be run by
帮助主题的示例通常可以由
> example(topic)
Windows versions of R have other optional help systems: use
Windows版本的R还有其他可选的帮助系统:使用
> ?help
for further details. 了解详情。
Technically R is an expression language with a very simple
syntax. It is case sensitive as are most UNIX based packages, so
A
and a
are different symbols and would refer to different
variables. The set of symbols which can be used in R names depends
on the operating system and country within which R is being run
(technically on the locale in use). Normally all alphanumeric
symbols are allowed2 (and in
some countries this includes accented letters) plus ‘.
’ and
‘_
’, with the restriction that a name must start with
‘.
’ or a letter, and if it starts with ‘.
’ the
second character must not be a digit. Names are effectively
unlimited in length.
从技术上讲,R是一种语法非常简单的表达式语言。和大多数基于UNIX的软件包一样,它是区分大小写的,所以 A
和 a
是不同的符号,并且会引用不同的变量。R名称中使用的符号集取决于运行R的操作系统和国家(技术上取决于使用的区域设置)。通常所有字母数字符号都允许使用 2 (在某些国家,这包括重音字母)加上' .
'和' _
',但限制是名称必须以' .
'或字母开头,如果以' .
'开头,第二个字符不得是数字。名字的长度实际上是无限的。
Elementary commands consist of either expressions or
assignments. If an expression is given as a command, it is
evaluated, printed (unless specifically made invisible), and the value
is lost. An assignment also evaluates an expression and passes the
value to a variable but the result is not automatically printed.
基本命令由表达式或赋值组成。如果一个表达式作为命令给出,它将被计算、打印(除非特别设置为不可见),并且值将丢失。赋值也计算表达式并将值传递给变量,但结果不会自动打印。
Commands are separated either by a semi-colon (‘;
’), or by a
newline. Elementary commands can be grouped together into one compound
expression by braces (‘{
’ and ‘}
’).
Comments can be put almost3 anywhere,
starting with a hash mark (‘#
’), everything to the end of the
line is a comment.
命令由分号(' ;
')或换行符分隔。基本命令可以通过大括号(' {
'和' }
')组合到一个复合表达式中。注释几乎可以放在任何地方,从一个哈希标记(“ #
”)开始,到行尾的所有内容都是注释。
If a command is not complete at the end of a line, R will
give a different prompt, by default
如果一个命令在一行的末尾没有完成,R会给出一个不同的提示,默认情况下
+
on second and subsequent lines and continue to read input until the
command is syntactically complete. This prompt may be changed by the
user. We will generally omit the continuation prompt
and indicate continuation by simple indenting.
并继续读取输入,直到命令在语法上完成。用户可以更改此提示。我们通常会省略continuation提示符,并通过简单的缩进来指示continuation。
Command lines entered at the console are limited4 to about 4095 bytes (not characters).
在控制台输入的命令行 4 被限制为大约4095字节(不是字符)。
Under many versions of UNIX and on Windows, R provides a mechanism
for recalling and re-executing previous commands. The vertical arrow
keys on the keyboard can be used to scroll forward and backward through
a command history. Once a command is located in this way, the
cursor can be moved within the command using the horizontal arrow keys,
and characters can be removed with the DEL key or added with the
other keys. More details are provided later: see The command-line editor.
在许多UNIX版本和Windows上,R提供了一种调用和重新执行以前命令的机制。键盘上的垂直箭头键可用于在命令历史记录中向前和向后滚动。以这种方式定位命令后,可以使用水平箭头键在命令内移动光标,并且可以使用 DEL 键删除字符或使用其他键添加字符。稍后提供更多细节:请参阅命令行编辑器。
The recall and editing capabilities under UNIX are highly customizable.
You can find out how to do this by reading the manual entry for the
readline library.
UNIX下的调用和编辑功能是高度可定制的。您可以通过阅读readline库的手册条目来了解如何做到这一点。
Alternatively, the Emacs text editor provides more general support
mechanisms (via ESS, Emacs Speaks Statistics) for
working interactively with R.
See R and Emacs in R FAQ.
或者,Emacs文本编辑器提供了更通用的支持机制(通过ESS,Emacs Speaks Statistics),用于与R交互工作。R和Emacs在R FAQ中。
If commands5 are stored in an external
file, say commands.R in the working directory work, they
may be executed at any time in an R session with the command
如果命令 5 存储在外部文件中,例如工作目录 work 中的 commands.R ,则可以在R会话中的任何时间使用命令
> source("commands.R")
For Windows Source is also available on the
File menu. The function sink
,
对于Windows源也可以在文件菜单上。函数 sink
,
> sink("record.lis")
will divert all subsequent output from the console to an external file,
record.lis. The command
将把所有后续输出从控制台转移到外部文件 record.lis 。命令
> sink()
restores it to the console once again.
再次将其恢复到控制台。
The entities that R creates and manipulates are known as
objects. These may be variables, arrays of numbers, character
strings, functions, or more general structures built from such
components.
R创建和操作的实体被称为对象。这些组件可以是变量、数组、字符串、函数或由这些组件构建的更通用的结构。
During an R session, objects are created and stored by name (we
discuss this process in the next section). The R command
在R会话期间,对象是按名称创建和存储的(我们将在下一节讨论这个过程)。R命令
> objects()
(alternatively, ls()
) can be used to display the names of (most
of) the objects which are currently stored within R. The collection
of objects currently stored is called the workspace.
(或者, ls()
)可以用来显示当前存储在R中的对象的名称。当前存储的对象集合称为工作区。
To remove objects the function rm
is available:
要删除对象,可以使用功能 rm
:
> rm(x, y, z, ink, junk, temp, foo, bar)
All objects created during an R session can be stored permanently in
a file for use in future R sessions. At the end of each R session
you are given the opportunity to save all the currently available
objects.
在R会话期间创建的所有对象都可以永久存储在文件中,以供将来的R会话使用。在每个R会话结束时,您都有机会保存所有当前可用的对象。
If you indicate that you want to do this, the objects are
written to a file called .RData6 in the
current directory, and the command lines used in the session are saved
to a file called .Rhistory.
如果您指明要执行此操作,则对象将写入当前目录中名为 .RData 6 的文件,会话中使用的命令行将保存到名为 .Rhistory 的文件。
When R is started at later time from the same directory it reloads
the workspace from this file. At the same time the associated commands
history is reloaded.
当以后从同一目录启动R时,它会从这个文件重新加载工作区。与此同时,重新加载相关的命令历史记录。
It is recommended that you should use separate working directories for
analyses conducted with R. It is quite common for objects with names
x
and y
to be created during an analysis. Names like this
are often meaningful in the context of a single analysis, but it can be
quite hard to decide what they might be when the several analyses have
been conducted in the same directory.
建议您应该使用单独的工作目录进行R分析。在分析过程中创建名称为 x
和 y
的对象是很常见的。这样的名称在单个分析的上下文中通常是有意义的,但是当在同一目录中进行多个分析时,很难确定它们是什么。
R operates on named data structures. The simplest such
structure is the numeric vector, which is a single entity
consisting of an ordered collection of numbers. To set up a vector
named x
, say, consisting of five numbers, namely 10.4, 5.6, 3.1,
6.4 and 21.7, use the R command
R操作命名的数据结构。最简单的这种结构是数字向量,它是由有序的数字集合组成的单个实体。要设置一个名为 x
的向量,比如说,由五个数字组成,即10.4、5.6、3.1、6.4和21.7,请使用R命令
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
This is an assignment statement using the function
c()
which in this context can take an arbitrary number of vector
arguments and whose value is a vector got by concatenating its
arguments end to end.7
这是一个使用函数 c()
的赋值语句,在此上下文中,该函数可以接受任意数量的向量参数,并且其值是通过将其参数首尾相连而获得的向量。 7
A number occurring by itself in an expression is taken as a vector of
length one.
在表达式中单独出现的数被视为长度为1的向量。
Notice that the assignment operator (‘<-
’), which consists
of the two characters ‘<
’ (“less than”) and
‘-
’ (“minus”) occurring strictly side-by-side and it
‘points’ to the object receiving the value of the expression.
In most contexts the ‘=
’ operator can be used as an alternative.
注意赋值运算符(' <-
')由两个字符' <
'(“小于”)和' -
'(“减”)组成,它们严格地并排出现,并且它“指向”接收表达式值的对象。在大多数情况下,' =
'运算符可以作为替代。
Assignment can also be made using the function assign()
. An
equivalent way of making the same assignment as above is with:
也可以使用函数 assign()
进行赋值。一个与上面相同的赋值的等价方法是:
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
The usual operator, <-
, can be thought of as a syntactic
short-cut to this.
通常的操作符 <-
可以被认为是一种语法捷径。
Assignments can also be made in the other direction, using the obvious
change in the assignment operator. So the same assignment could be made
using
也可以在另一个方向上进行递归,使用赋值运算符中的明显变化。所以同样的赋值可以用
> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x
If an expression is used as a complete command, the value is printed
and lost8. So now if we
were to use the command
如果一个表达式被用作一个完整的命令,则值被打印出来并丢失 8 。所以现在如果我们要使用
> 1/x
the reciprocals of the five values would be printed at the terminal (and
the value of x
, of course, unchanged).
五个值的倒数将在终端处打印(当然, x
的值不变)。
The further assignment 进一步的任务
> y <- c(x, 0, x)
would create a vector y
with 11 entries consisting of two copies
of x
with a zero in the middle place.
将创建具有11个条目的向量 y
,该条目由 x
的两个副本组成,中间位置为零。
Vectors can be used in arithmetic expressions, in which case the
operations are performed element by element. Vectors occurring in the
same expression need not all be of the same length.
向量可以用在算术表达式中,在这种情况下,运算是逐个元素执行的。出现在相同表达式中的载体不需要都具有相同的长度。
If they are not,
the value of the expression is a vector with the same length as the
longest vector which occurs in the expression. Shorter vectors in the
expression are recycled as often as need be (perhaps
fractionally) until they match the length of the longest vector. In
particular a constant is simply repeated. So with the above assignments
the command
如果不是,则表达式的值是一个与表达式中出现的最长向量长度相同的向量。表达式中较短的向量会根据需要(可能是部分地)重复使用,直到它们与最长向量的长度匹配。特别是一个常数只是简单地重复。因此,通过上述分配,命令
> v <- 2*x + y + 1
generates a new vector v
of length 11 constructed by adding
together, element by element, 2*x
repeated 2.2 times, y
repeated just once, and 1
repeated 11 times.
生成长度为11的新向量 v
,该向量是通过逐个元素相加、 2*x
重复2.2次、 y
仅重复一次以及 1
重复11次而构造的。
The elementary arithmetic operators are the usual +
, -
,
*
, /
and ^
for raising to a power.
In addition all of the common arithmetic functions are available.
log
, exp
, sin
, cos
, tan
, sqrt
,
and so on, all have their usual meaning.
max
and min
select the largest and smallest elements of a
vector respectively.
range
is a function whose value is a vector of length two, namely
c(min(x), max(x))
.
length(x)
is the number of elements in x
,
sum(x)
gives the total of the elements in x
,
and prod(x)
their product.
基本算术运算符是通常的 +
, -
, *
, /
和 ^
,用于提升到幂。此外,所有常见的算术函数都可用。 log
、 exp
、 sin
、 cos
、 tan
、 sqrt
等等,都有其通常的含义。 max
和 min
分别选择向量的最大和最小元素。 range
是一个函数,其值是一个长度为2的向量,即 c(min(x), max(x))
。 length(x)
是 x
中元素的数量, sum(x)
给出 x
中元素的总数, prod(x)
给出它们的乘积。
Two statistical functions are mean(x)
which calculates the sample
mean, which is the same as sum(x)/length(x)
,
and var(x)
which gives
两个统计函数是 mean(x)
和 var(x)
, mean(x)
计算样本平均值,与 sum(x)/length(x)
相同, var(x)
给出
sum((x-mean(x))^2)/(length(x)-1)
or sample variance. If the argument to var()
is an
n-by-p matrix the value is a p-by-p sample
covariance matrix got by regarding the rows as independent
p-variate sample vectors.
或样本方差。如果 var()
的参数是一个n × p矩阵,则该值是一个p × p样本协方差矩阵,通过将行视为独立的p变量样本向量而获得。
sort(x)
returns a vector of the same size as x
with the
elements arranged in increasing order; however there are other more
flexible sorting facilities available (see order()
or
sort.list()
which produce a permutation to do the sorting).
sort(x)
返回一个与 x
大小相同的向量,元素以递增顺序排列;然而,还有其他更灵活的排序工具可用(参见 order()
或 sort.list()
,它们产生一个排列来进行排序)。
Note that max
and min
select the largest and smallest
values in their arguments, even if they are given several vectors. The
parallel maximum and minimum functions pmax
and
pmin
return a vector (of length equal to their longest argument)
that contains in each element the largest (smallest) element in that
position in any of the input vectors.
请注意, max
和 min
在其参数中选择最大值和最小值,即使它们被赋予多个向量。并行的maximum和minimum函数 pmax
和 pmin
返回一个向量(长度等于其最长参数),该向量在每个元素中包含任何输入向量中该位置的最大(最小)元素。
For most purposes the user will not be concerned if the “numbers” in a
numeric vector are integers, reals or even complex. Internally
calculations are done as double precision real numbers, or double
precision complex numbers if the input data are complex.
在大多数情况下,用户不会关心数字向量中的“数字”是整数、实数还是复数。内部计算是作为双精度真实的数字完成的,如果输入数据是复数,则作为双精度复数完成。
To work with complex numbers, supply an explicit complex part. Thus
若要使用复数,请提供显式复数部分。因此
sqrt(-17)
will give NaN
and a warning, but
将给出 NaN
和警告,但
sqrt(-17+0i)
will do the computations as complex numbers.
会用复数来计算
R has a number of facilities for generating commonly used sequences
of numbers. For example 1:30
is the vector c(1, 2,
…, 29, 30)
.
The colon operator has high priority within an expression, so, for
example 2*1:15
is the vector c(2, 4, …, 28, 30)
.
Put n <- 10
and compare the sequences 1:n-1
and
1:(n-1)
.
R有许多生成常用数列的工具。例如, 1:30
是向量 c(1, 2,
…, 29, 30)
。冒号运算符在表达式中具有高优先级,例如 2*1:15
是向量 c(2, 4, …, 28, 30)
。将 n <- 10
放入并比较序列 1:n-1
和 1:(n-1)
。
The construction 30:1
may be used to generate a sequence
backwards.
构造 30:1
可用于反向生成序列。
The function seq()
is a more general facility for generating
sequences. It has five arguments, only some of which may be specified
in any one call.
函数 seq()
是用于生成序列的更通用的工具。它有五个参数,其中只有一部分可以在任何一个调用中指定。
The first two arguments, if given, specify the
beginning and end of the sequence, and if these are the only two
arguments given the result is the same as the colon operator. That is
seq(2,10)
is the same vector as 2:10
.
如果给出前两个参数,则指定序列的开始和结束,如果仅给出这两个参数,则结果与冒号运算符相同。也就是说 seq(2,10)
和 2:10
是同一个向量。
Arguments to seq()
, and to many other R functions, can also
be given in named form, in which case the order in which they appear is
irrelevant. The first two arguments may be named
from=value
and to=value
; thus
seq(1,30)
, seq(from=1, to=30)
and seq(to=30,
from=1)
are all the same as 1:30
. The next two arguments to
seq()
may be named by=value
and
length=value
, which specify a step size and a length for
the sequence respectively. If neither of these is given, the default
by=1
is assumed.
seq()
和许多其他R函数的参数也可以以命名的形式给出,在这种情况下,它们出现的顺序是无关紧要的。前两个参数可以命名为 from=value
和 to=value
;因此 seq(1,30)
、 seq(from=1, to=30)
和 seq(to=30,
from=1)
都与 1:30
相同。 seq()
的下两个参数可以命名为 by=value
和 length=value
,它们分别指定序列的步长和长度。如果两者都没有给出,则假定为默认值 by=1
。
For example 例如
> seq(-5, 5, by=.2) -> s3
generates in s3
the vector c(-5.0, -4.8, -4.6, …,
4.6, 4.8, 5.0)
. Similarly
在 s3
中生成矢量 c(-5.0, -4.8, -4.6, …,
4.6, 4.8, 5.0)
。类似地
> s4 <- seq(length=51, from=-5, by=.2)
generates the same vector in s4
.
在 s4
中生成相同的向量。
The fifth argument may be named along=vector
, which is
normally used as the only argument to create the sequence 1, 2,
…, length(vector)
, or the empty sequence if the vector is
empty (as it can be).
第五个参数可以命名为 along=vector
,它通常用作创建序列 1, 2,
…, length(vector)
的唯一参数,或者如果向量为空(可以是空的),则为空序列。
A related function is rep()
which can be used for replicating an object in various complicated ways.
The simplest form is
一个相关的函数是 rep()
,它可以用于以各种复杂的方式复制对象。最简单的形式是
> s5 <- rep(x, times=5)
which will put five copies of x
end-to-end in s5
. Another
useful version is
这将把 x
的五个副本端到端地放入 s5
中。另一个有用的版本是
> s6 <- rep(x, each=5)
which repeats each element of x
five times before moving on to
the next.
其在移动到下一个之前重复 x
的每个元素五次。
As well as numerical vectors, R allows manipulation of logical
quantities. The elements of a logical vector can have the values
TRUE
, FALSE
, and NA
(for “not available”, see
below). The first two are often abbreviated as T
and F
,
respectively. Note however that T
and F
are just
variables which are set to TRUE
and FALSE
by default, but
are not reserved words and hence can be overwritten by the user. Hence,
you should always use TRUE
and FALSE
.
和数字向量一样,R允许操作逻辑量。逻辑向量的元素可以具有值 TRUE
、 FALSE
和 NA
(对于“不可用”,参见下文)。前两个通常分别缩写为 T
和 F
。然而,请注意, T
和 F
只是默认设置为 TRUE
和 FALSE
的变量,但不是保留字,因此可以由用户覆盖。因此,您应该始终使用 TRUE
和 FALSE
。
Logical vectors are generated by conditions. For example
逻辑向量由条件生成。例如
> temp <- x > 13
sets temp
as a vector of the same length as x
with values
FALSE
corresponding to elements of x
where the condition
is not met and TRUE
where it is.
将 temp
设置为与 x
具有相同长度的向量,其中值 FALSE
对应于不满足条件的 x
的元素和满足条件的 TRUE
的元素。
The logical operators are <
, <=
, >
, >=
,
==
for exact equality and !=
for inequality.
In addition if c1
and c2
are logical expressions, then
c1 & c2
is their intersection (“and”), c1 | c2
is their union (“or”), and !c1
is the negation of
c1
.
逻辑运算符是 <
、 <=
、 >
、 >=
、 ==
,表示完全相等,而 !=
表示不等式。此外,如果 c1
和 c2
是逻辑表达式,则 c1 & c2
是它们的交集(“and”), c1 | c2
是它们的并集(“or”),而 !c1
是 c1
的否定。
Logical vectors may be used in ordinary arithmetic, in which case they
are coerced into numeric vectors, FALSE
becoming 0
and TRUE
becoming 1
. However there are situations where
logical vectors and their coerced numeric counterparts are not
equivalent, for example see the next subsection.
逻辑向量可以用于普通算术,在这种情况下,它们被强制转换为数字向量, FALSE
变成 0
, TRUE
变成 1
。然而,在某些情况下,逻辑向量和它们的强制数值对应物是不等价的,例如,请参见下一小节。
In some cases the components of a vector may not be completely
known. When an element or value is “not available” or a “missing
value” in the statistical sense, a place within a vector may be
reserved for it by assigning it the special value NA
.
In general any operation on an NA
becomes an NA
. The
motivation for this rule is simply that if the specification of an
operation is incomplete, the result cannot be known and hence is not
available.
在某些情况下,向量的分量可能不完全已知。当一个元素或值在统计意义上是“不可用”或“缺失值”时,可以通过为其分配特殊值 NA
来为它保留向量内的位置。一般来说,对 NA
的任何操作都会变成 NA
。这条规则的动机很简单,如果操作的规范不完整,结果就不可能知道,因此也就不可用。
The function is.na(x)
gives a logical vector of the same size as
x
with value TRUE
if and only if the corresponding element
in x
is NA
.
当且仅当 x
中的对应元素是 NA
时,函数 is.na(x)
给出与 x
大小相同的逻辑向量,值为 TRUE
。
> z <- c(1:3,NA); ind <- is.na(z)
Notice that the logical expression x == NA
is quite different
from is.na(x)
since NA
is not really a value but a marker
for a quantity that is not available. Thus x == NA
is a vector
of the same length as x
all of whose values are NA
as the logical expression itself is incomplete and hence undecidable.
请注意,逻辑表达式 x == NA
与 is.na(x)
非常不同,因为 NA
实际上不是一个值,而是一个不可用的数量的标记。因此, x == NA
是一个与 x
长度相同的向量,其所有值都是 NA
,因为逻辑表达式本身是不完整的,因此不可判定。
Note that there is a second kind of “missing” values which are
produced by numerical computation, the so-called Not a Number,
NaN
,
values. Examples are
请注意,还有第二种由数值计算产生的“缺失”值,即所谓的非数字值。实例是
> 0/0
or 或
> Inf - Inf
which both give NaN
since the result cannot be defined sensibly.
这两个都给出 NaN
,因为结果不能被合理地定义。
In summary, is.na(xx)
is TRUE
both for NA
and NaN
values. To differentiate these, is.nan(xx)
is only
TRUE
for NaN
s.
总之,对于 NA
和 NaN
值, is.na(xx)
都是 TRUE
。为了区分这些, is.nan(xx)
对于 NaN
s来说只是 TRUE
。
Missing values are sometimes printed as <NA>
when character
vectors are printed without quotes.
当字符向量不带引号打印时,缺失值有时会打印为 <NA>
。
Character quantities and character vectors are used frequently in R,
for example as plot labels. Where needed they are denoted by a sequence
of characters delimited by the double quote character, e.g.,
"x-values"
, "New iteration results"
.
字符量和字符向量在R中经常使用,例如作为绘图标签。在需要的地方,它们由双引号字符分隔的字符序列表示,例如, "x-values"
, "New iteration results"
。
Character strings are entered using either matching double ("
) or
single ('
) quotes, but are printed using double quotes (or
sometimes without quotes). They use C-style escape sequences, using
\
as the escape character, so \
is entered and printed as
\\
, and inside double quotes "
is entered as \"
.
Other useful escape sequences are \n
, newline, \t
, tab and
\b
, backspace—see ?Quotes
for a full list.
字符串使用匹配的双引号( "
)或单引号( '
)输入,但使用双引号(有时不带引号)打印。它们使用C风格的转义序列,使用 \
作为转义字符,因此 \
被输入并打印为 \\
,而双引号内的 "
被输入为 \"
。其他有用的转义序列是 \n
,换行符, \t
,制表符和 \b
,退格键-完整列表见 ?Quotes
。
Character vectors may be concatenated into a vector by the c()
function; examples of their use will emerge frequently.
字符向量可以通过 c()
函数连接成一个向量;它们的使用示例会经常出现。
The paste()
function takes an arbitrary number of arguments and
concatenates them one by one into character strings. Any numbers given
among the arguments are coerced into character strings in the evident
way, that is, in the same way they would be if they were printed.
0#函数接受任意数量的参数,并将它们一个接一个地连接成字符串。参数中给出的任何数字都以明显的方式强制转换为字符串,也就是说,如果它们被打印出来,它们也会以同样的方式被强制转换为字符串。
The
arguments are by default separated in the result by a single blank
character, but this can be changed by the named argument,
sep=string
, which changes it to string
,
possibly empty.
默认情况下,参数在结果中由单个空白字符分隔,但这可以通过命名参数 sep=string
进行更改,将其更改为 string
,可能为空。
For example 例如
> labs <- paste(c("X","Y"), 1:10, sep="")
makes labs
into the character vector
使 labs
成为字符向量
c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
Note particularly that recycling of short lists takes place here too;
thus c("X", "Y")
is repeated 5 times to match the sequence
1:10
.
9
特别要注意的是,短列表的循环也在这里发生;因此 c("X", "Y")
重复5次以匹配序列 1:10
。 9
Subsets of the elements of a vector may be selected by appending to the
name of the vector an index vector in square brackets. More
generally any expression that evaluates to a vector may have subsets of
its elements similarly selected by appending an index vector in square
brackets immediately after the expression.
向量的元素的子集可以通过将方括号中的索引向量附加到向量的名称来选择。更一般地,任何计算为向量的表达式都可以通过在表达式后面紧接着方括号中附加索引向量来类似地选择其元素的子集。
Such index vectors can be any of four distinct types.
这样的索引向量可以是四种不同类型中的任何一种。
TRUE
in the index vector are selected and
those corresponding to FALSE
are omitted. For example
TRUE
对应的值,并且省略与 FALSE
对应的值。例如> y <- x[!is.na(x)]
creates (or re-creates) an object y
which will contain the
non-missing values of x
, in the same order. Note that if
x
has missing values, y
will be shorter than x
.
Also
创建(或重新创建)一个对象 y
,它将以相同的顺序包含 x
的非缺失值。请注意,如果 x
有缺失值,则 y
将短于 x
。也
> (x+1)[(!is.na(x)) & x>0] -> z
creates an object z
and places in it the values of the vector
x+1
for which the corresponding value in x
was both
non-missing and positive.
创建对象 z
,并将矢量 x+1
的值放置在其中,其中 x
中的对应值是非缺失的并且是正的。
length(x)
}. The corresponding elements of the vector are
selected and concatenated, in that order, in the result. The
index vector can be of any length and the result is of the same length
as the index vector. For example x[6]
is the sixth component of
x
and
length(x)
}中。在结果中,选择向量的相应元素并按该顺序连接。索引向量可以是任何长度,结果的长度与索引向量的长度相同。例如, x[6]
是 x
的第六个分量,> x[1:10]
selects the first 10 elements of x
(assuming length(x)
is
not less than 10). Also
选择 x
的前10个元素(假设 length(x)
不小于10)。也
> c("x","y")[rep(c(1,2,2,1), times=4)]
(an admittedly unlikely thing to do) produces a character vector of
length 16 consisting of "x", "y", "y", "x"
repeated four times.
(an无可否认,这是不可能的)产生一个长度为16的字符向量,由重复四次的 "x", "y", "y", "x"
组成。
> y <- x[-(1:5)]
gives y
all but the first five elements of x
.
y
给出了 x
的前五个元素。
names
attribute to identify its components.
In this case a sub-vector of the names vector may be used in the same way
as the positive integral labels in item 2 further above.
names
属性来标识其组件的情况。在这种情况下,名称向量的子向量可以以与上文第2项中的正整数标签相同的方式使用。> fruit <- c(5, 10, 1, 20) > names(fruit) <- c("orange", "banana", "apple", "peach") > lunch <- fruit[c("apple","orange")]
The advantage is that alphanumeric names are often easier to
remember than numeric indices. This option is particularly
useful in connection with data frames, as we shall see later.
优点是字母数字名称通常比数字索引更容易记住。这个选项对于数据帧特别有用,我们将在后面看到。
An indexed expression can also appear on the receiving end of an
assignment, in which case the assignment operation is performed
only on those elements of the vector. The expression must be of
the form vector[index_vector]
as having an arbitrary
expression in place of the vector name does not make much sense here.
索引表达式也可以出现在赋值的接收端,在这种情况下,赋值操作仅对向量的这些元素执行。该表达式必须采用 vector[index_vector]
形式,因为用任意表达式代替向量名称在这里没有多大意义。
For example 例如
> x[is.na(x)] <- 0
replaces any missing values in x
by zeros and
将 x
中的任何缺失值替换为零,
> y[y < 0] <- -y[y < 0]
has the same effect as
具有与
> y <- abs(y)
Vectors are the most important type of object in R, but there are
several others which we will meet more formally in later sections.
向量是R中最重要的对象类型,但还有其他几种我们将在后面的章节中更正式地介绍。
The entities R operates on are technically known as objects.
Examples are vectors of numeric (real) or complex values, vectors of
logical values and vectors of character strings. These are known as
“atomic” structures since their components are all of the same type,
or mode, namely numeric10, complex,
logical, character and raw.
R操作的实体在技术上称为对象。例如数字(真实的)或复数值的向量、逻辑值的向量和字符串的向量。这些被称为“原子”结构,因为它们的组件都是相同的类型或模式,即数字 10 ,复杂,逻辑,字符和原始。
Vectors must have their values all of the same mode. Thus any
given vector must be unambiguously either logical,
numeric, complex, character or raw. (The
only apparent exception to this rule is the special “value” listed as
NA
for quantities not available, but in fact there are several
types of NA
). Note that a vector can be empty and still have a
mode. For example the empty character string vector is listed as
character(0)
and the empty numeric vector as numeric(0)
.
向量的值必须全部为同一模式。因此,任何给定的向量必须是明确的逻辑,数字,复杂,字符或原始。(The这一规则的唯一明显例外是特殊的“价值”列为 NA
数量不可用,但实际上有几种类型的 NA
)。注意,向量可以是空的,但仍然有一个模式。例如,空字符串向量被列为 character(0)
,空数字向量被列为 numeric(0)
。
R also operates on objects called lists, which are of mode
list. These are ordered sequences of objects which individually
can be of any mode. lists are known as “recursive” rather than
atomic structures since their components can themselves be lists in
their own right.
R还对称为列表的对象进行操作,这些对象具有模式列表。这些是有序的对象序列,单独地可以是任何模式。列表被称为“递归”而不是原子结构,因为它们的组件本身也可以是列表。
The other recursive structures are those of mode function and
expression. Functions are the objects that form part of the R
system along with similar user written functions, which we discuss in
some detail later. Expressions as objects form an
advanced part of R which will not be discussed in this guide, except
indirectly when we discuss formulae used with modeling in R.
其他递归结构是模式函数和表达式。函数是构成R系统的一部分的对象,沿着的还有类似的用户编写的函数,我们稍后会详细讨论。作为对象的表达式构成了R的高级部分,除了在讨论R建模时使用的公式时间接讨论外,本指南不会讨论它。
By the mode of an object we mean the basic type of its
fundamental constituents. This is a special case of a “property”
of an object. Another property of every object is its length. The
functions mode(object)
and length(object)
can be
used to find out the mode and length of any defined structure
11.
我们所说的对象的模式,是指它的基本成分的基本类型。这是对象的“属性”的一个特例。每个物体的另一个属性是它的长度。函数 mode(object)
和 length(object)
可用于找出任何定义的结构 11 的模式和长度。
Further properties of an object are usually provided by
attributes(object)
, see Getting and setting attributes.
Because of this, mode and length are also called “intrinsic
attributes” of an object.
对象的更多属性通常由 attributes(object)
提供,请参阅获取和设置属性。因此,模式和长度也称为对象的“内在属性”。
For example, if z
is a complex vector of length 100, then in an
expression mode(z)
is the character string "complex"
and
length(z)
is 100
.
例如,如果 z
是长度为100的复向量,则在表达式中 mode(z)
是字符串 "complex"
, length(z)
是 100
。
R caters for changes of mode almost anywhere it could be considered
sensible to do so, (and a few where it might not be). For example with
R几乎可以在任何被认为合理的地方满足模式更改的要求(也可以在一些可能不合理的地方)。例如用
> z <- 0:9
we could put 我们可以把
> digits <- as.character(z)
after which digits
is the character vector c("0", "1", "2",
…, "9")
. A further coercion, or change of mode,
reconstructs the numerical vector again:
digits
之后是字符向量 c("0", "1", "2",
…, "9")
。进一步的强制,或者说模式的改变,再次重构了数字向量:
> d <- as.integer(digits)
Now d
and z
are the same.12 There is a
large collection of functions of the form as.something()
for either coercion from one mode to another, or for investing an object
with some other attribute it may not already possess. The reader should
consult the different help files to become familiar with them.
d
和 z
是一样的。有大量的函数以 as.something()
的形式存在,用于从一种模式强制转换到另一种模式,或者用于为对象赋予它可能还没有拥有的其他属性。读者应该查阅不同的帮助文件以熟悉它们。
An “empty” object may still have a mode. For example
一个“空”的对象可能仍然有一个模式。例如
> e <- numeric()
makes e
an empty vector structure of mode numeric. Similarly
character()
is a empty character vector, and so on. Once an
object of any size has been created, new components may be added to it
simply by giving it an index value outside its previous range. Thus
使 e
成为一个模式为numeric的空向量结构。类似地, character()
是一个空的字符向量,等等。一旦创建了一个任意大小的对象,只需给它一个超出其先前范围的索引值,就可以向其添加新的组件。因此
> e[3] <- 17
now makes e
a vector of length 3, (the first two components of
which are at this point both NA
). This applies to any structure
at all, provided the mode of the additional component(s) agrees with the
mode of the object in the first place.
现在使 e
成为长度为3的向量,(其前两个分量此时都是 NA
)。这适用于任何结构,只要附加成分的模式首先与对象的模式一致。
This automatic adjustment of lengths of an object is used often, for
example in the scan()
function for input. (see The scan()
function.)
这种对象长度的自动调整经常使用,例如在用于输入的 scan()
功能中。(see scan()
功能。
Conversely to truncate the size of an object requires only an assignment
to do so. Hence if alpha
is an object of length 10, then
相反,截断对象的大小只需要一个赋值即可。因此,如果 alpha
是长度为10的对象,则
> alpha <- alpha[2 * 1:5]
makes it an object of length 5 consisting of just the former components
with even index. (The old indices are not retained, of course.) We can
then retain just the first three values by
使其成为长度为5的对象,仅由具有偶数索引的前一个组件组成。(The当然,旧的索引不会保留。)然后我们可以只保留前三个值,
> length(alpha) <- 3
and vectors can be extended (by missing values) in the same way.
并且向量可以以相同的方式扩展(通过缺失值)。
The function attributes(object)
returns a list of all the non-intrinsic attributes currently defined for
that object. The function attr(object, name)
can be used to select a specific attribute. These functions are rarely
used, except in rather special circumstances when some new attribute is
being created for some particular purpose, for example to associate a
creation date or an operator with an R object.
函数 attributes(object)
返回当前为该对象定义的所有非内在属性的列表。函数 attr(object, name)
可用于选择特定属性。这些函数很少使用,除非在一些特殊的情况下,当一些新的属性被创建用于某些特定的目的,例如将创建日期或操作符与R对象相关联时。
The concept, however,
is very important.
然而,这个概念非常重要。
Some care should be exercised when assigning or deleting attributes
since they are an integral part of the object system used in R.
在分配或删除属性时应该小心,因为它们是R中使用的对象系统的组成部分。
When it is used on the left hand side of an assignment it can be used
either to associate a new attribute with object
or to
change an existing one. For example
当它用在赋值语句的左手侧时,它既可以用来将新属性与 object
关联,也可以用来更改现有属性。例如
> attr(z, "dim") <- c(10,10)
allows R to treat z
as if it were a 10-by-10 matrix.
允许R将 z
视为10 × 10矩阵。
All objects in R have a class, reported by the function
class
. For simple vectors this is just the mode, for example
"numeric"
, "logical"
, "character"
or "list"
,
but "matrix"
, "array"
, "factor"
and
"data.frame"
are other possible values.
R中的所有对象都有一个类,由函数 class
报告。对于简单向量,这只是模式,例如 "numeric"
、 "logical"
、 "character"
或 "list"
,但 "matrix"
、 "array"
、 "factor"
和 "data.frame"
是其他可能的值。
A special attribute known as the class of the object is used to
allow for an object-oriented style13 of
programming in R. For example if an object has class
"data.frame"
, it will be printed in a certain way, the
plot()
function will display it graphically in a certain way, and
other so-called generic functions such as summary()
will react to
it as an argument in a way sensitive to its class.
一个被称为对象类的特殊属性用于支持R中面向对象的编程风格 13 。例如,如果一个对象有类 "data.frame"
,它将以某种方式打印, plot()
函数将以某种方式以图形方式显示它,而其他所谓的泛型函数(如 summary()
)将以对其类敏感的方式将其作为参数进行反应。
To remove temporarily the effects of class, use the function
unclass()
.
For example if winter
has the class "data.frame"
then
要暂时删除类的效果,请使用函数 unclass()
。例如,如果 winter
具有类 "data.frame"
,则
> winter
will print it in data frame form, which is rather like a matrix, whereas
将以数据框的形式打印出来,这很像一个矩阵,而
> unclass(winter)
will print it as an ordinary list. Only in rather special situations do
you need to use this facility, but one is when you are learning to come
to terms with the idea of class and generic functions.
将其打印为普通列表。只有在相当特殊的情况下,你才需要使用这个工具,但是一个是当你学习接受类和泛型函数的概念时。
Generic functions and classes will be discussed further in Classes, generic functions and object orientation, but only briefly.
泛型函数和类将在类,泛型函数和面向对象中进一步讨论,但只是简单地讨论。
A factor is a vector object used to specify a discrete
classification (grouping) of the components of other vectors of the same length.
R provides both ordered and unordered factors.
While the “real” application of factors is with model formulae
(see Contrasts), we here look at a specific example.
因子是一个向量对象,用于指定相同长度的其他向量的分量的离散分类(分组)。R提供有序和无序因子。虽然因子的“真实的”应用是在模型公式中(请参阅对比),但我们在此查看一个具体示例。
tapply()
and ragged arraystapply()
和参差不齐的数组Suppose, for example, we have a sample of 30 tax accountants from all
the states and territories of Australia14
and their individual state of origin is specified by a character vector
of state mnemonics as
例如,假设我们有一个来自澳大利亚所有州和地区的30名税务会计师的样本 14 ,他们各自的原籍州由州助记符的特征向量指定为
> state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")
Notice that in the case of a character vector, “sorted” means sorted
in alphabetical order.
请注意,对于字符向量,“sorted”表示按字母顺序排序。
A factor is similarly created using the factor()
function:
使用 factor()
函数类似地创建因子:
> statef <- factor(state)
The print()
function handles factors slightly differently from
other objects:
print()
函数处理因子的方式与其他对象略有不同:
> statef [1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa [16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic act Levels: act nsw nt qld sa tas vic wa
To find out the levels of a factor the function levels()
can be
used.
要找出因子的水平,可以使用函数 levels()
。
> levels(statef) [1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"
tapply()
and ragged arrays ¶tapply()
和不规则数组¶To continue the previous example, suppose we have the incomes of the
same tax accountants in another vector (in suitably large units of
money)
继续前面的例子,假设我们在另一个向量中有相同税务会计师的收入(以适当大的货币单位)
> incomes <- c(60, 49, 40, 61, 64, 60, 59, 54, 62, 69, 70, 42, 56, 61, 61, 61, 58, 51, 48, 65, 49, 49, 41, 48, 52, 46, 59, 46, 58, 43)
To calculate the sample mean income for each state we can now use the
special function tapply()
:
为了计算每个州的样本平均收入,我们现在可以使用特殊函数 tapply()
:
> incmeans <- tapply(incomes, statef, mean)
giving a means vector with the components labelled by the levels
给出一个均值向量,其分量由水平标记
act nsw nt qld sa tas vic wa 44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250
The function tapply()
is used to apply a function, here
mean()
, to each group of components of the first argument, here
incomes
, defined by the levels of the second component, here
statef
15, as if they were separate vector
structures. The result is a structure of the same length as the levels
attribute of the factor containing the results. The reader should
consult the help document for more details.
函数 tapply()
用于将函数(此处为 mean()
)应用于第一参数(此处为 incomes
)的每组分量,该组分量由第二分量(此处为 statef
15 )的级别定义,就好像它们是单独的向量结构一样。结果是一个与包含结果的因子的levels属性长度相同的结构。读者应查阅帮助文档以了解更多详细信息。
Suppose further we needed to calculate the standard errors of the state
income means. To do this we need to write an R function to calculate
the standard error for any given vector. Since there is an builtin
function var()
to calculate the sample variance, such a function
is a very simple one liner, specified by the assignment:
进一步假设我们需要计算州收入均值的标准误差。为此,我们需要编写一个R函数来计算任何给定向量的标准误差。由于有一个内置函数 var()
来计算样本方差,这样的函数是一个非常简单的线性函数,由赋值指定:
> stdError <- function(x) sqrt(var(x)/length(x))
(Writing functions will be considered later in Writing your own functions. Note that R’s a builtin function sd()
is something different.)
After this assignment, the standard errors are calculated by
(编写函数将在后面的编写自己的函数中考虑。注意R的内置函数 sd()
是不同的。赋值后,标准误差计算如下:
> incster <- tapply(incomes, statef, stdError)
and the values calculated are then
然后计算出的值为
> incster act nsw nt qld sa tas vic wa 1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575
As an exercise you may care to find the usual 95% confidence limits for
the state mean incomes. To do this you could use tapply()
once
more with the length()
function to find the sample sizes, and the
qt()
function to find the percentage points of the appropriate
t-distributions. (You could also investigate R’s facilities
for t-tests.)
作为一个练习,你可能会关心找到通常的95%置信区间的国家平均收入。要做到这一点,您可以再次使用 tapply()
和 length()
函数来查找样本大小,并使用 qt()
函数来查找适当t分布的百分比。(You也可以研究R的t检验设施。
The function tapply()
can also be used to handle more complicated
indexing of a vector by multiple categories. For example, we might wish
to split the tax accountants by both state and sex. However in this
simple instance (just one factor) what happens can be thought of as
follows.
函数 tapply()
也可以用来处理更复杂的多类别向量索引。例如,我们可能希望按州和性别划分税务会计师。然而,在这个简单的例子中(只有一个因素),所发生的事情可以被认为是如下。
The values in the vector are collected into groups
corresponding to the distinct entries in the factor. The function is
then applied to each of these groups individually. The value is a
vector of function results, labelled by the levels
attribute of
the factor.
向量中的值被收集到对应于因子中不同条目的组中。然后将该函数单独应用于这些组中的每一个。该值是函数结果的向量,由因子的 levels
属性标记。
The combination of a vector and a labelling factor is an example of what
is sometimes called a ragged array, since the subclass sizes are
possibly irregular. When the subclass sizes are all the same the
indexing may be done implicitly and much more efficiently, as we see in
the next section.
向量和标签因子的组合是有时被称为不规则数组的一个例子,因为子类的大小可能是不规则的。当子类的大小都相同时,索引可以隐式地完成,并且效率更高,正如我们在下一节中看到的那样。
The levels of factors are stored in alphabetical order, or in the order
they were specified to factor
if they were specified explicitly.
因子水平按字母顺序存储,如果显式指定,则按它们指定为 factor
的顺序存储。
Sometimes the levels will have a natural ordering that we want to record
and want our statistical analysis to make use of. The ordered()
function creates such ordered factors but is otherwise identical to
factor
. For most purposes the only difference between ordered
and unordered factors is that the former are printed showing the
ordering of the levels, but the contrasts generated for them in fitting
linear models are different.
有时,这些水平会有一个我们想要记录并希望我们的统计分析利用的自然顺序。 ordered()
函数创建这样的有序因子,但在其他方面与 factor
相同。在大多数情况下,有序因子和无序因子之间的唯一区别是前者显示水平的排序,但在拟合线性模型时为它们生成的对比是不同的。
array()
function array()
功能cbind()
and rbind()
cbind()
和 rbind()
c()
, with arraysc()
与数组An array can be considered as a multiply subscripted collection of data
entries, for example numeric. R allows simple facilities for
creating and handling arrays, and in particular the special case of
matrices.
数组可以被认为是数据项的多下标集合,例如数值。R允许创建和处理数组的简单设施,特别是矩阵的特殊情况。
A dimension vector is a vector of non-negative integers. If its length is
k then the array is k-dimensional, e.g. a matrix is a
2-dimensional array. The dimensions are indexed from one up to
the values given in the dimension vector.
维数向量是非负整数的向量。如果它的长度是k,那么数组是k维的,例如矩阵是2维数组。维度的索引从1到维度向量中给定的值。
A vector can be used by R as an array only if it has a dimension
vector as its dim attribute. Suppose, for example, z
is a
vector of 1500 elements. The assignment
一个vector可以被R用作一个数组,只有当它有一个维度vector作为它的dim属性。例如,假设 z
是一个1500个元素的向量。转让
> dim(z) <- c(3,5,100)
gives it the dim attribute that allows it to be treated as a
3 by 5 by 100 array.
为它提供dim属性,使其可以被视为3 x 5 x 100数组。
Other functions such as matrix()
and array()
are available
for simpler and more natural looking assignments, as we shall see in
The array()
function.
其他函数,如 matrix()
和 array()
,可用于更简单和更自然的赋值,正如我们将在 array()
函数中看到的那样。
The values in the data vector give the values in the array in the same
order as they would occur in FORTRAN, that is “column major order,”
with the first subscript moving fastest and the last subscript slowest.
数据向量中的值以与FORTRAN中相同的顺序给出数组中的值,即“列优先顺序”,第一个下标移动最快,最后一个下标移动最慢。
For example if the dimension vector for an array, say a
, is
c(3,4,2)
then there are 3 * 4 * 2
= 24 entries in a
and the data vector holds them in the order
a[1,1,1], a[2,1,1], …, a[2,4,2], a[3,4,2]
.
例如,如果一个数组的维度向量,比如说 a
,是 c(3,4,2)
,那么在 a
中有3 * 4 * 2 = 24个条目,数据向量以 a[1,1,1], a[2,1,1], …, a[2,4,2], a[3,4,2]
的顺序保存它们。
Arrays can be one-dimensional: such arrays are usually treated in the
same way as vectors (including when printing), but the exceptions can
cause confusion.
数组可以是一维的:这种数组通常以与向量相同的方式处理(包括打印时),但例外情况可能会导致混淆。
Individual elements of an array may be referenced by giving the name of
the array followed by the subscripts in square brackets, separated by
commas.
数组中的单个元素可以通过给出数组的名称,后跟方括号中的下标(用逗号分隔)来引用。
More generally, subsections of an array may be specified by giving a
sequence of index vectors in place of subscripts; however
if any index position is given an empty index vector, then the
full range of that subscript is taken.
更一般地说,数组的子部分可以通过给出一系列索引向量来代替下标来指定;但是,如果任何索引位置被赋予空索引向量,则采用该下标的整个范围。
Continuing the previous example, a[2,,]
is a 4 *
2 array with dimension vector c(4,2)
and data vector containing
the values
继续前面的示例, a[2,,]
是一个4 * 2数组,其中维度向量 c(4,2)
和数据向量包含值
c(a[2,1,1], a[2,2,1], a[2,3,1], a[2,4,1], a[2,1,2], a[2,2,2], a[2,3,2], a[2,4,2])
in that order. a[,,]
stands for the entire array, which is the
same as omitting the subscripts entirely and using a
alone.
按照这个顺序 a[,,]
代表整个数组,这与完全省略下标并单独使用 a
相同。
For any array, say Z
, the dimension vector may be referenced
explicitly as dim(Z)
(on either side of an assignment).
对于任何数组,比如说 Z
,维度向量可以显式引用为 dim(Z)
(在赋值的任何一侧)。
Also, if an array name is given with just one subscript or index
vector, then the corresponding values of the data vector only are used;
in this case the dimension vector is ignored. This is not the case,
however, if the single index is not a vector but itself an array, as we
next discuss.
此外,如果数组名只有一个下标或索引向量,则只使用数据向量的相应值;在这种情况下,维度向量被忽略。然而,如果单个索引不是一个向量,而是一个数组,就不是这种情况了,我们接下来会讨论。
As well as an index vector in any subscript position, a matrix may be
used with a single index matrix in order either to assign a vector
of quantities to an irregular collection of elements in the array, or to
extract an irregular collection as a vector.
除了在任何下标位置的索引向量之外,矩阵可以与单个索引矩阵一起使用,以便将量的向量分配给数组中的元素的不规则集合,或者提取不规则集合作为向量。
A matrix example makes the process clear. In the case of a doubly
indexed array, an index matrix may be given consisting of two columns
and as many rows as desired. The entries in the index matrix are the
row and column indices for the doubly indexed array.
一个矩阵示例使该过程变得清晰。在双索引数组的情况下,索引矩阵可以由两列和任意多的行组成。索引矩阵中的条目是双索引数组的行索引和列索引。
Suppose for
example we have a 4 by 5 array X
and we wish to do
the following:
例如,假设我们有一个4 × 5的数组 X
,我们希望做以下事情:
X[1,3]
, X[2,2]
and X[3,1]
as a
vector structure, and
X[1,3]
、 X[2,2]
和 X[3,1]
作为向量结构,以及X
by zeroes.
X
中的这些条目替换为零。In this case we need a 3 by 2 subscript array, as in the
following example.
在这种情况下,我们需要一个3乘2的下标数组,如下面的示例所示。
> x <- array(1:20, dim=c(4,5)) # Generate a 4 by 5 array.
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> i <- array(c(1:3,3:1), dim=c(3,2))
> i # i
is a 3 by 2 index array.
[,1] [,2]
[1,] 1 3
[2,] 2 2
[3,] 3 1
> x[i] # Extract those elements
[1] 9 6 3
> x[i] <- 0 # Replace those elements by zeros.
> x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 0 13 17
[2,] 2 0 10 14 18
[3,] 0 7 11 15 19
[4,] 4 8 12 16 20
>
Negative indices are not allowed in index matrices. NA
and zero
values are allowed: rows in the index matrix containing a zero are
ignored, and rows containing an NA
produce an NA
in the
result.
索引矩阵中不允许有负索引。允许 NA
和零值:索引矩阵中包含零的行被忽略,包含 NA
的行在结果中产生 NA
。
As a less trivial example, suppose we wish to generate an (unreduced)
design matrix for a block design defined by factors blocks
(b
levels) and varieties
(v
levels). Further
suppose there are n
plots in the experiment. We could proceed as
follows:
作为一个不太平凡的例子,假设我们希望为因子 blocks
( b
水平)和 varieties
( v
水平)定义的区组设计生成一个(未简化)设计矩阵。进一步假设实验中有 n
图。我们可以这样做:
> Xb <- matrix(0, n, b) > Xv <- matrix(0, n, v) > ib <- cbind(1:n, blocks) > iv <- cbind(1:n, varieties) > Xb[ib] <- 1 > Xv[iv] <- 1 > X <- cbind(Xb, Xv)
To construct the incidence matrix, N
say, we could use
为了构造关联矩阵,比如说,我们可以使用
> N <- crossprod(Xb, Xv)
However a simpler direct way of producing this matrix is to use
table()
:
然而,产生该矩阵的更简单的直接方式是使用 table()
:
> N <- table(blocks, varieties)
Index matrices must be numerical: any other form of matrix (e.g. a
logical or character matrix) supplied as a matrix is treated as an
indexing vector.
索引矩阵必须是数字的:任何其他形式的矩阵(例如逻辑或字符矩阵)作为矩阵提供,都被视为索引向量。
array()
function ¶array()
函数¶As well as giving a vector structure a dim
attribute, arrays can
be constructed from vectors by the array
function, which has the
form
除了给向量结构一个 dim
属性外,还可以通过 array
函数从向量构造数组,其形式为
> Z <- array(data_vector, dim_vector)
For example, if the vector h
contains 24 or fewer, numbers then
the command
例如,如果矢量 h
包含24个或更少的数字,则命令
> Z <- array(h, dim=c(3,4,2))
would use h
to set up 3 by 4 by 2 array in
Z
. If the size of h
is exactly 24 the result is the same as
将使用 h
在 Z
中设置3 × 4 × 2阵列。如果 h
的大小正好是24,则结果与
> Z <- h ; dim(Z) <- c(3,4,2)
However if h
is shorter than 24, its values are recycled from the
beginning again to make it up to size 24 (see Mixed vector and array arithmetic. The recycling rule)
but dim(h) <- c(3,4,2)
would signal an error about mismatching
length.
As an extreme but common example
但是,如果 h
小于24,它的值将从开始再次循环,以使其大小为24(请参见混合向量和数组运算。回收规则),但 dim(h) <- c(3,4,2)
将发出有关长度不匹配的错误信号。作为一个极端但常见的例子
> Z <- array(0, c(3,4,2))
makes Z
an array of all zeros.
使 Z
成为全零数组。
At this point dim(Z)
stands for the dimension vector
c(3,4,2)
, and Z[1:24]
stands for the data vector as it was
in h
, and Z[]
with an empty subscript or Z
with no
subscript stands for the entire array as an array.
此时 dim(Z)
代表维度向量 c(3,4,2)
, Z[1:24]
代表数据向量,就像在 h
中一样,带有空下标的 Z[]
或没有下标的 Z
代表整个数组作为数组。
Arrays may be used in arithmetic expressions and the result is an array
formed by element-by-element operations on the data vector. The
dim
attributes of operands generally need to be the same, and
this becomes the dimension vector of the result. So if A
,
B
and C
are all similar arrays, then
数组可用于算术表达式中,结果是通过对数据向量进行逐个元素操作形成的数组。操作数的 dim
属性通常需要相同,这将成为结果的维度向量。所以如果 A
、 B
和 C
都是相似的数组,那么
> D <- 2*A*B + C + 1
makes D
a similar array with its data vector being the result of
the given element-by-element operations. However the precise rule
concerning mixed array and vector calculations has to be considered a
little more carefully.
使 D
成为一个类似的数组,其数据向量是给定的逐个元素操作的结果。然而,关于混合数组和向量计算的精确规则必须更仔细地考虑。
The precise rule affecting element by element mixed calculations with
vectors and arrays is somewhat quirky and hard to find in the
references. From experience we have found the following to be a reliable
guide.
使用向量和数组逐个元素混合计算的精确规则有点古怪,很难在参考文献中找到。根据经验,我们发现以下是一个可靠的指南。
dim
attribute or an error results.
dim
属性,否则将导致错误。dim
attribute of its array operands.
dim
属性。An important operation on arrays is the outer product. If
a
and b
are two numeric arrays, their outer product is an
array whose dimension vector is obtained by concatenating their two
dimension vectors (order is important), and whose data vector is got by
forming all possible products of elements of the data vector of a
with those of b
. The outer product is formed by the special
operator %o%
:
数组的一个重要运算是外积。如果 a
和 b
是两个数值数组,则它们的外积是一个数组,其维度向量是通过连接它们的两个维度向量获得的(顺序很重要),其数据向量是通过将 a
的数据向量的元素与 b
的元素的所有可能的乘积形成而获得的。外积由特殊算子 %o%
形成:
> ab <- a %o% b
An alternative is 另一种方法是
> ab <- outer(a, b, "*")
The multiplication function can be replaced by an arbitrary function of
two variables. For example if we wished to evaluate the function
f(x; y) = cos(y)/(1 + x^2)
over a regular grid of values with x- and y-coordinates
defined by the R vectors x
and y
respectively, we could
proceed as follows:
乘法函数可以用任意两个变量的函数代替。例如,如果我们希望在具有分别由R向量 x
和 y
定义的x坐标和y坐标的值的规则网格上计算函数f(x; y)= cos(y)/(1 + x^2),我们可以如下进行:
> f <- function(x, y) cos(y)/(1 + x^2) > z <- outer(x, y, f)
In particular the outer product of two ordinary vectors is a doubly
subscripted array (that is a matrix, of rank at most 1). Notice that
the outer product operator is of course non-commutative. Defining your
own R functions will be considered further in Writing your own functions.
特别是两个普通向量的外积是一个双下标数组(即秩至多为1的矩阵)。注意,外积运算符当然是不可交换的。定义自己的R函数将在编写自己的函数中进一步考虑。
As an artificial but cute example, consider the determinants of 2
by 2 matrices [a, b; c, d] where each entry is a
non-negative integer in the range 0, 1, ..., 9, that is a
digit.
作为一个人工但可爱的例子,考虑2乘2矩阵[a,B; c,d]的行列式,其中每个条目是范围0,1,.,9,这是一个数字。
The problem is to find the determinants, ad - bc, of all possible
matrices of this form and represent the frequency with which each value
occurs as a high density plot. This amounts to finding the
probability distribution of the determinant if each digit is chosen
independently and uniformly at random.
问题是找到所有可能的矩阵的行列式ad - bc,并将每个值出现的频率表示为高密度图。这相当于找到行列式的概率分布,如果每个数字是独立和均匀随机选择的。
A neat way of doing this uses the outer()
function twice:
一个简单的方法是使用两次 outer()
函数:
> d <- outer(0:9, 0:9) > fr <- table(outer(d, d, "-")) > plot(fr, xlab="Determinant", ylab="Frequency")
Notice that plot()
here uses a histogram like plot method, because
it “sees” that fr
is of class "table"
.
The “obvious” way of doing this problem with for
loops, to be
discussed in Grouping, loops and conditional execution, is so inefficient as
to be impractical.
请注意,这里的 plot()
使用了类似于plot方法的直方图,因为它“看到” fr
属于类 "table"
。用 for
循环来解决这个问题的“显而易见”的方法,将在循环,循环和条件执行中讨论,是如此低效以至于不切实际。
It is also perhaps surprising that about 1 in 20 such matrices is
singular.
也许令人惊讶的是,大约每20个这样的矩阵中就有1个是奇异的。
The function aperm(a, perm)
may be used to permute an array, a
. The argument perm
must be a permutation of the integers {1, ..., k}, where
k is the number of subscripts in a
. The result of the
function is an array of the same size as a
but with old dimension
given by perm[j]
becoming the new j
-th dimension. The
easiest way to think of this operation is as a generalization of
transposition for matrices. Indeed if A
is a matrix, (that is, a
doubly subscripted array) then B
given by
函数 aperm(a, perm)
可用于置换数组 a
。参数 perm
必须是整数{1,.,k},其中k是 a
中的下标的数量。该函数的结果是一个与 a
大小相同的数组,但由 perm[j]
给出的旧维度成为新的第 j
维度。最简单的方法来考虑这个操作是作为一个推广的转置矩阵。事实上,如果 A
是一个矩阵,(也就是说,一个双下标数组),那么 B
由下式给出:
> B <- aperm(A, c(2,1))
is just the transpose of A
. For this special case a simpler
function t()
is available, so we could have used B <- t(A)
.
是 A
的转置。对于这种特殊情况,可以使用更简单的函数 t()
,因此我们可以使用 B <- t(A)
。
As noted above, a matrix is just an array with two subscripts. However
it is such an important special case it needs a separate discussion.
R contains many operators and functions that are available only for
matrices. For example t(X)
is the matrix transpose function, as
noted above. The functions nrow(A)
and ncol(A)
give the
number of rows and columns in the matrix A
respectively.
如上所述,矩阵只是一个有两个下标的数组。然而,这是一个如此重要的特殊情况,需要单独讨论。R包含许多仅适用于矩阵的运算符和函数。例如, t(X)
是矩阵转置函数,如上所述。函数 nrow(A)
和 ncol(A)
分别给出矩阵 A
中的行数和列数。
The operator %*%
is used for matrix multiplication.
An n by 1 or 1 by n matrix may of course be
used as an n-vector if in the context such is appropriate.
运算符 %*%
用于矩阵乘法。当然,如果在上下文中合适的话,n乘1或1乘n矩阵可以用作n向量。
Conversely, vectors which occur in matrix multiplication expressions are
automatically promoted either to row or column vectors, whichever is
multiplicatively coherent, if possible, (although this is not always
unambiguously possible, as we see later).
相反,如果可能的话,矩阵乘法表达式中出现的向量会自动提升为行向量或列向量,无论哪一个是乘法一致的(尽管这并不总是明确可行的,正如我们稍后看到的那样)。
If, for example, A
and B
are square matrices of the same
size, then
例如,如果 A
和 B
是相同大小的方阵,则
> A * B
is the matrix of element by element products and
是元素与元素乘积的矩阵,
> A %*% B
is the matrix product. If x
is a vector, then
是矩阵乘积。如果 x
是向量,则
> x %*% A %*% x
is a quadratic form.16
是一个二次型。 16
The function crossprod()
forms “cross products”, meaning that
crossprod(X, y)
is the same as t(X) %*% y
but the
operation is more efficient. If the second argument to
crossprod()
is omitted it is taken to be the same as the first.
函数 crossprod()
形成“叉积”,这意味着 crossprod(X, y)
与 t(X) %*% y
相同,但操作更有效。如果 crossprod()
的第二个参数被省略,它将被视为与第一个参数相同。
The meaning of diag()
depends on its argument. diag(v)
,
where v
is a vector, gives a diagonal matrix with elements of the
vector as the diagonal entries. On the other hand diag(M)
, where
M
is a matrix, gives the vector of main diagonal entries of
M
. This is the same convention as that used for diag()
in
MATLAB. Also, somewhat confusingly, if k
is a single
numeric value then diag(k)
is the k
by k
identity
matrix!
diag()
的含义取决于它的参数。 diag(v)
,其中 v
是一个向量,给出了一个对角矩阵,其中向量的元素作为对角元素。另一方面,其中 M
是矩阵, diag(M)
给出 M
的主对角线项的向量。这与MATLAB中的 diag()
使用的约定相同。此外,有点令人困惑的是,如果 k
是一个单一的数值,那么 diag(k)
是 k
乘 k
的单位矩阵!
Solving linear equations is the inverse of matrix multiplication.
When after
解线性方程组是矩阵乘法的逆过程。等到
> b <- A %*% x
only A
and b
are given, the vector x
is the
solution of that linear equation system. In R,
仅给出了 A
和 b
,矢量 x
是该线性方程组的解。在R中,
> solve(A,b)
solves the system, returning x
(up to some accuracy loss).
Note that in linear algebra, formally
x = A^{-1} %*% b
where
A^{-1}
denotes the inverse of
A
, which can be computed by
求解系统,返回 x
(直到一些精度损失)。请注意,在线性代数中,形式上是 x = A^{-1} %*% b
,其中 A^{-1}
表示 A
的逆,可以通过以下方式计算:
solve(A)
but rarely is needed. Numerically, it is both inefficient and
potentially unstable to compute x <- solve(A) %*% b
instead of
solve(A,b)
.
但很少被需要。从数值上讲,计算 x <- solve(A) %*% b
而不是 solve(A,b)
既低效又可能不稳定。
The quadratic form x %*% A^{-1} %*%
x
which is used in multivariate computations, should be computed by
something like17 x %*% solve(A,x)
, rather
than computing the inverse of A
.
在多变量计算中使用的二次型 x %*% A^{-1} %*%
x
应该通过类似于 17 x %*% solve(A,x)
的东西来计算,而不是计算 A
的逆。
The function eigen(Sm)
calculates the eigenvalues and
eigenvectors of a symmetric matrix Sm
. The result of this
function is a list of two components named values
and
vectors
. The assignment
函数 eigen(Sm)
计算对称矩阵 Sm
的特征值和特征向量。此函数的结果是两个名为 values
和 vectors
的组件的列表。转让
> ev <- eigen(Sm)
will assign this list to ev
. Then ev$val
is the vector of
eigenvalues of Sm
and ev$vec
is the matrix of
corresponding eigenvectors. Had we only needed the eigenvalues we could
have used the assignment:
将此列表分配给 ev
。则 ev$val
是 Sm
的特征值的向量,并且 ev$vec
是对应的特征向量的矩阵。如果我们只需要特征值,我们可以使用分配:
> evals <- eigen(Sm)$values
evals
now holds the vector of eigenvalues and the second
component is discarded. If the expression
evals
现在保存特征值的向量,并且丢弃第二个分量。如果表达式
> eigen(Sm)
is used by itself as a command the two components are printed, with
their names. For large matrices it is better to avoid computing the
eigenvectors if they are not needed by using the expression
本身用作命令,打印两个组件及其名称。对于大型矩阵,如果不需要使用以下表达式,则最好避免计算特征向量:
> evals <- eigen(Sm, only.values = TRUE)$values
The function svd(M)
takes an arbitrary matrix argument, M
,
and calculates the singular value decomposition of M
. This
consists of a matrix of orthonormal columns U
with the same
column space as M
, a second matrix of orthonormal columns
V
whose column space is the row space of M
and a diagonal
matrix of positive entries D
such that M = U %*% D %*%
t(V)
. D
is actually returned as a vector of the diagonal
elements. The result of svd(M)
is actually a list of three
components named d
, u
and v
, with evident meanings.
函数 svd(M)
采用任意矩阵参数 M
,并计算 M
的奇异值分解。这包括具有与 M
相同列空间的正交归一化列 U
的矩阵、其列空间是 M
的行空间的正交归一化列 V
的第二矩阵以及正项 D
的对角矩阵,使得 M = U %*% D %*%
t(V)
。 D
实际上是作为对角元素的向量返回的。 svd(M)
的结果实际上是一个名为 d
, u
和 v
的三个组件的列表,具有明显的含义。
If M
is in fact square, then, it is not hard to see that
如果 M
实际上是正方形,那么,不难看出,
> absdetM <- prod(svd(M)$d)
calculates the absolute value of the determinant of M
. If this
calculation were needed often with a variety of matrices it could be
defined as an R function
计算 M
的行列式的绝对值。如果这种计算需要经常与各种矩阵,它可以被定义为一个R函数
> absdet <- function(M) prod(svd(M)$d)
after which we could use absdet()
as just another R function.
As a further trivial but potentially useful example, you might like to
consider writing a function, say tr()
, to calculate the trace of
a square matrix. [Hint: You will not need to use an explicit loop.
Look again at the diag()
function.]
之后我们可以使用 absdet()
作为另一个R函数。作为一个更简单但可能有用的例子,你可能会考虑编写一个函数,比如 tr()
,来计算一个方阵的迹。[Hint:您不需要使用显式循环。再看看 diag()
函数。
R has a builtin function det
to calculate a determinant,
including the sign, and another, determinant
, to give the sign
and modulus (optionally on log scale),
R有一个内置函数 det
来计算行列式,包括符号,另一个内置函数 determinant
来给出符号和模数(可选对数标度),
The function lsfit()
returns a list giving results of a least
squares fitting procedure. An assignment such as
函数 lsfit()
返回一个列表,给出最小二乘拟合过程的结果。一项任务,如
> ans <- lsfit(X, y)
gives the results of a least squares fit where y
is the vector of
observations and X
is the design matrix. See the help facility
for more details, and also for the follow-up function ls.diag()
for, among other things, regression diagnostics. Note that a grand mean
term is automatically included and need not be included explicitly as a
column of X
. Further note that you almost always will prefer
using lm(.)
(see Linear models) to lsfit()
for
regression modelling.
给出了最小二乘拟合的结果,其中 y
是观测向量, X
是设计矩阵。请参阅帮助工具以了解更多详细信息,以及用于回归诊断等的后续函数 ls.diag()
。请注意,总平均值项会自动包含,并且不需要显式包含为 X
列。进一步注意,您几乎总是更喜欢使用 lm(.)
(请参阅线性模型)而不是 lsfit()
来进行回归建模。
Another closely related function is qr()
and its allies.
Consider the following assignments
另一个密切相关的功能是 qr()
及其盟友。考虑以下分配
> Xplus <- qr(X) > b <- qr.coef(Xplus, y) > fit <- qr.fitted(Xplus, y) > res <- qr.resid(Xplus, y)
These compute the orthogonal projection of y
onto the range of
X
in fit
, the projection onto the orthogonal complement in
res
and the coefficient vector for the projection in b
,
that is, b
is essentially the result of the MATLAB
‘backslash’ operator.
这些计算 y
到 fit
中的 X
的范围上的正交投影,到 res
中的正交补上的投影以及 b
中的投影的系数向量,即 b
本质上是MATLAB“反斜杠”运算符的结果。
It is not assumed that X
has full column rank. Redundancies will
be discovered and removed as they are found.
不假定 X
具有完整列秩。冗余将被发现并删除,因为他们被发现。
This alternative is the older, low-level way to perform least squares
calculations. Although still useful in some contexts, it would now
generally be replaced by the statistical models features, as will be
discussed in Statistical models in R.
此替代方法是执行最小二乘计算的较旧的低级方法。虽然在某些情况下仍然有用,但现在它通常会被统计模型特征所取代,正如R中的统计模型中所讨论的那样。
cbind()
and rbind()
¶cbind()
和 rbind()
¶As we have already seen informally, matrices can be built up from other
vectors and matrices by the functions cbind()
and rbind()
.
Roughly cbind()
forms matrices by binding together matrices
horizontally, or column-wise, and rbind()
vertically, or
row-wise.
正如我们已经非正式地看到的,矩阵可以通过函数 cbind()
和 rbind()
从其他向量和矩阵建立。大致上, cbind()
通过水平或列方式将矩阵绑定在一起来形成矩阵,而 rbind()
垂直或行方式。
In the assignment 在转让中
> X <- cbind(arg_1, arg_2, arg_3, ...)
the arguments to cbind()
must be either vectors of any length, or
matrices with the same column size, that is the same number of rows.
The result is a matrix with the concatenated arguments arg_1,
arg_2, … forming the columns.
cbind()
的参数必须是任意长度的向量,或者是具有相同列大小(即相同行数)的矩阵。结果是一个矩阵,其中参数arg_1,arg_2,.串联起来形成列。
If some of the arguments to cbind()
are vectors they may be
shorter than the column size of any matrices present, in which case they
are cyclically extended to match the matrix column size (or the length
of the longest vector if no matrices are given).
如果 cbind()
的一些参数是向量,它们可能比存在的任何矩阵的列大小都短,在这种情况下,它们被循环扩展以匹配矩阵列大小(或最长向量的长度,如果没有给出矩阵)。
The function rbind()
does the corresponding operation for rows.
In this case any vector argument, possibly cyclically extended, are of
course taken as row vectors.
函数 rbind()
对行执行相应的操作。在这种情况下,任何可能循环扩展的向量参数当然都被视为行向量。
Suppose X1
and X2
have the same number of rows. To
combine these by columns into a matrix X
, together with an
initial column of 1
s we can use
假设 X1
和 X2
具有相同的行数。为了将这些按列联合收割机组合成矩阵 X
,连同初始列 1
,我们可以使用
> X <- cbind(1, X1, X2)
The result of rbind()
or cbind()
always has matrix status.
Hence cbind(x)
and rbind(x)
are possibly the simplest ways
explicitly to allow the vector x
to be treated as a column or row
matrix respectively.
rbind()
或 cbind()
的结果始终具有矩阵状态。因此, cbind(x)
和 rbind(x)
可能是明确允许向量 x
分别被视为列矩阵或行矩阵的最简单方式。
c()
, with arrays ¶c()
与数组¶It should be noted that whereas cbind()
and rbind()
are
concatenation functions that respect dim
attributes, the basic
c()
function does not, but rather clears numeric objects of all
dim
and dimnames
attributes. This is occasionally useful
in its own right.
应该注意的是,尽管 cbind()
和 rbind()
是遵守 dim
属性的串联函数,但基本的 c()
函数不遵守,而是清除所有 dim
和 dimnames
属性的数字对象。这本身有时是有用的。
The official way to coerce an array back to a simple vector object is to
use as.vector()
将数组强制转换为简单向量对象的正式方法是使用 as.vector()
> vec <- as.vector(X)
However a similar result can be achieved by using c()
with just
one argument, simply for this side-effect:
然而,通过使用只有一个参数的 c()
可以实现类似的结果,只是为了这个副作用:
> vec <- c(X)
There are slight differences between the two, but ultimately the choice
between them is largely a matter of style (with the former being
preferable).
两者之间有细微的差异,但最终在它们之间的选择在很大程度上是一个风格问题(前者更可取)。
Recall that a factor defines a partition into groups. Similarly a pair
of factors defines a two way cross classification, and so on.
The function table()
allows frequency tables to be calculated
from equal length factors. If there are k factor arguments,
the result is a k-way array of frequencies.
回想一下,因子将分区定义为多个组。类似地,一对因子定义了一个双向交叉分类,等等。函数 table()
允许从等长因子计算频率表。如果有k个因子参数,则结果是k路频率数组。
Suppose, for example, that statef
is a factor giving the state
code for each entry in a data vector. The assignment
例如,假设 statef
是为数据向量中的每个条目给出状态代码的因子。转让
> statefr <- table(statef)
gives in statefr
a table of frequencies of each state in the
sample. The frequencies are ordered and labelled by the levels
attribute of the factor. This simple case is equivalent to, but more
convenient than,
在 statefr
中给出了样本中每个状态的频率表。频率由因子的 levels
属性排序和标记。这个简单的例子相当于,但更方便,
> statefr <- tapply(statef, statef, length)
Further suppose that incomef
is a factor giving a suitably
defined “income class” for each entry in the data vector, for example
with the cut()
function:
进一步假设 incomef
是为数据向量中的每个条目给出适当定义的“收入类别”的因子,例如使用 cut()
函数:
> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef
Then to calculate a two-way table of frequencies:
然后计算一个双向频率表:
> table(incomef,statef) statef incomef act nsw nt qld sa tas vic wa (35,45] 1 1 0 1 0 0 1 0 (45,55] 1 1 1 1 2 0 1 3 (55,65] 0 3 1 3 2 2 2 1 (65,75] 0 1 0 0 0 0 1 0
Extension to higher-way frequency tables is immediate.
扩展到更高的方式频率表是立即的。
An R list is an object consisting of an ordered collection of
objects known as its components.
一个R列表是一个对象,由一个有序的对象集合组成,称为它的组件。
There is no particular need for the components to be of the same mode or
type, and, for example, a list could consist of a numeric vector, a
logical value, a matrix, a complex vector, a character array, a
function, and so on. Here is a simple example of how to make a list:
没有特别的必要让这些组件具有相同的模式或类型,例如,列表可以由数字向量、逻辑值、矩阵、复向量、字符数组、函数等组成。下面是一个如何创建列表的简单示例:
> Lst <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))
Components are always numbered and may always be referred to as
such. Thus if Lst
is the name of a list with four components,
these may be individually referred to as Lst[[1]]
,
Lst[[2]]
, Lst[[3]]
and Lst[[4]]
. If, further,
Lst[[4]]
is a vector subscripted array then Lst[[4]][1]
is
its first entry.
组件总是编号的,并可能总是这样称呼。因此,如果 Lst
是具有四个分量的列表的名称,则这些分量可以被单独地称为 Lst[[1]]
、 Lst[[2]]
、 Lst[[3]]
和 Lst[[4]]
。此外,如果 Lst[[4]]
是一个向量下标数组,那么 Lst[[4]][1]
是它的第一个条目。
If Lst
is a list, then the function length(Lst)
gives the
number of (top level) components it has.
如果 Lst
是一个列表,那么函数 length(Lst)
给出了它拥有的(顶级)组件的数量。
Components of lists may also be named, and in this case the
component may be referred to either by giving the component name as a
character string in place of the number in double square brackets, or,
more conveniently, by giving an expression of the form
也可以命名列表的组件,在这种情况下,可以通过将组件名称作为字符串来代替双方括号中的数字来引用组件,或者更方便地,通过给出以下形式的表达式来引用组件
> name$component_name
for the same thing.
为了同样的事
This is a very useful convention as it makes it easier to get the right
component if you forget the number.
这是一个非常有用的约定,因为如果您忘记了数字,它可以更容易地获得正确的组件。
So in the simple example given above:
在上面给出的简单例子中:
Lst$name
is the same as Lst[[1]]
and is the string
"Fred"
,
Lst$name
与 Lst[[1]]
相同,是字符串 "Fred"
,
Lst$wife
is the same as Lst[[2]]
and is the string
"Mary"
,
Lst$wife
与 Lst[[2]]
相同,是字符串 "Mary"
,
Lst$child.ages[1]
is the same as Lst[[4]][1]
and is the
number 4
.
Lst$child.ages[1]
与 Lst[[4]][1]
相同,是数字 4
。
Additionally, one can also use the names of the list components in
double square brackets, i.e., Lst[["name"]]
is the same as
Lst$name
. This is especially useful, when the name of the
component to be extracted is stored in another variable as in
此外,还可以在双方括号中使用列表组件的名称,即, Lst[["name"]]
与 Lst$name
相同。当要提取的组件的名称存储在另一个变量中时(如
> x <- "name"; Lst[[x]]
It is very important to distinguish Lst[[1]]
from Lst[1]
.
‘[[…]]
’ is the operator used to select a single
element, whereas ‘[…]
’ is a general subscripting
operator. Thus the former is the first object in the list
Lst
, and if it is a named list the name is not included.
The latter is a sublist of the list Lst
consisting of the
first entry only. If it is a named list, the names are transferred to
the sublist.
区分 Lst[[1]]
和 Lst[1]
是非常重要的。' [[…]]
'是用于选择单个元素的运算符,而' […]
'是通用下标运算符。因此,前者是列表 Lst
中的第一个对象,如果它是一个命名列表,则不包括名称。后者是仅由第一条目组成的列表 Lst
的子列表。如果它是一个命名列表,则将名称转移到子列表。
The names of components may be abbreviated down to the minimum number of
letters needed to identify them uniquely. Thus Lst$coefficients
may be minimally specified as Lst$coe
and Lst$covariance
as Lst$cov
.
组件的名称可以缩写成唯一标识它们所需的最小字母数。因此, Lst$coefficients
可以最低限度地被指定为 Lst$coe
并且 Lst$covariance
可以被指定为 Lst$cov
。
The vector of names is in fact simply an attribute of the list like any
other and may be handled as such. Other structures besides lists may,
of course, similarly be given a names attribute also.
名称向量实际上只是列表的一个属性,就像任何其他属性一样,可以这样处理。当然,除了列表之外的其他结构也可以类似地被赋予names属性。
New lists may be formed from existing objects by the function
list()
. An assignment of the form
新的列表可以通过函数 list()
从现有对象中形成。形式的赋值
> Lst <- list(name_1=object_1, ..., name_m=object_m)
sets up a list Lst
of m components using object_1,
…, object_m for the components and giving them names as
specified by the argument names, (which can be freely chosen). If these
names are omitted, the components are numbered only. The components
used to form the list are copied when forming the new list and
the originals are not affected.
使用object_1,...,object_m为组件设置一个包含m个组件的列表 Lst
,并根据参数名称指定它们的名称(可以自由选择)。如果省略这些名称,则仅对组件进行编号。用于形成列表的组件在形成新列表时被复制,并且原始组件不受影响。
Lists, like any subscripted object, can be extended by specifying
additional components. For example
列表,像任何下标对象一样,可以通过指定其他组件来扩展。例如
> Lst[5] <- list(matrix=Mat)
When the concatenation function c()
is given list arguments, the
result is an object of mode list also, whose components are those of the
argument lists joined together in sequence.
当连接函数 c()
被给予列表参数时,结果也是模式列表的对象,其组件是按顺序连接在一起的参数列表的组件。
> list.ABC <- c(list.A, list.B, list.C)
Recall that with vector objects as arguments the concatenation function
similarly joined together all arguments into a single vector structure.
In this case all other attributes, such as dim
attributes, are
discarded.
回想一下,以向量对象作为参数,串联函数类似地将所有参数连接在一起成为单个向量结构。在这种情况下,所有其他属性(如 dim
属性)都将被丢弃。
A data frame is a list with class "data.frame"
. There are
restrictions on lists that may be made into data frames, namely
数据帧是一个类为 "data.frame"
的列表。对可以被制成数据帧的列表有一些限制,即
A data frame may for many purposes be regarded as a matrix with columns
possibly of differing modes and attributes. It may be displayed in
matrix form, and its rows and columns extracted using matrix indexing
conventions.
一个数据框在许多情况下可以被看作是一个矩阵,其中的列可能具有不同的模式和属性。它可以以矩阵形式显示,并且使用矩阵索引约定提取其行和列。
attach()
and detach()
attach()
和 detach()
Objects satisfying the restrictions placed on the columns (components)
of a data frame may be used to form one using the function
data.frame
:
满足对数据框的列(组件)的限制的对象可用于使用函数 data.frame
形成一个数据框:
> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)
A list whose components conform to the restrictions of a data frame may
be coerced into a data frame using the function
as.data.frame()
可以使用函数 as.data.frame()
将其组件符合数据帧的限制的列表强制到数据帧中
The simplest way to construct a data frame from scratch is to use the
read.table()
function to read an entire data frame from an
external file. This is discussed further in Reading data from files.
从头开始构造数据帧的最简单方法是使用 read.table()
函数从外部文件读取整个数据帧。这将在从文件中阅读数据中进一步讨论。
attach()
and detach()
¶attach()
和 detach()
¶The $
notation, such as accountants$home
, for list
components is not always very convenient. A useful facility would be
somehow to make the components of a list or data frame temporarily
visible as variables under their component name, without the need to
quote the list name explicitly each time.
列表组件的 $
表示法(如 accountants$home
)并不总是很方便。一个有用的工具是以某种方式使列表或数据框的组件临时显示为组件名称下的变量,而不需要每次都显式引用列表名称。
The attach()
function takes a ‘database’ such as a list or data
frame as its argument. Thus suppose lentils
is a
data frame with three variables lentils$u
, lentils$v
,
lentils$w
. The attach
attach()
函数接受一个“数据库”,如列表或数据框作为其参数。因此,假设 lentils
是具有三个变量 lentils$u
、 lentils$v
、 lentils$w
的数据帧。附接
> attach(lentils)
places the data frame in the search path at position 2, and provided
there are no variables u
, v
or w
in position 1,
u
, v
and w
are available as variables from the data
frame in their own right. At this point an assignment such as
将数据帧放置在搜索路径中的位置2处,并且假设在位置1处没有变量 u
、 v
或 w
,则 u
、 v
和 w
可作为来自数据帧的变量以其自身的权利获得。在这一点上,
> u <- v+w
does not replace the component u
of the data frame, but rather
masks it with another variable u
in the workspace at
position 1 on the search path. To make a permanent change to the
data frame itself, the simplest way is to resort once again to the
$
notation:
不会替换数据框的组件 u
,而是在搜索路径上的位置1的工作区中使用另一个变量 u
将其屏蔽。要对数据框本身进行永久更改,最简单的方法是再次采用 $
表示法:
> lentils$u <- v+w
However the new value of component u
is not visible until the
data frame is detached and attached again.
但是,组件 u
的新值不可见,直到拆离并再次附着数据框。
To detach a data frame, use the function
要拆离数据框,请使用函数
> detach()
More precisely, this statement detaches from the search path the entity
currently at position 2. Thus in the present context the variables
u
, v
and w
would be no longer visible, except under
the list notation as lentils$u
and so on. Entities at positions
greater than 2 on the search path can be detached by giving their number
to detach
, but it is much safer to always use a name, for example
by detach(lentils)
or detach("lentils")
更准确地说,该语句将当前位于位置2的实体从搜索路径中分离出来。因此,在本上下文中,变量 u
、 v
和 w
将不再可见,除了在列表标记下为 lentils$u
等等。在搜索路径上大于2的位置处的实体可以通过将它们的编号赋予 detach
来分离,但是总是使用名称(例如,通过 detach(lentils)
或 detach("lentils")
)要安全得多
Note: In R lists and data frames can only be attached at position 2 or above, and what is attached is a copy of the original object. You can alter the attached values via
assign
, but the original list or data frame is unchanged.
注意:在R中,列表和数据框只能附加在位置2或以上,附加的是原始对象的副本。您可以通过assign
更改附加值,但原始列表或数据框不变。
A useful convention that allows you to work with many different problems
comfortably together in the same workspace is
有一个有用的约定可以让您在同一个工作区中轻松地处理许多不同的问题,
$
form of assignment, and
then detach()
;
$
赋值形式添加任何您希望保留以供将来引用的变量到数据框中,然后使用 detach()
;In this way it is quite simple to work with many problems in the same
directory, all of which have variables named x
, y
and
z
, for example.
通过这种方式,在同一个目录中处理许多问题非常简单,例如,所有这些问题都有名为 x
, y
和 z
的变量。
attach()
is a generic function that allows not only directories
and data frames to be attached to the search path, but other classes of
object as well. In particular any object of mode "list"
may be
attached in the same way:
attach()
是一个通用函数,它不仅允许将目录和数据框附加到搜索路径,还允许将其他类的对象附加到搜索路径。特别地,模式 "list"
的任何对象可以以相同的方式被附加:
> attach(any.old.list)
Anything that has been attached can be detached by detach
, by
position number or, preferably, by name.
任何已附加的内容都可以通过 detach
、位置号或名称进行分离。
The function search
shows the current search path and so is
a very useful way to keep track of which data frames and lists (and
packages) have been attached and detached. Initially it gives
函数 search
显示了当前的搜索路径,因此是一种非常有用的方法来跟踪哪些数据帧和列表(以及包)已经被附加和分离。最初,它提供
> search() [1] ".GlobalEnv" "Autoloads" "package:base"
where .GlobalEnv
is the workspace.18
其中 .GlobalEnv
是工作空间。 18
After lentils
is attached we have
在 lentils
连接后,我们有
> search() [1] ".GlobalEnv" "lentils" "Autoloads" "package:base" > ls(2) [1] "u" "v" "w"
and as we see ls
(or objects
) can be used to examine the
contents of any position on the search path.
如我们所见, ls
(或 objects
)可用于检查搜索路径上任何位置的内容。
Finally, we detach the data frame and confirm it has been removed from
the search path.
最后,我们分离数据框并确认它已从搜索路径中删除。
> detach("lentils") > search() [1] ".GlobalEnv" "Autoloads" "package:base"
Large data objects will usually be read as values from external files
rather than entered during an R session at the keyboard. R input
facilities are simple and their requirements are fairly strict and even
rather inflexible.
大型数据对象通常会从外部文件中读取值,而不是在R会话期间在键盘上输入。R输入设施很简单,它们的要求相当严格,甚至相当不灵活。
There is a clear presumption by the designers of
R that you will be able to modify your input files using other tools,
such as file editors or Perl19 to fit in with the
requirements of R. Generally this is very simple.
R的设计者有一个明确的假设,即您可以使用其他工具修改输入文件,例如文件编辑器或Perl 19 ,以适应R的要求。一般来说,这很简单。
If variables are to be held mainly in data frames, as we strongly
suggest they should be, an entire data frame can be read directly with
the read.table()
function. There is also a more primitive input
function, scan()
, that can be called directly.
如果变量主要保存在数据帧中,正如我们强烈建议的那样,可以直接使用 read.table()
函数读取整个数据帧。还有一个更原始的输入函数 scan()
,可以直接调用。
For more details on importing data into R and also exporting data,
see the R Data Import/Export manual.
有关将数据导入R以及导出数据的更多详细信息,请参阅R数据导入/导出手册。
read.table()
function read.table()
功能scan()
function scan()
功能read.table()
function ¶read.table()
函数¶To read an entire data frame directly, the external file will normally
have a special form.
为了直接读取整个数据帧,外部文件通常具有特殊的格式。
If the file has one fewer item in its first line than in its second, this
arrangement is presumed to be in force. So the first few lines of a file
to be read as a data frame might look as follows.
如果文件的第一行比第二行少一项,则假定这种安排有效。因此,作为数据帧读取的文件的前几行可能如下所示。
Input file form with names and row labels: Price Floor Area Rooms Age Cent.heat 01 52.00 111.0 830 5 6.2 no 02 54.75 128.0 710 5 7.5 no 03 57.50 101.0 1000 5 4.2 no 04 57.50 131.0 690 6 8.8 no 05 59.75 93.0 900 5 1.9 yes ...
By default numeric items (except row labels) are read as numeric
variables and non-numeric variables, such as Cent.heat
in the
example, as character variables. This can be changed if necessary.
默认情况下,数字项(行标签除外)被读取为数字变量,非数字变量(如示例中的 Cent.heat
)被读取为字符变量。如有必要,可以进行更改。
The function read.table()
can then be used to read the data frame
directly
然后可以使用函数 read.table()
直接读取数据帧
> HousePrice <- read.table("houses.data")
Often you will want to omit including the row labels directly and use the
default labels. In this case the file may omit the row label column as in
the following.
通常,您会希望省略直接包含行标签,而使用默认标签。在这种情况下,文件可能会省略行标签列,如下所示。
Input file form without row labels: Price Floor Area Rooms Age Cent.heat 52.00 111.0 830 5 6.2 no 54.75 128.0 710 5 7.5 no 57.50 101.0 1000 5 4.2 no 57.50 131.0 690 6 8.8 no 59.75 93.0 900 5 1.9 yes ...
The data frame may then be read as
然后,数据帧可以被读取为
> HousePrice <- read.table("houses.data", header=TRUE)
where the header=TRUE
option specifies that the first line is a
line of headings, and hence, by implication from the form of the file,
that no explicit row labels are given.
其中 header=TRUE
选项指定第一行是一行标题,因此,从文件的形式来看,没有明确的行标签。
scan()
function ¶scan()
函数¶Suppose the data vectors are of equal length and are to be read in
parallel. Further suppose that there are three vectors, the first of
mode character and the remaining two of mode numeric, and the file is
input.dat. The first step is to use scan()
to read in the
three vectors as a list, as follows
假设数据向量具有相等的长度并且将被并行读取。进一步假设有三个向量,第一个是模式字符,其余两个是模式数字,文件是 input.dat 。第一步是使用 scan()
将三个向量作为列表读入,如下所示
> inp <- scan("input.dat", list("",0,0))
The second argument is a dummy list structure that establishes the mode
of the three vectors to be read. The result, held in inp
, is a
list whose components are the three vectors read in. To separate the
data items into three separate vectors, use assignments like
第二个参数是一个伪列表结构,它建立了要读取的三个向量的模式。结果保存在 inp
中,是一个列表,其分量是读入的三个向量。若要将数据项分隔为三个单独的向量,请使用以下赋值方法,
> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]
More conveniently, the dummy list can have named components, in which
case the names can be used to access the vectors read in. For example
更方便的是,虚拟列表可以有命名的组件,在这种情况下,名称可以用来访问读入的向量。例如
> inp <- scan("input.dat", list(id="", x=0, y=0))
If you wish to access the variables separately they may either be
re-assigned to variables in the working frame:
如果您希望单独访问变量,则可以将它们重新分配给工作框架中的变量:
> label <- inp$id; x <- inp$x; y <- inp$y
or the list may be attached at position 2 of the search path
(see Attaching arbitrary lists).
或者列表可以附加在搜索路径的位置2(请参阅附加任意列表)。
If the second argument is a single value and not a list, a single vector
is read in, all components of which must be of the same mode as the
dummy value.
如果第二个参数是单个值而不是列表,则读入单个向量,其所有分量必须与伪值具有相同的模式。
> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)
There are more elaborate input facilities available and these are
detailed in the manuals.
还有更复杂的输入设施,这些都在手册中详细说明。
Around 100 datasets are supplied with R (in package datasets),
and others are available in packages (including the recommended packages
supplied with R). To see the list of datasets currently available
use
R提供了大约100个数据集(在包数据集中),其他数据集在包中可用(包括R提供的推荐包)。要查看当前可用的数据集列表,请使用
data()
All the datasets supplied with R are available directly by name.
However, many packages still use the obsolete convention in which
data
was also used to load datasets into R, for example
R提供的所有数据集都可以直接通过名称获得。然而,许多软件包仍然使用过时的约定,其中 data
也用于将数据集加载到R中,例如
data(infert)
and this can still be used with the standard packages (as in this
example). In most cases this will load an R object of the same name.
However, in a few cases it loads several objects, so see the on-line
help for the object to see what to expect.
并且这仍然可以与标准包一起使用(如在该示例中)。在大多数情况下,这将加载同名的R对象。但是,在少数情况下,它会加载多个对象,因此请参阅对象的联机帮助以了解预期内容。
To access data from a particular package, use the package
argument, for example
要访问特定包中的数据,请使用 package
参数,例如
data(package="rpart") data(Puromycin, package="datasets")
If a package has been attached by library
, its datasets are
automatically included in the search.
如果某个包已由 library
附加,则其数据集将自动包含在搜索中。
User-contributed packages can be a rich source of datasets.
用户贡献的包可以是数据集的丰富来源。
When invoked on a data frame or matrix, edit
brings up a separate
spreadsheet-like environment for editing. This is useful for making
small changes once a data set has been read. The command
当在数据框或矩阵上调用时, edit
会带来一个单独的类似电子表格的编辑环境。这对于在读取数据集后进行小的更改非常有用。命令
> xnew <- edit(xold)
will allow you to edit your data set xold
, and on completion the
changed object is assigned to xnew
. If you want to alter the
original dataset xold
, the simplest way is to use
fix(xold)
, which is equivalent to xold <- edit(xold)
.
将允许您编辑数据集 xold
,完成后,更改的对象将分配给 xnew
。如果你想改变原始数据集 xold
,最简单的方法是使用 fix(xold)
,它相当于 xold <- edit(xold)
。
Use 使用
> xnew <- edit(data.frame())
to enter new data via the spreadsheet interface.
通过电子表格界面输入新数据。
One convenient use of R is to provide a comprehensive set of
statistical tables. Functions are provided to evaluate the cumulative
distribution function P(X <= x),
the probability density function and the quantile function (given
q, the smallest x such that P(X <= x) > q),
and to simulate from the distribution.
R的一个方便用途是提供一套全面的统计表。提供函数来评估累积分布函数P(X <= x)、概率密度函数和分位数函数(给定q,使得P(X <= x)> q的最小x),并从分布进行模拟。
Distribution 分布 R name R名称 additional arguments 补充论点 beta beta
shape1, shape2, ncp
binomial 二项式 binom
size, prob
Cauchy cauchy
location, scale
chi-squared 卡方 chisq
df, ncp
exponential 指数 exp
rate
F f
df1, df2, ncp
gamma 伽马 gamma
shape, scale
geometric 几何 geom
prob
hypergeometric 超几何 hyper
m, n, k
log-normal 对数正 lnorm
meanlog, sdlog
logistic logis
location, scale
negative binomial 负二项 nbinom
size, prob
normal 正常 norm
mean, sd
Poisson 泊松 pois
lambda
signed rank 符号秩 signrank
n
Student’s t 学生t t
df, ncp
uniform 均匀 unif
min, max
Weibull weibull
shape, scale
Wilcoxon wilcox
m, n
Prefix the name given here by ‘d’ for the density, ‘p’ for the
CDF, ‘q’ for the quantile function and ‘r’ for simulation
(random deviates). The first argument is x
for
dxxx
, q
for pxxx
, p
for
qxxx
and n
for rxxx
(except for
rhyper
, rsignrank
and rwilcox
, for which it is
nn
). In not quite all cases is the non-centrality parameter
ncp
currently available: see the on-line help for details.
在这里给出的名称前面加上' d '表示密度,' p '表示CDF,' q '表示分位数函数,' r '表示模拟(随机偏差)。第一个参数是 dxxx
的 x
, pxxx
的 q
, qxxx
的 p
和 rxxx
的 n
(除了 rhyper
, rsignrank
和 rwilcox
,它是 nn
)。在不完全所有的情况下,非中心性参数 ncp
当前可用:有关详细信息,请参阅在线帮助。
The pxxx
and qxxx
functions all have logical
arguments lower.tail
and log.p
and the dxxx
ones have log
. This allows, e.g., getting the cumulative (or
“integrated”) hazard function, H(t) = - log(1 - F(t)), by
pxxx
和 qxxx
函数都有逻辑参数 lower.tail
和 log.p
, dxxx
函数有 log
。这允许,例如,得到累积(或“积分”)风险函数,H(t)= - log(1 - F(t)),通过
- pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)
or more accurate log-likelihoods (by dxxx(..., log =
TRUE)
), directly.
或更精确的对数似然(通过 dxxx(..., log =
TRUE)
)。
In addition there are functions ptukey
and qtukey
for the
distribution of the studentized range of samples from a normal
distribution, and dmultinom
and rmultinom
for the
multinomial distribution. Further distributions are available in
contributed packages, notably SuppDists.
此外,还有函数 ptukey
和 qtukey
用于来自正态分布的样本的学生化范围的分布,以及用于多项分布的函数 dmultinom
和 rmultinom
。更多的发行版可以在贡献包中找到,特别是SuppDists。
Here are some examples
这里有一些例子
> ## 2-tailed p-value for t distribution > 2*pt(-2.43, df = 13) > ## upper 1% point for an F(2, 7) distribution > qf(0.01, 2, 7, lower.tail = FALSE)
See the on-line help on RNG
for how random-number generation is
done in R.
请参阅 RNG
上的在线帮助,了解如何在R中生成随机数。
Given a (univariate) set of data we can examine its distribution in a
large number of ways. The simplest is to examine the numbers. Two
slightly different summaries are given by summary
and
fivenum
and a display of the numbers by stem
(a “stem and leaf” plot).
给定一组(单变量)数据,我们可以用很多方法来检查它的分布。最简单的方法是检查数字。 summary
和 fivenum
给出了两个略有不同的摘要, stem
显示了数字(“茎和叶”图)。
> attach(faithful) > summary(eruptions) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.600 2.163 4.000 3.488 4.454 5.100 > fivenum(eruptions) [1] 1.6000 2.1585 4.0000 4.4585 5.1000 > stem(eruptions) The decimal point is 1 digit(s) to the left of the | 16 | 070355555588 18 | 000022233333335577777777888822335777888 20 | 00002223378800035778 22 | 0002335578023578 24 | 00228 26 | 23 28 | 080 30 | 7 32 | 2337 34 | 250077 36 | 0000823577 38 | 2333335582225577 40 | 0000003357788888002233555577778 42 | 03335555778800233333555577778 44 | 02222335557780000000023333357778888 46 | 0000233357700000023578 48 | 00000022335800333 50 | 0370
A stem-and-leaf plot is like a histogram, and R has a function
hist
to plot histograms.
茎叶图就像直方图,R有一个函数 hist
来绘制直方图。
> hist(eruptions) ## make the bins smaller, make a plot of density > hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE) > lines(density(eruptions, bw=0.1)) > rug(eruptions) # show the actual data points
More elegant density plots can be made by density
, and we added a
line produced by density
in this example. The bandwidth
bw
was chosen by trial-and-error as the default gives too much
smoothing (it usually does for “interesting” densities). (Better
automated methods of bandwidth choice are available, and in this example
bw = "SJ"
gives a good result.)
更优雅的密度图可以由 density
绘制,我们在这个例子中添加了一条由 density
生成的线。带宽 bw
是通过试错法选择的,因为默认值提供了太多的平滑(它通常用于“感兴趣”的密度)。(有更好的自动化带宽选择方法,在本例中, bw = "SJ"
给出了一个很好的结果。)
We can plot the empirical cumulative distribution function by using the
function ecdf
.
我们可以使用函数 ecdf
绘制经验累积分布函数。
> plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)
This distribution is obviously far from any standard distribution.
How about the right-hand mode, say eruptions of longer than 3 minutes?
Let us fit a normal distribution and overlay the fitted CDF.
这种分布显然与任何标准分布都相去甚远。右手模式怎么样,比如喷发时间超过3分钟?让我们拟合一个正态分布并覆盖拟合的CDF。
> long <- eruptions[eruptions > 3] > plot(ecdf(long), do.points=FALSE, verticals=TRUE) > x <- seq(3, 5.4, 0.01) > lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)
Quantile-quantile (Q-Q) plots can help us examine this more carefully.
分位数-分位数(Q-Q)图可以帮助我们更仔细地检查这一点。
par(pty="s") # arrange for a square figure region qqnorm(long); qqline(long)
which shows a reasonable fit but a shorter right tail than one would
expect from a normal distribution. Let us compare this with some
simulated data from a t distribution
这显示了合理的拟合,但比正态分布预期的右尾短。让我们将其与来自t分布的一些模拟数据进行比较
x <- rt(250, df = 5) qqnorm(x); qqline(x)
which will usually (if it is a random sample) show longer tails than
expected for a normal. We can make a Q-Q plot against the generating
distribution by
其通常(如果是随机样本)将显示比正常情况下预期的更长的尾部。我们可以通过以下方式绘制生成分布的Q-Q图:
qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn") qqline(x)
Finally, we might want a more formal test of agreement with normality
(or not). R provides the Shapiro-Wilk test
最后,我们可能需要一个更正式的检验是否符合正态性。R提供Shapiro-Wilk检验
> shapiro.test(long) Shapiro-Wilk normality test data: long W = 0.9793, p-value = 0.01052
and the Kolmogorov-Smirnov test
Kolmogorov-Smirnov检验
> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long))) One-sample Kolmogorov-Smirnov test data: long D = 0.0661, p-value = 0.4284 alternative hypothesis: two.sided
(Note that the distribution theory is not valid here as we
have estimated the parameters of the normal distribution from the same
sample.)
(Note分布理论在这里是无效的,因为我们已经从相同的样本中估计了正态分布的参数。
So far we have compared a single sample to a normal distribution. A
much more common operation is to compare aspects of two samples. Note
that in R, all “classical” tests including the ones used below are
in package stats which is normally loaded.
到目前为止,我们已经比较了单个样本和正态分布。一个更常见的操作是比较两个样本的各个方面。请注意,在R中,所有“经典”测试,包括下面使用的测试,都在通常加载的包stats中。
Consider the following sets of data on the latent heat of the fusion of
ice (cal/gm) from Rice (1995, p.490)
考虑以下几组来自Rice(1995年,第490页)的关于冰融合潜热(cal/gm)的数据。
Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
Boxplots provide a simple graphical comparison of the two samples.
箱形图提供了两个样品的简单图形比较。
A <- scan() 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02 B <- scan() 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97 boxplot(A, B)
which indicates that the first group tends to give higher results than
the second.
这表明第一组倾向于给出比第二组更高的结果。
To test for the equality of the means of the two examples, we can use
an unpaired t-test by
为了检验两个例子的平均值是否相等,我们可以使用非配对t检验,
> t.test(A, B) Welch Two Sample t-test data: A and B t = 3.2499, df = 12.027, p-value = 0.00694 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01385526 0.07018320 sample estimates: mean of x mean of y 80.02077 79.97875
which does indicate a significant difference, assuming normality. By
default the R function does not assume equality of variances in the
two samples.
We can use the F test to test for equality in the variances,
provided that the two samples are from normal populations.
这确实表明了一个显著的差异,假设正态性。默认情况下,R函数不假设两个样本的方差相等。我们可以使用F检验来检验方差的相等性,前提是两个样本来自正态总体。
> var.test(A, B) F test to compare two variances data: A and B F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.1251097 2.1052687 sample estimates: ratio of variances 0.5837405
which shows no evidence of a significant difference, and so we can use
the classical t-test that assumes equality of the variances.
这表明没有显著差异的证据,因此我们可以使用假设方差相等的经典t检验。
> t.test(A, B, var.equal=TRUE) Two Sample t-test data: A and B t = 3.4722, df = 19, p-value = 0.002551 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.01669058 0.06734788 sample estimates: mean of x mean of y 80.02077 79.97875
All these tests assume normality of the two samples. The two-sample
Wilcoxon (or Mann-Whitney) test only assumes a common continuous
distribution under the null hypothesis.
所有这些检验均假定两个样本呈正态分布。双样本Wilcoxon(或Mann-Whitney)检验仅在零假设下假设共同连续分布。
> wilcox.test(A, B) Wilcoxon rank sum test with continuity correction data: A and B W = 89, p-value = 0.007497 alternative hypothesis: true location shift is not equal to 0 Warning message: Cannot compute exact p-value with ties in: wilcox.test(A, B)
Note the warning: there are several ties in each sample, which suggests
strongly that these data are from a discrete distribution (probably due
to rounding).
注意警告:每一个样本都有几个联系,这强烈表明这些数据来自离散分布(可能是由于四舍五入)。
There are several ways to compare graphically the two samples. We have
already seen a pair of boxplots. The following
有几种方法可以用图形比较两个样本。我们已经看到了两个箱线图。以下
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
will show the two empirical CDFs, and qqplot
will perform a Q-Q
plot of the two samples. The Kolmogorov-Smirnov test is of the maximal
vertical distance between the two ecdfs, assuming a common continuous
distribution:
将显示两个经验CDF, qqplot
将执行两个样本的Q-Q图。Kolmogorov-Smirnov检验是两个ecdf之间的最大垂直距离,假设一个共同的连续分布:
> ks.test(A, B) Two-sample Kolmogorov-Smirnov test data: A and B D = 0.5962, p-value = 0.05919 alternative hypothesis: two-sided Warning message: cannot compute correct p-values with ties in: ks.test(A, B)
R is an expression language in the sense that its only command type
is a function or expression which returns a result.
R是一种表达式语言,因为它唯一的命令类型是返回结果的函数或表达式。
Even an assignment
is an expression whose result is the value assigned, and it may be used
wherever any expression may be used; in particular multiple assignments
are possible.
甚至赋值也是一个表达式,其结果是赋值的值,并且它可以在任何可能使用表达式的地方使用;特别是多个赋值是可能的。
Commands may be grouped together in braces, {expr_1;
…; expr_m}
, in which case the value of the group
is the result of the last expression in the group evaluated. Since such
a group is also an expression it may, for example, be itself included in
parentheses and used as part of an even larger expression, and so on.
命令可以在大括号 {expr_1;
…; expr_m}
中组合在一起,在这种情况下,组的值是组中最后一个表达式的结果。因为这样的一个群也是一个表达式,所以它本身可以包含在括号中,并作为一个更大的表达式的一部分使用,等等。
if
statementsif
语句for
loops, repeat
and while
for
循环, repeat
和 while
if
statements ¶if
语句¶The language has available a conditional construction of the form
该语言具有以下形式的条件结构:
> if (expr_1) expr_2 else expr_3
where expr_1 must evaluate to a single logical value and the
result of the entire expression is then evident.
其中expr_1必须计算为单个逻辑值,然后整个表达式的结果是明显的。
The “short-circuit” operators &&
and ||
are often used
as part of the condition in an if
statement. Whereas &
and |
apply element-wise to vectors, &&
and ||
apply to vectors of length one, and only evaluate their second argument
if necessary.
“短路”运算符 &&
和 ||
通常用作 if
语句中条件的一部分。而 &
和 |
适用于向量的元素方式, &&
和 ||
适用于长度为1的向量,并且仅在必要时评估其第二个参数。
There is a vectorized version of the if
/else
construct,
the ifelse
function. This has the form ifelse(condition, a,
b)
and returns a vector of the same length as condition
, with
elements a[i]
if condition[i]
is true, otherwise
b[i]
(where a
and b
are recycled as necessary).
有一个矢量化版本的 if
/ else
结构,即 ifelse
函数。它的形式为 ifelse(condition, a,
b)
,返回一个与 condition
长度相同的向量,如果 condition[i]
为真,则返回元素 a[i]
,否则返回元素 b[i]
(其中 a
和 b
根据需要回收)。
for
loops, repeat
and while
¶for
循环, repeat
和 while
¶There is also a for
loop construction which has the form
还有一个 for
循环结构,其形式为
> for (name
in expr_1) expr_2
where name
is the loop variable. expr_1 is a
vector expression, (often a sequence like 1:20
), and
expr_2 is often a grouped expression with its sub-expressions
written in terms of the dummy name. expr_2 is repeatedly
evaluated as name ranges through the values in the vector result
of expr_1.
其中 name
是循环变量。expr_1是一个向量表达式,(通常是一个序列,如 1:20
),而expr_2通常是一个分组表达式,其子表达式用哑名来写。expr_2被重复地评估为贯穿expr_1的向量结果中的值的名称范围。
As an example, suppose ind
is a vector of class indicators and we
wish to produce separate plots of y
versus x
within
classes. One possibility here is to use coplot()
,20
which will produce an array of plots corresponding to each level of the
factor. Another way to do this, now putting all plots on the one
display, is as follows:
举个例子,假设 ind
是一个类指示符向量,我们希望在类中生成 y
与 x
的单独图。这里的一种可能性是使用 coplot()
、 20 ,这将产生对应于因子的每个水平的图的阵列。另一种方法是将所有图放在一个显示器上,如下所示:
> xc <- split(x, ind) > yc <- split(y, ind) > for (i in 1:length(yc)) { plot(xc[[i]], yc[[i]]) abline(lsfit(xc[[i]], yc[[i]])) }
(Note the function split()
which produces a list of vectors
obtained by splitting a larger vector according to the classes specified
by a factor. This is a useful function, mostly used in connection
with boxplots. See the help
facility for further details.)
(Note函数 split()
,其产生通过根据由因子指定的类分割较大向量而获得的向量列表。这是一个有用的函数,主要用于箱线图。更多详情请参见 help
设施。)
Warning:
for()
loops are used in R code much less often than in compiled languages. Code that takes a ‘whole object’ view is likely to be both clearer and faster in R.
警告:R代码中使用for()
循环的频率远远低于编译语言。采用“整个对象”视图的代码在R中可能更清晰,更快。
Other looping facilities include the
其他循环工具包括
> repeat expr
statement and the 声明和
> while (condition) expr
statement. 声明
The break
statement can be used to terminate any loop, possibly
abnormally. This is the only way to terminate repeat
loops.
break
语句可以用来终止任何循环,可能是异常的。这是终止 repeat
循环的唯一方法。
The next
statement can be used to discontinue one particular
cycle and skip to the “next”.
next
语句可用于中断一个特定的循环并跳到“下一个”。
Control statements are most often used in connection with
functions which are discussed in Writing your own functions, and where more examples will emerge.
控制语句最常与函数结合使用,在编写自己的函数中讨论过,在那里会出现更多的例子。
As we have seen informally along the way, the R language allows the
user to create objects of mode function. These are true R
functions that are stored in a special internal form and may be used in
further expressions and so on.
正如我们在沿着非正式地看到的那样,R语言允许用户创建模式函数的对象。这些是真正的R函数,以特殊的内部形式存储,可以在进一步的表达式中使用。
In the process, the language gains
enormously in power, convenience and elegance, and learning to write
useful functions is one of the main ways to make your use of R
comfortable and productive.
在这个过程中,语言在功能、方便性和优雅性方面获得了巨大的进步,学习编写有用的函数是使您舒适和高效地使用R的主要方法之一。
It should be emphasized that most of the functions supplied as part of
the R system, such as mean()
, var()
,
postscript()
and so on, are themselves written in R and thus
do not differ materially from user written functions.
应该强调的是,作为R系统的一部分提供的大多数函数,例如 mean()
, var()
, postscript()
等,本身都是用R编写的,因此与用户编写的函数没有实质性差异。
A function is defined by an assignment of the form
函数由以下形式的赋值定义:
> name <- function(arg_1, arg_2, ...) expression
The expression is an R expression, (usually a grouped
expression), that uses the arguments, arg_i, to calculate a value.
The value of the expression is the value returned for the function.
该表达式是一个R表达式(通常是一个分组表达式),它使用参数arg_i来计算一个值。表达式的值是函数返回的值。
A call to the function then usually takes the form
name(expr_1, expr_2, …)
and may occur
anywhere a function call is legitimate.
对函数的调用通常采用 name(expr_1, expr_2, …)
的形式,并且可以在函数调用合法的任何地方发生。
As a first example, consider a function to calculate the two sample
t-statistic, showing “all the steps”. This is an artificial
example, of course, since there are other, simpler ways of achieving the
same end.
作为第一个例子,考虑一个函数来计算两个样本的t-统计量,显示“所有步骤”。当然,这是一个人为的例子,因为还有其他更简单的方法可以达到同样的目的。
The function is defined as follows:
函数定义如下:
> twosam <- function(y1, y2) { n1 <- length(y1); n2 <- length(y2) yb1 <- mean(y1); yb2 <- mean(y2) s1 <- var(y1); s2 <- var(y2) s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2) tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) tst }
With this function defined, you could perform two sample t-tests
using a call such as
定义了这个函数后,您可以使用以下调用执行两个样本t检验,
> tstat <- twosam(data$male, data$female); tstat
As a second example, consider a function to emulate directly the
MATLAB backslash command, which returns the coefficients of the
orthogonal projection of the vector y onto the column space of
the matrix, X. (This is ordinarily called the least squares
estimate of the regression coefficients.) This would ordinarily be
done with the qr()
function; however this is sometimes a bit
tricky to use directly and it pays to have a simple function such as the
following to use it safely.
作为第二个例子,考虑一个直接模拟MATLAB反斜杠命令的函数,它返回向量y在矩阵X的列空间上的正交投影的系数。(This通常称为回归系数的最小二乘估计。这通常是用 qr()
函数来完成的;然而,这有时候直接使用有点棘手,它支付了一个简单的函数,如以下安全地使用它。
Thus given a n by 1 vector y and an n by
p matrix X then
X \ y
is defined as
(X’X)^{-}X’y,
where
(X’X)^{-}
is a generalized inverse of X'X.
因此,给定n乘1向量y和n乘p矩阵X,则X | y被定义为(X 'X)^{-} X' y,其中(X 'X)^{-}是X' X的广义逆。
> bslash <- function(X, y) { X <- qr(X) qr.coef(X, y) }
After this object is created it may be used in statements such as
创建此对象后,它可以在以下语句中使用,
> regcoeff <- bslash(Xmat, yvar)
and so on. 等
The classical R function lsfit()
does this job quite well, and
more21. It in turn uses the functions qr()
and qr.coef()
in the slightly counterintuitive way above to do this part of the
calculation. Hence there is probably some value in having just this
part isolated in a simple to use function if it is going to be in
frequent use.
经典的R函数 lsfit()
很好地完成了这项工作,而更多的是 21 。它反过来使用函数 qr()
和 qr.coef()
以上面稍微违反直觉的方式来完成这部分计算。因此,如果要频繁使用,那么将这一部分隔离在一个简单易用的函数中可能会有一些价值。
If so, we may wish to make it a matrix binary operator
for even more convenient use.
如果是这样的话,我们可能希望使它成为一个矩阵二元运算符,以便更方便地使用。
Had we given the bslash()
function a different name, namely one of
the form
如果我们给 bslash()
函数一个不同的名字,即形式之一,
%anything%
it could have been used as a binary operator in expressions
rather than in function form. Suppose, for example, we choose !
for the internal character. The function definition would then start as
它可以被用作表达式中的二元运算符,而不是函数形式。例如,假设我们选择 !
作为内部字符。函数定义将开始为
> "%!%" <- function(X, y) { ... }
(Note the use of quote marks.) The function could then be used as
X %!% y
. (The backslash symbol itself is not a convenient choice
as it presents special problems in this context.)
(Note使用引号。)该函数可以用作 X %!% y
。(The反斜杠符号本身不是一个方便的选择,因为它在此上下文中存在特殊问题。)
The matrix multiplication operator, %*%
, and the outer product
matrix operator %o%
are other examples of binary operators
defined in this way.
矩阵乘法运算符 %*%
和外积矩阵运算符 %o%
是以这种方式定义的二元运算符的其他示例。
As first noted in Generating regular sequences, if arguments to
called functions are given in the “name=object
”
form, they may be given in any order. Furthermore the argument sequence
may begin in the unnamed, positional form, and specify named arguments
after the positional arguments.
正如第一次在生成规则序列中指出的,如果被调用函数的参数以“ name=object
“形式给出,它们可以以任何顺序给出。此外,参数序列可以以未命名的位置形式开始,并在位置参数之后指定命名参数。
Thus if there is a function fun1
defined by
因此,如果有一个函数 fun1
定义为
> fun1 <- function(data, data.frame, graph, limit) {
[function body omitted]
}
then the function may be invoked in several ways, for example
则该函数可以以多种方式调用,例如
> ans <- fun1(d, df, TRUE, 20) > ans <- fun1(d, df, graph=TRUE, limit=20) > ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)
are all equivalent. 都是等价的
In many cases arguments can be given commonly appropriate default
values, in which case they may be omitted altogether from the call when
the defaults are appropriate. For example, if fun1
were defined
as
在许多情况下,参数可以被赋予通常合适的默认值,在这种情况下,当默认值合适时,它们可以从调用中完全省略。例如,如果 fun1
被定义为
> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }
it could be called as
可以称之为
> ans <- fun1(d, df)
which is now equivalent to the three cases above, or as
这相当于上述三种情况,或者说,
> ans <- fun1(d, df, limit=10)
which changes one of the defaults.
这会改变其中一个默认值。
It is important to note that defaults may be arbitrary expressions, even
involving other arguments to the same function; they are not restricted
to be constants as in our simple example here.
需要注意的是,默认值可以是任意的表达式,甚至可以包含同一函数的其他参数;它们并不像我们这里的简单例子那样被限制为常量。
Another frequent requirement is to allow one function to pass on
argument settings to another. For example many graphics functions use
the function par()
and functions like plot()
allow the
user to pass on graphical parameters to par()
to control the
graphical output. (See Permanent changes: The par()
function, for more details on the
par()
function.) This can be done by including an extra
argument, literally ‘…’, of the function, which may then be
passed on. An outline example is given below.
另一个常见的要求是允许一个函数将参数设置传递给另一个函数。例如,许多图形函数使用函数 par()
,而像 plot()
这样的函数允许用户将图形参数传递给 par()
以控制图形输出。(See永久更改: par()
函数,有关 par()
函数的更多详细信息。这可以通过在函数中包含一个额外的参数来实现,字面意思是' … ',然后可以传递。下面给出了一个概要示例。
fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) { [omitted statements] if (graph) par(pch="*", ...) [more omissions] }
Less frequently, a function will need to refer to components of
‘…’. The expression list(...)
evaluates all such
arguments and returns them in a named list, while ..1
,
..2
, etc. evaluate them one at a time, with ‘..n’
returning the n-th unmatched argument.
函数需要引用' … '的组件,这种情况不太常见。表达式 list(...)
计算所有这样的参数,并在命名列表中返回它们,而 ..1
, ..2
等一次计算一个参数,' ..n '返回第n个不匹配的参数。
Note that any ordinary assignments done within the function are
local and temporary and are lost after exit from the function. Thus
the assignment X <- qr(X)
does not affect the value of the
argument in the calling program.
请注意,在函数内完成的任何普通赋值都是局部的和临时的,在退出函数后会丢失。因此,赋值 X <- qr(X)
不会影响调用程序中参数的值。
To understand completely the rules governing the scope of R assignments
the reader needs to be familiar with the notion of an evaluation
frame. This is a somewhat advanced, though hardly difficult,
topic and is not covered further here.
为了完全理解控制R赋值范围的规则,读者需要熟悉评估框架的概念。这是一个有点先进的,但并不困难,主题,并没有在这里进一步介绍。
If global and permanent assignments are intended within a function, then
either the ‘superassignment’ operator, <<-
or the function
assign()
can be used. See the help
document for details.
如果全局和永久赋值是在函数中进行的,那么可以使用“superassignment”操作符 <<-
或函数 assign()
。详见 help
文档。
As a more complete, if a little pedestrian, example of a function,
consider finding the efficiency factors for a block design. (Some
aspects of this problem have already been discussed in Index matrices.)
作为一个更完整的,如果一个小行人,一个功能的例子,考虑找到一个块设计的效率因素。(Some这个问题的各个方面已经在指数矩阵中讨论过了。)
A block design is defined by two factors, say blocks
(b
levels) and varieties
(v
levels). If R and
K are the v by v and b by b
replications and block size matrices, respectively, and
N is the b by v incidence matrix, then the
efficiency factors are defined as the eigenvalues of the matrix
E = I_v - R^{-1/2}N’K^{-1}NR^{-1/2} = I_v - A’A,
where A = K^{-1/2}NR^{-1/2}.
One way to write the function is given below.
区组设计由两个因子定义,例如 blocks
( b
水平)和 varieties
( v
水平)。如果R和K分别是v乘v和B乘B的重复矩阵和块大小矩阵,并且N是B乘v的关联矩阵,则效率因子被定义为矩阵E = I_v - R^{-1/2} N 'K ^{-1}NR^{-1/2} = I_v -A' A的特征值,其中A = K^{-1/2}NR^{-1/2}。下面给出了一种编写函数的方法。
> bdeff <- function(blocks, varieties) { blocks <- as.factor(blocks) # minor safety move b <- length(levels(blocks)) varieties <- as.factor(varieties) # minor safety move v <- length(levels(varieties)) K <- as.vector(table(blocks)) # remove dim attr R <- as.vector(table(varieties)) # remove dim attr N <- table(blocks, varieties) A <- 1/sqrt(K) * N * rep(1/sqrt(R), rep(b, v)) sv <- svd(A) list(eff=1 - sv$d^2, blockcv=sv$u, varietycv=sv$v) }
It is numerically slightly better to work with the singular value
decomposition on this occasion rather than the eigenvalue routines.
在这种情况下,使用奇异值分解而不是特征值例程在数值上稍微好一点。
The result of the function is a list giving not only the efficiency
factors as the first component, but also the block and variety canonical
contrasts, since sometimes these give additional useful qualitative
information.
函数的结果是一个列表,不仅给出了效率因子作为第一个分量,而且还给出了块和品种的典型对比,因为有时这些提供了额外的有用的定性信息。
For printing purposes with large matrices or arrays, it is often useful
to print them in close block form without the array names or numbers.
Removing the dimnames
attribute will not achieve this effect, but
rather the array must be given a dimnames
attribute consisting of
empty strings. For example to print a matrix, X
对于大型矩阵或数组的打印目的,以封闭块形式打印它们而不带数组名称或数字通常很有用。删除 dimnames
属性不会达到这种效果,而是必须给数组一个由空字符串组成的 dimnames
属性。例如打印矩阵, X
> temp <- X > dimnames(temp) <- list(rep("", nrow(X)), rep("", ncol(X))) > temp; rm(temp)
This can be much more conveniently done using a function,
no.dimnames()
, shown below, as a “wrap around” to achieve the
same result. It also illustrates how some effective and useful user
functions can be quite short.
使用下面所示的函数 no.dimnames()
可以更方便地实现这一点,作为“环绕”来实现相同的结果。它还说明了一些有效和有用的用户功能是如何非常短的。
no.dimnames <- function(a) {
## Remove all dimension names from an array for compact printing.
d <- list()
l <- 0
for(i in dim(a)) {
d[[l <- l + 1]] <- rep("", i)
}
dimnames(a) <- d
a
}
With this function defined, an array may be printed in close format
using
定义了这个函数后,可以使用
> no.dimnames(X)
This is particularly useful for large integer arrays, where patterns are
the real interest rather than the values.
这对于大型整数数组特别有用,因为模式是真实的兴趣而不是值。
Functions may be recursive, and may themselves define functions within
themselves. Note, however, that such functions, or indeed variables,
are not inherited by called functions in higher evaluation frames as
they would be if they were on the search path.
函数可以是递归的,并且可以自己在自己内部定义函数。然而,请注意,这些函数或变量不会被更高求值框架中的调用函数继承,因为如果它们在搜索路径上,它们将被继承。
The example below shows a naive way of performing one-dimensional
numerical integration. The integrand is evaluated at the end points of
the range and in the middle.
下面的例子展示了一种简单的一维数值积分方法。在范围的端点和中间评估被积函数。
If the one-panel trapezium rule answer is
close enough to the two panel, then the latter is returned as the value.
Otherwise the same process is recursively applied to each panel.
如果一个面板的mixzium规则答案与两个面板的答案足够接近,则返回后者作为值。否则,相同的过程递归地应用于每个面板。
The
result is an adaptive integration process that concentrates function
evaluations in regions where the integrand is farthest from linear.
其结果是一个自适应的整合过程,集中在区域的被积函数是最远离线性的功能评估。
There is, however, a heavy overhead, and the function is only
competitive with other algorithms when the integrand is both smooth and
very difficult to evaluate.
然而,有一个沉重的开销,只有当被积函数是光滑的,很难评估与其他算法的功能是有竞争力的。
The example is also given partly as a little puzzle in R programming.
这个例子也部分地作为R编程中的一个小难题给出。
area <- function(f, a, b, eps = 1.0e-06, lim = 10) {
fun1 <- function(f, a, b, fa, fb, a0, eps, lim, fun) {
## function ‘fun1’ is only visible inside ‘area’
d <- (a + b)/2
h <- (b - a)/4
fd <- f(d)
a1 <- h * (fa + fd)
a2 <- h * (fd + fb)
if(abs(a0 - a1 - a2) < eps || lim == 0)
return(a1 + a2)
else {
return(fun(f, a, d, fa, fd, a1, eps, lim - 1, fun) +
fun(f, d, b, fd, fb, a2, eps, lim - 1, fun))
}
}
fa <- f(a)
fb <- f(b)
a0 <- ((fa + fb) * (b - a))/2
fun1(f, a, b, fa, fb, a0, eps, lim, fun1)
}
The discussion in this section is somewhat more technical than in other
parts of this document. However, it details one of the major differences
between S-PLUS and R.
本节的讨论比本文件其他部分的讨论更具技术性。然而,它详细说明了S-PLUS和R之间的主要区别之一。
The symbols which occur in the body of a function can be divided into
three classes; formal parameters, local variables and free variables.
The formal parameters of a function are those occurring in the argument
list of the function.
函数体中出现的符号可以分为三类:形式参数、局部变量和自由变量。函数的形参是出现在函数的参数列表中的形参。
Their values are determined by the process of
binding the actual function arguments to the formal parameters.
Local variables are those whose values are determined by the evaluation
of expressions in the body of the functions. Variables which are not
formal parameters or local variables are called free variables.
它们的值由将实际函数参数绑定到形式参数的过程确定。局部变量是其值由函数体中表达式的求值确定的变量。非形式参数或局部变量的变量称为自由变量。
Free
variables become local variables if they are assigned to. Consider the
following function definition.
如果自由变量被赋值给,它们就变成了局部变量。考虑下面的函数定义。
f <- function(x) { y <- 2*x print(x) print(y) print(z) }
In this function, x
is a formal parameter, y
is a local
variable and z
is a free variable.
在这个函数中, x
是形参, y
是局部变量, z
是自由变量。
In R the free variable bindings are resolved by first looking in the
environment in which the function was created. This is called
lexical scope. First we define a function called cube
.
在R中,自由变量绑定首先通过查看创建函数的环境来解决。这就是所谓的词法作用域。首先,我们定义一个名为 cube
的函数。
cube <- function(n) { sq <- function() n*n n*sq() }
The variable n
in the function sq
is not an argument to that
function. Therefore it is a free variable and the scoping rules must be
used to ascertain the value that is to be associated with it. Under static
scope (S-PLUS) the value is that associated with a global variable named
n
. Under lexical scope (R) it is the parameter to the function
cube
since that is the active binding for the variable n
at
the time the function sq
was defined. The difference between
evaluation in R and evaluation in S-PLUS is that S-PLUS looks for a
global variable called n
while R first looks for a variable
called n
in the environment created when cube
was invoked.
函数 sq
中的变量 n
不是该函数的参数。因此,它是一个自由变量,必须使用作用域规则来确定与它关联的值。在静态作用域(S-PLUS)下,该值与名为 n
的全局变量关联。在词法作用域(R)下,它是函数 cube
的参数,因为在定义函数 sq
时,它是变量 n
的活动绑定。R中的求值和S-PLUS中的求值之间的区别在于,S-PLUS查找名为 n
的全局变量,而R首先在调用 cube
时创建的环境中查找名为 n
的变量。
## first evaluation in S S> cube(2) Error in sq(): Object "n" not found Dumped S> n <- 3 S> cube(2) [1] 18 ## then the same function evaluated in R R> cube(2) [1] 8
Lexical scope can also be used to give functions mutable state.
In the following example we show how R can be used to mimic a bank
account. A functioning bank account needs to have a balance or total, a
function for making withdrawals, a function for making deposits and a
function for stating the current balance.
词法作用域也可以用来赋予函数可变的状态。在下面的例子中,我们展示了如何使用R来模拟银行账户。一个正常运行的银行账户需要有一个余额或总额,一个取款功能,一个存款功能和一个说明当前余额的功能。
We achieve this by creating
the three functions within account
and then returning a list
containing them. When account
is invoked it takes a numerical
argument total
and returns a list containing the three functions.
Because these functions are defined in an environment which contains
total
, they will have access to its value.
我们通过在 account
中创建三个函数,然后返回一个包含它们的列表来实现这一点。当 account
被调用时,它接受一个数字参数 total
,并返回一个包含三个函数的列表。因为这些函数是在包含 total
的环境中定义的,所以它们可以访问它的值。
The special assignment operator, <<-
,
is used to change the value associated with total
. This operator
looks back in enclosing environments for an environment that contains
the symbol total
and when it finds such an environment it
replaces the value, in that environment, with the value of right hand
side. If the global or top-level environment is reached without finding
the symbol total
then that variable is created and assigned to
there. For most users <<-
creates a global variable and assigns
the value of the right hand side to it22. Only when <<-
has
been used in a function that was returned as the value of another
function will the special behavior described here occur.
特殊赋值运算符 <<-
用于更改与 total
关联的值。该操作符在封闭环境中查找包含符号 total
的环境,当它找到这样的环境时,它将该环境中的值替换为右侧的值。如果到达全局或顶级环境时没有找到符号 total
,则创建该变量并将其分配给该环境。对于大多数用户 <<-
创建一个全局变量,并将右侧的值赋给它 22 。只有当 <<-
在一个函数中被用作另一个函数的值时,才会发生这里描述的特殊行为。
open.account <- function(total) { list( deposit = function(amount) { if(amount <= 0) stop("Deposits must be positive!\n") total <<- total + amount cat(amount, "deposited. Your balance is", total, "\n\n") }, withdraw = function(amount) { if(amount > total) stop("You don't have that much money!\n") total <<- total - amount cat(amount, "withdrawn. Your balance is", total, "\n\n") }, balance = function() { cat("Your balance is", total, "\n\n") } ) } ross <- open.account(100) robert <- open.account(200) ross$withdraw(30) ross$balance() robert$balance() ross$deposit(50) ross$balance() ross$withdraw(500)
Users can customize their environment in several different ways. There
is a site initialization file and every directory can have its own
special initialization file. Finally, the special functions
.First
and .Last
can be used.
用户可以通过几种不同的方式自定义其环境。有一个站点初始化文件,每个目录都可以有自己的特殊初始化文件。最后,可以使用特殊功能 .First
和 .Last
。
The location of the site initialization file is taken from the value of
the R_PROFILE
environment variable. If that variable is unset,
the file Rprofile.site in the R home subdirectory etc is
used. This file should contain the commands that you want to execute
every time R is started under your system. A second, personal,
profile file named .Rprofile23 can be placed in any directory. If R is invoked in that
directory then that file will be sourced. This file gives individual
users control over their workspace and allows for different startup
procedures in different working directories. If no .Rprofile
file is found in the startup directory, then R looks for a
.Rprofile file in the user’s home directory and uses that (if it
exists). If the environment variable R_PROFILE_USER
is set, the
file it points to is used instead of the .Rprofile files.
站点初始化文件的位置取自 R_PROFILE
环境变量的值。如果未设置该变量,则使用R home文件夹 etc 中的文件 Rprofile.site 。这个文件应该包含每次在系统下启动R时要执行的命令。另一个名为 .Rprofile 23 的个人配置文件可以放在任何目录中。如果在该目录中调用R,则该文件将被源化。该文件使各个用户能够控制其工作区,并允许在不同的工作目录中执行不同的启动过程。如果在启动目录中没有找到 .Rprofile 文件,则R在用户的主目录中查找 .Rprofile 文件并使用该文件(如果存在)。如果设置了环境变量 R_PROFILE_USER
,则使用它指向的文件而不是 .Rprofile 文件。
Any function named .First()
in either of the two profile files or
in the .RData image has a special status. It is automatically
performed at the beginning of an R session and may be used to
initialize the environment. For example, the definition in the example
below alters the prompt to $
and sets up various other useful
things that can then be taken for granted in the rest of the session.
在这两个配置文件或 .RData 镜像中,任何名为 .First()
的函数都具有特殊状态。它在R会话开始时自动执行,可用于初始化环境。例如,下面示例中的定义将提示符更改为 $
,并设置了各种其他有用的东西,然后可以在会话的其余部分中将其视为理所当然。
Thus, the sequence in which files are executed is, Rprofile.site,
the user profile, .RData and then .First()
. A definition
in later files will mask definitions in earlier files.
因此,执行文件的顺序是: Rprofile.site 、用户简档、 .RData 、然后是 .First()
。后面文件中的定义将屏蔽前面文件中的定义。
> .First <- function() {
options(prompt="$ ", continue="+\t") # $
is the prompt
options(digits=5, length=999) # custom numbers and printout
x11() # for graphics
par(pch = "+") # plotting character
source(file.path(Sys.getenv("HOME"), "R", "mystuff.R"))
# my personal functions
library(MASS) # attach a package
}
Similarly a function .Last()
, if defined, is (normally) executed
at the very end of the session. An example is given below.
类似地,函数 .Last()
,如果定义了,(通常)在会话的最后执行。以下是一个例子。
> .Last <- function() { graphics.off() # a small safety measure. cat(paste(date(),"\nAdios\n")) # Is it time for lunch? }
The class of an object determines how it will be treated by what are
known as generic functions. Put the other way round, a generic
function performs a task or action on its arguments specific to
the class of the argument itself. If the argument lacks any class
attribute, or has a class not catered for specifically by the generic
function in question, there is always a default action provided.
一个对象的类决定了它将如何被所谓的泛型函数处理。换句话说,泛型函数在其参数上执行特定于参数本身的类的任务或操作。如果参数缺少任何 class
属性,或者有一个类不是由所讨论的泛型函数专门提供的,则总是提供一个默认操作。
An example makes things clearer. The class mechanism offers the user
the facility of designing and writing generic functions for special
purposes. Among the other generic functions are plot()
for
displaying objects graphically, summary()
for summarizing
analyses of various types, and anova()
for comparing statistical
models.
一个例子让事情变得更清楚。类机制为用户提供了为特殊目的设计和编写泛型函数的便利。其他通用函数包括: plot()
用于图形化显示对象, summary()
用于汇总各种类型的分析, anova()
用于比较统计模型。
The number of generic functions that can treat a class in a specific way
can be quite large. For example, the functions that can accommodate in
some fashion objects of class "data.frame"
include
可以以特定方式处理类的泛型函数的数量可能相当大。例如,可以以某种方式容纳类 "data.frame"
对象的函数包括
[ [[<- any as.matrix [<- mean plot summary
A currently complete list can be got by using the methods()
function:
使用 methods()
函数可以得到当前完整的列表:
> methods(class="data.frame")
Conversely the number of classes a generic function can handle can also
be quite large. For example the plot()
function has a default
method and variants for objects of classes "data.frame"
,
"density"
, "factor"
, and more. A complete list can be got
again by using the methods()
function:
相反,泛型函数可以处理的类的数量也可以非常大。例如, plot()
函数有一个默认的方法和变量,用于类 "data.frame"
、 "density"
、 "factor"
等的对象。使用 methods()
函数可以再次获得完整的列表:
> methods(plot)
For many generic functions the function body is quite short, for example
对于许多泛型函数,函数体非常短,例如
> coef function (object, ...) UseMethod("coef")
The presence of UseMethod
indicates this is a generic function.
To see what methods are available we can use methods()
UseMethod
的存在表明这是一个泛型函数。要查看哪些方法可用,我们可以使用 methods()
> methods(coef) [1] coef.aov* coef.Arima* coef.default* coef.listof* [5] coef.nls* coef.summary.nls* Non-visible functions are asterisked
In this example there are six methods, none of which can be seen by
typing its name. We can read these by either of
在这个例子中有六个方法,没有一个可以通过键入其名称来看到。我们可以通过以下两种方式来解读
> getAnywhere("coef.aov") A single object matching ‘coef.aov’ was found It was found in the following places registered S3 method for coef from namespace stats namespace:stats with value function (object, ...) { z <- object$coef z[!is.na(z)] } > getS3method("coef", "aov") function (object, ...) { z <- object$coef z[!is.na(z)] }
A function named gen.cl
will be invoked by the
generic gen
for class cl
, so do not name
functions in this style unless they are intended to be methods.
名为 gen.cl
的函数将由类 cl
的泛型 gen
调用,所以不要以这种风格命名函数,除非它们打算成为方法。
The reader is referred to the R Language Definition for a more
complete discussion of this mechanism.
读者可以参考R语言定义来更完整地讨论这种机制。
This section presumes the reader has some familiarity with statistical
methodology, in particular with regression analysis and the analysis of
variance.
本节假定读者对统计方法学有一定的了解,特别是回归分析和方差分析。
Later we make some rather more ambitious presumptions, namely
that something is known about generalized linear models and nonlinear
regression.
稍后,我们做了一些更雄心勃勃的假设,即对广义线性模型和非线性回归有所了解。
The requirements for fitting statistical models are sufficiently well
defined to make it possible to construct general tools that apply in a
broad spectrum of problems.
拟合统计模型的要求已经得到了充分的定义,使得构建适用于广泛问题的通用工具成为可能。
R provides an interlocking suite of facilities that make fitting
statistical models very simple. As we mention in the introduction, the
basic output is minimal, and one needs to ask for the details by calling
extractor functions.
R提供了一套环环相扣的工具,使拟合统计模型变得非常简单。正如我们在介绍中提到的,基本输出是最小的,需要通过调用提取器函数来询问细节。
The template for a statistical model is a linear regression model with
independent, homoscedastic errors
统计模型的模板是具有独立同方差误差的线性回归模型
y_i = sum_{j=0}^p beta_j x_{ij} + e_i, i = 1, ..., n,
where the e_i are NID(0, sigma^2).
In matrix terms this would be written
其中e_i是NID(0,sigma^2)。在矩阵中,这可以写成
y = X beta + e
where the y is the response vector, X is the model
matrix or design matrix and has columns
x_0, x_1, ..., x_p,
the determining variables. Very often x_0
will be a column of ones defining an intercept term.
其中y是响应向量,X是模型矩阵或设计矩阵,具有列x_0,x_1,.,x_p,决定变量。x_0通常是定义截距项的一列。
Before giving a formal specification, a few examples may usefully set
the picture.
在给出一个正式的规范之前,一些例子可能会有用地设置图片。
Suppose y
, x
, x0
, x1
, x2
, … are
numeric variables, X
is a matrix and A
, B
,
C
, … are factors. The following formulae on the left
side below specify statistical models as described on the right.
假设 y
、 x
、 x0
、 x1
、 x2
、.是数值变量, X
是矩阵, A
、 B
、 C
、.是因子。下面左侧的公式指定了右侧描述的统计模型。
y ~ x
y ~ 1 + x
Both imply the same simple linear regression model of y on
x. The first has an implicit intercept term, and the second an
explicit one.
两者都意味着y对x的简单线性回归模型。第一个有一个隐式的截距项,第二个是显式的。
y ~ 0 + x
y ~ -1 + x
y ~ x - 1
Simple linear regression of y on x through the origin
(that is, without an intercept term).
y对x通过原点的简单线性回归(即没有截距项)。
log(y) ~ x1 + x2
Multiple regression of the transformed variable,log(y),
on x1 and x2 (with an implicit intercept term).
转换变量log(y)在x1和x2上的多元回归(具有隐式截距项)。
y ~ poly(x,2)
y ~ 1 + x + I(x^2)
Polynomial regression of y on x of degree 2. The first
form uses orthogonal polynomials, and the second uses explicit powers,
as basis.
y对x的二次多项式回归。第一种形式使用正交多项式,第二种形式使用显式幂作为基础。
y ~ X + poly(x,2)
Multiple regression y with model matrix consisting of the matrix
X as well as polynomial terms in x to degree 2.
多元回归y,模型矩阵由矩阵X以及x中的2阶多项式项组成。
y ~ A
Single classification analysis of variance model of y, with
classes determined by A.
单分类方差分析模型的y,与类确定的A。
y ~ A + x
Single classification analysis of covariance model of y, with
classes determined by A, and with covariate x.
y的协方差模型的单一分类分析,类别由A确定,协变量为x。
y ~ A*B
y ~ A + B + A:B
y ~ B %in% A
y ~ A/B
Two factor non-additive model of y on A and B. The
first two specify the same crossed classification and the second two
specify the same nested classification. In abstract terms all four
specify the same model subspace.
A和B上y的双因子非加性模型。前两个指定相同的交叉分类,后两个指定相同的嵌套分类。在抽象的术语中,所有四个指定相同的模型子空间。
y ~ (A + B + C)^2
y ~ A*B*C - A:B:C
Three factor experiment but with a model containing main effects and two
factor interactions only. Both formulae specify the same model.
三因子试验,但模型仅包含主效应和两因子交互作用。两个公式指定相同的模型。
y ~ A * x
y ~ A/x
y ~ A/(1 + x) - 1
Separate simple linear regression models of y on x within
the levels of A, with different codings. The last form produces
explicit estimates of as many different intercepts and slopes as there
are levels in A.
在A的水平内,用不同的编码分离y对x的简单线性回归模型。最后一种形式给出了A中有多少水平就有多少不同截距和斜率的显式估计。
y ~ A*B + Error(C)
An experiment with two treatment factors, A and B, and
error strata determined by factor C. For example a split plot
experiment, with whole plots (and hence also subplots), determined by
factor C.
一个有两个处理因素A和B以及由因素C确定的误差层的实验。例如,裂区实验,整区(因此也包括子区)由因子C确定。
The operator ~
is used to define a model formula in R.
The form, for an ordinary linear model, is
运算符 ~
用于在R中定义模型公式。对于普通线性模型,形式为
response ~ op_1 term_1 op_2 term_2 op_3 term_3 ...
where 哪里
is a vector or matrix, (or expression evaluating to a vector or matrix)
defining the response variable(s).
是定义响应变量的向量或矩阵(或向量或矩阵的表达式)。
is an operator, either +
or -
, implying the inclusion or
exclusion of a term in the model, (the first is optional).
是一个运算符,可以是 +
或 -
,表示在模型中包含或排除一个项(第一个是可选的)。
is either 要么是
1
,
1
,In all cases each term defines a collection of columns either to be
added to or removed from the model matrix. A 1
stands for an
intercept column and is by default included in the model matrix unless
explicitly removed.
在所有情况下,每个项都定义了要添加到模型矩阵或从模型矩阵中删除的列的集合。 1
代表截距列,默认情况下包含在模型矩阵中,除非明确删除。
The formula operators are similar in effect to the Wilkinson and
Rogers notation used by such programs as Glim and Genstat. One
inevitable change is that the operator ‘.
’ becomes
‘:
’ since the period is a valid name character in R.
公式运算符的效果类似于Glim和Genstat等程序使用的威尔金森和罗杰斯符号。一个不可避免的变化是操作符' .
'变成了' :
',因为句号在R中是一个有效的名称字符。
The notation is summarized below (based on Chambers & Hastie, 1992,
p.29):
符号总结如下(基于Chambers & Hastie,1992,第29页):
Y ~ M
Y is modeled as M.
Y被建模为M。
M_1 + M_2
Include M_1 and M_2.
包括M_1和M_2。
M_1 - M_2
Include M_1 leaving out terms of M_2.
包括M_1,但不包括M_2项。
M_1 : M_2
The tensor product of M_1 and M_2. If both terms are
factors, then the “subclasses” factor.
M_1和M_2的张量积。如果两项都是因子,则“子类”因子。
M_1 %in% M_2
Similar to M_1:M_2
, but with a different coding.
与 M_1:M_2
类似,但编码不同。
M_1 * M_2
M_1 + M_2 + M_1:M_2
.
M_1 / M_2
M_1 + M_2 %in% M_1
.
M^n
All terms in M together with “interactions” up to order n
M中的所有项以及n阶以下的“相互作用”
I(M)
Insulate M. Inside M all operators have their normal
arithmetic meaning, and that term appears in the model matrix.
绝缘M.在M内部,所有运算符都有其正常的算术含义,并且该术语出现在模型矩阵中。
Note that inside the parentheses that usually enclose function arguments
all operators have their normal arithmetic meaning. The function
I()
is an identity function used to allow terms in model formulae
to be defined using arithmetic operators.
请注意,在通常包含函数参数的括号内,所有运算符都具有正常的算术含义。函数 I()
是用于允许使用算术运算符来定义模型公式中的项的恒等函数。
Note particularly that the model formulae specify the columns
of the model matrix, the specification of the parameters being
implicit. This is not the case in other contexts, for example in
specifying nonlinear models.
特别注意,模型公式指定了模型矩阵的列,参数的指定是隐式的。在其他情况下,例如在指定非线性模型时,情况并非如此。
We need at least some idea how the model formulae specify the columns of
the model matrix. This is easy if we have continuous variables, as each
provides one column of the model matrix (and the intercept will provide
a column of ones if included in the model).
我们至少需要知道模型公式如何指定模型矩阵的列。如果我们有连续变量,这很容易,因为每个变量提供模型矩阵的一列(如果模型中包含截距,则截距将提供一列1)。
What about a k-level factor A
? The answer differs for
unordered and ordered factors. For unordered factors k -
1 columns are generated for the indicators of the second, …,
k-th levels of the factor. (Thus the implicit parameterization is
to contrast the response at each level with that at the first.) For
ordered factors the k - 1 columns are the orthogonal
polynomials on 1, ..., k, omitting the constant term.
那么k级因子 A
呢?对于无序和有序因子,答案是不同的。对于无序因子,为因子的第二、.、第k个水平的指标生成k - 1列。(Thus隐式参数化是将每个水平的响应与第一水平的响应进行对比。对于有序因子,k - 1列是关于1,...,k,省略常数项。
Although the answer is already complicated, it is not the whole story.
First, if the intercept is omitted in a model that contains a factor
term, the first such term is encoded into k columns giving the
indicators for all the levels. Second, the whole behavior can be
changed by the options
setting for contrasts
. The default
setting in R is
虽然答案已经很复杂了,但这并不是故事的全部。首先,如果在包含因子项的模型中省略截距,则第一个此类项将编码到k列中,给出所有水平的指标。第二,整个行为可以通过 contrasts
的 options
设置来改变。R中的默认设置为
options(contrasts = c("contr.treatment", "contr.poly"))
The main reason for mentioning this is that R and S have
different defaults for unordered factors, S using Helmert
contrasts. So if you need to compare your results to those of a textbook
or paper which used S-PLUS, you will need to set
提到这一点的主要原因是R和S对于无序因子有不同的默认值,S使用Helmert对比。因此,如果您需要将您的结果与使用S-PLUS的教科书或论文的结果进行比较,则需要设置
options(contrasts = c("contr.helmert", "contr.poly"))
This is a deliberate difference, as treatment contrasts (R’s default)
are thought easier for newcomers to interpret.
这是一个故意的区别,因为治疗对比(R的默认)被认为更容易为新人解释。
We have still not finished, as the contrast scheme to be used can be set
for each term in the model using the functions contrasts
and
C
.
我们还没有完成,因为可以使用函数 contrasts
和 C
为模型中的每个项设置要使用的对比度方案。
We have not yet considered interaction terms: these generate the
products of the columns introduced for their component terms.
我们还没有考虑交互作用项:这些项生成为其分量项引入的列的乘积。
Although the details are complicated, model formulae in R will
normally generate the models that an expert statistician would expect,
provided that marginality is preserved.
尽管细节很复杂,但R中的模型公式通常会生成专家统计学家所期望的模型,前提是保留边缘性。
Fitting, for example, a model
with an interaction but not the corresponding main effects will in
general lead to surprising results, and is for experts only.
例如,拟合具有交互作用但不具有相应主效应的模型通常会导致令人惊讶的结果,并且仅适用于专家。
The basic function for fitting ordinary multiple models is lm()
,
and a streamlined version of the call is as follows:
普通多车型拟合的基本功能为 lm()
,精简版调用如下:
> fitted.model <- lm(formula, data = data.frame)
For example 例如
> fm2 <- lm(y ~ x1 + x2, data = production)
would fit a multiple regression model of y on x1 and
x2 (with implicit intercept term).
将在x1和x2上拟合y的多元回归模型(具有隐式截距项)。
The important (but technically optional) parameter data =
production
specifies that any variables needed to construct the model
should come first from the production
data frame.
This is the case regardless of whether data frame
production
has been attached on the search path or not.
重要的(但技术上可选的)参数 data =
production
指定构建模型所需的任何变量都应首先来自 production
数据帧。无论数据帧 production
是否已被附加在搜索路径上,情况都是如此。
The value of lm()
is a fitted model object; technically a list of
results of class "lm"
. Information about the fitted model can
then be displayed, extracted, plotted and so on by using generic
functions that orient themselves to objects of class "lm"
. These
include
lm()
的值是一个拟合模型对象;技术上是类 "lm"
的结果列表。然后,通过使用将自身定向到类 "lm"
的对象的通用函数,可以显示、提取、绘制关于拟合模型的信息等。这些包括
add1 deviance formula predict step alias drop1 kappa print summary anova effects labels proj vcov coef family plot residuals
A brief description of the most commonly used ones is given below.
下面简要介绍最常用的几种方法。
anova(object_1, object_2)
¶Compare a submodel with an outer model and produce an analysis of
variance table.
将子模型与外部模型进行比较,并生成方差分析表。
coef(object)
¶Extract the regression coefficient (matrix).
提取回归系数(矩阵)。
Long form: coefficients(object)
.
长表: coefficients(object)
。
deviance(object)
¶Residual sum of squares, weighted if appropriate.
残差平方和,适当时加权。
formula(object)
¶Extract the model formula.
提取模型公式。
plot(object)
¶Produce four plots, showing residuals, fitted values and some
diagnostics.
生成四个图,显示残差、拟合值和一些诊断。
predict(object, newdata=data.frame)
¶The data frame supplied must have variables specified with the same
labels as the original. The value is a vector or matrix of predicted
values corresponding to the determining variable values in
data.frame.
所提供的数据框必须具有使用与原始数据框相同的标签指定的变量。该值是与data.frame中的确定变量值相对应的预测值的向量或矩阵。
print(object)
¶Print a concise version of the object. Most often used implicitly.
打印对象的简明版本。最常用于含蓄。
residuals(object)
¶Extract the (matrix of) residuals, weighted as appropriate.
提取残差(矩阵),适当加权。
Short form: resid(object)
.
简称: resid(object)
。
step(object)
¶Select a suitable model by adding or dropping terms and preserving
hierarchies. The model with the smallest value of AIC (Akaike’s An
Information Criterion) discovered in the stepwise search is returned.
通过添加或删除项并保留层次结构来选择合适的模型。返回在逐步搜索中发现的AIC(Akaike's An Information Criterion)值最小的模型。
summary(object)
¶Print a comprehensive summary of the results of the regression analysis.
打印回归分析结果的全面摘要。
vcov(object)
¶Returns the variance-covariance matrix of the main parameters of a
fitted model object.
返回拟合模型对象的主参数的方差-协方差矩阵。
The model fitting function aov(formula,
data=data.frame)
operates at the simplest level in a very similar way to the function
lm()
, and most of the generic functions listed in the table in
Generic functions for extracting model information apply.
模型拟合函数 aov(formula,
data=data.frame)
以与函数 lm()
非常类似的方式在最简单的级别上操作,并且用于提取模型信息的通用函数中的表中列出的大多数通用函数都适用。
It should be noted that in addition aov()
allows an analysis of
models with multiple error strata such as split plot experiments, or
balanced incomplete block designs with recovery of inter-block
information. The model formula
应该注意的是,此外 aov()
允许分析具有多个错误层的模型,例如裂区实验,或具有块间信息恢复的平衡不完全区组设计。模型公式
response ~ mean.formula + Error(strata.formula)
specifies a multi-stratum experiment with error strata defined by the
strata.formula. In the simplest case, strata.formula is
simply a factor, when it defines a two strata experiment, namely between
and within the levels of the factor.
指定具有strata.formula定义的错误层的多层实验。在最简单的情况下,分层公式只是一个因素,当它定义了一个两层的实验,即之间和内的水平的因素。
For example, with all determining variables factors, a model formula such
as that in:
例如,对于所有决定性变量因子,模型公式如下:
> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
would typically be used to describe an experiment with mean model
v + n*p*k
and three error strata, namely “between farms”,
“within farms, between blocks” and “within blocks”.
通常用于描述具有平均模型 v + n*p*k
和三个误差层的实验,即“农场之间”、“农场内、区组之间”和“区组内”。
Note also that the analysis of variance table (or tables) are for a
sequence of fitted models. The sums of squares shown are the decrease
in the residual sums of squares resulting from an inclusion of
that term in the model at that place in the sequence.
Hence only for orthogonal experiments will the order of inclusion be
inconsequential.
另请注意,方差分析表(或多个表)是针对一系列拟合模型的。显示的平方和是残差平方和的减少,这是由于在序列中的该位置将该项包含在模型中。因此,只有正交试验的顺序将是无关紧要的。
For multistratum experiments the procedure is first to project the
response onto the error strata, again in sequence, and to fit the mean
model to each projection. For further details, see Chambers & Hastie
(1992).
对于多层实验,该过程首先将响应投影到误差层上,再次按顺序,并将平均模型拟合到每个投影。更多细节见Chambers & Hastie(1992)。
A more flexible alternative to the default full ANOVA table is to
compare two or more models directly using the anova()
function.
默认完整ANOVA表的一个更灵活的替代方法是使用 anova()
函数直接比较两个或多个模型。
> anova(fitted.model.1, fitted.model.2, ...)
The display is then an ANOVA table showing the differences between the
fitted models when fitted in sequence. The fitted models being compared
would usually be an hierarchical sequence, of course.
然后显示的是ANOVA表,显示按顺序拟合时拟合模型之间的差异。当然,被比较的拟合模型通常是一个层次序列。
This does not
give different information to the default, but rather makes it easier to
comprehend and control.
这并没有给默认值提供不同的信息,而是使其更容易理解和控制。
The update()
function is largely a convenience function that
allows a model to be fitted that differs from one previously fitted
usually by just a few additional or removed terms. Its form is
update()
函数在很大程度上是一个方便的函数,它允许拟合的模型与之前拟合的模型不同,通常只是增加或删除了一些项。其形式
> new.model <- update(old.model, new.formula)
In the new.formula the special name consisting of a period,
‘.
’,
only, can be used to stand for “the corresponding part of the old model
formula”. For example,
在新的.formula中,只有由句点组成的特殊名称“ .
”才能用于代表“旧模型公式的相应部分”。例如,在一个示例中,
> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production) > fm6 <- update(fm05, . ~ . + x6) > smf6 <- update(fm6, sqrt(.) ~ .)
would fit a five variate multiple regression with variables (presumably)
from the data frame production
, fit an additional model including
a sixth regressor variable, and fit a variant on the model where the
response had a square root transform applied.
将用来自数据框 production
的变量(假定)拟合五变量多元回归,拟合包括第六个回归变量的附加模型,并拟合模型上的变量,其中响应应用了平方根变换。
Note especially that if the data=
argument is specified on the
original call to the model fitting function, this information is passed on
through the fitted model object to update()
and its allies.
特别注意,如果在最初调用模型拟合函数时指定了 data=
参数,则此信息将通过拟合模型对象传递给 update()
及其盟友。
The name ‘.’ can also be used in other contexts, but with slightly
different meaning. For example
名称' . '也可以在其他上下文中使用,但含义略有不同。例如
> fmfull <- lm(y ~ . , data = production)
would fit a model with response y
and regressor variables
all other variables in the data frame production
.
将拟合具有响应 y
和回归变量的模型,所有其他变量在数据帧 production
中。
Other functions for exploring incremental sequences of models are
add1()
, drop1()
and step()
.
The names of these give a good clue to their purpose, but for full
details see the on-line help.
用于探索模型的增量序列的其他函数是 add1()
、 drop1()
和 step()
。它们的名称很好地说明了它们的用途,但要了解详细信息,请参阅在线帮助。
Generalized linear modeling is a development of linear models to
accommodate both non-normal response distributions and transformations
to linearity in a clean and straightforward way.
广义线性建模是线性模型的发展,以一种干净直接的方式适应非正态响应分布和线性转换。
A generalized linear
model may be described in terms of the following sequence of
assumptions:
广义线性模型可以按照以下假设顺序来描述:
eta = beta_1 x_1 + beta_2 x_2 + ... + beta_p x_p,
hence x_i has no influence on the distribution of y if and
only if beta_i is zero.
因此x_i对y的分布没有影响当且仅当β_i为零。
f_Y(y; mu, phi) = exp((A/phi) * (y lambda(mu) - gamma(lambda(mu))) + tau(y, phi))
where phi is a scale parameter (possibly known),
and is constant for all observations, A represents a prior
weight, assumed known but possibly varying with the observations, and
mu is the mean of y.
其中phi是尺度参数(可能已知),并且对于所有观测是恒定的,A表示先验权重,假设已知但可能随观测而变化,并且mu是y的平均值。
So it is assumed that the distribution of y is determined by its
mean and possibly a scale parameter as well.
因此,假设y的分布由其平均值和可能的尺度参数决定。
mu = m(eta), eta = m^{-1}(mu) = ell(mu)
and this inverse function, ell(), is called the link
function.
并且该反函数ell()被称为链接函数。
These assumptions are loose enough to encompass a wide class of models
useful in statistical practice, but tight enough to allow the
development of a unified methodology of estimation and inference, at
least approximately.
这些假设是松散的,足以涵盖广泛的一类模型在统计实践中有用的,但足够紧密,允许开发一个统一的方法估计和推断,至少近似。
The reader is referred to any of the current
reference works on the subject for full details, such as McCullagh &
Nelder (1989) or Dobson (1990).
读者可以参考有关该主题的任何当前参考著作以获取完整详细信息,例如McCullagh & Nelder(1989)或多布森(1990)。
The class of generalized linear models handled by facilities supplied in
R includes gaussian, binomial, poisson,
inverse gaussian and gamma response distributions and also
quasi-likelihood models where the response distribution is not
explicitly specified. In the latter case the variance function
must be specified as a function of the mean, but in other cases this
function is implied by the response distribution.
R中提供的工具处理的广义线性模型类包括高斯、二项式、泊松、逆高斯和伽马响应分布,以及响应分布未显式指定的准似然模型。在后一种情况下,方差函数必须指定为均值的函数,但在其他情况下,响应分布隐含了此函数。
Each response distribution admits a variety of link functions to connect
the mean with the linear predictor. Those automatically available are
shown in the following table:
每个响应分布都允许各种链接函数将均值与线性预测值连接起来。下表显示了自动可用的资源:
Family name 姓氏 Link functions 链路功能 binomial
logit
,probit
,log
,cloglog
gaussian
identity
,log
,inverse
Gamma
identity
,inverse
,log
inverse.gaussian
1/mu^2
,identity
,inverse
,log
poisson
identity
,log
,sqrt
quasi
logit
,probit
,cloglog
,identity
,inverse
,log
,1/mu^2
,sqrt
The combination of a response distribution, a link function and various
other pieces of information that are needed to carry out the modeling
exercise is called the family of the generalized linear model.
响应分布、链接函数和执行建模练习所需的各种其他信息的组合称为广义线性模型族。
glm()
function ¶glm()
函数¶Since the distribution of the response depends on the stimulus variables
through a single linear function only, the same mechanism as was
used for linear models can still be used to specify the linear part of a
generalized model. The family has to be specified in a different way.
由于响应的分布仅通过单个线性函数依赖于刺激变量,因此与线性模型相同的机制仍然可以用于指定广义模型的线性部分。必须以不同的方式指定族。
The R function to fit a generalized linear model is glm()
which uses the form
拟合广义线性模型的R函数是 glm()
,它使用以下形式:
> fitted.model <- glm(formula, family=family.generator, data=data.frame)
The only new feature is the family.generator, which is the
instrument by which the family is described. It is the name of a
function that generates a list of functions and expressions that
together define and control the model and estimation process.
唯一的新功能是family.generator,它是描述族的工具。它是一个函数的名称,生成一个函数和表达式的列表,这些函数和表达式一起定义和控制模型和估计过程。
Although
this may seem a little complicated at first sight, its use is quite
simple.
虽然乍一看这似乎有点复杂,但它的使用非常简单。
The names of the standard, supplied family generators are given under
“Family Name” in the table in Families. Where there is a choice
of links, the name of the link may also be supplied with the family
name, in parentheses as a parameter. In the case of the quasi
family, the variance function may also be specified in this way.
提供的标准族生成器的名称在族中的表中的“族名称”下给出。在有链接选择的情况下,链接的名称也可以与族名称一起提供,在括号中作为参数。在 quasi
族的情况下,也可以以这种方式指定方差函数。
Some examples make the process clear.
一些例子使这个过程变得清晰。
gaussian
family ¶gaussian
家庭A call such as
电话,如
> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)
achieves the same result as
达到与
> fm <- lm(y ~ x1+x2, data=sales)
but much less efficiently. Note how the gaussian family is not
automatically provided with a choice of links, so no parameter is
allowed. If a problem requires a gaussian family with a nonstandard
link, this can usually be achieved through the quasi
family, as
we shall see later.
但效率低得多。请注意,高斯族不会自动提供链接选择,因此不允许使用任何参数。如果一个问题需要一个具有非标准链接的高斯族,这通常可以通过 quasi
族来实现,我们将在后面看到。
binomial
family ¶binomial
家庭Consider a small, artificial example, from Silvey (1970).
考虑一个小的,人为的例子,来自Silvey(1970)。
On the Aegean island of Kalythos the male inhabitants suffer from a
congenital eye disease, the effects of which become more marked with
increasing age. Samples of islander males of various ages were tested
for blindness and the results recorded. The data is shown below:
在爱琴海的Kalythos岛上,男性居民患有先天性眼病,其影响随着年龄的增长而变得更加明显。对不同年龄段的岛民男性进行了失明测试,并记录了结果。数据如下所示:
Age: 年龄: | 20 | 35 | 45 | 55 | 70 |
No. tested: 号测试: | 50 | 50 | 50 | 50 | 50 |
No. blind: 号盲态: | 6 | 17 | 26 | 37 | 44 |
The problem we consider is to fit both logistic and probit models to
this data, and to estimate for each model the LD50, that is the age at
which the chance of blindness for a male inhabitant is 50%.
我们考虑的问题是将logistic和probit模型拟合到该数据,并估计每个模型的LD 50,即男性居民失明的可能性为50%的年龄。
If y is the number of blind at age x and n the
number tested, both models have the form
y ~ B(n, F(beta_0 + beta_1 x))
where for the probit case,
F(z) = Phi(z)
is the standard normal distribution function, and in the logit case
(the default),
F(z) = e^z/(1+e^z).
如果y是x岁时的盲数,n是测试的盲数,则两个模型的形式为y ~ B(n,F(beta_0 + beta_1 x)),其中对于概率单位情况,F(z)= Phi(z)是标准正态分布函数,而在对数单位情况(默认值)下,F(z)= e^z/(1+e^z)。
In both cases the LD50 is
LD50 = - beta_0/beta_1
that is, the point at which the argument of the distribution function is
zero.
在这两种情况下,LD 50都是LD 50 = - beta_0/beta_1,即分布函数的自变量为零的点。
The first step is to set the data up as a data frame
第一步是将数据设置为数据框
> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5), y = c(6,17,26,37,44))
To fit a binomial model using glm()
there are three possibilities
for the response:
要使用 glm()
拟合二项式模型,响应有三种可能性:
Here we need the second of these conventions, so we add a matrix to our
data frame:
在这里,我们需要这些约定中的第二个,所以我们在数据框架中添加了一个矩阵:
> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)
To fit the models we use
以适应我们使用的模型
> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos) > fml <- glm(Ymat ~ x, family = binomial, data = kalythos)
Since the logit link is the default the parameter may be omitted on the
second call. To see the results of each fit we could use
由于logit链接是默认的,因此在第二次调用时可以省略该参数。为了查看每个拟合的结果,我们可以使用
> summary(fmp) > summary(fml)
Both models fit (all too) well. To find the LD50 estimate we can use a
simple function:
这两种模式都很适合。为了计算LD 50估计值,我们可以使用一个简单的函数:
> ld50 <- function(b) -b[1]/b[2] > ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)
The actual estimates from this data are 43.663 years and 43.601 years
respectively.
根据这些数据的实际估计值分别为43.663年和43.601年。
With the Poisson family the default link is the log
, and in
practice the major use of this family is to fit surrogate Poisson
log-linear models to frequency data, whose actual distribution is often
multinomial. This is a large and important subject we will not discuss
further here.
使用Poisson系列,默认链接是 log
,实际上,该系列的主要用途是将替代Poisson对数线性模型拟合到频率数据,其实际分布通常是多项式。这是一个重大而重要的问题,我们在这里不再讨论。
It even forms a major part of the use of non-gaussian
generalized models overall.
它甚至构成了非高斯广义模型整体使用的主要部分。
Occasionally genuinely Poisson data arises in practice and in the past
it was often analyzed as gaussian data after either a log or a
square-root transformation.
在实践中偶尔会出现真正的泊松数据,在过去,它通常被分析为高斯数据后,无论是对数或平方根变换。
As a graceful alternative to the latter, a
Poisson generalized linear model may be fitted as in the following
example:
作为后者的一个优雅的替代方案,泊松广义线性模型可以像下面的例子那样拟合:
> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt), data = worm.counts)
For all families the variance of the response will depend on the mean
and will have the scale parameter as a multiplier. The form of
dependence of the variance on the mean is a characteristic of the
response distribution; for example for the Poisson distribution
Var(y) = mu.
对于所有族,响应的方差将取决于均值,并将尺度参数作为乘数。方差对均值的依赖形式是响应分布的一个特征;例如泊松分布Var(y)= mu。
For quasi-likelihood estimation and inference the precise response
distribution is not specified, but rather only a link function and the
form of the variance function as it depends on the mean.
对于拟似然估计和推断,没有指定精确的响应分布,而只是一个链接函数和方差函数的形式,因为它取决于均值。
Since
quasi-likelihood estimation uses formally identical techniques to those
for the gaussian distribution, this family provides a way of fitting
gaussian models with non-standard link functions or variance functions,
incidentally.
由于拟似然估计使用与高斯分布相同的技术,这个家族提供了一种用非标准链接函数或方差函数拟合高斯模型的方法。
For example, consider fitting the non-linear regression
y = theta_1 z_1 / (z_2 - theta_2) + e
which may be written alternatively as
y = 1 / (beta_1 x_1 + beta_2 x_2) + e
where
x_1 = z_2/z_1, x_2 = -1/z_1, beta_1 = 1/theta_1, and beta_2 =
theta_2/theta_1.
例如,考虑拟合非线性回归y = theta_1 z_1 /(z_2 - theta_2)+ e,其可替代地写作y = 1 /(beta_1 x_1 + beta_2 x_2)+ e,其中x_1 = z_2/z_1,x_2 = -1/z_1,beta_1 = 1/theta_1,且beta_2 = theta_2/theta_1。
Supposing a suitable data frame to be set up we could fit this
non-linear regression as
假设要建立一个合适的数据框架,我们可以将这个非线性回归拟合为
> nlfit <- glm(y ~ x1 + x2 - 1, family = quasi(link=inverse, variance=constant), data = biochem)
The reader is referred to the manual and the help document for further
information, as needed.
读者可根据需要查阅手册和帮助文件以获取更多信息。
Certain forms of nonlinear model can be fitted by Generalized Linear
Models (glm()
). But in the majority of cases we have to approach
the nonlinear curve fitting problem as one of nonlinear optimization.
R’s nonlinear optimization routines are optim()
, nlm()
and nlminb()
,
We seek the parameter values that minimize some index
of lack-of-fit, and they do this by trying out various parameter values
iteratively. Unlike linear regression for example, there is no
guarantee that the procedure will converge on satisfactory estimates.
某些形式的非线性模型可以用广义线性模型( glm()
)拟合。但在大多数情况下,我们不得不接近的非线性曲线拟合问题作为一个非线性优化。R的非线性优化例程是 optim()
, nlm()
和 nlminb()
,我们寻求使某些失拟指数最小化的参数值,并且他们通过迭代地尝试各种参数值来做到这一点。例如,与线性回归不同,不能保证该过程将收敛于令人满意的估计。
All the methods require initial guesses about what parameter values to
try, and convergence may depend critically upon the quality of the
starting values.
所有的方法都需要对尝试什么参数值进行初始猜测,并且收敛性可能严重依赖于初始值的质量。
One way to fit a nonlinear model is by minimizing the sum of the squared
errors (SSE) or residuals. This method makes sense if the observed
errors could have plausibly arisen from a normal distribution.
拟合非线性模型的一种方法是最小化平方误差(SSE)或残差的总和。如果观测到的误差可能是由正态分布引起的,那么这种方法是有意义的。
Here is an example from Bates & Watts (1988), page 51. The data are:
这里有一个例子,来自Bates & Watts(1988),第51页。数据如下:
> x <- c(0.02, 0.02, 0.06, 0.06, 0.11, 0.11, 0.22, 0.22, 0.56, 0.56, 1.10, 1.10) > y <- c(76, 47, 97, 107, 123, 139, 159, 152, 191, 201, 207, 200)
The fit criterion to be minimized is:
要最小化的拟合标准为:
> fn <- function(p) sum((y - (p[1] * x)/(p[2] + x))^2)
In order to do the fit we need initial estimates of the parameters. One
way to find sensible starting values is to plot the data, guess some
parameter values, and superimpose the model curve using those values.
为了进行拟合,我们需要对参数进行初始估计。找到合理的起始值的一种方法是绘制数据,猜测一些参数值,并使用这些值绘制模型曲线。
> plot(x, y) > xfit <- seq(.02, 1.1, .05) > yfit <- 200 * xfit/(0.1 + xfit) > lines(spline(xfit, yfit))
We could do better, but these starting values of 200 and 0.1 seem
adequate. Now do the fit:
我们可以做得更好,但这些200和0.1的起始值似乎足够了。现在做fit:
> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)
After the fitting, out$minimum
is the SSE, and
out$estimate
are the least squares estimates of the parameters.
To obtain the approximate standard errors (SE) of the estimates we do:
拟合后, out$minimum
是SSE, out$estimate
是参数的最小二乘估计值。为了获得估计值的近似标准误差(SE),我们这样做:
> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))
The 2
which is subtracted in the line above represents the number
of parameters. A 95% confidence interval would be the parameter
estimate +/- 1.96 SE. We can superimpose the least squares
fit on a new plot:
在上面的行中减去的 2
表示参数的数量。95%置信区间为参数估计值+/- 1.96 SE。我们可以在一个新的图上使用最小二乘拟合:
> plot(x, y) > xfit <- seq(.02, 1.1, .05) > yfit <- 212.68384222 * xfit/(0.06412146 + xfit) > lines(spline(xfit, yfit))
The standard package stats provides much more extensive facilities
for fitting non-linear models by least squares. The model we have just
fitted is the Michaelis-Menten model, so we can use
标准的stats包提供了更广泛的工具,用于通过最小二乘法拟合非线性模型。我们刚刚拟合的模型是Michaelis-Menten模型,因此我们可以使用
> df <- data.frame(x=x, y=y) > fit <- nls(y ~ SSmicmen(x, Vm, K), df) > fit Nonlinear regression model model: y ~ SSmicmen(x, Vm, K) data: df Vm K 212.68370711 0.06412123 residual sum-of-squares: 1195.449 > summary(fit) Formula: y ~ SSmicmen(x, Vm, K) Parameters: Estimate Std. Error t value Pr(>|t|) Vm 2.127e+02 6.947e+00 30.615 3.24e-11 K 6.412e-02 8.281e-03 7.743 1.57e-05 Residual standard error: 10.93 on 10 degrees of freedom Correlation of Parameter Estimates: Vm K 0.7651
Maximum likelihood is a method of nonlinear model fitting that applies
even if the errors are not normal. The method finds the parameter values
which maximize the log likelihood, or equivalently which minimize the
negative log-likelihood.
最大似然法是一种非线性模型拟合的方法,即使误差不是正态分布也适用。该方法找到最大化对数似然的参数值,或者等效地最小化负对数似然的参数值。
Here is an example from Dobson (1990), pp.
108–111. This example fits a logistic model to dose-response data,
which clearly could also be fit by glm()
. The data are:
这里有一个例子,来自多布森(1990),pp. 108-111.该示例将逻辑模型拟合到剂量-反应数据,其显然也可以由 glm()
拟合。数据如下:
> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839) > y <- c( 6, 13, 18, 28, 52, 53, 61, 60) > n <- c(59, 60, 62, 56, 63, 59, 62, 60)
The negative log-likelihood to minimize is:
要最小化的负对数似然为:
> fn <- function(p) sum( - (y*(p[1]+p[2]*x) - n*log(1+exp(p[1]+p[2]*x)) + log(choose(n, y)) ))
We pick sensible starting values and do the fit:
我们选择合理的起始值并进行拟合:
> out <- nlm(fn, p = c(-50,20), hessian = TRUE)
After the fitting, out$minimum
is the negative log-likelihood,
and out$estimate
are the maximum likelihood estimates of the
parameters. To obtain the approximate SEs of the estimates we do:
拟合后, out$minimum
是负对数似然, out$estimate
是参数的最大似然估计。为了获得估计的近似SE,我们执行以下操作:
> sqrt(diag(solve(out$hessian)))
A 95% confidence interval would be the parameter estimate +/-
1.96 SE.
95%置信区间为参数估计值+/- 1.96 SE。
We conclude this chapter with just a brief mention of some of the other
facilities available in R for special regression and data analysis
problems.
在结束本章时,我们简单地提到了R中用于特殊回归和数据分析问题的其他一些工具。
lme()
and nlme()
for linear and non-linear mixed-effects models, that is linear and
non-linear regressions in which some of the coefficients correspond to
random effects. These functions make heavy use of formulae to specify
the models.
lme()
和 nlme()
,即线性和非线性回归,其中一些系数对应于随机效应。这些函数大量使用公式来指定模型。loess()
function fits a nonparametric regression by using a locally weighted
regression. Such regressions are useful for highlighting a trend in
messy data or for data reduction to give some insight into a large data
set.
loess()
函数通过使用局部加权回归拟合非参数回归。这种回归对于突出混乱数据的趋势或数据简化以提供对大型数据集的一些洞察是有用的。Function loess
is in the standard package stats, together
with code for projection pursuit regression.
函数 loess
在标准包stats中,与投影追踪回归的代码一起。
lqs
in the recommended package MASS provides state-of-art algorithms
for highly-resistant fits. Less resistant but statistically more
efficient methods are available in packages, for example function
rlm
in package MASS.
lqs
MASS为高阻力配合提供了最先进的算法。封装中的方法阻力较小,但在统计上更有效,例如封装MASS中的功能 rlm
。avas
and
ace
in package acepack and functions bruto
and mars
in package mda provide some examples of these techniques in
user-contributed packages to R. An extension is Generalized
Additive Models, implemented in user-contributed packages gam and
mgcv.
avas
和 ace
以及mda包中的函数 bruto
和 mars
在用户贡献给R的包中提供了这些技术的一些示例。一个扩展是广义加性模型,在用户贡献的软件包gam和mgcv中实现。Models are again specified in the ordinary linear model form. The model
fitting function is tree()
,
but many other generic functions such as plot()
and text()
are well adapted to displaying the results of a tree-based model fit in
a graphical way.
模型再次以普通线性模型的形式被指定。模型拟合函数是 tree()
,但许多其他通用函数(如 plot()
和 text()
)也适用于以图形方式显示基于树的模型拟合的结果。
Tree models are available in R via the user-contributed
packages rpart and tree.
树模型在R中通过用户贡献的包rpart和tree可用。
Graphical facilities are an important and extremely versatile component
of the R environment. It is possible to use the facilities to
display a wide variety of statistical graphs and also to build entirely
new types of graph.
图形化工具是R环境中一个重要且用途极其广泛的组件。可以使用这些工具来显示各种各样的统计图表,也可以构建全新类型的图表。
The graphics facilities can be used in both interactive and batch modes,
but in most cases, interactive use is more productive. Interactive use
is also easy because at startup time R initiates a graphics
device driver which opens a special graphics window for
the display of interactive graphics. Although this is done
automatically, it may useful to know that the command used is
X11()
under UNIX, windows()
under Windows and
quartz()
under macOS. A new device can always be opened by
dev.new()
.
图形工具可以在交互式和批处理模式下使用,但在大多数情况下,交互式使用更具生产力。交互式使用也很容易,因为在启动时,R启动一个图形设备驱动程序,打开一个特殊的图形窗口,用于显示交互式图形。虽然这是自动完成的,但了解UNIX下使用的命令是 X11()
,Windows下使用的命令是 windows()
,macOS下使用的命令是 quartz()
可能很有用。新设备始终可以通过 dev.new()
打开。
Once the device driver is running, R plotting commands can be used to
produce a variety of graphical displays and to create entirely new kinds
of display.
一旦设备驱动程序运行,R绘图命令可以用来产生各种图形显示,并创建全新的显示类型。
Plotting commands are divided into three basic groups:
打印命令分为三个基本组:
In addition, R maintains a list of graphical parameters which
can be manipulated to customize your plots.
此外,R维护了一个图形参数列表,可以操纵这些参数来自定义绘图。
This manual only describes what are known as ‘base’ graphics. A
separate graphics sub-system in package grid coexists with base –
it is more powerful but harder to use. There is a recommended package
lattice which builds on grid and provides ways to produce
multi-panel plots akin to those in the Trellis system in S.
本手册仅描述了所谓的“基本”图形。在包网格中有一个单独的图形子系统与基础共存-它更强大,但更难使用。有一个推荐的包lattice,它建立在网格上,并提供了生成类似于S中的网格系统的多面板图的方法。
High-level plotting functions are designed to generate a complete plot
of the data passed as arguments to the function.
高级绘图函数用于生成作为参数传递给函数的数据的完整绘图。
Where appropriate,
axes, labels and titles are automatically generated (unless you request
otherwise.) High-level plotting commands always start a new plot,
erasing the current plot if necessary.
在适当的情况下,轴、标签和标题会自动生成(除非您另有要求)。高级打印命令总是开始一个新的打印,如有必要,会擦除当前打印。
plot()
function plot()
功能plot()
function ¶plot()
函数¶One of the most frequently used plotting functions in R is the
plot()
function. This is a generic function: the type of
plot produced is dependent on the type or class of the first
argument.
R中最常用的绘图函数之一是 plot()
函数。这是一个泛型函数:生成的图的类型取决于第一个参数的类型或类。
plot(x, y)
plot(xy)
If x and y are vectors, plot(x, y)
produces a scatterplot of y against x. The same effect can
be produced by supplying one argument (second form) as either a list
containing two elements x and y or a two-column matrix.
如果x和y是向量,则 plot(x, y)
生成y相对于x的散点图。通过提供一个参数(第二种形式)作为包含两个元素x和y的列表或两列矩阵也可以产生相同的效果。
plot(x)
If x is a time series, this produces a time-series plot. If
x is a numeric vector, it produces a plot of the values in the
vector against their index in the vector. If x is a complex
vector, it produces a plot of imaginary versus real parts of the vector
elements.
如果x是一个时间序列,这将产生一个时间序列图。如果x是一个数字向量,它会生成一个向量中的值与向量中的索引的关系图。如果x是一个复向量,它会产生一个向量元素虚部与真实的部的关系图。
plot(f)
plot(f, y)
f is a factor object, y is a numeric vector. The first form
generates a bar plot of f; the second form produces boxplots of
y for each level of f.
f是一个因子对象,y是一个数值向量。第一种形式生成f的条形图;第二种形式为f的每个水平生成y的箱线图。
plot(df)
plot(~ expr)
plot(y ~ expr)
df is a data frame, y is any object, expr is a list
of object names separated by ‘+
’ (e.g., a + b + c
). The
first two forms produce distributional plots of the variables in a data
frame (first form) or of a number of named objects (second form). The
third form plots y against every object named in expr.
df是数据帧,y是任何对象,expr是由' +
'分隔的对象名称的列表(例如, a + b + c
)。前两种形式生成数据框中变量的分布图(第一种形式)或多个命名对象的分布图(第二种形式)。第三种形式将y绘制到expr中命名的每个对象上。
R provides two very useful functions for representing multivariate
data. If X
is a numeric matrix or data frame, the command
R提供了两个非常有用的函数来表示多变量数据。如果 X
是数值矩阵或数据框,则命令
> pairs(X)
produces a pairwise scatterplot matrix of the variables defined by the
columns of X
, that is, every column of X
is plotted
against every other column of X
and the resulting n(n-1)
plots are arranged in a matrix with plot scales constant over the rows
and columns of the matrix.
产生由 X
的列定义的变量的成对散点图矩阵,即, X
的每一列相对于 X
的每隔一列绘制,并且所得的n(n-1)个图以矩阵的行和列上的图标度恒定的方式排列在矩阵中。
When three or four variables are involved a coplot may be more
enlightening. If a
and b
are numeric vectors and c
is a numeric vector or factor object (all of the same length), then
the command
当涉及三个或四个变量时,共图可能更有启发性。如果 a
和 b
是数值向量,而 c
是数值向量或因子对象(长度都相同),则命令
> coplot(a ~ b | c)
produces a number of scatterplots of a
against b
for given
values of c
. If c
is a factor, this simply means that
a
is plotted against b
for every level of c
. When
c
is numeric, it is divided into a number of conditioning
intervals and for each interval a
is plotted against b
for values of c
within the interval. The number and position of
intervals can be controlled with given.values=
argument to
coplot()
—the function co.intervals()
is useful for
selecting intervals. You can also use two given variables with a
command like
对于给定的 c
值,生成多个 a
对 b
的散点图。如果 c
是一个因子,这仅仅意味着对于 c
的每个水平, a
都相对于 b
作图。当 c
为数值时,将其划分为多个调节区间,对于每个区间,针对区间内的 c
值,将 a
与 b
作图。间隔的数量和位置可以通过 given.values=
参数到 coplot()
来控制-功能 co.intervals()
对于选择间隔很有用。您也可以使用两个给定的变量与命令,如
> coplot(a ~ b | c + d)
which produces scatterplots of a
against b
for every joint
conditioning interval of c
and d
.
其针对 c
和 d
的每个关节调节间隔产生 a
对 b
的散点图。
The coplot()
and pairs()
function both take an argument
panel=
which can be used to customize the type of plot which
appears in each panel. The default is points()
to produce a
scatterplot but by supplying some other low-level graphics function of
two vectors x
and y
as the value of panel=
you can
produce any type of plot you wish. An example panel function useful for
coplots is panel.smooth()
.
coplot()
和 pairs()
函数都接受一个参数 panel=
,该参数可用于自定义每个面板中显示的绘图类型。默认值为 points()
以生成散点图,但通过提供两个向量 x
和 y
的其他低级图形函数作为 panel=
的值,您可以生成任何类型的图。一个对coplot有用的面板函数示例是 panel.smooth()
。
Other high-level graphics functions produce different types of plots.
Some examples are:
其他高级图形函数生成不同类型的绘图。例如:
qqnorm(x)
¶qqline(x)
qqplot(x, y)
Distribution-comparison plots. The first form plots the numeric vector
x
against the expected Normal order scores (a normal scores plot)
and the second adds a straight line to such a plot by drawing a line
through the distribution and data quartiles. The third form plots the
quantiles of x
against those of y
to compare their
respective distributions.
分布比较图。第一种形式绘制了数字向量 x
与预期正态序评分的关系图(正态评分图),第二种形式通过绘制一条穿过分布和数据四分位数的直线,向该图添加了一条直线。第三种形式绘制 x
的分位数与 y
的分位数,以比较它们各自的分布。
hist(x)
¶hist(x, nclass=n)
hist(x, breaks=b, …)
Produces a histogram of the numeric vector x
. A sensible number
of classes is usually chosen, but a recommendation can be given with the
nclass=
argument. Alternatively, the breakpoints can be
specified exactly with the breaks=
argument. If the
probability=TRUE
argument is given, the bars represent relative
frequencies divided by bin width instead of counts.
生成数字向量 x
的直方图。通常会选择合理数量的类,但可以使用 nclass=
参数给出建议。或者,可以使用 breaks=
参数精确指定断点。如果给定了 probability=TRUE
参数,则条形表示相对频率除以区间宽度而不是计数。
dotchart(x, …)
¶Constructs a dot chart of the data in x
. In a dot chart the
y-axis gives a labelling of the data in x
and the
x-axis gives its value. For example it allows easy visual
selection of all data entries with values lying in specified ranges.
构建 x
中数据的点图。在点图中,y轴给出 x
中数据的标签,x轴给出其值。例如,它允许轻松地可视化选择所有值位于指定范围内的数据条目。
image(x, y, z, …)
¶contour(x, y, z, …)
persp(x, y, z, …)
Plots of three variables. The image
plot draws a grid of rectangles
using different colours to represent the value of z
, the contour
plot draws contour lines to represent the value of z
, and the
persp
plot draws a 3D surface.
三个变量的图。 image
图使用不同颜色绘制矩形网格以表示 z
的值, contour
图绘制等高线以表示 z
的值,而 persp
图绘制3D表面。
There are a number of arguments which may be passed to high-level
graphics functions, as follows:
有许多参数可以传递给高级图形函数,如下所示:
add=TRUE
Forces the function to act as a low-level graphics function,
superimposing the plot on the current plot (some functions only).
强制函数作为低级图形函数,将绘图叠加在当前绘图上(仅限某些函数)。
axes=FALSE
Suppresses generation of axes—useful for adding your own custom axes
with the axis()
function. The default, axes=TRUE
, means
include axes.
禁止生成轴-这对于使用 axis()
函数添加您自己的自定义轴很有用。默认值 axes=TRUE
表示包含轴。
log="x"
log="y"
log="xy"
Causes the x, y or both axes to be logarithmic. This will
work for many, but not all, types of plot.
使x、y轴或两个轴都为对数。这将适用于许多,但不是所有类型的情节。
type=
The type=
argument controls the type of plot produced, as
follows:
type=
参数控制生成的图的类型,如下所示:
type="p"
Plot individual points (the default)
绘制单个点(默认)
type="l"
Plot lines 情节线
type="b"
Plot points connected by lines (both)
绘制由线连接的点(两者)
type="o"
Plot points overlaid by lines
绘制被线覆盖的点
type="h"
Plot vertical lines from points to the zero axis (high-density)
绘制从点到零轴的垂直线(高密度)
type="s"
type="S"
Step-function plots. In the first form, the top of the vertical defines
the point; in the second, the bottom.
阶跃函数图。在第一种形式中,垂直线的顶部定义了点;在第二种形式中,底部定义了点。
type="n"
No plotting at all. However axes are still drawn (by default) and the
coordinate system is set up according to the data. Ideal for creating
plots with subsequent low-level graphics functions.
完全没有阴谋。但是,轴仍然被绘制(默认情况下),坐标系根据数据设置。非常适合使用后续低级图形功能创建绘图。
xlab=string
ylab=string
Axis labels for the x and y axes. Use these arguments to
change the default labels, usually the names of the objects used in the
call to the high-level plotting function.
x和y轴的轴标签。使用这些参数更改默认标签,通常是调用高级绘图函数时使用的对象的名称。
main=string
Figure title, placed at the top of the plot in a large font.
图标题,以大字体显示在图的顶部。
sub=string
Sub-title, placed just below the x-axis in a smaller font.
副标题,以较小的字体放置在x轴的正下方。
Sometimes the high-level plotting functions don’t produce exactly the
kind of plot you desire. In this case, low-level plotting commands can
be used to add extra information (such as points, lines or text) to the
current plot.
有时候,高级绘图函数并不能精确地生成您想要的那种绘图。在这种情况下,可以使用低级绘图命令向当前绘图添加额外信息(如点、线或文本)。
Some of the more useful low-level plotting functions are:
一些更有用的低级绘图函数是:
points(x, y)
¶lines(x, y)
Adds points or connected lines to the current plot. plot()
’s
type=
argument can also be passed to these functions (and
defaults to "p"
for points()
and "l"
for
lines()
.)
将点或连接线添加到当前图中。 plot()
的 type=
参数也可以传递给这些函数(对于 points()
,默认为 "p"
;对于 lines()
,默认为 "l"
)。
text(x, y, labels, …)
¶Add text to a plot at points given by x, y
. Normally
labels
is an integer or character vector in which case
labels[i]
is plotted at point (x[i], y[i])
. The default
is 1:length(x)
.
将文本添加到图中 x, y
给出的点。通常 labels
是一个整数或字符向量,在这种情况下, labels[i]
被绘制在点 (x[i], y[i])
。默认值为 1:length(x)
。
Note: This function is often used in the sequence
注意:此函数经常在序列中使用
> plot(x, y, type="n"); text(x, y, names)
The graphics parameter type="n"
suppresses the points but sets up
the axes, and the text()
function supplies special characters, as
specified by the character vector names
for the points.
图形参数 type="n"
抑制点但设置轴,而 text()
函数提供特殊字符,如由点的字符向量 names
指定的。
abline(a, b)
¶abline(h=y)
abline(v=x)
abline(lm.obj)
Adds a line of slope b
and intercept a
to the current
plot. h=y
may be used to specify y-coordinates for
the heights of horizontal lines to go across a plot, and
v=x
similarly for the x-coordinates for vertical
lines. Also lm.obj may be list with a coefficients
component of length 2 (such as the result of model-fitting functions,)
which are taken as an intercept and slope, in that order.
在当前图中添加斜率为0#、截距为1#的直线。 h=y
可用于指定穿过图的水平线的高度的y坐标,并且 v=x
类似地用于垂直线的x坐标。lm.obj也可以与长度为2的 coefficients
分量(例如模型拟合函数的结果)一起列出,其依次被视为截距和斜率。
polygon(x, y, …)
¶Draws a polygon defined by the ordered vertices in (x
, y
)
and (optionally) shade it in with hatch lines, or fill it if the
graphics device allows the filling of figures.
绘制由( x
, y
)中的有序顶点定义的多边形,并(可选)使用阴影线对其进行着色,或者如果图形设备允许填充图形,则填充它。
legend(x, y, legend, …)
¶Adds a legend to the current plot at the specified position. Plotting
characters, line styles, colors etc., are identified with the labels in
the character vector legend
. At least one other argument v
(a vector the same length as legend
) with the corresponding
values of the plotting unit must also be given, as follows:
将图例添加到当前图的指定位置。绘制字符、线条样式、颜色等,用字符向量 legend
中的标签来标识。还必须给出至少一个其他参数v(与 legend
长度相同的向量)以及绘图单位的相应值,如下所示:
legend( , fill=v)
Colors for filled boxes 填充框的颜色
legend( , col=v)
Colors in which points or lines will be drawn
绘制点或线时使用的颜色
legend( , lty=v)
Line styles 线样式
legend( , lwd=v)
Line widths 线宽
legend( , pch=v)
Plotting characters (character vector)
打印字符(字符向量)
title(main, sub)
¶Adds a title main
to the top of the current plot in a large font
and (optionally) a sub-title sub
at the bottom in a smaller font.
将标题 main
以大字体添加到当前绘图的顶部,并(可选)将子标题 sub
以小字体添加到底部。
axis(side, …)
¶Adds an axis to the current plot on the side given by the first argument
(1 to 4, counting clockwise from the bottom.) Other arguments control
the positioning of the axis within or beside the plot, and tick
positions and labels. Useful for adding custom axes after calling
plot()
with the axes=FALSE
argument.
在第一个参数给定的一侧向当前图添加一个轴(1到4,从底部顺时针计数)。其他参数控制轴在图中或图旁的位置,以及标记位置和标签。用于在使用 axes=FALSE
参数调用 plot()
后添加自定义轴。
Low-level plotting functions usually require some positioning
information (e.g., x and y coordinates) to determine where
to place the new plot elements. Coordinates are given in terms of
user coordinates which are defined by the previous high-level
graphics command and are chosen based on the supplied data.
低级绘图功能通常需要一些定位信息(例如,x和y坐标)来确定放置新绘图元素的位置。坐标以用户坐标的形式给出,这些坐标由之前的高级图形命令定义并根据提供的数据进行选择。
Where x
and y
arguments are required, it is also
sufficient to supply a single argument being a list with elements named
x
and y
. Similarly a matrix with two columns is also
valid input. In this way functions such as locator()
(see below)
may be used to specify positions on a plot interactively.
在需要 x
和 y
参数的情况下,提供单个参数(包含名为 x
和 y
的元素的列表)也就足够了。类似地,具有两列的矩阵也是有效输入。以这种方式,可以使用诸如 locator()
(见下文)的函数来交互地指定绘图上的位置。
In some cases, it is useful to add mathematical symbols and formulae to a
plot. This can be achieved in R by specifying an expression rather
than a character string in any one of text
, mtext
, axis
,
or title
. For example, the following code draws the formula for
the Binomial probability function:
在某些情况下,向图中添加数学符号和公式非常有用。在R中,这可以通过在 text
、 mtext
、 axis
或 title
中指定表达式而不是字符串来实现。例如,下面的代码绘制二项式概率函数的公式:
> text(x, y, expression(paste(bgroup("(", atop(n, x), ")"), p^x, q^{n-x})))
More information, including a full listing of the features available can
obtained from within R using the commands:
更多信息,包括可用功能的完整列表,可以从R中使用命令获得:
> help(plotmath) > example(plotmath) > demo(plotmath)
It is possible to specify Hershey vector fonts for rendering text when using
the text
and contour
functions. There are three reasons for
using the Hershey fonts:
使用 text
和 contour
函数时,可以指定用于呈现文本的Hershey矢量字体。使用Hershey字体有三个原因:
More information, including tables of Hershey characters can be obtained from
within R using the commands:
更多信息,包括Hershey字符表,可以从R中使用命令获得:
> help(Hershey) > demo(Hershey) > help(Japanese) > demo(Japanese)
R also provides functions which allow users to extract or add
information to a plot using a mouse. The simplest of these is the
locator()
function:
R还提供了允许用户使用鼠标提取或添加信息到图中的功能。其中最简单的是 locator()
函数:
locator(n, type)
¶Waits for the user to select locations on the current plot using the
left mouse button. This continues until n
(default 512) points
have been selected, or another mouse button is pressed. The
type
argument allows for plotting at the selected points and has
the same effect as for high-level graphics commands; the default is no
plotting. locator()
returns the locations of the points selected
as a list with two components x
and y
.
等待用户使用鼠标左键选择当前图上的位置。此操作一直持续到选择了 n
(默认值为512)个点或按下另一个鼠标按钮。 type
参数允许在选定点处打印,其效果与高级图形命令相同;默认值为不打印。 locator()
返回所选点的位置,作为包含两个组件 x
和 y
的列表。
locator()
is usually called with no arguments. It is
particularly useful for interactively selecting positions for graphic
elements such as legends or labels when it is difficult to calculate in
advance where the graphic should be placed.
locator()
通常不带参数调用。当难以预先计算图形的放置位置时,它对于交互式地选择图例或标签等图形元素的位置特别有用。
For example, to place some
informative text near an outlying point, the command
例如,要在外围点附近放置一些信息性文本,
> text(locator(1), "Outlier", adj=0)
may be useful. (locator()
will be ignored if the current device,
such as postscript
does not support interactive pointing.)
可能有用(如果当前设备(例如 postscript
)不支持交互式指向,则会忽略 locator()
。)
identify(x, y, labels)
¶Allow the user to highlight any of the points defined by x
and
y
(using the left mouse button) by plotting the corresponding
component of labels
nearby (or the index number of the point if
labels
is absent). Returns the indices of the selected points
when another button is pressed.
允许用户通过在附近绘制 labels
的相应分量(或如果没有 labels
,则绘制点的索引号)来突出显示由 x
和 y
定义的任何点(使用鼠标左键)。当按下另一个按钮时,返回选定点的索引。
Sometimes we want to identify particular points on a plot, rather
than their positions. For example, we may wish the user to select some
observation of interest from a graphical display and then manipulate
that observation in some way. Given a number of (x, y)
coordinates in two numeric vectors x
and y
, we could use
the identify()
function as follows:
有时候我们想在图上确定特定的点,而不是它们的位置。例如,我们可能希望用户从图形显示中选择一些感兴趣的观察,然后以某种方式操纵该观察。给定两个数字向量 x
和 y
中的(x,y)坐标的数量,我们可以如下使用 identify()
函数:
> plot(x, y) > identify(x, y)
The identify()
functions performs no plotting itself, but simply
allows the user to move the mouse pointer and click the left mouse
button near a point. If there is a point near the mouse pointer it will
be marked with its index number (that is, its position in the
x
/y
vectors) plotted nearby. Alternatively, you could use
some informative string (such as a case name) as a highlight by using
the labels
argument to identify()
, or disable marking
altogether with the plot = FALSE
argument. When the process is
terminated (see above), identify()
returns the indices of the
selected points; you can use these indices to extract the selected
points from the original vectors x
and y
.
identify()
函数本身不执行绘图,只是允许用户移动鼠标指针并在点附近单击鼠标左键。如果在鼠标指针附近有一个点,它将被标记为其索引号(即它在 x
/ y
矢量中的位置)。或者,您可以使用一些信息字符串(如案例名称)作为突出显示,方法是使用 labels
参数到 identify()
,或者使用 plot = FALSE
参数完全禁用标记。当过程终止时(见上文), identify()
返回选定点的索引;您可以使用这些索引从原始向量 x
和 y
中提取选定点。
When creating graphics, particularly for presentation or publication
purposes, R’s defaults do not always produce exactly that which is
required. You can, however, customize almost every aspect of the
display using graphics parameters. R maintains a list of a
large number of graphics parameters which control things such as line
style, colors, figure arrangement and text justification among many
others. Every graphics parameter has a name (such as ‘col
’,
which controls colors,) and a value (a color number, for example.)
当创建图形时,特别是用于演示或发布目的时,R的默认值并不总是生成所需的图形。但是,您可以使用图形参数自定义显示的几乎每个方面。R维护了一个包含大量图形参数的列表,这些参数控制诸如线型、颜色、图形排列和文本对齐等。每个图形参数都有一个名称(如' col
',它控制颜色)和一个值(例如颜色编号)。
A separate list of graphics parameters is maintained for each active
device, and each device has a default set of parameters when
initialized.
为每个活动设备维护一个单独的图形参数列表,并且每个设备在初始化时都有一组默认参数。
Graphics parameters can be set in two ways: either
permanently, affecting all graphics functions which access the current
device; or temporarily, affecting only a single graphics function call.
图形参数可以通过两种方式设置:永久性地影响访问当前设备的所有图形函数;或者临时性地只影响单个图形函数调用。
par()
functionpar()
功能par()
function ¶par()
函数¶The par()
function is used to access and modify the list of
graphics parameters for the current graphics device.
par()
函数用于访问和修改当前图形设备的图形参数列表。
par()
Without arguments, returns a list of all graphics parameters and their
values for the current device.
不带参数,返回当前设备的所有图形参数及其值的列表。
par(c("col", "lty"))
With a character vector argument, returns only the named graphics
parameters (again, as a list.)
使用字符向量参数时,仅返回命名的图形参数(同样以列表形式)。
par(col=4, lty=2)
With named arguments (or a single list argument), sets the values of
the named graphics parameters, and returns the original values of the
parameters as a list.
使用命名参数(或单个列表参数),设置命名图形参数的值,并以列表形式返回参数的原始值。
Setting graphics parameters with the par()
function changes the
value of the parameters permanently, in the sense that all future
calls to graphics functions (on the current device) will be affected by
the new value.
使用 par()
函数设置图形参数会永久更改参数的值,也就是说,将来对图形函数(在当前设备上)的所有调用都将受到新值的影响。
You can think of setting graphics parameters in this way
as setting “default” values for the parameters, which will be used by
all graphics functions unless an alternative value is given.
您可以将以这种方式设置图形参数视为设置参数的“默认”值,除非给出替代值,否则所有图形函数都将使用这些参数。
Note that calls to par()
always affect the global values
of graphics parameters, even when par()
is called from within a
function. This is often undesirable behavior—usually we want to set
some graphics parameters, do some plotting, and then restore the
original values so as not to affect the user’s R session. You can
restore the initial values by saving the result of par()
when
making changes, and restoring the initial values when plotting is
complete.
请注意,对 par()
的调用始终会影响图形参数的全局值,即使是在函数内部调用 par()
时也是如此。这通常是不受欢迎的行为-通常我们希望设置一些图形参数,进行一些绘图,然后恢复原始值,以免影响用户的R会话。您可以通过在进行更改时保存 par()
的结果并在绘图完成时恢复初始值来恢复初始值。
> oldpar <- par(col=4, lty=2)
... plotting commands ...
> par(oldpar)
To save and restore all settable24 graphical parameters use
要保存和恢复所有可设置的 24 图形参数,请使用
> oldpar <- par(no.readonly=TRUE)
... plotting commands ...
> par(oldpar)
Graphics parameters may also be passed to (almost) any graphics function
as named arguments. This has the same effect as passing the arguments
to the par()
function, except that the changes only last for the
duration of the function call. For example:
图形参数也可以作为命名参数传递给(几乎)任何图形函数。这与将参数传递给 par()
函数具有相同的效果,只是更改仅持续函数调用的持续时间。例如:
> plot(x, y, pch="+")
produces a scatterplot using a plus sign as the plotting character,
without changing the default plotting character for future plots.
使用加号作为打印字符生成散点图,而不更改将来打印的默认打印字符。
Unfortunately, this is not implemented entirely consistently and it is
sometimes necessary to set and reset graphics parameters using
par()
.
不幸的是,这并不是完全一致地实现的,有时需要使用 par()
设置和重置图形参数。
The following sections detail many of the commonly-used graphical
parameters. The R help documentation for the par()
function
provides a more concise summary; this is provided as a somewhat more
detailed alternative.
以下部分详细介绍了许多常用的图形参数。 par()
函数的R帮助文档提供了一个更简洁的总结;这是作为一个更详细的替代方案提供的。
Graphics parameters will be presented in the following form:
图形参数将以以下形式显示:
name=value
A description of the parameter’s effect. name is the name of the
parameter, that is, the argument name to use in calls to par()
or
a graphics function. value is a typical value you might use when
setting the parameter.
参数效果的说明。name是参数的名称,即调用 par()
或图形函数时使用的参数名称。value是设置参数时可能使用的典型值。
Note that axes
is not a graphics parameter but an
argument to a few plot
methods: see xaxt
and yaxt
.
注意 axes
不是一个图形参数,而是一些 plot
方法的参数:参见 xaxt
和 yaxt
。
R plots are made up of points, lines, text and polygons (filled
regions.) Graphical parameters exist which control how these
graphical elements are drawn, as follows:
R图由点、线、文本和多边形(填充区域)组成。存在控制如何绘制这些图形元素的图形参数,如下所示:
pch="+"
Character to be used for plotting points. The default varies with
graphics drivers, but it is usually
a circle.
Plotted points tend to appear slightly above or below the appropriate
position unless you use "."
as the plotting character, which
produces centered points.
用于打印点的字符。默认值因图形驱动程序而异,但通常为圆形。除非使用 "."
作为打印字符,否则打印的点往往会显示在适当位置的上方或下方,这会产生居中的点。
pch=4
When pch
is given as an integer between 0 and 25 inclusive, a
specialized plotting symbol is produced. To see what the symbols are,
use the command
当 pch
作为0和25之间的整数给出时,产生专用绘图符号。要查看符号是什么,请使用命令
> legend(locator(1), as.character(0:25), pch = 0:25)
Those from 21 to 25 may appear to duplicate earlier symbols, but can be
coloured in different ways: see the help on points
and its
examples.
从21到25的符号可能看起来与之前的符号重复,但可以以不同的方式着色:请参阅关于 points
的帮助及其示例。
In addition, pch
can be a character or a number in the range
32:255
representing a character in the current font.
此外, pch
可以是表示当前字体中的字符的范围 32:255
中的字符或数字。
lty=2
Line types. Alternative line styles are not supported on all graphics
devices (and vary on those that do) but line type 1 is always a solid
line, line type 0 is always invisible, and line types 2 and onwards are
dotted or dashed lines, or some combination of both.
线路类型。并非所有图形设备都支持替代线样式(并且在支持的图形设备上有所不同),但线类型1始终为实线,线类型0始终不可见,线类型2及以后为点线或虚线,或两者的某种组合。
lwd=2
Line widths. Desired width of lines, in multiples of the “standard”
line width. Affects axis lines as well as lines drawn with
lines()
, etc. Not all devices support this, and some have
restrictions on the widths that can be used.
线宽。所需的线宽,以“标准”线宽的倍数表示。影响轴线以及使用 lines()
绘制的线等。并非所有设备都支持此功能,有些设备对可使用的宽度有限制。
col=2
Colors to be used for points, lines, text, filled regions and images.
A number from the current palette (see ?palette
) or a named colour.
用于点、线、文本、填充区域和图像的颜色。当前调色板中的数字(参见 ?palette
)或命名颜色。
col.axis
col.lab
col.main
col.sub
The color to be used for axis annotation, x and y labels,
main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的颜色。
font=2
An integer which specifies which font to use for text. If possible,
device drivers arrange so that 1
corresponds to plain text,
2
to bold face, 3
to italic, 4
to bold italic
and 5
to a symbol font (which include Greek letters).
一个整数,指定文本使用的字体。如果可能的话,设备驱动程序会安排 1
对应于纯文本, 2
对应于粗体, 3
对应于斜体, 4
对应于粗体斜体, 5
对应于符号字体(包括希腊字母)。
font.axis
font.lab
font.main
font.sub
The font to be used for axis annotation, x and y labels,
main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的字体。
adj=-0.1
Justification of text relative to the plotting position. 0
means
left justify, 1
means right justify and 0.5
means to
center horizontally about the plotting position. The actual value is
the proportion of text that appears to the left of the plotting
position, so a value of -0.1
leaves a gap of 10% of the text width
between the text and the plotting position.
文字相对于打印位置的对正。 0
表示左对齐, 1
表示右对齐, 0.5
表示围绕打印位置水平居中。实际值是显示在打印位置左侧的文本的比例,因此值 -0.1
在文本和打印位置之间留下文本宽度的10%的间隙。
cex=1.5
Character expansion. The value is the desired size of text characters
(including plotting characters) relative to the default text size.
性格膨胀。该值是文本字符(包括打印字符)相对于默认文本大小的所需大小。
cex.axis
cex.lab
cex.main
cex.sub
The character expansion to be used for axis annotation, x and
y labels, main and sub-titles, respectively.
分别用于轴注释、x和y标签、主标题和副标题的字符扩展。
Many of R’s high-level plots have axes, and you can construct axes
yourself with the low-level axis()
graphics function. Axes have
three main components: the axis line (line style controlled by the
lty
graphics parameter), the tick marks (which mark off unit
divisions along the axis line) and the tick labels (which mark the
units.) These components can be customized with the following graphics
parameters.
R的许多高级图都有轴,你可以用低级的 axis()
graphics函数自己构造轴。轴有三个主要组件:轴线(由 lty
图形参数控制的线样式)、刻度线(沿轴线沿着划分单位)和刻度标签(标记单位)。可以使用以下图形参数自定义这些组件。
lab=c(5, 7, 12)
The first two numbers are the desired number of tick intervals on the
x and y axes respectively. The third number is the
desired length of axis labels, in characters (including the decimal
point.) Choosing a too-small value for this parameter may result in all
tick labels being rounded to the same number!
前两个数字分别是x轴和y轴上所需的刻度间隔数。第三个数字是所需的轴标签长度,以字符为单位(包括小数点)。为该参数选择一个太小的值可能会导致所有刻度标签被舍入为相同的数字!
las=1
Orientation of axis labels. 0
means always parallel to axis,
1
means always horizontal, and 2
means always
perpendicular to the axis.
轴标签的方向。 0
表示始终平行于轴, 1
表示始终水平, 2
表示始终垂直于轴。
mgp=c(3, 1, 0)
Positions of axis components. The first component is the distance from
the axis label to the axis position, in text lines. The second
component is the distance to the tick labels, and the final component is
the distance from the axis position to the axis line (usually zero).
轴组件的位置。第一个分量是从轴标签到轴位置的距离,以文本行表示。第二个分量是到刻度标签的距离,最后一个分量是从轴位置到轴线的距离(通常为零)。
Positive numbers measure outside the plot region, negative numbers
inside.
正数在绘图区域外测量,负数在绘图区域内测量。
tck=0.01
Length of tick marks, as a fraction of the size of the plotting region.
When tck
is small (less than 0.5) the tick marks on the x
and y axes are forced to be the same size. A value of 1 gives
grid lines. Negative values give tick marks outside the plotting
region. Use tck=0.01
and mgp=c(1,-1.5,0)
for internal
tick marks.
刻度线的长度,作为打印区域大小的一部分。当 tck
很小时(小于0.5),x轴和y轴上的刻度线将被强制为相同的大小。值为1表示栅格线。负值将在打印区域外给出刻度线。使用 tck=0.01
和 mgp=c(1,-1.5,0)
作为内部刻度线。
xaxs="r"
yaxs="i"
Axis styles for the x and y axes, respectively. With
styles "i"
(internal) and "r"
(the default) tick marks
always fall within the range of the data, however style "r"
leaves a small amount of space at the edges.
x轴和y轴的轴样式。对于样式 "i"
(内部)和 "r"
(默认),刻度线始终落在数据范围内,但是样式 "r"
在边缘处留下少量空间。
A single plot in R is known as a figure
and comprises a
plot region surrounded by margins (possibly containing axis
labels, titles, etc.) and (usually) bounded by the axes themselves.
R中的单个绘图称为 figure
,包括由页边距包围的绘图区域(可能包含轴标签、标题等)并且(通常)由轴本身限制。
A typical figure is
一个典型的人物是
Graphics parameters controlling figure layout include:
控制图形布局的图形参数包括:
mai=c(1, 0.5, 0.5, 0)
Widths of the bottom, left, top and right margins, respectively,
measured in inches.
底部、左侧、顶部和右侧边距的宽度,分别以英寸为单位。
mar=c(4, 2, 2, 1)
Similar to mai
, except the measurement unit is text lines.
类似于 mai
,除了测量单位是文本行。
mar
and mai
are equivalent in the sense that setting one
changes the value of the other. The default values chosen for this
parameter are often too large; the right-hand margin is rarely needed,
and neither is the top margin if no title is being used.
mar
和 mai
在设置一个会改变另一个的值的意义上是等效的。为这个参数选择的默认值通常太大;很少需要右边距,如果没有使用标题,也不需要上边距。
The bottom and
left margins must be large enough to accommodate the axis and tick
labels. Furthermore, the default is chosen without regard to the size
of the device surface: for example, using the postscript()
driver
with the height=4
argument will result in a plot which is about
50% margin unless mar
or mai
are set explicitly. When
multiple figures are in use (see below) the margins are reduced, however
this may not be enough when many figures share the same page.
底部和左边距必须足够大,以容纳轴和刻度标签。此外,默认值的选择与设备表面的大小无关:例如,使用带有 height=4
参数的 postscript()
驱动程序将导致大约50%的边缘,除非显式设置 mar
或 mai
。当使用多个图形时(见下文),边距会减少,但当许多图形共享同一页面时,这可能不够。
R allows you to create an n by m array of figures on a
single page. Each figure has its own margins, and the array of figures
is optionally surrounded by an outer margin, as shown in the
following figure.
R允许你在一个页面上创建一个n乘m的数字数组。每个图形都有自己的边距,并且图形阵列可以选择由外部边距包围,如下图所示。
The graphical parameters relating to multiple figures are as follows:
与多个图形相关的图形参数如下:
mfcol=c(3, 2)
mfrow=c(2, 4)
Set the size of a multiple figure array. The first value is the number of
rows; the second is the number of columns. The only difference between
these two parameters is that setting mfcol
causes figures to be
filled by column; mfrow
fills by rows.
设置多地物阵列的大小。第一个值是行数;第二个值是列数。这两个参数之间的唯一区别是设置 mfcol
会导致数字按列填充; mfrow
按行填充。
The layout in the Figure could have been created by setting
mfrow=c(3,2)
; the figure shows the page after four plots have
been drawn.
图中的布局可以通过设置 mfrow=c(3,2)
创建;图中显示了绘制四个图后的页面。
Setting either of these can reduce the base size of symbols and text
(controlled by par("cex")
and the pointsize of the device). In a
layout with exactly two rows and columns the base size is reduced by a
factor of 0.83: if there are three or more of either rows or columns,
the reduction factor is 0.66.
设置这两项都可以减小符号和文本的基本大小(由 par("cex")
和设备的点大小控制)。在只有两行和两列的布局中,基本大小将以0.83的因子减小:如果有三行或三列或更多行或列,则减小因子为0.66。
mfg=c(2, 2, 3, 2)
Position of the current figure in a multiple figure environment. The first
two numbers are the row and column of the current figure; the last two
are the number of rows and columns in the multiple figure array. Set
this parameter to jump between figures in the array.
当前地物在多地物环境中的位置。前两个数字是当前图形的行数和列数;后两个数字是多图形数组中的行数和列数。设置此参数以在数组中的图形之间跳转。
You can even use
different values for the last two numbers than the true values
for unequally-sized figures on the same page.
您甚至可以使用不同的值为最后两个数字比真正的价值观不相等大小的数字在同一页上。
fig=c(4, 9, 1, 4)/10
Position of the current figure on the page. Values are the positions of
the left, right, bottom and top edges respectively, as a percentage of
the page measured from the bottom left corner. The example value would
be for a figure in the bottom right of the page.
当前图形在页面上的位置。值分别是左、右、下和上边缘的位置,以从左下角开始测量的页面百分比表示。示例值将用于页面右下角的图形。
Set this parameter for
arbitrary positioning of figures within a page. If you want to add a
figure to a current page, use new=TRUE
as well (unlike S).
设置此参数可在页面中任意定位图形。如果你想在当前页面上添加一个图形,也可以使用 new=TRUE
(与S不同)。
oma=c(2, 0, 3, 0)
omi=c(0, 0, 0.8, 0)
Size of outer margins. Like mar
and mai
, the first
measures in text lines and the second in inches, starting with the
bottom margin and working clockwise.
外部边距的大小。与 mar
和 mai
一样,第一个以文本行为单位,第二个以英寸为单位,从底部边距开始顺时针工作。
Outer margins are particularly useful for page-wise titles, etc. Text
can be added to the outer margins with the mtext()
function with
argument outer=TRUE
. There are no outer margins by default,
however, so you must create them explicitly using oma
or
omi
.
外边距对于页面标题等特别有用。可以使用带参数 outer=TRUE
的 mtext()
函数将文本添加到外边距。但是,默认情况下没有外部边距,因此必须使用 oma
或 omi
显式创建它们。
More complicated arrangements of multiple figures can be produced by the
split.screen()
and layout()
functions, as well as by the
grid and lattice packages.
通过 split.screen()
和 layout()
函数以及网格和格子包可以产生多个图形的更复杂的排列。
R can generate graphics (of varying levels of quality) on almost any
type of display or printing device. Before this can begin, however,
R needs to be informed what type of device it is dealing with. This
is done by starting a device driver. The purpose of a device
driver is to convert graphical instructions from R (“draw a line,”
for example) into a form that the particular device can understand.
R可以在几乎任何类型的显示器或打印设备上生成图形(不同质量级别)。然而,在这开始之前,R需要被告知它正在处理什么类型的设备。这是通过启动设备驱动程序来完成的。设备驱动程序的目的是将来自R的图形指令(例如“画一条线”)转换为特定设备可以理解的形式。
Device drivers are started by calling a device driver function. There
is one such function for every device driver: type help(Devices)
for a list of them all. For example, issuing the command
设备驱动程序通过调用设备驱动程序函数启动。每个设备驱动程序都有一个这样的函数:键入 help(Devices)
以获得所有设备驱动程序的列表。例如,发出命令
> postscript()
causes all future graphics output to be sent to the printer in
PostScript format. Some commonly-used device drivers are:
使所有将来的图形输出以JPEG格式发送到打印机。一些常用的设备驱动程序是:
X11()
¶For use with the X11 window system on Unix-alikes
在类Unix系统上与X11窗口系统一起使用
windows()
¶For use on Windows 适用于Windows
quartz()
¶For use on macOS 适用于macOS
postscript()
¶For printing on PostScript printers, or creating PostScript graphics
files.
用于在Windows打印机上打印或创建Windows图形文件。
pdf()
¶Produces a PDF file, which can also be included into PDF files.
生成PDF文件,该文件也可以包含在PDF文件中。
png()
¶Produces a bitmap PNG file. (Not always available: see its help page.)
生成位图PNG文件。(Not总是可用的:请参阅其帮助页面。)
jpeg()
¶Produces a bitmap JPEG file, best used for image
plots.
(Not always available: see its help page.)
生成位图JPEG文件,最适合用于 image
打印。(Not总是可用的:请参阅其帮助页面。)
When you have finished with a device, be sure to terminate the device
driver by issuing the command
当您完成了一个设备,一定要终止设备驱动程序发出命令
> dev.off()
This ensures that the device finishes cleanly; for example in the case
of hardcopy devices this ensures that every page is completed and has
been sent to the printer. (This will happen automatically at the normal
end of a session.)
这确保了设备完成干净;例如,在硬拷贝设备的情况下,这确保了每一页都完成并已发送到打印机。(This将在会话正常结束时自动发生。)
By passing the file
argument to the postscript()
device
driver function, you may store the graphics in PostScript format in a
file of your choice. The plot will be in landscape orientation unless
the horizontal=FALSE
argument is given, and you can control the
size of the graphic with the width
and height
arguments
(the plot will be scaled as appropriate to fit these dimensions.) For
example, the command
通过将 file
参数传递给 postscript()
设备驱动程序函数,您可以将图形存储为您选择的文件中的NTFS格式。除非给出了 horizontal=FALSE
参数,否则图将以横向方向显示,并且您可以使用 width
和 height
参数控制图形的大小(图将根据这些尺寸进行适当缩放。)例如,命令
> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)
will produce a file containing PostScript code for a figure five inches
high, perhaps for inclusion in a document. It is important to note that
if the file named in the command already exists, it will be overwritten.
将产生一个文件,其中包含一个五英寸高的图形的密码,可能包含在一个文档中。需要注意的是,如果命令中指定的文件已经存在,它将被覆盖。
This is the case even if the file was only created earlier in the same
R session.
即使该文件只是在同一个R会话中较早创建的,情况也是如此。
Many usages of PostScript output will be to incorporate the figure in
another document. This works best when encapsulated PostScript
is produced: R always produces conformant output, but only marks the
output as such when the onefile=FALSE
argument is supplied. This
unusual notation stems from S-compatibility: it really means that
the output will be a single page (which is part of the EPSF
specification). Thus to produce a plot for inclusion use something like
在许多情况下,将图形合并到另一个文档中是对图形输出的一种用法。这在产生封装的输出时效果最好:R总是产生一致的输出,但只有在提供 onefile=FALSE
参数时才将输出标记为一致。这种不寻常的表示法源于S兼容性:它实际上意味着输出将是单个页面(这是EPSF规范的一部分)。因此,要生成包含的图,请使用以下内容
> postscript("plot1.eps", horizontal=FALSE, onefile=FALSE, height=8, width=6, pointsize=10)
In advanced use of R it is often useful to have several graphics
devices in use at the same time. Of course only one graphics device can
accept graphics commands at any one time, and this is known as the
current device. When multiple devices are open, they form a
numbered sequence with names giving the kind of device at any position.
在R的高级使用中,同时使用多个图形设备通常很有用。当然,在任何时候只有一个图形设备可以接受图形命令,这就是所谓的当前设备。当多个设备打开时,它们会形成一个编号序列,其中的名称给出了任何位置的设备类型。
The main commands used for operating with multiple devices, and their
meanings are as follows:
用于操作多个设备的主要命令及其含义如下:
X11()
[UNIX]
windows()
win.printer()
win.metafile()
[Windows]
quartz()
[macOS]
postscript()
pdf()
png()
jpeg()
tiff()
bitmap()
…
Each new call to a device driver function opens a new graphics device,
thus extending by one the device list. This device becomes the current
device, to which graphics output will be sent.
对设备驱动程序函数的每个新调用都打开一个新的图形设备,从而将设备列表扩展一个。此设备将成为当前设备,图形输出将发送到该设备。
dev.list()
¶Returns the number and name of all active devices. The device at
position 1 on the list is always the null device which does not
accept graphics commands at all.
返回所有活动设备的编号和名称。列表中位置1处的设备始终是空设备,它根本不接受图形命令。
dev.next()
¶dev.prev()
Returns the number and name of the graphics device next to, or previous
to the current device, respectively.
分别返回当前设备旁边或前面的图形设备的编号和名称。
dev.set(which=k)
¶Can be used to change the current graphics device to the one at position
k of the device list. Returns the number and label of the device.
可用于将当前图形设备更改为设备列表中位置k处的设备。返回设备的编号和标签。
dev.off(k)
¶Terminate the graphics device at point k of the device list. For
some devices, such as postscript
devices, this will either print
the file immediately or correctly complete the file for later printing,
depending on how the device was initiated.
在设备列表的点k处终止图形设备。对于某些设备(如 postscript
设备),这将立即打印文件或正确完成文件以便稍后打印,具体取决于设备的启动方式。
dev.copy(device, …, which=k)
dev.print(device, …, which=k)
Make a copy of the device k. Here device
is a device
function, such as postscript
, with extra arguments, if needed,
specified by ‘…’. dev.print
is similar, but the
copied device is immediately closed, so that end actions, such as
printing hardcopies, are immediately performed.
复制设备k。这里的 device
是一个设备函数,比如 postscript
,如果需要的话,有额外的参数,由' … '指定。 dev.print
类似,但复制的设备会立即关闭,以便立即执行结束操作,例如打印硬拷贝。
graphics.off()
Terminate all graphics devices on the list, except the null device.
终止列表中的所有图形设备,空设备除外。
R does not have builtin capabilities for dynamic or
interactive graphics, e.g. rotating point clouds or to “brushing”
(interactively highlighting) points. However, extensive dynamic graphics
facilities are available in the system GGobi by Swayne, Cook
and Buja available from
R没有内置的动态或交互式图形功能,例如旋转点云或“刷”(交互式突出显示)点。然而,在Swayne、Cook和布贾的系统GGobi中可获得广泛的动态图形设施,其可从
and these can be accessed from R via the package rggobi, described at
http://ggobi.org/rggobi.html.
并且这些可以通过在http://ggobi.org/rggobi.html描述的软件包rggobi从R访问。
Also, package rgl provides ways to interact with 3D plots, for example
of surfaces.
此外,软件包rgl提供了与3D绘图(例如曲面)交互的方法。
All R functions and datasets are stored in packages. Only
when a package is loaded are its contents available. This is done both
for efficiency (the full list would take more memory and would take
longer to search than a subset), and to aid package developers, who are
protected from name clashes with other code.
所有R函数和数据集都存储在包中。只有当包被加载时,其内容才可用。这样做既是为了提高效率(完整列表将占用更多内存,并且比子集搜索时间更长),也是为了帮助包开发人员,他们受到保护,不会与其他代码发生名称冲突。
The process of developing
packages is described in
Creating R packages in Writing R Extensions.
Here, we will describe them from a user’s point of view.
在编写R扩展中创建R包中描述了开发包的过程。在这里,我们将从用户的角度来描述它们。
To see which packages are installed at your site, issue the command
要查看站点上安装了哪些软件包,请发出以下命令
> library()
with no arguments. To load a particular package (e.g., the boot
package containing functions from Davison & Hinkley (1997)), use a
command like
没有争论为了加载特定的包(例如,包含Davison & Hinkley(1997)中函数的靴子包),使用如下命令
> library(boot)
Users connected to the Internet can use the install.packages()
and update.packages()
functions (available through the
Packages
menu in the Windows and macOS GUIs, see Installing
packages in R Installation and Administration) to install
and update packages.
连接到Internet的用户可以使用 install.packages()
和 update.packages()
功能(可通过Windows和macOS GUI中的 Packages
菜单获得,请参阅R安装和管理中的安装包)来安装和更新包。
To see which packages are currently loaded, use
要查看当前加载了哪些包,请使用
> search()
to display the search list. Some packages may be loaded but not
available on the search list (see Namespaces): these will be
included in the list given by
显示搜索列表。有些软件包可能已经加载,但在搜索列表中不可用(请参阅搜索空间):这些软件包将包含在由
> loadedNamespaces()
To see a list of all available help topics in an installed package,
use
若要查看已安装包中所有可用帮助主题的列表,请使用
> help.start()
to start the HTML help system, and then navigate to the package
listing in the Reference
section.
启动HTML帮助系统,然后导航到 Reference
部分中列出的软件包。
The standard (or base) packages are considered part of the R
source code. They contain the basic functions that allow R to work,
and the datasets and standard statistical and graphical functions that
are described in this manual. They should be automatically available in
any R installation.
标准(或基础)包被认为是R源代码的一部分。它们包含允许R工作的基本功能,以及本手册中描述的数据集和标准统计和图形功能。它们应该在任何R安装中自动可用。
For a complete list, see
R packages in R FAQ.
有关完整列表,请参见R FAQ中的R包。
There are thousands of contributed packages for R, written by many
different authors. Some of these packages implement specialized
statistical methods, others give access to data or hardware, and others
are designed to complement textbooks. Some (the recommended
packages) are distributed with every binary distribution of R. Most
are available for download from CRAN
(https://CRAN.R-project.org/ and its mirrors) and other
repositories such as Bioconductor (https://www.bioconductor.org/).
The R FAQ
contains a list of CRAN packages current at the time of release, but the
collection of available packages changes very frequently.
有成千上万的R软件包,由许多不同的作者编写。其中一些软件包实现了专门的统计方法,另一些软件包提供了数据或硬件访问,还有一些软件包旨在补充教科书。有些(推荐的软件包)随R的每个二进制发行版一起发行。大多数都可以从CRAN(https://CRAN.R-project.org/及其镜像)和其他存储库(如Bioconductor(https://www.bioconductor. org/))下载。R FAQ包含了一个CRAN软件包的列表,但可用的软件包的集合变化非常频繁。
Packages have namespaces, which do three things: they allow the
package writer to hide functions and data that are meant only for
internal use, they prevent functions from breaking when a user (or other
package writer) picks a name that clashes with one in the package, and
they provide a way to refer to an object within a particular package.
包有命名空间,它做三件事:它们允许包编写者隐藏只供内部使用的函数和数据,它们防止当用户(或其他包编写者)选择与包中的名称冲突的名称时函数中断,它们提供了一种引用特定包中对象的方法。
For example, t()
is the transpose function in R, but users
might define their own function named t
. Namespaces prevent
the user’s definition from taking precedence, and breaking every
function that tries to transpose a matrix.
例如, t()
是R中的转置函数,但用户可以定义自己的函数 t
。空格可以防止用户的定义优先,并阻止每个试图转置矩阵的函数。
There are two operators that work with namespaces. The double-colon
operator ::
selects definitions from a particular namespace.
In the example above, the transpose function will always be available
as base::t
, because it is defined in the base
package.
Only functions that are exported from the package can be retrieved in
this way.
有两个操作符可以处理名称空间。双冒号操作符 ::
从特定的名称空间中选择定义。在上面的例子中,转置函数将始终作为 base::t
可用,因为它在 base
包中定义。只有从包中导出的函数才能以这种方式检索。
The triple-colon operator :::
may be seen in a few places in R
code: it acts like the double-colon operator but also allows access to
hidden objects. Users are more likely to use the getAnywhere()
function, which searches multiple packages.
三冒号运算符 :::
可以在R代码中的一些地方看到:它的作用类似于双冒号运算符,但也允许访问隐藏对象。用户更有可能使用 getAnywhere()
功能,该功能搜索多个软件包。
Packages are often inter-dependent, and loading one may cause others to
be automatically loaded. The colon operators described above will also
cause automatic loading of the associated package.
包通常是相互依赖的,加载一个包可能会导致其他包自动加载。上面描述的冒号操作符也会导致自动加载相关的包。
When packages with
namespaces are loaded automatically they are not added to the search
list.
当带有命名空间的包自动加载时,它们不会添加到搜索列表中。
R has quite extensive facilities to access the OS under which it is
running: this allows it to be used as a scripting language and that
ability is much used by R itself, for example to install packages.
R有相当广泛的工具来访问它运行的操作系统:这允许它被用作脚本语言,并且R本身也经常使用这种能力,例如安装软件包。
Because R’s own scripts need to work across all platforms,
considerable effort has gone into make the scripting facilities as
platform-independent as is feasible.
由于R自己的脚本需要跨所有平台工作,因此在使脚本工具尽可能独立于平台方面做了大量工作。
There are many functions to manipulate files and directories. Here are
pointers to some of the more commonly used ones.
有许多函数可以操作文件和目录。这里有一些更常用的指针。
To create an (empty) file or directory, use file.create
or
dir.create
. (These are the analogues of the POSIX utilities
touch
and mkdir
.) For temporary files and
directories in the R session directory see tempfile
.
要创建(空)文件或目录,请使用 file.create
或 dir.create
。(这些是POSIX实用程序 touch
和 mkdir
的类似物。有关R会话目录中的临时文件和目录,请参见 tempfile
。
Files can be removed by either file.remove
or unlink
: the
latter can remove directory trees.
文件可以通过 file.remove
或 unlink
删除:后者可以删除目录树。
For directory listings use list.files
(also available as
dir
) or list.dirs
. These can select files using a regular
expression: to select by wildcards use Sys.glob
.
对于目录列表,请使用 list.files
(也可用作 dir
)或 list.dirs
。这些可以使用正则表达式选择文件:要通过通配符选择,请使用 Sys.glob
。
Many types of information on a filepath (including for example if it is
a file or directory) can be found by file.info
.
文件路径上的许多类型的信息(包括例如文件或目录)可以通过 file.info
找到。
There are several ways to find out if a file ‘exists’ (a file can
exist on the filesystem and not be visible to the current user).
There are functions file.exists
, file.access
and
file_test
with various versions of this test: file_test
is
a version of the POSIX test
command for those familiar with
shell scripting.
有几种方法可以确定文件是否“存在”(文件可以存在于文件系统中,但当前用户看不到)。这个测试有不同版本的函数 file.exists
, file.access
和 file_test
: file_test
是POSIX test
命令的一个版本,适合那些熟悉shell脚本的人。
Function file.copy
is the R analogue of the POSIX command
cp
.
函数 file.copy
是POSIX命令 cp
的R模拟。
Choosing files can be done interactively by file.choose
: the
Windows port has the more versatile functions choose.files
and
choose.dir
and there are similar functions in the tcltk
package: tk_choose.files
and tk_choose.dir
.
选择文件可以通过 file.choose
交互式完成:Windows端口具有更通用的功能 choose.files
和 choose.dir
,tcltk包中也有类似的功能: tk_choose.files
和 tk_choose.dir
。
Functions file.show
and file.edit
will display and edit
one or more files in a way appropriate to the R port, using the
facilities of a console (such as RGui on Windows or R.app on macOS) if
one is in use.
函数 file.show
和 file.edit
将以适合R端口的方式显示和编辑一个或多个文件,使用控制台的工具(例如Windows上的RGui或macOS上的R.app)(如果正在使用)。
There is some support for links in the filesystem: see functions
file.link
and Sys.readlink
.
在文件系统中有一些对链接的支持:参见函数 file.link
和 Sys.readlink
。
With a few exceptions, R relies on the underlying OS functions to
manipulate filepaths. Some aspects of this are allowed to depend on the
OS, and do, even down to the version of the OS.
除了少数例外,R依赖于底层OS函数来操作文件路径。这方面的某些方面允许依赖于操作系统,甚至依赖于操作系统的版本。
There are POSIX
standards for how OSes should interpret filepaths and many R users
assume POSIX compliance: but Windows does not claim to be compliant and
other OSes may be less than completely compliant.
对于操作系统如何解释文件路径,有POSIX标准,许多R用户假设POSIX兼容:但Windows并不声称是兼容的,其他操作系统可能不完全兼容。
The following are some issues which have been encountered with filepaths.
以下是文件路径遇到的一些问题。
Functions basename
and dirname
select parts of a file
path: the recommended way to assemble a file path from components is
file.path
. Function pathexpand
does ‘tilde expansion’,
substituting values for home directories (the current user’s, and
perhaps those of other users).
函数 basename
和 dirname
选择文件路径的一部分:推荐的从组件组装文件路径的方法是 file.path
。函数 pathexpand
执行“波浪线扩展”,替换主目录的值(当前用户的,可能还有其他用户的)。
On filesystems with links, a single file can be referred to by many
filepaths. Function normalizePath
will find a canonical
filepath.
在带有链接的文件系统上,单个文件可以被多个文件路径引用。函数 normalizePath
将查找规范文件路径。
Windows has the concepts of short (‘8.3’) and long file names:
normalizePath
will return an absolute path using long file names
and shortPathName
will return a version using short names. The
latter does not contain spaces and uses backslash as the separator, so
is sometimes useful for exporting names from R.
Windows有短('8.3')和长文件名的概念: normalizePath
将使用长文件名返回绝对路径, shortPathName
将使用短文件名返回版本。后者不包含空格,并使用反斜杠作为分隔符,因此有时候对于从R导出名称很有用。
File permissions are a related topic. R has support for the
POSIX concepts of read/write/execute permission for owner/group/all but
this may be only partially supported on the filesystem, so for example
on Windows only read-only files (for the account running the R
session) are recognized.
文件权限是一个相关的主题。R支持所有者/组/所有者的读/写/执行权限的POSIX概念,但这可能只在文件系统上部分支持,因此例如在Windows上只识别只读文件(用于运行R会话的帐户)。
Access Control Lists (ACLs) are employed on
several filesystems, but do not have an agreed standard and R has no
facilities to control them. Use Sys.chmod
to change permissions.
访问控制列表(Access Control List,简写为ACL)被应用在多个文件系统上,但没有一个一致的标准,R也没有控制它们的工具。使用 Sys.chmod
更改权限。
Functions system
and system2
are used to invoke a system
command and optionally collect its output. system2
is a little
more general but its main advantage is that it is easier to write
cross-platform code using it.
函数 system
和 system2
用于调用系统命令并可选地收集其输出。 system2
有点通用,但它的主要优点是使用它编写跨平台代码更容易。
system
behaves differently on Windows from other OSes (because
the API C call of that name does). Elsewhere it invokes a shell to run
the command: the Windows port of R has a function shell
to do
that.
system
在Windows上的行为与其他操作系统不同(因为该名称的API C调用确实如此)。在其他地方,它调用一个shell来运行命令:R的Windows端口有一个函数 shell
来做这件事。
To find out if the OS includes a command, use Sys.which
, which
attempts to do this in a cross-platform way (unfortunately it is not a
standard OS service).
要了解操作系统是否包含命令,请使用 Sys.which
,它试图以跨平台的方式执行此操作(不幸的是,它不是标准的操作系统服务)。
Function shQuote
will quote filepaths as needed for commands in
the current OS.
函数 shQuote
将根据当前操作系统中命令的需要引用文件路径。
Recent versions of R have extensive facilities to read and write
compressed files, often transparently. Reading of files in R is to a
very large extent done by connections, and the file
function which is used to open a connection to a file (or a URL) and is
able to identify the compression used from the ‘magic’ header of the
file.
R的最新版本具有广泛的工具来读取和写入压缩文件,通常是透明的。在R中,文件的阅读在很大程度上是通过连接完成的,而 file
函数用于打开到文件(或URL)的连接,并且能够从文件的“magic”头中识别所使用的压缩。
The type of compression which has been supported for longest is
gzip
compression, and that remains a good general compromise.
Files compressed by the earlier Unix compress
utility can also
be read, but these are becoming rare. Two other forms of compression,
those of the bzip2
and xz
utilities are also
available. These generally achieve higher rates of compression
(depending on the file, much higher) at the expense of slower
decompression and much slower compression.
支持时间最长的压缩类型是 gzip
压缩,这仍然是一个很好的通用折衷方案。由早期的Unix compress
实用程序压缩的文件也可以读取,但这些正在变得越来越少。另外还有两种压缩形式,即 bzip2
和 xz
实用程序。这些通常实现更高的压缩率(取决于文件,更高),但代价是更慢的解压缩和更慢的压缩。
There is some confusion between xz
and lzma
compression (see https://en.wikipedia.org/wiki/Xz and
https://en.wikipedia.org/wiki/LZMA): R can read files
compressed by most versions of either.
在 xz
和 lzma
压缩之间有一些混淆(参见https://en.wikipedia.org/wiki/Xz和https://en.wikipedia.org/wiki/LZMA):R可以读取由大多数版本压缩的文件。
File archives are single files which contain a collection of files, the
most common ones being ‘tarballs’ and zip files as used to distribute
R packages. R can list and unpack both (see functions untar
and unzip
) and create both (for zip
with the help of an
external program).
文件存档是包含文件集合的单个文件,最常见的是用于分发R包的“tarball”和zip文件。R可以列出和解包两者(参见函数 untar
和 unzip
),并创建两者(在外部程序的帮助下为 zip
)。
The following session is intended to introduce to you some features of
the R environment by using them. Many features of the system will be
unfamiliar and puzzling at first, but this puzzlement will soon
disappear.
下面的会话旨在通过使用它们向您介绍R环境的一些功能。系统的许多特性一开始会让人感到陌生和困惑,但这种困惑很快就会消失。
Start R appropriately for your platform (see Invoking R).
为您的平台适当地启动R(请参阅重新启动R)。
The R program begins, with a banner.
R程序以一个横幅开始。
(Within R code, the prompt on the left hand side will not be shown to
avoid confusion.)
(在R代码中,为避免混淆,左手侧的提示将不显示。)
help.start()
Start the HTML interface to on-line help (using a web browser
available at your machine). You should briefly explore the features of
this facility with the mouse.
启动在线帮助的HTML界面(使用机器上可用的Web浏览器)。您应该用鼠标简要地浏览一下此工具的功能。
Iconify the help window and move on to the next part.
图标化帮助窗口并继续下一部分。
x <- rnorm(50)
y <- rnorm(x)
Generate two pseudo-random normal vectors of x- and
y-coordinates.
生成x和y坐标的两个伪随机法向量。
plot(x, y)
Plot the points in the plane. A graphics window will appear automatically.
把这些点标绘在平面上。图形窗口将自动出现。
ls()
See which R objects are now in the R workspace.
查看哪些R对象现在在R工作区中。
rm(x, y)
Remove objects no longer needed. (Clean up).
删除不再需要的对象。(清理)。
x <- 1:20
Make x = (1, 2, ..., 20).
使x =(1,2,...,20)。
w <- 1 + sqrt(x)/2
A ‘weight’ vector of standard deviations.
标准差的“权重”向量。
dummy <- data.frame(x=x, y= x + rnorm(x)*w)
dummy
Make a data frame of two columns, x and y, and look
at it.
制作一个包含x和y两列的数据框,然后观察它。
fm <- lm(y ~ x, data=dummy)
summary(fm)
Fit a simple linear regression and look at the
analysis. With y
to the left of the tilde,
we are modelling y dependent on x.
拟合一个简单的线性回归,看看分析结果。在波浪号左边有 y
,我们对y依赖于x进行建模。
fm1 <- lm(y ~ x, data=dummy, weight=1/w^2)
summary(fm1)
Since we know the standard deviations, we can do a weighted regression.
因为我们知道标准差,我们可以做加权回归。
attach(dummy)
Make the columns in the data frame visible as variables.
使数据框中的列作为变量可见。
lrf <- lowess(x, y)
Make a nonparametric local regression function.
建立一个非参数局部回归函数。
plot(x, y)
Standard point plot. 标准点图。
lines(x, lrf$y)
Add in the local regression.
加入本地回归。
abline(0, 1, lty=3)
The true regression line: (intercept 0, slope 1).
真实回归线:(截距0,斜率1)。
abline(coef(fm))
Unweighted regression line.
未加权回归线。
abline(coef(fm1), col = "red")
Weighted regression line.
加权回归线。
detach()
Remove data frame from the search path.
从搜索路径中删除数据框。
plot(fitted(fm), resid(fm),
xlab="Fitted values",
ylab="Residuals",
main="Residuals vs Fitted")
A standard regression diagnostic plot to check for heteroscedasticity.
Can you see it?
用于检查异方差的标准回归诊断图。你能看见吗?
qqnorm(resid(fm), main="Residuals Rankit Plot")
A normal scores plot to check for skewness, kurtosis and outliers. (Not
very useful here.)
一个正常的分数图,以检查偏度,峰度和离群值。(Not在这里很有用)。
rm(fm, fm1, lrf, x, dummy)
Clean up again. 再打扫一下。
The next section will look at data from the classical experiment of
Michelson to measure the speed of light. This dataset is available in
the morley
object, but we will read it to illustrate the
read.table
function.
下一节将研究迈克尔逊测量光速的经典实验的数据。该数据集在 morley
对象中可用,但我们将阅读它来说明 read.table
函数。
filepath <- system.file("data", "morley.tab" , package="datasets")
filepath
Get the path to the data file.
获取数据文件的路径。
file.show(filepath)
Optional. Look at the file.
可选的.看看档案
mm <- read.table(filepath)
mm
Read in the Michelson data as a data frame, and look at it.
There are five experiments (column Expt
) and each has 20 runs
(column Run
) and sl
is the recorded speed of light,
suitably coded.
将迈克尔逊数据作为一个数据框读入,并查看它,其中有5个实验(第0列),每个实验有20次运行(第1列),第2列是记录的光速,适当编码。
mm$Expt <- factor(mm$Expt)
mm$Run <- factor(mm$Run)
Change Expt
and Run
into factors.
将 Expt
和 Run
改为因子。
attach(mm)
Make the data frame visible at position 2 (the default).
使数据框在位置2可见(默认)。
plot(Expt, Speed, main="Speed of Light Data", xlab="Experiment No.")
Compare the five experiments with simple boxplots.
用简单的箱线图比较这五个实验。
fm <- aov(Speed ~ Run + Expt, data=mm)
summary(fm)
Analyze as a randomized block, with ‘runs’ and ‘experiments’ as factors.
作为随机区组进行分析,以“运行”和“实验”作为因子。
fm0 <- update(fm, . ~ . - Run)
anova(fm0, fm)
Fit the sub-model omitting ‘runs’, and compare using a formal analysis
of variance.
拟合子模型,省略“运行”,并使用正式的方差分析进行比较。
detach()
rm(fm, fm0)
Clean up before moving on.
在离开前清理干净。
We now look at some more graphical features: contour and image plots.
现在我们来看看一些更多的图形特征:轮廓图和图像图。
x <- seq(-pi, pi, len=50)
y <- x
x is a vector of 50 equally spaced values in
the interval [-pi\, pi].
y is the same.
x是在区间[-pi\,pi]中的50个等距值的向量。Y是一样的。
f <- outer(x, y, function(x, y) cos(y)/(1 + x^2))
f is a square matrix, with rows and columns indexed by x
and y respectively, of values of the function
cos(y)/(1 + x^2).
f是一个正方形矩阵,行和列分别由x和y索引,函数cos(y)/(1 + x^2)的值。
oldpar <- par(no.readonly = TRUE)
par(pty="s")
Save the plotting parameters and set the plotting region to “square”.
保存绘图参数并将绘图区域设置为“square”。
contour(x, y, f)
contour(x, y, f, nlevels=15, add=TRUE)
Make a contour map of f; add in more lines for more detail.
绘制f的等值线图;添加更多线条以获得更多细节。
fa <- (f-t(f))/2
fa
is the “asymmetric part” of f. (t()
is
transpose).
fa
是f的“不对称部分”。( t()
是转置)。
contour(x, y, fa, nlevels=15)
Make a contour plot, …
绘制等高线图,...
par(oldpar)
… and restore the old graphics parameters.
.并恢复旧的图形参数。
image(x, y, f)
image(x, y, fa)
Make some high density image plots, (of which you can get
hardcopies if you wish), …
制作一些高密度的图像图(如果你愿意,你可以得到硬拷贝),.
objects(); rm(x, y, f, fa)
… and clean up before moving on.
......在离开之前清理干净。
R can do complex arithmetic, also.
R也可以做复杂的算术。
th <- seq(-pi, pi, len=100)
z <- exp(1i*th)
1i
is used for the complex number i.
1i
用于复数i。
par(pty="s")
plot(z, type="l")
Plotting complex arguments means plot imaginary versus real parts. This
should be a circle.
绘制复杂的参数意味着绘制假想的部分与真实的部分。这应该是一个圆圈。
w <- rnorm(100) + rnorm(100)*1i
Suppose we want to sample points within the unit circle. One method
would be to take complex numbers with standard normal real and imaginary
parts …
假设我们想在单位圆内采样点。一种方法是把复数与标准正常的真实的和虚部...
w <- ifelse(Mod(w) > 1, 1/w, w)
… and to map any outside the circle onto their reciprocal.
......并将任何圈外的元素映射到它们的倒数上。
plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+",xlab="x", ylab="y")
lines(z)
All points are inside the unit circle, but the distribution is not
uniform.
所有点都在单位圆内,但分布不均匀。
w <- sqrt(runif(100))*exp(2*pi*runif(100)*1i)
plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+", xlab="x", ylab="y")
lines(z)
The second method uses the uniform distribution. The points should now
look more evenly spaced over the disc.
第二种方法使用均匀分布。现在,这些点在光盘上的间距看起来应该更均匀。
rm(th, w, z)
Clean up again. 再打扫一下。
q()
Quit the R program. You will be asked if you want to save the R
workspace, and for an exploratory session like this, you probably do not
want to save it.
退出R程序。系统会询问您是否要保存R工作区,对于这样的探索性会话,您可能不想保存它。
Users of R on Windows or macOS should read the OS-specific section
first, but command-line use is also supported.
Windows或macOS上的R用户应首先阅读操作系统特定部分,但也支持命令行使用。
When working at a command line on UNIX or Windows, the command ‘R’
can be used both for starting the main R program in the form
当在UNIX或Windows上使用命令行时,命令' R '可以用于启动主R程序,格式如下:
R
[options] [<
infile] [>
outfile],
or, via the R CMD
interface, as a wrapper to various R tools
(e.g., for processing files in R documentation format or manipulating
add-on packages) which are not intended to be called “directly”.
或者,通过 R CMD
接口,作为各种R工具的包装器(例如,用于处理R文档格式的文件或操作附加软件包),这些软件包不打算“直接”调用。
At the Windows command-line, Rterm.exe
is preferred to
R
.
在Windows命令行中, Rterm.exe
比 R
更受欢迎。
You need to ensure that either the environment variable TMPDIR
is
unset or it points to a valid place to create temporary files and
directories.
您需要确保环境变量 TMPDIR
未被设置,或者它指向一个有效的位置来创建临时文件和目录。
Most options control what happens at the beginning and at the end of an
R session. The startup mechanism is as follows (see also the on-line
help for topic ‘Startup’ for more information, and the section below
for some Windows-specific details).
大多数选项控制R会话开始和结束时发生的事情。启动机制如下所示(有关详细信息,请参阅主题“ Startup ”的联机帮助,有关某些特定于Windows的详细信息,请参阅下面的部分)。
R_ENVIRON
; if this is unset, R_HOME/etc/Renviron.site
is used (if it exists). The user file is the one pointed to by the
environment variable R_ENVIRON_USER
if this is set; otherwise,
files .Renviron in the current or in the user’s home directory
(in that order) are searched for. These files should contain lines of
the form ‘name=value’. (See help("Startup")
for
a precise description.) Variables you might want to set include
R_PAPERSIZE
(the default paper size), R_PRINTCMD
(the
default print command) and R_LIBS
(specifies the list of R
library trees searched for add-on packages).
R_ENVIRON
指向的名称;如果未设置,则使用 R_HOME/etc/Renviron.site (如果存在)。如果设置了环境变量 R_ENVIRON_USER
,则用户文件是环境变量 R_ENVIRON_USER
指向的文件;否则,将搜索当前目录或用户主目录中的文件 .Renviron (按此顺序)。这些文件应包含格式为“ name=value ”的行。(See help("Startup")
为一个精确的描述。您可能想要设置的变量包括 R_PAPERSIZE
(默认纸张大小), R_PRINTCMD
(默认打印命令)和 R_LIBS
(指定搜索附加软件包的R库树列表)。R_PROFILE
environment variable. If
that variable is unset, the default
R_HOME/etc/Rprofile.site is used if this exists.
R_PROFILE
环境变量的值。如果该变量未设置,则使用默认值 R_HOME/etc/Rprofile.site (如果存在)。R_PROFILE_USER
; if unset, a file called
.Rprofile in the current directory or in the user’s home
directory (in that order) is searched for.
R_PROFILE_USER
;如果未设置,则会在当前目录或用户的主目录(按此顺序)中搜索名为 .Rprofile 的文件。.First()
exists, it is executed. This
function (as well as .Last()
which is executed at the end of the
R session) can be defined in the appropriate startup profiles, or
reside in .RData.
.First()
存在,则执行该函数。这个函数(以及在R会话结束时执行的 .Last()
)可以在适当的启动配置文件中定义,或者驻留在 .RData 中。In addition, there are options for controlling the memory available to
the R process (see the on-line help for topic ‘Memory’ for more
information). Users will not normally need to use these unless they
are trying to limit the amount of memory used by R.
此外,还有控制R进程可用内存的选项(有关更多信息,请参阅主题' Memory '的在线帮助)。用户通常不需要使用这些,除非他们试图限制R使用的内存量。
R accepts the following command-line options.
R接受以下命令行选项。
Print short help message to standard output and exit successfully.
将简短的帮助消息打印到标准输出并成功退出。
Print version information to standard output and exit successfully.
将版本信息打印到标准输出并成功退出。
Specify the encoding to be assumed for input from the console or
stdin
. This needs to be an encoding known to iconv
: see
its help page. (--encoding enc
is also accepted.) The
input is re-encoded to the locale R is running in and needs to be
representable in the latter’s encoding (so e.g. you cannot re-encode
Greek text in a French locale unless that locale uses the UTF-8
encoding).
指定从控制台或 stdin
输入时采用的编码。这需要是一个已知的编码 iconv
:请参阅其帮助页面。( --encoding enc
也被接受。输入被重新编码到R运行的语言环境中,并且需要在后者的编码中可表示(例如,您不能在法语语言环境中重新编码希腊语文本,除非该语言环境使用UTF-8编码)。
Print the path to the R “home directory” to standard output and
exit successfully. Apart from the front-end shell script and the man
page, R installation puts everything (executables, packages, etc.)
into this directory.
将R“home directory”的路径打印到标准输出并成功退出。除了前端shell脚本和手册页之外,R安装将所有内容(可执行文件,包等)到这个目录。
Control whether data sets should be saved or not at the end of the R
session. If neither is given in an interactive session, the user is
asked for the desired behavior when ending the session with q();
in non-interactive use one of these must be specified or implied by some
other option (see below).
控制是否应该在R会话结束时保存数据集。如果在交互式会话中两者都没有给出,则在非交互式使用中,当使用 q() 结束会话时,用户会被要求提供所需的行为,其中一个必须由其他选项指定或暗示(见下文)。
Do not read any user file to set environment variables.
不要读取任何用户文件来设置环境变量。
Do not read the site-wide profile at startup.
不要在启动时读取站点范围的配置文件。
Do not read the user’s profile at startup.
不要在启动时读取用户的配置文件。
Control whether saved images (file .RData in the directory where
R was started) should be restored at startup or not. The default is
to restore. (--no-restore implies all the specific
--no-restore-* options.)
控制是否应在启动时恢复已保存的图像(R启动目录中的文件 .RData )。默认值是恢复。( --no-restore 意味着所有特定的 --no-restore-* 选项。
Control whether the history file (normally file .Rhistory in the
directory where R was started, but can be set by the environment
variable R_HISTFILE
) should be restored at startup or not. The
default is to restore.
控制是否在启动时恢复历史文件(通常是R启动目录中的文件 .Rhistory ,但可以通过环境变量 R_HISTFILE
设置)。默认值是恢复。
(Windows only) Prevent loading the Rconsole file at startup.
(仅限Windows)防止在启动时加载 Rconsole 文件。
Combine --no-save, --no-environ,
--no-site-file, --no-init-file and
--no-restore. Under Windows, this also includes
--no-Rconsole.
联合收割机 --no-save 、 --no-environ 、 --no-site-file 、 --no-init-file 、 --no-restore 。在Windows下,这也包括 --no-Rconsole 。
(not Rgui.exe
) Take input from file: ‘-’ means
stdin
. Implies --no-save unless --save has
been set. On a Unix-alike, shell metacharacters should be avoided in
file (but spaces are allowed).
(not Rgui.exe
)从文件中获取输入:' - '表示 stdin
。表示 --no-save ,除非已设置 --save 。在类Unix系统上,shell元字符应该避免出现在文件中(但允许使用空格)。
(not Rgui.exe
) Use expression as an input line. One or
more -e options can be used, but not together with -f
or --file. Implies --no-save unless --save
has been set. (There is a limit of 10,000 bytes on the total length of
expressions used in this way. Expressions containing spaces or shell
metacharacters will need to be quoted.)
(not Rgui.exe
)使用表达式作为输入行。可以使用一个或多个 -e 选项,但不能与 -f 或 --file 一起使用。表示 --no-save ,除非已设置 --save 。(以这种方式使用的表达式的总长度限制为10,000字节。包含空格或shell元字符的表达式需要加引号。)
(UNIX only) Turn off command-line editing via readline. This
is useful when running R from within Emacs using the ESS
(“Emacs Speaks Statistics”) package. See The command-line editor,
for more information. Command-line editing is enabled for default
interactive use (see --interactive). This option also affects
tilde-expansion: see the help for path.expand
.
(UNIX仅限)通过readline关闭命令行编辑。这在使用ESS(“Emacs Speaks Statistics”)包从Emacs中运行R时很有用。有关更多信息,请参见命令行编辑器。默认交互式使用启用命令行编辑(请参阅 --interactive )。此选项也会影响波浪号扩展:请参阅 path.expand
的帮助。
For expert use only: set the initial trigger sizes for garbage
collection of vector heap (in bytes) and cons cells (number)
respectively. Suffix ‘M’ specifies megabytes or millions of cells
respectively. The defaults are 6Mb and 350k respectively and can also
be set by environment variables R_NSIZE
and R_VSIZE
.
仅供专家使用:分别设置向量堆(字节)和cons单元(数量)垃圾收集的初始触发器大小。后缀' M '分别指定兆字节或百万个单元格。默认值分别为6Mb和350k,也可以通过环境变量 R_NSIZE
和 R_VSIZE
进行设置。
Specify the maximum size of the pointer protection stack as N
locations. This defaults to 10000, but can be increased to allow
large and complicated calculations to be done. Currently the maximum
value accepted is 100000.
将指针保护堆栈的最大大小指定为N个位置。默认值为10000,但可以增加以允许进行大型和复杂的计算。目前接受的最大值是100000。
Do not print out the initial copyright and welcome messages.
不要打印出最初的版权和欢迎信息。
Make R run as quietly as possible. This option is intended to
support programs which use R to compute results for them. It implies
--quiet and --no-save.
让R尽可能安静地运行。此选项旨在支持使用R计算结果的程序。 --quiet 和 --no-save 。
(UNIX only) Assert that R really is being run interactively even if
input has been redirected: use if input is from a FIFO or pipe and fed
from an interactive program. (The default is to deduce that R is
being run interactively if and only if stdin is connected to a
terminal or pty
.) Using -e, -f or
--file asserts non-interactive use even if
--interactive is given.
(UNIX only)断言R实际上正在交互式运行,即使输入已被重定向:如果输入来自FIFO或管道,并从交互式程序馈送,则使用。(The默认情况下,当且仅当 stdin 连接到终端或 pty
时,推断R正在交互式运行。使用 -e 、 -f 或 --file 断言非交互式使用,即使给出了 --interactive 。
Note that this does not turn on command-line editing.
请注意,这不会打开命令行编辑。
(Windows only) Set Rterm
up for use by R-inferior-mode
in
ESS, including asserting interactive use (without the
command-line editor) and no buffering of stdout.
(仅限Windows)设置 Rterm
供ESS中的 R-inferior-mode
使用,包括断言交互式使用(不使用命令行编辑器)和不缓冲 stdout 。
Print more information about progress, and in particular set R’s
option verbose
to TRUE
. R code uses this option to
control the printing of diagnostic messages.
打印更多关于进度的信息,特别是将R的选项 verbose
设置为 TRUE
。R代码使用此选项控制诊断消息的打印。
(UNIX only) Run R through debugger name. For most debuggers
(the exceptions are valgrind
and recent versions of
gdb
), further command line options are disregarded, and should
instead be given when starting the R executable from inside the
debugger.
(UNIX仅限)通过调试器名称运行R。对于大多数调试器(例外是 valgrind
和最新版本的 gdb
),进一步的命令行选项被忽略,而应该在从调试器内部启动R可执行文件时提供。
(UNIX only) Use type as graphical user interface (note that this
also includes interactive graphics). Currently, possible values for
type are ‘X11’ (the default) and, provided that ‘Tcl/Tk’
support is available, ‘Tk’. (For back-compatibility, ‘x11’ and
‘tk’ are accepted.)
(UNIX仅限)使用文字作为图形用户界面(注意,这也包括交互式图形)。当前,type的可能值为' X11 '(默认值),如果' Tcl/Tk '支持可用,则为' Tk '。(For向后兼容性,接受' x11 '和' tk '。
(UNIX only) Run the specified sub-architecture.
(UNIX仅限)运行指定的子体系结构。
This flag does nothing except cause the rest of the command line to be
skipped: this can be useful to retrieve values from it with
commandArgs(TRUE)
.
这个标志除了导致命令行的其余部分被跳过之外什么也不做:这对于使用 commandArgs(TRUE)
从它检索值很有用。
Note that input and output can be redirected in the usual way (using
‘<’ and ‘>’), but the line length limit of 4095 bytes still
applies. Warning and error messages are sent to the error channel
(stderr
).
请注意,输入和输出可以以通常的方式重定向(使用' < '和' > '),但4095字节的行长度限制仍然适用。警告和错误消息被发送到错误通道( stderr
)。
The command R CMD
allows the invocation of various tools which
are useful in conjunction with R, but not intended to be called
“directly”. The general form is
命令 R CMD
允许调用与R结合使用的各种工具,但不打算“直接”调用。一般形式是
R CMD command args
where command is the name of the tool and args the arguments
passed on to it.
其中command是工具的名称,args是传递给它的参数。
Currently, the following tools are available.
目前,有以下工具可用。
BATCH
Run R in batch mode. Runs R --restore --save
with possibly
further options (see ?BATCH
).
在批处理模式下运行R。选项 R --restore --save
,可能有更多选项(参见 ?BATCH
)。
COMPILE
(UNIX only) Compile C, C++, Fortran … files for use with R.
(UNIX仅限)编译C、C++、Fortran.文件以用于R。
SHLIB
Build shared library for dynamic loading.
为动态加载构建共享库。
INSTALL
Install add-on packages. 安装附加组件包。
REMOVE
Remove add-on packages. 删除附加组件包。
build
Build (that is, package) add-on packages.
构建(即打包)附加组件包。
check
Check add-on packages. 检查附加软件包。
LINK
(UNIX only) Front-end for creating executable programs.
(UNIX仅限)用于创建可执行程序的前端。
Rprof
Post-process R profiling files.
后处理R分析文件。
Rdconv
Rd2txt
Convert Rd format to various other formats, including HTML, LaTeX,
plain text, and extracting the examples. Rd2txt
can be used as
shorthand for Rd2conv -t txt
.
将Rd格式转换为各种其他格式,包括HTML,LaTeX,纯文本,并提取示例。 Rd2txt
可以用作 Rd2conv -t txt
的简写。
Rd2pdf
Convert Rd format to PDF.
将Rd格式转换为PDF。
Stangle
Extract S/R code from Sweave or other vignette documentation
从Swave或其他小插曲文档中提取S/R代码
Sweave
Process Sweave or other vignette documentation
处理Swave或其他小插曲文件
Rdiff
Diff R output ignoring headers etc
Diff R输出忽略标头等
config
Obtain configuration information
获得配置信息
javareconf
(Unix only) Update the Java configuration variables
(Unix仅限)更新Java配置变量
rtags
(Unix only) Create Emacs-style tag files from C, R, and Rd files
(Unix仅限)从C、R和Rd文件创建Emacs样式的标记文件
open
(Windows only) Open a file via Windows’ file associations
(仅限Windows)通过Windows的文件关联打开文件
texify
(Windows only) Process (La)TeX files with R’s style files
(仅限Windows)使用R的样式文件处理(La)TeX文件
Use 使用
R CMD command --help
to obtain usage information for each of the tools accessible via the
R CMD
interface.
以获得可经由 R CMD
接口访问的每个工具的使用信息。
In addition, you can use options --arch=,
--no-environ, --no-init-file, --no-site-file
and --vanilla between R
and CMD
: these
affect any R processes run by the tools. (Here --vanilla is
equivalent to --no-environ --no-site-file --no-init-file.)
However, note that R CMD
does not of itself use any R
startup files (in particular, neither user nor site Renviron
files), and all of the R processes run by these tools (except
BATCH
) use --no-restore. Most use --vanilla
and so invoke no R startup files: the current exceptions are
INSTALL
, REMOVE
, Sweave
and
SHLIB
(which uses --no-site-file --no-init-file).
此外,您可以在 R
和 CMD
之间使用选项 --arch= 、 --no-environ 、 --no-init-file 、 --no-site-file 和 --vanilla :这些选项会影响工具运行的任何R进程。(Here --vanilla 相当于 --no-environ --no-site-file --no-init-file 。然而,请注意 R CMD
本身并不使用任何R启动文件(特别是,无论是用户还是站点 Renviron 文件),并且这些工具运行的所有R进程(除了 BATCH
)都使用 --no-restore 。大多数使用 --vanilla ,因此不调用R启动文件:当前的例外是 INSTALL
, REMOVE
, Sweave
和 SHLIB
(使用 --no-site-file --no-init-file )。
R CMD cmd args
for any other executable cmd
on the path or given by an
absolute filepath: this is useful to have the same environment as R
or the specific commands run under, for example to run ldd
or
pdflatex
. Under Windows cmd can be an executable or a
batch file, or if it has extension .sh
or .pl
the
appropriate interpreter (if available) is called to run it.
对于路径上或由绝对文件路径给出的任何其他可执行文件 cmd
:这对于具有与R相同的环境或特定命令运行环境非常有用,例如运行 ldd
或 pdflatex
。在Windows下,cmd可以是可执行文件或批处理文件,或者如果它具有扩展名 .sh
或 .pl
,则调用适当的解释器(如果可用)来运行它。
There are two ways to run R under Windows. Within a terminal window
(e.g. cmd.exe
or a more capable shell), the methods described in
the previous section may be used, invoking by R.exe
or more
directly by Rterm.exe
. For interactive use, there is a
console-based GUI (Rgui.exe
).
在Windows下运行R有两种方法。在一个终端窗口(例如 cmd.exe
或一个更强大的shell)中,可以使用上一节中描述的方法,由 R.exe
或更直接地由 Rterm.exe
调用。对于交互式使用,有一个基于控制台的GUI( Rgui.exe
)。
The startup procedure under Windows is very similar to that under
UNIX, but references to the ‘home directory’ need to be clarified, as
this is not always defined on Windows. If the environment variable
R_USER
is defined, that gives the home directory. Next, if the
environment variable HOME
is defined, that gives the home
directory. After those two user-controllable settings, R tries to
find system defined home directories. It first tries to use the
Windows "personal" directory (typically My Documents
in
recent versions of Windows). If that fails, and
environment variables HOMEDRIVE
and HOMEPATH
are defined
(and they normally are) these define the home directory. Failing all
those, the home directory is taken to be the starting directory.
Windows下的启动过程与UNIX下的启动过程非常相似,但需要澄清对“主目录”的引用,因为Windows上并不总是定义此目录。如果定义了环境变量 R_USER
,则会给出主目录。接下来,如果定义了环境变量 HOME
,则会给出主目录。在这两个用户可控制的设置之后,R试图找到系统定义的主目录。它首先尝试使用Windows“个人”目录(在最近版本的Windows中通常为 My Documents
)。如果失败,则定义环境变量 HOMEDRIVE
和 HOMEPATH
(它们通常是),这些定义了主目录。如果所有这些都失败,则将home目录作为起始目录。
You need to ensure that either the environment variables TMPDIR
,
TMP
and TEMP
are either unset or one of them points to a
valid place to create temporary files and directories.
您需要确保环境变量 TMPDIR
、 TMP
和 TEMP
未设置,或者其中一个指向有效的位置以创建临时文件和目录。
Environment variables can be supplied as ‘name=value’
pairs on the command line.
环境变量可以在命令行中作为' name=value '对提供。
If there is an argument ending .RData (in any case) it is
interpreted as the path to the workspace to be restored: it implies
--restore and sets the working directory to the parent of the
named file. (This mechanism is used for drag-and-drop and file
association with RGui.exe
, but also works for Rterm.exe
.
If the named file does not exist it sets the working directory
if the parent directory exists.)
如果有一个以 .RData 结尾的参数(在任何情况下),它都被解释为要恢复的工作区的路径:它意味着 --restore 并将工作目录设置为命名文件的父目录。(This机制用于与 RGui.exe
的拖放和文件关联,但也适用于 Rterm.exe
。如果指定的文件不存在,则在父目录存在的情况下设置工作目录。
The following additional command-line options are available when
invoking RGui.exe
.
调用 RGui.exe
时,可以使用以下附加命令行选项。
Control whether Rgui
will operate as an MDI program
(with multiple child windows within one main window) or an SDI application
(with multiple top-level windows for the console, graphics and pager). The
command-line setting overrides the setting in the user’s Rconsole file.
控制 Rgui
是否将作为MDI程序(在一个主窗口中具有多个子窗口)或SDI应用程序(具有用于控制台、图形和寻呼机的多个顶级窗口)运行。命令行设置将覆盖用户的 Rconsole 文件中的设置。
Enable the “Break to debugger” menu item in Rgui
, and trigger
a break to the debugger during command line processing.
在 Rgui
中启用“Break to debugger”菜单项,并在命令行处理期间触发对调试器的中断。
Under Windows with R CMD
you may also specify your own
.bat, .exe, .sh or .pl file. It will be run
under the appropriate interpreter (Perl for .pl) with several
environment variables set appropriately, including R_HOME
,
R_OSTYPE
, PATH
, BSTINPUTS
and TEXINPUTS
. For
example, if you already have latex.exe on your path, then
在Windows下,您还可以使用 R CMD
指定自己的 .bat 、 .exe 、 .sh 或 .pl 文件。它将在适当的解释器(Perl for .pl )下运行,并适当设置了几个环境变量,包括 R_HOME
, R_OSTYPE
, PATH
, BSTINPUTS
和 TEXINPUTS
。例如,如果您的路径上已经有 latex.exe ,则
R CMD latex.exe mydoc
will run LaTeX on mydoc.tex, with the path to R’s
share/texmf macros appended to TEXINPUTS
. (Unfortunately,
this does not help with the MiKTeX build of LaTeX, but
R CMD texify mydoc
will work in that case.)
将在 mydoc.tex 上运行LaTeX,并将R的 share/texmf 宏的路径附加到 TEXINPUTS
。(不幸的是,这对LaTeX的MiKTeX构建没有帮助,但在这种情况下 R CMD texify mydoc
将起作用。
There are two ways to run R under macOS. Within a Terminal.app
window by invoking R
, the methods described in the first
subsection apply. There is also console-based GUI (R.app
) that by
default is installed in the Applications
folder on your
system. It is a standard double-clickable macOS application.
在macOS下运行R有两种方法。在 Terminal.app
窗口中,通过调用 R
,应用第一小节中描述的方法。还有基于控制台的GUI( R.app
),默认情况下安装在系统上的 Applications
文件夹中。它是一个标准的双击macOS应用程序。
The startup procedure under macOS is very similar to that under UNIX, but
R.app
does not make use of command-line arguments.
macOS下的启动过程与UNIX下的启动过程非常相似,但 R.app
不使用命令行参数。
The ‘home
directory’ is the one inside the R.framework, but the startup and
current working directory are set as the user’s home directory unless a
different startup directory is given in the Preferences window
accessible from within the GUI.
“home directory”是R.framework内部的目录,但是启动和当前工作目录被设置为用户的home目录,除非在GUI中的首选项窗口中提供了不同的启动目录。
If you just want to run a file foo.R of R commands, the
recommended way is to use R CMD BATCH foo.R
. If you want to
run this in the background or as a batch job use OS-specific facilities
to do so: for example in most shells on Unix-alike OSes R CMD
BATCH foo.R &
runs a background job.
如果你只是想运行一个文件 foo.R 的R命令,推荐的方法是使用 R CMD BATCH foo.R
。如果你想在后台运行或者作为一个批处理作业,使用特定于操作系统的工具来实现:例如在大多数Unix类操作系统的shell中 R CMD
BATCH foo.R &
运行一个后台作业。
You can pass parameters to scripts via additional arguments on the
command line: for example (where the exact quoting needed will depend on
the shell in use)
您可以通过命令行上的附加参数将参数传递给脚本:例如(其中所需的确切引用将取决于所使用的shell)
R CMD BATCH "--args arg1 arg2" foo.R &
will pass arguments to a script which can be retrieved as a character
vector by
将把参数传递给一个脚本,该脚本可以作为字符向量被检索,
args <- commandArgs(TRUE)
This is made simpler by the alternative front-end Rscript
,
which can be invoked by
这通过替代前端 Rscript
变得更简单,它可以由
Rscript foo.R arg1 arg2
and this can also be used to write executable script files like (at
least on Unix-alikes, and in some Windows shells)
这也可以用来编写可执行的脚本文件,如(至少在Unix类和一些Windows shell中)
#! /path/to/Rscript args <- commandArgs(TRUE) ... q(status=<exit status code>)
If this is entered into a text file runfoo and this is made
executable (by chmod 755 runfoo
), it can be invoked for
different arguments by
如果这被输入到一个文本文件 runfoo 中,并且这是可执行的(通过 chmod 755 runfoo
),它可以被调用为不同的参数,通过
runfoo arg1 arg2
For further options see help("Rscript")
. This writes R
output to stdout and stderr, and this can be redirected in
the usual way for the shell running the command.
更多选项请参见 help("Rscript")
。这将R输出写入 stdout 和 stderr ,并且可以以运行命令的shell的常用方式重定向。
If you do not wish to hardcode the path to Rscript
but have it
in your path (which is normally the case for an installed R except on
Windows, but e.g. macOS users may need to add /usr/local/bin
to their path), use
如果您不希望将路径硬编码为 Rscript
,而是将其放在您的路径中(通常情况下,除了在Windows上安装R,但例如macOS用户可能需要将 /usr/local/bin 添加到其路径中),请使用
#! /usr/bin/env Rscript ...
At least in Bourne and bash shells, the #!
mechanism does
not allow extra arguments like
#! /usr/bin/env Rscript --vanilla
.
至少在Bourne和bash shell中, #!
机制不允许像 #! /usr/bin/env Rscript --vanilla
这样的额外参数。
One thing to consider is what stdin()
refers to. It is
commonplace to write R scripts with segments like
有一件事要考虑的是 stdin()
指的是什么。编写带有如下段的R脚本是很常见的
chem <- scan(n=24) 2.90 3.10 3.40 3.40 3.70 3.70 2.80 2.50 2.40 2.40 2.70 2.20 5.28 3.37 3.03 3.03 28.95 3.77 3.40 2.20 3.50 3.60 3.70 3.70
and stdin()
refers to the script file to allow such traditional
usage. If you want to refer to the process’s stdin, use
"stdin"
as a file
connection, e.g. scan("stdin", ...)
.
而 stdin()
引用脚本文件以允许这种传统用法。如果你想引用进程的 stdin ,使用 "stdin"
作为 file
连接,例如 scan("stdin", ...)
。
Another way to write executable script files (suggested by
François Pinard) is to use a here document like
编写可执行脚本文件的另一种方法(由FrançoisPinard建议)是使用一个here文档,如
#!/bin/sh [environment variables can be set here] R --no-echo [other options] <<EOF R program goes here... EOF
but here stdin()
refers to the program source and
"stdin"
will not be usable.
但是这里 stdin()
指的是程序源,而 "stdin"
将不可用。
Short scripts can be passed to Rscript
on the command-line
via the -e flag. (Empty scripts are not accepted.)
短脚本可以通过 -e 标志传递给命令行上的 Rscript
。(不接受空脚本。)
Note that on a Unix-alike the input filename (such as foo.R)
should not contain spaces nor shell metacharacters.
注意,在类Unix系统上,输入文件名(如 foo.R )不应包含空格或shell元字符。
When the GNU readline library is available at the
time R is configured for compilation under UNIX, an inbuilt command
line editor allowing recall, editing and re-submission of prior commands
is used. Note that other versions of readline exist and may be
used by the inbuilt command line editor: this is most common on macOS.
You can find out which version (if any) is available by running
extSoftVersion()
in an R session.
当GNU readline库在R配置为在UNIX下编译时可用时,使用内置的命令行编辑器,允许调用,编辑和重新提交先前的命令。请注意,存在其他版本的readline,并且可能由内置命令行编辑器使用:这在macOS上最常见。您可以通过在R会话中运行 extSoftVersion()
来找出可用的版本(如果有的话)。
It can be disabled (useful for usage with ESS 25) using the startup option
--no-readline.
可以使用启动选项 --no-readline 禁用它(对于ESS 25 使用很有用)。
Windows versions of R have somewhat simpler command-line editing: see
‘Console’ under the ‘Help’ menu of the GUI, and the
file README.Rterm for command-line editing under
Rterm.exe
.
Windows版本的R有更简单的命令行编辑:参见GUI的“ Help ”菜单下的“ Console ”,以及用于命令行编辑的文件 Rterm.exe
。
When using R with GNU26 readline
capabilities, the functions described below are available, as well as
others (probably) documented in man readline
or info
readline
on your system.
当使用带有GNU 26 readline功能的R时,可以使用下面描述的函数,以及在您的系统上的 man readline
或 info
readline
中(可能)记录的其他函数。
Many of these use either Control or Meta characters. Control
characters, such as Control-m, are obtained by holding the
CTRL down while you press the m key, and are written as
C-m below. Meta characters, such as Meta-b, are typed by
holding down META27 and pressing b, and written as M-b
in the following. If your terminal does not have a META key
enabled, you can still type Meta characters using two-character
sequences starting with ESC. Thus, to enter M-b, you could
type ESCb. The ESC character sequences are also
allowed on terminals with real Meta keys. Note that case is significant
for Meta characters.
其中许多使用Control或Meta字符。控制字符,如 Control-m ,是通过按住 CTRL 同时按下 m 键获得的,并在下面写为 C-m 。Meta字符(如 Meta-b )通过按住 META 和 27 并按下 b 来键入,并在下文中写作 M-b 。如果您的终端没有启用 META 键,您仍然可以使用从 ESC 开始的两个字符序列键入Meta字符。因此,要输入 M-b ,您可以键入 ESC b 。在具有真实的Meta键的终端上也允许使用 ESC 字符序列。请注意,大小写对于Meta字符很重要。
Some but not all versions28 of readline
will recognize resizing of the terminal window so this is best avoided.
一些但不是所有版本的readline 28 会识别终端窗口的错误,因此最好避免这种情况。
The R program keeps a history of the command lines you type,
including the erroneous lines, and commands in your history may be
recalled, changed if necessary, and re-submitted as new commands.
R程序保留了您键入的命令行的历史记录,包括错误的行,并且历史记录中的命令可能会被调用,必要时进行更改,并作为新命令重新提交。
In
Emacs-style command-line editing any straight typing you do while in
this editing phase causes the characters to be inserted in the command
you are editing, displacing any characters to the right of the cursor.
In vi mode character insertion mode is started by M-i or
M-a, characters are typed and insertion mode is finished by typing
a further ESC. (The default is Emacs-style, and only that is
described here: for vi mode see the readline
documentation.)
在Emacs样式的命令行编辑中,在此编辑阶段执行的任何直接键入操作都会导致在正在编辑的命令中插入字符,从而将任何字符替换到光标右侧。在vi模式下,字符插入模式由 M-i 或 M-a 开始,输入字符,然后再输入 ESC 完成插入模式。(The默认为Emacs-style,这里只描述了这一点:对于vi模式,请参阅readline文档。)
Pressing the RET command at any time causes the command to be
re-submitted.
任何时候按下 RET 命令都会导致重新提交命令。
Other editing actions are summarized in the following table.
下表汇总了其他编辑操作。
Go to the previous command (backwards in the history).
返回上一个命令(在历史记录中向后)。
Go to the next command (forwards in the history).
转到下一个命令(在历史中向前)。
Find the last command with the text string in it. This can be
cancelled by C-g
(and on some versions of R by C-c
).
找到最后一个包含文本字符串的命令。这可以通过 C-g
取消(在某些版本的R中可以通过 C-c
取消)。
On most terminals, you can also use the up and down arrow keys instead
of C-p and C-n, respectively.
在大多数终端上,您也可以使用向上和向下箭头键分别代替 C-p 和 C-n 。
Go to the beginning of the command.
转到命令的开头。
Go to the end of the line.
到队伍的尽头去。
Go back one word. 后退一个字。
Go forward one word. 向前走一个字。
Go back one character. 返回一个字符。
Go forward one character.
前进一个字符。
On most terminals, you can also use the left and right arrow keys
instead of C-b and C-f, respectively.
在大多数终端上,您也可以使用左箭头键和右箭头键,而不是分别使用 C-b 和 C-f 。
Insert text at the cursor.
在光标处插入文本。
Append text after the cursor.
在光标后追加文本。
Delete the previous character (left of the cursor).
删除前一个字符(光标左侧)。
Delete the character under the cursor.
删除光标下的字符。
Delete the rest of the word under the cursor, and “save” it.
删除光标下的其余单词,并保存它。
Delete from cursor to end of command, and “save” it.
从光标到命令末尾删除,然后保存它。
Insert (yank) the last “saved” text here.
在此处插入(猛拉)最后一个“保存”文本。
Transpose the character under the cursor with the next.
将光标下的字符与下一个字符调换位置。
Change the rest of the word to lower case.
把这个词的其余部分改为小写。
Change the rest of the word to upper case.
把这个词的其余部分改为大写。
Re-submit the command to R.
将命令重新提交给R。
The final RET terminates the command line editing sequence.
最后的 RET 终止命令行编辑序列。
The readline key bindings can be customized in the usual way
via a ~/.inputrc file. These customizations can be
conditioned on application R
, that is by including a section like
readline键绑定可以通过一个 ~/.inputrc 文件以通常的方式进行定制。这些定制可以根据应用程序 R
进行调整,也就是说,通过包含一个类似
$if R "\C-xd": "q('no')\n" $endif
Jump to: 跳转到: | -
:
!
?
.
*
/
&
%
^
+
<
=
>
|
~ -:!?.*/&%^+ <=>|~
A B C D E F G H I J K L M N O P Q R S T U V W X |
---|
Jump to: 跳转到: | -
:
!
?
.
*
/
&
%
^
+
<
=
>
|
~ -:!?.*/&%^+ <=>|~
A B C D E F G H I J K L M N O P Q R S T U V W X |
---|
Jump to: 跳转到: | A B C D E F G I K L M N O P Q R S T U V W ABCDEFGIKLMNOPQRSTUVW(英文) |
---|
Jump to: 跳转到: | A B C D E F G I K L M N O P Q R S T U V W ABCDEFGIKLMNOPQRSTUVW(英文) |
---|
D. M. Bates and D. G. Watts (1988),
Nonlinear Regression Analysis and Its Applications.
John Wiley & Sons, New York.
D. M. Bates和D. G. Watts(1988),非线性回归分析及其应用。约翰威利父子公司,纽约。
Richard A. Becker, John M. Chambers and Allan R. Wilks (1988),
The New S Language.
Chapman & Hall, New York.
This book is often called the “Blue Book”.
Richard A.作者:John M. Chambers和Allan R. Wilks(1988),The New S Language。查普曼和霍尔,纽约。这本书通常被称为“蓝皮书”。
John M. Chambers and Trevor J. Hastie eds. (1992),
Statistical Models in S.
Chapman & Hall, New York.
This is also called the “White Book”.
John M. Chambers和特雷弗J. Hastie编辑。(1992),Statistical Models in S.查普曼和霍尔,纽约。这也被称为“白色书”。
John M. Chambers (1998)
Programming with Data.
Springer, New York.
This is also called the “Green Book”.
John M. Chambers(1998)Programming with Data.斯普林格,纽约。这也被称为“绿色书”。
A. C. Davison and D. V. Hinkley (1997),
Bootstrap Methods and Their Applications,
Cambridge University Press.
A. C. Davison和D.陈文辉(1997),自举方法及其应用,剑桥大学出版社.
Annette J. Dobson (1990),
An Introduction to Generalized Linear Models,
Chapman and Hall, London.
Annette J.多布森(1990),An Introduction to Generalized Linear Models,Chapman and Hall,伦敦.
Peter McCullagh and John A. Nelder (1989),
Generalized Linear Models.
Second edition, Chapman and Hall, London.
Peter McCullagh和John A. Nelder(1989),Generalized Linear Models。第二版,查普曼和霍尔,伦敦。
John A. Rice (1995),
Mathematical Statistics and Data Analysis.
Second edition. Duxbury Press, Belmont, CA.
John A. Rice(1995),Mathematical Statistics and Data Analysis.第二版。Duxbury Press,贝尔蒙特,CA.
S. D. Silvey (1970),
Statistical Inference.
Penguin, London.
S. D. Silvey(1970),Statistical Inference.企鹅,伦敦。
ACM Software Systems award, 1998:
https://awards.acm.org/award_winners/chambers_6640862.cfm.
ACM软件系统奖,1998年:https://awards.acm.org/award_winners/chambers_6640862.cfm。
For portable R code (including that to
be used in R packages) only A–Z, a–z, and 0–9 should be used.
对于可移植的R代码(包括在R包中使用的代码),只能使用A-Z,a-z和0-9。
not inside strings,
nor within the argument list of a function definition
不在字符串内部,也不在函数定义的参数列表中
some of the
consoles will not allow you to enter more, and amongst those which do
some will silently discard the excess and some will use it as the start
of the next line.
一些控制台将不允许您输入更多,并且在那些允许输入更多的控制台中,一些控制台将默默地丢弃多余的控制台,并且一些控制台将使用它作为下一行的开始。
of unlimited length. 无限的长度。
The leading “dot” in
this file name makes it invisible in normal file listings in
UNIX, and in default GUI file listings on macOS and Windows.
此文件名中的前导“点”使其在UNIX中的正常文件列表以及macOS和Windows上的默认GUI文件列表中不可见。
With other than vector types of argument,
such as list
mode arguments, the action of c()
is rather
different. See Concatenating lists.
对于其他非向量类型的参数,例如 list
模式参数, c()
的作用是相当不同的。请参见串联列表。
Actually, it is still available as
.Last.value
before any other statements are executed.
实际上,在执行任何其他语句之前,它仍然可以作为 .Last.value
使用。
paste(..., collapse=ss)
joins the
arguments into a single character string putting ss in between, e.g.,
ss <- "|"
. There are more tools for character manipulation, see the help
for sub
and substring
.
paste(..., collapse=ss)
将参数连接到单个字符串中,并在其间放置ss,例如,一号。有更多的工具用于字符操作,请参阅 sub
和 substring
的帮助。
numeric mode is
actually an amalgam of two distinct modes, namely integer and
double precision, as explained in the manual.
数字模式实际上是两种不同模式的混合,即整数和双精度,如手册中所述。
Note however that length(object)
does not always
contain intrinsic useful information, e.g., when object
is a
function.
然而,请注意, length(object)
并不总是包含固有的有用信息,例如, object
是一个函数。
In general, coercion
from numeric to character and back again will not be exactly reversible,
because of roundoff errors in the character representation.
一般来说,从数字到字符再返回的强制转换是不完全可逆的,因为字符表示中存在舍入错误。
A different style using
‘formal’ or ‘S4’ classes is provided in package methods
.
包 methods
中提供了使用“formal”或“S4”类的不同样式。
Readers should note
that there are eight states and territories in Australia, namely the
Australian Capital Territory, New South Wales, the Northern Territory,
Queensland, South Australia, Tasmania, Victoria and Western Australia.
读者应注意,澳洲共有八个州及实封,分别为澳大利亚首都直辖区、新南威尔士州、北领地、昆士兰州、南澳、塔斯马尼亚、维多利亚及西澳。
Note that tapply()
also works in this case
when its second argument is not a factor, e.g.,
‘tapply(incomes, state)
’, and this is true for quite a few
other functions, since arguments are coerced to factors when
necessary (using as.factor()
).
请注意,当第二个参数不是因子时, tapply()
也适用于这种情况,例如,' tapply(incomes, state)
',这对很多其他函数来说也是如此,因为参数在必要时被强制转换为因子(使用 as.factor()
)。
Note that x %*% x
is ambiguous, as
it could mean either x’x or x x’, where x is the
column form. In such cases the smaller matrix seems implicitly to be
the interpretation adopted, so the scalar x’x is in this case the
result. The matrix x x’ may be calculated either by cbind(x)
%*% x
or x %*% rbind(x)
since the result of rbind()
or
cbind()
is always a matrix. However, the best way to compute
x’x or x x’ is crossprod(x)
or x %o% x
respectively.
请注意, x %*% x
是不明确的,因为它可能意味着x'x或x x ',其中x是列形式。在这种情况下,较小的矩阵似乎隐含地是所采用的解释,因此标量x'x在这种情况下是结果。由于 rbind()
或 cbind()
的结果总是矩阵,所以矩阵x x'可以由 cbind(x)
%*% x
或 x %*% rbind(x)
计算。然而,计算x'x或x x'的最佳方法分别是 crossprod(x)
或 x %o% x
。
Even better would be to form a matrix square
root B with A = BB’ and find the squared length
of the solution of By = x , perhaps using the Cholesky or
eigendecomposition of A.
更好的方法是用A = BB'构成一个矩阵平方根B,并求出By = x的解的平方长度,也许可以用A的Cholesky或特征分解。
See the on-line help
for autoload
for the meaning of the second term.
有关第二个术语的含义,请参阅 autoload
的在线帮助。
Under UNIX, the utilities
sed
or awk
can be used.
在UNIX下,可以使用实用程序 sed
或 awk
。
to be
discussed later, or use xyplot
from package lattice.
或使用package lattice中的 xyplot
。
See also the methods described in Statistical models in R
另请参见R中的统计模型中描述的方法
In some sense this
mimics the behavior in S-PLUS since in S-PLUS this operator always
creates or assigns to a global variable.
在某种意义上,这模仿了S-PLUS中的行为,因为在S-PLUS中,该操作符总是创建或分配给全局变量。
So it is hidden under
UNIX.
它隐藏在UNIX下。
Some graphics
parameters such as the size of the current device are for information
only.
某些图形参数(如当前设备的大小)仅供参考。
The
‘Emacs Speaks Statistics’ package; see the URL
https://ESS.R-project.org/
“Emacs Speaks Statistics”包;请https://ESS.R-project.org/
It is possible to build R using an
emulation of GNU readline, such as one based on NetBSD’s
editline (also known as libedit), in which case only a
subset of the capabilities may be provided.
可以使用GNU readline的模拟来构建R,例如基于NetBSD的editline(也称为libedit),在这种情况下,可能只提供一部分功能。
On a PC keyboard this is usually the
Alt key, occasionally the ‘Windows’ key. On a Mac keyboard normally no
meta key is available.
在PC键盘上,这通常是Alt键,偶尔是“Windows”键。在Mac键盘上通常没有可用的Meta键。
In particular, not versions 6.3 or
later: this is worked around as from R 3.4.0.
特别是,不是6.3或更高版本:这是从R 3.4.0开始解决的。