scanf()
2017-06-08
This document is for you if you started to learn programmming in C
. Chances are you follow a course and the method to read some input you were taught is to use the scanf()
function.
如果你刚开始学习编程,并且选择了C
语言,那么这份文档正是为你准备的。很可能你正在跟随一门课程学习,而课程中教授的读取输入的方法是使用scanf()
函数。
scanf()
?scanf()
有什么问题?Nothing. And, chances are, everything for your usecase. This document attempts to make you understand why. So here's the very first rule about scanf()
:
没什么问题。而且,对于你的用例来说,可能一切都很合适。本文试图让你理解 为什么。所以,关于 scanf()
的第一条规则是:
Rule 0: Don't use
scanf()
. (Unless, you know exactly what you do.)
规则 0: 不要使用scanf()
。(除非,你确切知道你在做什么。)
But before presenting some alternatives for common usecases, let's elaborate a bit on the knowing what you do part.
但在介绍一些常见用例的替代方案之前,让我们详细阐述一下知道自己做什么的部分。
Here is a classic example of scanf()
use (and, misuse) in a beginner's program:
这是一个经典的scanf()
使用(以及误用)的初学者程序示例:
1 2 3 4 5 6 7 8 9 |
|
As you probably know, %d
is the conversion for an integer, so this program works as expected:
正如你可能知道的,%d
是整数的转换,所以这个程序按预期工作:
$ ./example1
enter a number: 42
You entered 42.
Or does it? 还是它?
$ ./example1
enter a number: abcdefgh
You entered 38.
Oops. Where does the value 38
come from?
哎呀。38
这个值是从哪里来的?
The answer is: This could be any value, or the program could just crash. A crashing program in just two lines of code is quite easy to create in C
. scanf()
is asked to convert a number, and the input doesn't contain any numbers, so scanf()
converts nothing. As a consequence, the variable a
is never written to and using the value of an uninitialized variable in C
is undefined behavior.
答案是:这可以是任何值,或者程序可能会崩溃。在C
中,用两行代码创建一个崩溃的程序相当容易。scanf()
被要求转换一个数字,但输入中不包含任何数字,因此scanf()
什么也没有转换。因此,变量a
从未被写入,而在C
中使用未初始化变量的值是未定义行为。
Undefined behavior in
C
C
中的未定义行为C is a very low-level language and one consequence of that is the following:
C 是一种非常底层的语言,其后果之一如下:Nothing will ever stop you from doing something completely wrong.
没有什么能阻止你做完全错误的事情。Many languages, especially those for some managed environment like
Java
orC#
actually stop you when you do things that are not allowed, say, access an array element that does not exist.C
doesn't. As long as your program is syntactically correct, the compiler won't complain. If you do something forbidden in your program,C
just calls the behavior of your program undefined. This formally allows anything to happen when running the program. Often, the result will be a crash or just output of "garbage" values, as seen above. But if you're really unlucky, your program will seem to work just fine until it gets some slightly different input, and by that time, you will have a really hard time to spot where exactly your program is undefined. Therefore avoid undefined behavior by all means!.
许多语言,特别是那些用于某些托管环境的语言,如Java
或C#
,实际上会在你做不允许的事情时阻止你,比如访问一个不存在的数组元素。C
不会。只要你的程序语法正确,编译器就不会抱怨。如果你在程序中做了禁止的事情,C
只会将你的程序行为称为未定义。这正式允许在运行程序时发生任何事情。通常,结果将是崩溃或只是输出如上所示的“垃圾”值。但如果你真的很不幸,你的程序在接收到稍微不同的输入之前似乎会正常工作,到那时,你将很难找到程序中未定义的确切位置。因此尽一切可能避免未定义行为!。On a side note, undefined behavior can also cause security holes. This has happened a lot in practice.
顺便说一下,未定义行为也可能导致安全漏洞。这在实践中已经发生了很多次。
Now that we know the program is broken, let's fix it. Because scanf()
returns how many items were converted successfully, the next obvious idea is just to retry the "number input" in case the user entered something else:
既然我们知道程序有问题,让我们来修复它。因为scanf()
返回成功转换的项目数,下一个明显的想法是如果用户输入了其他内容,只需重试“数字输入”:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Let's test: 让我们测试:
$ ./example2
enter a number: abc
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: enter a number: enter a number:
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: enter a number: enter a number:
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: enter a number: enter a number:
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: enter a number: enter a number:
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: enter a number: enter a number:
enter a number: enter a number: enter a number: enter a number: enter a
number: enter a number: enter a number: ^C
stooooop! Ok, we managed to interrupt this madness with Ctrl+C
but why did that happen?
停!好吧,我们设法用Ctrl+C
中断了这种疯狂,但为什么会发生这种情况?
Here's a rule: 这里有一个规则:
Rule 1:
scanf()
is not for reading input, it's for parsing input.
规则 1:scanf()
不是用于读取输入,而是用于解析输入。
The first argument to scanf()
is a format string, describing what scanf()
should parse. The important thing is: scanf()
never reads anything it cannot parse. In our example, we tell scanf()
to parse a number, using the %d
conversion. Sure enough, abc
is not a number, and as a consequence, abc
is not even read. The next call to scanf()
will again find our unread input and, again, can't parse it.scanf()
的第一个参数是一个格式字符串,描述了scanf()
应该解析的内容。重要的是:scanf()
永远不会读取它无法解析的内容。在我们的例子中,我们告诉scanf()
使用%d
转换来解析一个数字。当然,abc
不是一个数字,因此,abc
甚至没有被读取。下一次调用scanf()
时,它将再次找到我们未读的输入,并且仍然无法解析它。
Chances are you find some examples saying "let's just flush the input before the next call to scanf()
":
你可能会发现一些例子说“让我们在下次调用scanf()
之前先清空输入”:
fflush(stdin); // <- never do that!
Forget about this idea immediately, please.
请立即忘记这个想法。
You'd expect this to clear all unread input, and indeed, some systems will do just that. But according to C
, flushing an input stream is undefined behavior, and this should now ring a bell. And yes, there are a lot of systems that won't clear the input when you attempt to flush stdin
.
你可能会期望这会清除所有未读输入,事实上,一些系统确实会这样做。但根据C
,刷新输入流是未定义行为,这应该会引起你的注意。是的,有很多系统在你尝试刷新stdin
时不会清除输入。
So, the only way to clear unread input is by reading it. Of course, we can make scanf()
read it, using a format string that parses any string. Sounds easy.
因此,清除未读输入的唯一方法是通过读取它。当然,我们可以使用解析任何字符串的格式字符串让scanf()
读取它。听起来很简单。
Let's consider another classic example of a beginner's program, trying to read a string from the user:
让我们考虑另一个 经典 的初学者程序示例,尝试从用户读取字符串:
1 2 3 4 5 6 7 8 9 |
|
As %s
is for strings, this should work with any input:
正如 %s
用于字符串,这应该适用于任何输入:
$ ./example3
What's your name? Paul
Hello Paul!
$ ./example3
What's your name? Christopher-Joseph-Montgomery
Segmentation fault
$
Well, now we have a buffer overflow. You might get Segmentation fault
on a Linux system, any other kind of crash, maybe even a "correctly" working program, because, once again, the program has undefined behavior.
现在我们遇到了一个 缓冲区溢出。在 Linux 系统上,你可能会遇到 段错误
,或者其他类型的崩溃,甚至可能是一个“正确”工作的程序,因为,再次强调,程序具有 未定义行为。
The problem here is: %s
matches any string, of any length, and scanf()
has no idea when to stop reading. It reads as long as it can parse the input according to the format string, so it writes a lot more data to our name
variable than the 12 characters we declared for it.
这里的问题是:%s
匹配任何长度的字符串,而 scanf()
不知道何时停止读取。它会根据格式字符串尽可能多地解析输入,因此它会将比我们为 name
变量声明的 12 个字符更多的数据写入其中。
Buffer overflows in
C
C
中的缓冲区溢出A buffer overflow is a specific kind of undefined behavior resulting from a program that tries to write more data to an (array) variable than this variable can hold. Although this is undefined, in practice it will result in overwriting some other data (that happens to be placed after the overflowed buffer in memory) and this can easily crash the program.
缓冲区溢出是一种特定的未定义行为,当程序试图向一个(数组)变量写入比该变量能容纳的更多数据时发生。尽管这是未定义的,但实际上会导致覆盖一些其他数据(这些数据恰好位于溢出缓冲区之后的内存中),这很容易导致程序崩溃。One particularly dangerous result of a buffer overflow is overwriting the return address of a function. The return address is used when a function exits, to jump back to the calling function. Being able to overwrite this address ultimately means that a person with enough knowledge about the system can cause the running program to execute any other code supplied as input. This problem has led to many security vulnerabilities; imagine you can make for example a webserver written in
C
execute your own code by submitting a specially tailored request...
缓冲区溢出特别危险的一个结果是覆盖函数的返回地址。返回地址用于函数退出时跳回到调用函数。能够覆盖这个地址最终意味着一个对系统有足够了解的人可以让正在运行的程序执行任何其他代码,这些代码作为输入提供。这个问题导致了许多安全漏洞;想象一下,你可以通过提交一个精心设计的请求,让用C
编写的服务器执行你自己的代码……
So, here's the next rule:
那么,这是下一条规则:
Rule 2:
scanf()
can be dangerous when used carelessly. Always use field widths with conversions that parse to a string (like%s
).
规则 2:scanf()
如果使用不当可能会危险。始终在解析为字符串的转换中使用字段宽度(如%s
)。
The field width is a number preceeding the conversion specifier. It causes scanf()
to consider a maximum number of characters from the input when parsing for this conversion. Let's demonstrate it in a fixed program:
字段宽度是一个位于转换说明符之前的数字。它导致scanf()
在解析此转换时从输入中考虑最大字符数。让我们在一个固定程序中演示它:
1 2 3 4 5 6 7 8 9 |
|
We also increased the buffer size, because there might be really long names.
我们还增加了缓冲区大小,因为可能会有非常长的名称。
There's an important thing to notice: Although our name
has room for 40 characters, we instruct scanf()
not to read more than 39. This is because a string in C
always needs a 0
byte appended to mark the end. When scanf()
is finished parsing into a string, it appends this byte automatically, and there must be space left for it.
有一个需要注意的重要事项:尽管我们的name
有 40 个字符的空间,但我们指示scanf()
不要读取超过 39 个字符。这是因为C
语言中的字符串总是需要一个0
字节来标记结束。当scanf()
完成对字符串的解析时,它会自动附加这个字节,并且必须留出空间给它。
So, this program is now safe from buffer overflows. Let's try something different:
所以,这个程序现在可以防止缓冲区溢出。让我们尝试一些不同的东西:
$ ./example4
What's your name? Martin Brown
Hello Martin!
Well, that's ... outspoken. What happens here? Reading some scanf()
manual, we would find that %s
parses a word, not a string, for example I found the following wording:
嗯,这真是...直言不讳。这里发生了什么?阅读一些 scanf()
手册,我们会发现 %s
解析的是一个 单词,而不是一个 字符串,例如我发现了以下措辞:
A white-space in C
is one of space, tab (\t
) or newline (\n
).
在 C
语言中,空白字符 是 空格、制表符 (\t
) 或 换行符 (\n
) 之一。
Rule 3: Although
scanf()
format strings can look quite similar toprintf()
format strings, they often have slightly different semantics. (Make sure to read the fine manual)
规则 3: 尽管scanf()
格式字符串看起来与printf()
格式字符串非常相似,但它们通常具有稍微不同的语义。(确保阅读详细手册)
The general problem with parsing "a string" from an input stream is: Where does this string end? With %s
, the answer is at the next white-space. If you want something different, you can use %[
:
从输入流中解析 "字符串" 的一般问题是:这个字符串在哪里结束? 使用 %s
,答案是 在下一个空白字符处。如果你想要不同的结果,可以使用 %[
:
%[a-z]
: parse as long as the input characters are in the range a
- z
.%[a-z]
: 解析输入字符,只要它们在 a
- z
范围内。%[ny]
: parse as long as the input characters are y
or n
.%[ny]
:只要输入字符是y
或n
,就会进行解析。%[^.]
: The ^
negates the list, so this means parse as long as there is no .
in the input.%[^.]
:^
取反列表,因此这意味着只要输入中没有.
,就会进行解析。We could change the program, so anything until a newline will be parsed into our string:
我们可以更改程序,因此任何直到换行符的内容都将被解析到我们的字符串中:
1 2 3 4 5 6 7 8 9 |
|
It might get a bit frustrating, but this is again a program with possible undefined behavior, see what happens when we just press Enter
:
这可能会有些令人沮丧,但这又是一个可能具有未定义行为的程序,看看当我们按下Enter
时会发生什么:
$ ./example5
What's your name?
Hello ÿ¦e!
Here's another sentence from a scanf()
manual, from the section describing the [
conversion:
这是来自scanf()
手册的另一句话,来自描述[
转换的部分:
With many conversions, scanf()
automatically skips whitespace characters in the input, but with some, it doesn't. Here, our newline from just pressing enter isn't skipped, and it doesn't match for our conversion that explicitly excludes newlines. The result is: scanf()
doesn't parse anything, our name
remains uninitialized.
对于许多转换,scanf()
会自动跳过输入中的空白字符,但对于某些转换则不会。在这里,我们只是按下回车键产生的换行符没有被跳过,并且它与我们的转换不匹配,该转换明确排除了换行符。结果是:scanf()
没有解析任何内容,我们的名字
仍然保持未初始化。
One way around this is to tell scanf()
to skip whitespace: If the format string contains any whitespace, it matches any number of whitespace characters in the input, including no whitespace at all. Let's use this to skip whitespace the user might enter before entering his name:
解决这个问题的一个方法是告诉scanf()
跳过空白字符:如果格式字符串包含任何空白字符,它将匹配输入中的任意数量的空白字符,包括没有空白字符。让我们使用这个方法来跳过用户可能在输入名字之前输入的空白字符:
1 2 3 4 5 6 7 8 9 10 |
|
Yes, this program works and doesn't have any undefined behavior*), but I guess you don't like very much that nothing at all happens when you just press enter, because scanf()
is skipping it and continues to wait for input that can be matched.
是的,这个程序可以工作并且没有任何未定义行为*),但我猜你不太喜欢当你只是按下回车时什么都没有发生,因为scanf()
跳过了它并继续等待可以匹配的输入。
*) actually, this isn't even entirely true. This program still has undefined behavior for empty input. You could force this on a Linux console hitting Ctrl+D
for example. So, it's again an example for code you should not write.
*) 实际上,这甚至不完全正确。这个程序仍然对空输入有未定义行为。例如,你可以在 Linux 控制台上通过按Ctrl+D
来强制执行此操作。因此,这再次是一个你不应该写的代码的例子。
There are several functions in C
for reading input. Let's have a look at one that's probably most useful to you: fgets()
.
在C
中有几个用于读取输入的函数。让我们来看一个可能对你最有用的函数:fgets()
。
fgets()
does a simple thing, it reads up to a given maximum number of characters, but stops at a newline, which is read as well. In other words: It reads a line of input.fgets()
做了一件简单的事情,它读取最多给定数量的字符,但会在换行符处停止,换行符也会被读取。换句话说:它读取一行输入。
This is the function signature:
这是函数签名:
char *fgets(char *str, int n, FILE *stream)
There are two very nice things about this function for what we want to do:
这个函数有两个非常适合我们需求的特点:
0
byte, so we can just pass the size of our variable.0
字节,因此我们可以直接传递变量的大小。str
or NULL
if, for any reason, nothing was read.str
的指针,要么是 NULL
,如果由于任何原因没有读取到任何内容。So let's rewrite this program again:
所以让我们再次重写这个程序:
1 2 3 4 5 6 7 8 9 10 11 |
|
I assure you this is safe, but it has a little flaw:
我向你保证这是安全的,但它有一个小缺陷:
$ ./example7
What's your name? Bob
Hello Bob
!
Of course, this is because fgets()
also reads the newline character itself. But the fix is simple as well: We use strcspn()
to get the index of the newline character if there is one and overwrite it with 0
. strcspn()
is declared in string.h
, so we need a new #include
:
当然,这是因为fgets()
也会读取换行符本身。但修复也很简单:我们使用strcspn()
来获取换行符的索引(如果有的话),并用0
覆盖它。strcspn()
在string.h
中声明,所以我们需要一个新的#include
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Let's test it: 让我们测试一下:
$ ./example8
What's your name? Bob Belcher
Hello Bob Belcher!
scanf()
?scanf()
的情况下获取数字?There are many functions for converting a string to a number in C
. A function that's used quite often is atoi()
, the name means anything to integer. It returns 0 if it can't convert the string. Let's try to rewrite the broken example 2 using fgets()
and atoi()
. atoi()
is declared in stdlib.h
.
在C
语言中,有许多函数可以将字符串转换为数字。一个经常使用的函数是atoi()
,它的名字意味着任何转换为整数。如果无法转换字符串,它将返回 0。让我们尝试使用fgets()
和atoi()
重写损坏的示例 2。atoi()
在stdlib.h
中声明。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
So, trying this out: 所以,尝试一下这个:
$ ./example9
enter a number: foo
enter a number: bar
enter a number: 15x
You entered 15.
Well, not bad so far. But what if we want to allow an actual 0
to be entered? We can't tell whether atoi()
returns 0
because it cannot convert anything or because there was an actual 0
in the string. Also, ignoring the extra x
when we input 15x
may not be what we want.
嗯,到目前为止还不错。但如果我们想允许输入一个实际的 0
呢?我们无法判断 atoi()
返回 0
是因为它无法转换任何内容,还是因为字符串中有一个实际的 0
。此外,当我们输入 15x
时忽略额外的 x
可能不是我们想要的。
atoi()
is good enough in many cases, but if you want better error checking, there's an alternative: strtol()
:atoi()
在很多情况下已经足够好了,但如果你想进行更好的错误检查,有一个替代方案:strtol()
:
long int strtol(const char *nptr, char **endptr, int base);
This looks complicated. But it isn't:
这看起来很复杂。但其实不然:
10
, but you could give 16
here for parsing hexadecimal or 2
for parsing binary.10
,但您可以在这里提供 16
来解析十六进制或 2
来解析二进制。strtol()
even sets errno
, so you can check whether a number was too small or too big for conversion.strtol()
甚至会设置 errno
,因此您可以检查数字是否太小或太大而无法转换。Now let's use this instead of atoi()
(note it returns a long int
), making use of every possibility to detect errors:
现在让我们用这个替代atoi()
(注意它返回一个long int
),充分利用各种可能性来检测错误:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
And again, let's try: 再次尝试:
$ ./example10
enter a number: 565672475687456576574
Sorry, this number is too small or too large.
enter a number: ggggg
enter a number: 15x
enter a number: 0
You entered 0.
This looks really good, doesn't it? If you want to know more, I suggest you read on similar functions like atof()
, strtoll()
, strtod()
etc.
这看起来真的很好,不是吗?如果你想了解更多,我建议你阅读类似函数,如atof()
、strtoll()
、strtod()
等。
scanf()
as well?scanf()
的示例吗?Yes, you can. Here's a last rule:
是的,你可以。这里有一个最后的规则:
Rule 4:
scanf()
is a very powerful function. (and with great power comes great responsibility ...)
规则 4:scanf()
是一个非常强大的函数。(能力越大,责任越大……)
A lot of parsing work can be done with scanf()
in a very concise way, which can be very nice, but it also has many pitfalls and there are tasks (such as reading a line of input) that are much simpler to accomplish with a simpler function. Make sure you understand the rules presented here, and if in doubt, read the scanf()
manual precisely.
很多解析工作可以用scanf()
以非常简洁的方式完成,这非常好,但它也有很多陷阱,有些任务(比如读取一行输入)用更简单的函数来完成会更简单。确保你理解这里介绍的规则,如果有疑问,请仔细阅读scanf()
的手册精确。
That being said, here's an example how to read a number with retries using scanf()
:
话虽如此,这里有一个示例,展示了如何使用scanf()
进行带重试的数字读取:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
It's not as nice as the version using strtol()
above, because there is no way to tell scanf()
not to skip whitespace for %d
-- so if you just hit Enter
, it will still wait for your input -- but it works and it's a really short program.
这不如上面使用strtol()
的版本好,因为没有方法告诉scanf()
不跳过%d
的空白符——所以如果你只是按Enter
,它仍然会等待你的输入——但它有效,而且是一个非常短的程序。
For the sake of completeness, if you really really want to get a line of input using scanf()
, of course this can be done safely as well:
为了完整性,如果你真的真的想使用scanf()
获取一行输入,当然这也可以安全地完成:
1 2 3 4 5 6 7 8 9 10 11 |
|
Note that this final example of course leaves input unread, even from the same line, if there were more than 39 characters until the newline. If this is a concern, you'd have to find another way -- or just use fgets()
, making the check easier, because it gives you the newline if there was one.
请注意,这个最终示例当然会忽略同一行中超过 39 个字符后的输入,直到换行符。如果这是一个问题,您需要寻找另一种方法——或者直接使用fgets()
,这样检查会更简单,因为它会在有换行符时提供换行符。