...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
...世界上最受推崇和精心设计的 C++库项目之一。
— 赫伯·苏特和安德烈·亚历山德鲁斯库,C++ 编程规范
struct
s and class
esstruct
s 和 class
esparse()
API The parse()
API
(原文中包含特殊符号和代码,因此未进行翻译。)
First, let's cover some terminology that we'll be using throughout the docs:
首先,让我们介绍一些将在文档中使用的术语:
A semantic action is an arbitrary bit of logic associated
with a parser, that is only executed when the parser matches.
语义动作是与解析器相关联的任意逻辑片段,仅在解析器匹配时执行。
Simpler parsers can be combined to form more complex parsers. Given some
combining operation C
, and
parsers P0
, P1
, ... PN
,
C(P0, P1, ... PN)
creates a new parser Q
.
This creates a parse tree. Q
is the parent of P1
, P2
is the child of Q
,
etc. The parsers are applied in the top-down fashion implied by this topology.
When you use Q
to parse a
string, it will use P0
,
P1
, etc. to do the actual
work. If P3
is being used
to parse the input, that means that Q
is as well, since the way Q
parses is by dispatching to its children to do some or all of the work. At
any point in the parse, there will be exactly one parser without children
that is being used to parse the input; all other parsers being used are its
ancestors in the parse tree.
更简单的解析器可以组合成更复杂的解析器。给定一些组合操作 C
,以及解析器 P0
, P1
,... PN
, C(P0, P1, ... PN)
创建一个新的解析器 Q
。这创建了一个解析树。 Q
是 P1
的父节点, P2
是 Q
的子节点等。解析器按照这种拓扑隐含的从上到下的方式应用。当你使用 Q
解析字符串时,它将使用 P0
, P1
等来完成实际工作。如果正在使用 P3
来解析输入,这意味着 Q
也在使用,因为 Q
解析的方式是通过将其子节点调度到做部分或全部工作。在解析的任何时刻,将恰好有一个没有子节点的解析器被用来解析输入;所有其他正在使用的解析器都是解析树中的祖先。
A subparser is a parser that is the child of another
parser.
子解析器是另一个解析器的子解析器。
The top-level parser is the root of the tree of parsers.
顶级解析器是解析器树的根。
The current parser or bottommost parser
is the parser with no children that is currently being used to parse the
input.
当前解析器或最底层的解析器是当前用于解析输入的无子节点的解析器。
A rule is a kind of parser that makes building large,
complex parsers easier. A subrule is a rule that is
the child of some other rule. The current rule or bottommost
rule is the one rule currently being used to parse the input that
has no subrules. Note that while there is always exactly one current parser,
there may or may not be a current rule — rules are one kind of parser,
and you may or may not be using one at a given point in the parse.
规则是一种使构建大型、复杂解析器更简单的解析器。子规则是某个其他规则的子规则。当前规则或最底层的规则是当前用于解析没有子规则的输入的规则。请注意,虽然始终只有一个当前解析器,但可能有一个或没有当前规则——规则是解析器的一种,您可能在解析的某个点上使用或不使用它。
The top-level parse is the parse operation being performed
by the top-level parser. This term is necessary because, though most parse
failures are local to a particular parser, some parse failures cause the
call to parse()
to indicate failure of the
entire parse. For these cases, we say that such a local failure "causes
the top-level parse to fail".
顶级解析是顶级解析器正在执行的解释操作。这个术语是必要的,因为尽管大多数解析失败都是局部于特定解析器的,但有些解析失败会导致调用 parse()
以指示整个解析失败。在这些情况下,我们说这种局部失败“导致顶级解析失败”。
Throughout the Boost.Parser documentation, I will refer to "the call
to parse()
". Read this as "the
call to any one of the functions described in The
parse()
API". That includes prefix_parse()
,
callback_parse()
, and callback_prefix_parse()
.
在整个 Boost.Parser 文档中,我将提到“对 parse()
的调用”。请将其理解为“对 The parse()
API 中描述的任何函数的调用”。这包括 prefix_parse()
、 callback_parse()
和 callback_prefix_parse()
。
There are some special kinds of parsers that come up often in this documentation.
这里有一些在文档中经常出现的特殊类型的解析器。
One is a sequence parser; you will see it created using
operator>>
,
as in p1 >>
p2 >>
p3
. A sequence parser tries to
match all of its subparsers to the input, one at a time, in order. It matches
the input iff all its subparsers do.
一个是一个序列解析器;您将看到它是如何使用 operator>>
创建的,就像 p1 >>
p2 >>
p3
一样。序列解析器试图按顺序将所有子解析器与输入匹配,一次一个。如果所有子解析器都匹配,则匹配输入。
Another is an alternative parser; you will see it created
using operator|
,
as in p1 |
p2 |
p3
. An alternative parser tries
to match all of its subparsers to the input, one at a time, in order; it
stops after matching at most one subparser. It matches the input iff one
of its subparsers does.
另一个是替代解析器;您将看到它是如何使用 operator|
创建的,就像 p1 |
p2 |
p3
一样。替代解析器会尝试按顺序将所有子解析器与输入匹配,一次一个;它最多匹配一个子解析器后停止。如果其中一个子解析器匹配输入,则匹配输入。
Finally, there is a permutation parser; it is created
using operator||
,
as in p1 ||
p2 ||
p3
. A permutation parser tries
to match all of its subparsers to the input, in any order. So the parser
p1 ||
p2 ||
p3
is equivalent to (p1 >>
p2 >>
p3) | (p1
>> p3
>> p2) | (p2 >> p1 >> p3) |
(p2 >> p3 >> p1) | (p3 >> p1 >> p2) |
(p3 >> p2 >> p1)
. Hopefully its terseness is self-explanatory.
It matches the input iff all of its subparsers do, regardless of the order
they match in.
最后,有一个排列解析器;它是使用 operator||
创建的,就像 p1 ||
p2 ||
p3
一样。排列解析器尝试以任何顺序将其子解析器与输入匹配。因此,解析器 p1 ||
p2 ||
p3
等同于 (p1 >>
p2 >>
p3) | (p1
>> p3
>> p2) | (p2 >> p1 >> p3) |
(p2 >> p3 >> p1) | (p3 >> p1 >> p2) |
(p3 >> p2 >> p1)
。希望它的简洁性是显而易见的。它只有在所有子解析器都匹配的情况下才匹配输入,无论它们匹配的顺序如何。
Boost.Parser parsers each have an attribute associated
with them, or explicitly have no attribute. An attribute is a value that
the parser generates when it matches the input. For instance, the parser
double_
generates a double
when it matches
the input. ATTR
()
is a notional macro that expands to the attribute type of the parser passed
to it;
is ATTR
(double_)double
.
This is similar to the attribute
type trait.
每个 Boost.Parser 解析器都有一个与之关联的属性,或者明确没有属性。属性是解析器在匹配输入时生成的值。例如,当解析器 double_
匹配输入时,它会生成一个 double
。 ATTR
()
是一个概念宏,它扩展为传递给它的解析器的属性类型;
是 ATTR
(double_)double
。这与 attribute
类型特性类似。
Next, we'll look at some simple programs that parse using Boost.Parser. We'll
start small and build up from there.
接下来,我们将查看一些使用 Boost.Parser 进行解析的简单程序。我们将从小处着手,逐步构建。
This is just about the most minimal example of using Boost.Parser that one
could write. We take a string from the command line, or "World"
if none is given, and then we parse it:
这是使用 Boost.Parser 所能编写的最简例子之一。我们从命令行获取一个字符串,如果没有提供,则使用 "World"
,然后对其进行解析:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main(int argc, char const * argv[]) { std::string input = "World"; if (1 < argc) input = argv[1]; std::string result; bp::parse(input, *bp::char_, result); std::cout << "Hello, " << result << "!\n"; }
The expression *bp::char_
is a parser-expression. It uses one of the many parsers that Boost.Parser
provides: char_
.
Like all Boost.Parser parsers, it has certain operations defined on it. In
this case, *bp::char_
is using an overloaded operator*
as the C++ version of a Kleene
star operator. Since C++ has no postfix unary *
operator, we have to use the one we have, so it is used as a prefix.
表达式 *bp::char_
是一个解析表达式。它使用 Boost.Parser 提供的许多解析器之一: char_
。像所有 Boost.Parser 解析器一样,它在其上定义了某些操作。在这种情况下, *bp::char_
使用了重载的 operator*
作为 C++ 版本的 Kleene 星号运算符。由于 C++ 没有后缀一元 *
运算符,我们必须使用我们有的,所以它被用作前缀。
So, *bp::char_
means "any number of characters". In other words, it really cannot
fail. Even an empty string will match it.
所以, *bp::char_
表示“任意数量的字符”。换句话说,它实际上不可能失败。即使是空字符串也能匹配它。
The parse operation is performed by calling the parse()
function, passing the parser as one of the arguments:
解析操作通过调用 parse()
函数执行,将解析器作为参数之一传递:
bp::parse(input, *bp::char_, result);
The arguments here are: input
,
the range to parse; *bp::char_
,
the parser used to do the parse; and result
,
an out-parameter into which to put the result of the parse. Don't get too
caught up on this method of getting the parse result out of parse()
; there are multiple ways
of doing so, and we'll cover all of them in subsequent sections.
这里的参数有: input
,要解析的范围; *bp::char_
,用于解析的解析器;以及 result
,一个输出参数,用于存放解析结果。不要过于纠结于从 parse()
获取解析结果的方法;有多种方法可以实现,我们将在后续章节中全部介绍。
Also, just ignore for now the fact that Boost.Parser somehow figured out
that the result type of the *bp::char_
parser is a std::string
. There are clear rules for this
that we'll cover later.
此外,现在先忽略这样一个事实:Boost.Parser 不知怎么的推断出 *bp::char_
解析器的结果类型是 std::string
。对此有明确的规则,我们稍后会讨论。
The effects of this call to parse()
is not very interesting — since the parser we gave it cannot ever
fail, and because we're placing the output in the same type as the input,
it just copies the contents of input
to result
.
此调用 parse()
的效果并不很有趣——因为我们给出的解析器永远不会失败,而且因为我们把输出放在与输入相同的类型中,它只是将 input
的内容复制到 result
。
Let's look at a slightly more complicated example, even if it is still trivial.
Instead of taking any old char
s
we're given, let's require some structure. Let's parse one or more double
s, separated by commas.
让我们看看一个稍微复杂一点的例子,即使它仍然很 trivial。不是随便拿给我们的任何旧的 char
,而是要求一些结构。让我们解析一个或多个由逗号分隔的 double
。
The Boost.Parser parser for double
is double_
.
So, to parse a single double
,
we'd just use that. If we wanted to parse two double
s
in a row, we'd use:
The Boost.Parser 解析器用于 double
是 double_
。因此,要解析单个 double
,我们只需使用它。如果我们想连续解析两个 double
,我们会使用:
boost::parser::double_ >> boost::parser::double_
operator>>
in this expression is the sequence-operator; read it as "followed by".
If we combine the sequence-operator with Kleene
star, we can get the parser we want by writing:
operator>>
在这个表达式中是序列运算符;读作“之后”。如果我们把序列运算符与 Kleene 星号结合,就可以通过编写以下内容来得到我们想要的解析器:
boost::parser::double_ >> *(',' >> boost::parser::double_)
This is a parser that matches at least one double
— because of the first double_
in the expression
above — followed by zero or more instances of a-comma-followed-by-a-double
. Notice that we can use ','
directly. Though it is not a parser, operator>>
and the other operators defined on Boost.Parser parsers have overloads that
accept character/parser pairs of arguments; these operator overloads will
create the right parser to recognize ','
.
这是一个至少匹配一个 double
的解析器——因为上述表达式中的第一个 double_
——后面跟着零个或多个由逗号和 double
组成的实例。请注意,我们可以直接使用 ','
。尽管它不是一个解析器, operator>>
和其他在 Boost.Parser 解析器上定义的运算符有接受字符/解析器对参数的重载;这些运算符重载将创建识别 ','
的正确解析器。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. No pressure. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, bp::double_ >> *(',' >> bp::double_)); if (result) { std::cout << "Great! It looks like you entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Good job! Please proceed to the recovery annex for cake.\n"; } }
The first example filled in an out-parameter to deliver the result of the
parse. This call to parse()
returns a result instead. As you can see, the result is contextually convertible
to bool
, and *result
is some sort of range. In fact,
the return type of this call to parse()
is std::optional<std::vector<double>>
. Naturally, if the parse fails,
std::nullopt
is returned. We'll look at how
Boost.Parser maps the type of the parser to the return type, or the filled
in out-parameter's type, a bit later.
第一个示例填充了一个输出参数以传递解析的结果。这个对 parse()
的调用返回了一个结果。正如你所见,结果可以上下文转换成 bool
,而 *result
是一种范围。实际上,这个对 parse()
的调用返回类型是 std::optional<std::vector<double>>
。当然,如果解析失败,则返回 std::nullopt
。我们稍后会看看 Boost.Parser 如何将解析器的类型映射到返回类型,或者填充的输出参数的类型。
Note 注意 | |
---|---|
There's a type trait that can tell you the attribute type for a parser,
|
If I run it in a shell, this is the result:
如果我在 shell 中运行它,这是结果:
$ example/trivial Enter a list of doubles, separated by commas. No pressure. 5.6,8.9 Great! It looks like you entered: 5.6 8.9 $ example/trivial Enter a list of doubles, separated by commas. No pressure. 5.6, 8.9 Good job! Please proceed to the recovery annex for cake.
It does not recognize "5.6, 8.9"
.
This is because it expects a comma followed immediately
by a double
, but I inserted
a space after the comma. The same failure to parse would occur if I put a
space before the comma, or before or after the list of double
s.
它不识别 "5.6, 8.9"
。这是因为它期望逗号后立即跟一个 double
,但我却在逗号后插入了空格。如果我在逗号前或 double
列表前后加空格,也会出现同样的解析失败。
One more thing: there is a much better way to write the parser above. Instead
of repeating the double_
subparser, we could have written this:
还有一件事:上面解析器的写法有更好的方法。我们不必重复使用 double_
子解析器,可以写成这样:
bp::double_ % ','
That's semantically identical to bp::double_ >> *(',' >> bp::double_)
. This pattern — some bit of input
repeated one or more times, with a separator between each instance —
comes up so often that there's an operator specifically for that, operator%
.
We'll be using that operator from now on.
这与 bp::double_ >> *(',' >> bp::double_)
在语义上相同。这种模式——一些输入重复一次或多次,每次之间有分隔符——出现得如此频繁,以至于有一个专门的操作符用于此, operator%
。从现在起,我们将使用该操作符。
Let's modify the trivial parser we just saw to ignore any spaces it might
find among the double
s and commas.
To skip whitespace wherever we find it, we can pass a skip parser
to our call to parse()
(we don't need to touch
the parser passed to parse()
).
Here, we use ws
, which matches
any Unicode whitespace character.
让我们修改我们刚才看到的平凡解析器,使其忽略在 double
s 和逗号之间可能找到的任何空格。要跳过我们找到的任何空白,我们可以将跳过解析器传递给我们的 parse()
调用(我们不需要触摸传递给 parse()
的解析器)。在这里,我们使用 ws
,它匹配任何 Unicode 空白字符。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. No pressure. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, bp::double_ % ',', bp::ws); if (result) { std::cout << "Great! It looks like you entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Good job! Please proceed to the recovery annex for cake.\n"; } }
The skip parser, or skipper, is run between the subparsers
within the parser passed to parse()
.
In this case, the skipper is run before the first double
is parsed, before any subsequent comma or double
is parsed, and at the end. So, the strings "3.6,5.9"
and " 3.6 , \t 5.9 "
are parsed the same by this program.
跳过解析器,或称为跳过器,在传递给 parse()
的解析器内的子解析器之间运行。在这种情况下,跳过器在解析第一个 double
之前运行,在解析任何后续逗号或 double
之前运行,并在最后运行。因此,该程序以相同的方式解析字符串 "3.6,5.9"
和 " 3.6 , \t 5.9 "
。
Skipping is an important concept in Boost.Parser. You can skip anything,
not just whitespace; there are lots of other things you might want to skip.
The skipper you pass to parse()
can be an arbitrary parser. For example, if you write a parser for a scripting
language, you can write a skipper to skip whitespace, inline comments, and
end-of-line comments.
跳过是 Boost.Parser 中的一个重要概念。你可以跳过任何内容,而不仅仅是空白;你可能想要跳过很多东西。传递给 parse()
的跳过器可以是一个任意的解析器。例如,如果你为脚本语言编写了一个解析器,你可以编写一个跳过器来跳过空白、行内注释和行尾注释。
We'll be using skip parsers almost exclusively in the rest of the documentation.
The ability to ignore the parts of your input that you don't care about is
so convenient that parsing without skipping is a rarity in practice.
我们将几乎在文档的其余部分使用跳过解析器。忽略你不需要关注的部分的能力非常方便,以至于在实际应用中不跳过的解析几乎很少见。
Like all parsing systems (lex & yacc, Boost.Spirit,
etc.), Boost.Parser has a mechanism for associating semantic actions with
different parts of the parse. Here is nearly the same program as we saw in
the previous example, except that it is implemented in terms of a semantic
action that appends each parsed double
to a result, instead of automatically building and returning the result.
To do this, we replace the double_
from the previous
example with double_[action]
;
action
is our semantic action:
与所有解析系统(lex & yacc、Boost.Spirit 等)一样,Boost.Parser 有一个将语义动作与解析的不同部分关联的机制。这里是一个与上一个例子几乎相同的程序,只不过它是在语义动作的术语中实现的,该动作将每个解析的 double
追加到结果中,而不是自动构建和返回结果。为此,我们将上一个例子中的 double_
替换为 double_[action]
; action
是我们的语义动作:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. "; std::string input; std::getline(std::cin, input); std::vector<double> result; auto const action = [&result](auto & ctx) { std::cout << "Got one!\n"; result.push_back(_attr(ctx)); }; auto const action_parser = bp::double_[action]; auto const success = bp::parse(input, action_parser % ',', bp::ws); if (success) { std::cout << "You entered:\n"; for (double x : result) { std::cout << x << "\n"; } } else { std::cout << "Parse failure.\n"; } }
Run in a shell, it looks like this:
在 shell 中运行,看起来是这样的:
$ example/semantic_actions Enter a list of doubles, separated by commas. 4,3 Got one! Got one! You entered: 4 3
In Boost.Parser, semantic actions are implemented in terms of invocable objects
that take a single parameter to a parse-context object. The parse-context
object represents the current state of the parse. In the example we used
this lambda as our invocable:
在 Boost.Parser 中,语义动作是通过接受一个解析上下文对象参数的可调用对象实现的。解析上下文对象表示解析的当前状态。在示例中,我们使用这个 lambda 作为我们的可调用对象:
auto const action = [&result](auto & ctx) { std::cout << "Got one!\n"; result.push_back(_attr(ctx)); };
We're both printing a message to std::cout
and recording a parsed result in the lambda. It could do both, either, or
neither of these things if you like. The way we get the parsed double
in the lambda is by asking the parse
context for it. _attr(ctx)
is
how you ask the parse context for the attribute produced by the parser to
which the semantic action is attached. There are lots of functions like
_attr()
that can be used to access the state in the parse context. We'll cover more
of them later on. The
Parse Context defines what exactly the parse context is and how it
works.
我们都在向 std::cout
打印消息并在 lambda 中记录解析结果。如果你喜欢,它可以同时做这两件事,也可以只做其中一件,或者一件都不做。我们通过询问解析上下文来获取 lambda 中的解析 double
。 _attr(ctx)
是询问解析上下文以获取与语义动作相关联的解析器产生的属性的方式。有许多像 _attr()
这样的函数可以用来访问解析上下文中的状态。我们将在稍后介绍更多这样的函数。解析上下文定义了解析上下文的确切含义及其工作方式。
Note that you can't write an unadorned lambda directly as a semantic action.
Otherwise, the compile will see two '['
characters and think it's about to parse an attribute. Parentheses fix this:
请注意,您不能直接将未装饰的 lambda 作为语义动作写入。否则,编译器会看到两个 '['
字符,并认为它即将解析一个属性。括号可以解决这个问题:
p[([](auto & ctx){/*...*/})]
Before you do this, note that the lambdas that you write as semantic actions
are almost always generic (having an auto
& ctx
parameter), and so are very frequently re-usable. Most semantic action lambdas
you write should be written out-of-line, and given a good name. Even when
they are not reused, named lambdas keep your parsers smaller and easier to
read.
在执行此操作之前,请注意,您作为语义动作编写的 lambda 函数几乎总是通用的(具有 auto
& ctx
参数),因此它们非常频繁地可重用。您编写的多数语义动作 lambda 函数应该独立编写,并赋予一个良好的名称。即使它们没有被重用,命名 lambda 函数也能使您的解析器更小、更易于阅读。
Important 重要 | |
---|---|
Attaching a semantic action to a parser removes its attribute. That is,
|
There are some other forms for semantic actions, when they are used inside
of rules
.
See More About Rules
for details.
存在一些其他形式的语义动作,当它们在 rules
内部使用时。有关详细信息,请参阅规则。
So far we've seen examples that parse some text and generate associated attributes.
Sometimes, you want to find some subrange of the input that contains what
you're looking for, and you don't want to generate attributes at all.
到目前为止,我们已经看到了一些解析文本并生成相关属性的示例。有时,你可能只想找到包含你所需内容的输入子范围,而不想生成任何属性。
There are two directives that affect the attribute type
of any parser, raw[]
and string_view[]
.
(We'll get to directives in more detail in the Directives
section later. For now, you just need to know that a directive wraps a parser,
and changes some aspect of how it functions.)
有两个指令会影响任何解析器的属性类型,即 raw[]
和 string_view[]
。(我们将在指令部分详细讨论指令。现在,你只需要知道指令会包装解析器,并改变其功能的一些方面。)
raw[]
changes the attribute of its
parser to be a subrange
whose begin()
and end()
return the bounds of the sequence being parsed that match p
.
raw[]
更改其解析器的属性,使其成为一个 subrange
,该 subrange
的 begin()
和 end()
返回与 p
匹配的序列的界限。
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto subrange_parser = bp::raw[int_parser]; // ATTR(subrange_parser) is a subrange // Parse using int_parser, generating integers. auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws); assert(ints); assert(*ints == std::vector<int>({1, 2, 3, 4})); // Parse again using int_parser, but this time generating only the // subrange matched by int_parser. (prefix_parse() allows matches that // don't consume the entire input.) auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws); assert(range); assert(range->begin() == str.begin()); assert(range->end() == str.begin() + 10); static_assert(std::is_same_v< decltype(range), std::optional<bp::subrange<std::string::const_iterator>>>);
Note that the subrange
has the iterator type std::string::const_iterator
,
because that's the iterator type passed to prefix_parse()
.
If we had passed char const
*
iterators to prefix_parse()
,
that would have been the iterator type. The only exception to this comes
from Unicode-aware parsing (see Unicode
Support). In some of those cases, the iterator being used in the parse
is not the one you passed. For instance, if you call prefix_parse()
with char8_t *
iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the
iterators of that view. In such a case, you'll get a subrange
whose iterator type
is a transcoding iterator. When that happens, you can get the underlying
iterator — the one you passed to prefix_parse()
— by calling the .base()
member function on each transcoding iterator
in the returned subrange
.
请注意, subrange
具有迭代器类型 std::string::const_iterator
,因为那是传递给 prefix_parse()
的迭代器类型。如果我们向 prefix_parse()
传递了 char const
*
迭代器,那么迭代器类型就是那个。唯一的例外来自对 Unicode 的解析(见 Unicode 支持)。在这些情况中,用于解析的迭代器不是你传递的那个。例如,如果你用 char8_t *
迭代器调用 prefix_parse()
,它将创建一个 UTF-8 到 UTF-32 转换视图,并解析该视图的迭代器。在这种情况下,你将得到一个迭代器类型为转换迭代器的 subrange
。当发生这种情况时,你可以通过在返回的 subrange
中的每个转换迭代器上调用 .base()
成员函数来获取底层迭代器——即你传递给 prefix_parse()
的那个。
auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c"); auto u8first = u8str.begin(); auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws); assert(u8range); assert(u8range->begin().base() == u8str.begin()); assert(u8range->end().base() == u8str.begin() + 10);
string_view[]
has very similar semantics
to raw[]
, except that it produces a
std::basic_string_view<CharT>
(where CharT
is the type
of the underlying range begin parsed) instead of a subrange
. For this to work,
the underlying range must be contiguous. Contiguity of iterators is not detectable
before C++20, so this directive is only available in C++20 and later.
string_view[]
与 raw[]
的语义非常相似,除了它产生一个 std::basic_string_view<CharT>
(其中 CharT
是底层范围的开始解析类型)而不是一个 subrange
。为了使其工作,底层范围必须是连续的。在 C++20 之前,迭代器的连续性是不可检测的,因此此指令仅在 C++20 及以后版本中可用。
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto sv_parser = bp::string_view[int_parser]; // ATTR(sv_parser) is a string_view auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws); assert(sv1); assert(*sv1 == str.substr(0, 10)); static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);
Since string_view[]
produces string_view
s,
it cannot return transcoding iterators as described above for raw[]
. If you parse a sequence of
CharT
with string_view[]
,
you get exactly a std::basic_string_view<CharT>
.
If the parse is using transcoding in the Unicode-aware path, string_view[]
will decompose the transcoding
iterator as necessary. If you pass a transcoding view to parse()
or transcoding iterators to prefix_parse()
,
string_view[]
will still see through the
transcoding iterators without issue, and give you a string_view
of part of the underlying range.
由于 string_view[]
产生 string_view
,它不能像上面描述的那样为 raw[]
返回转码迭代器。如果你用 string_view[]
解析一个 CharT
序列,你会得到一个精确的 std::basic_string_view<CharT>
。如果解析在 Unicode 感知路径中使用转码, string_view[]
将根据需要分解转码迭代器。如果你将转码视图传递给 parse()
或将转码迭代器传递给 prefix_parse()
, string_view[]
仍然可以无问题地看穿转码迭代器,并给你一个底层范围的子范围。
auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws); assert(sv2); assert(*sv2 == "1, 2, 3, 4"); static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);
Now would be a good time to describe the parse context in some detail. Any
semantic action that you write will need to use state in the parse context,
so you need to know what's available.
现在是一个详细描述解析上下文的好时机。你编写的任何语义动作都需要在解析上下文中使用状态,因此你需要知道有什么可用。
The parse context is an object that stores the current state of the parse
— the current- and end-iterators, the error handler, etc. Data may
seem to be "added" to or "removed" from it at different
times during the parse. For instance, when a parser p
with a semantic action a
succeeds, the context adds the attribute that p
produces to the parse context, then calls a
,
passing it the context.
解析上下文是一个对象,用于存储解析的当前状态——当前和结束迭代器、错误处理器等。数据可能在解析的不同时间被“添加”或“删除”。例如,当解析器 p
执行语义动作 a
成功时,上下文会将 p
生成的属性添加到解析上下文中,然后调用 a
,并将上下文传递给它。
Though the context object appears to have things added to or removed from
it, it does not. In reality, there is no one context object. Contexts are
formed at various times during the parse, usually when starting a subparser.
Each context is formed by taking the previous context and adding or changing
members as needed to form a new context object. When the function containing
the new context object returns, its context object (if any) is destructed.
This is efficient to do, because the parse context has only about a dozen
data members, and each data member is less than or equal to the size of a
pointer. Copying the entire context when mutating the context is therefore
fast. The context does no memory allocation.
尽管上下文对象看起来被添加或删除了东西,但实际上并没有。实际上,没有上下文对象。上下文在解析过程中形成,通常在开始子解析器时。每个上下文都是通过取前一个上下文,并根据需要添加或更改成员来形成新的上下文对象。当包含新上下文对象的函数返回时,其上下文对象(如果有)将被销毁。这样做是高效的,因为解析上下文只有大约十几个数据成员,每个数据成员的大小不超过指针的大小。因此,在修改上下文时复制整个上下文是快速的。上下文不进行内存分配。
Tip 提示 | |
---|---|
All these functions that take the parse context as their first parameter
will find by found by Argument-Dependent Lookup. You will probably never
need to qualify them with |
By convention, the names of all Boost.Parser functions that take a parse
context, and are therefore intended for use inside semantic actions, contain
a leading underscore.
按照惯例,所有接受解析上下文作为参数的 Boost.Parser 函数,因此旨在在语义动作中使用,其名称都包含一个前置下划线。
_pass()
returns a reference to a
bool
indicating the success
of failure of the current parse. This can be used to force the current parse
to pass or fail:
_pass()
返回一个指向 bool
的引用,表示当前解析的成功或失败。这可以用来强制当前解析通过或失败:
[](auto & ctx) { // If the attribute fails to meet this predicate, fail the parse. if (!necessary_condition(_attr(ctx))) _pass(ctx) = false; }
Note that for a semantic action to be executed, its associated parser must
already have succeeded. So unless you previously wrote _pass(ctx)
= false
within your action, _pass(ctx)
= true
does nothing; it's redundant.
请注意,要执行语义动作,其关联的解析器必须已经成功。所以除非你之前在你的动作中写了 _pass(ctx)
= false
,否则 _pass(ctx)
= true
什么也不做;它是多余的。
_begin()
and _end()
return the beginning and end of the range that you passed to parse()
, respectively. _where()
returns a subrange
indicating the bounds
of the input matched by the current parse. _where()
can be useful if you just want to parse some text and return a result consisting
of where certain elements are located, without producing any other attributes.
_where()
can also be essential in
tracking where things are located, to provide good diagnostics at a later
point in the parse. Think mismatched tags in XML; if you parse a close-tag
at the end of an element, and it does not match the open-tag, you want to
produce an error message that mentions or shows both tags. Stashing _where(ctx).begin()
somewhere that is available to the close-tag parser will enable that. See
Error
Handling and Debugging for an example of this.
_begin()
和 _end()
分别返回传递给 parse()
的范围的开始和结束。 _where()
返回一个 subrange
,表示当前解析匹配的输入的界限。 _where()
如果您只想解析一些文本并返回一个仅包含某些元素位置的结果,而不产生其他属性,则非常有用。 _where()
在跟踪位置、在稍后提供良好的诊断方面也至关重要。考虑 XML 中的不匹配标签;如果您解析元素末尾的闭合标签,并且它不匹配开标签,您希望产生一个提及或显示这两个标签的错误消息。将 _where(ctx).begin()
存储在闭合标签解析器可访问的地方将启用此功能。请参阅错误处理和调试的示例。
_error_handler()
returns a reference to the
error handler associated with the parser passed to parse()
.
Using _error_handler()
, you can generate errors
and warnings from within your semantic actions. See Error
Handling and Debugging for concrete examples.
_error_handler()
返回与传递给 parse()
的解析器关联的错误处理程序引用。使用 _error_handler()
,您可以在您的语义动作中生成错误和警告。请参阅错误处理和调试以获取具体示例。
_attr()
returns a reference to the
value of the current parser's attribute. It is available only when the current
parser's parse is successful. If the parser has no semantic action, no attribute
gets added to the parse context. It can be used to read and write the current
parser's attribute:
_attr()
返回当前解析器属性值的引用。仅在当前解析器解析成功时可用。如果解析器没有语义动作,则不会向解析上下文添加任何属性。它可以用来读取和写入当前解析器的属性:
[](auto & ctx) { _attr(ctx) = 3; }
If the current parser has no attribute, a none
is returned.
如果当前解析器没有属性,则返回一个 none
。
_val()
returns a reference to the
value of the attribute of the current rule being used to parse (if any),
and is available even before the rule's parse is successful. It can be used
to set the current rule's attribute, even from a parser that is a subparser
inside the rule. Let's say we're writing a parser with a semantic action
that is within a rule. If we want to set the current rule's value to some
function of subparser's attribute, we would write this semantic action:
_val()
返回当前正在使用的规则(如果有)的属性值的引用,即使在规则解析成功之前也可以使用。可以用来设置当前规则的属性,即使是从规则内部的子解析器中也可以。假设我们正在编写一个具有规则内语义动作的解析器。如果我们想将当前规则的值设置为子解析器属性的某个函数,我们会编写这个语义动作:
[](auto & ctx) { _val(ctx) = some_function(_attr(ctx)); }
If there is no current rule, or the current rule has no attribute, a none
is returned.
如果没有当前规则,或者当前规则没有属性,则返回一个 none
。
You need to use _val()
in cases where the default
attribute for a rule
's
parser is not directly compatible with the attribute type of the rule
.
In these cases, you'll need to write some code like the example above to
compute the rule
's
attribute from the rule
's
parser's generated attribute. For more info on rules
, see the next page, and
More About Rules.
您需要在默认属性对于某个 rule
的解析器不直接兼容于 rule
的属性类型的情况下使用 _val()
。在这些情况下,您需要编写一些像上面示例中的代码来从 rule
的解析器生成的属性计算 rule
的属性。有关 rules
的更多信息,请参阅下一页,以及更多关于规则的内容。
_globals()
returns a reference to a
user-supplied object that contains whatever data you want to use during the
parse. The "globals" for a parse is an object — typically
a struct — that you give to the top-level parser. Then you can use
_globals()
to access it at any time
during the parse. We'll see how globals get associated with the top-level
parser in The parse()
API later. As an example, say that you have an early part of the parse
that needs to record some black-listed values, and that later parts of the
parse might need to parse values, failing the parse if they see the black-listed
values. In the early part of the parse, you could write something like this.
_globals()
返回一个指向用户提供的对象的引用,该对象包含您在解析过程中想要使用的任何数据。解析的“全局变量”是一个对象——通常是结构体——您将其提供给顶层解析器。然后您可以在解析过程中任何时间使用 _globals()
来访问它。我们将在后面的 parse()
API 中看到全局变量是如何与顶层解析器关联的。作为一个例子,假设您在解析的早期部分需要记录一些黑名单值,而解析的后期部分可能需要解析值,如果看到黑名单值则解析失败。在解析的早期部分,您可以编写如下内容。
[](auto & ctx) { // black_list is a std::unordered_set. _globals(ctx).black_list.insert(_attr(ctx)); }
Later in the parse, you could then use black_list
to check values as they are parsed.
稍后解析时,您可以使用 black_list
来检查解析时的值。
[](auto & ctx) { if (_globals(ctx).black_list.contains(_attr(ctx))) _pass(ctx) = false; }
_locals()
returns a reference to one
or more values that are local to the current rule being parsed, if any. If
there are two or more local values, _locals()
returns a reference to a boost::parser::tuple
. Rules with locals are
something we haven't gotten to yet (see More
About Rules), but for now all you need to know is that you can provide
a template parameter (LocalState
)
to rule
,
and the rule will default construct an object of that type for use within
the rule. You access it via _locals()
:
_locals()
返回对当前解析规则中一个或多个局部值的引用(如果有的话)。如果有两个或更多局部值, _locals()
返回对 boost::parser::tuple
的引用。具有局部值的规则是我们还没有涉及的(参见关于规则的更多信息),但到目前为止,你需要知道的是,你可以提供一个模板参数( LocalState
)给 rule
,规则将默认构造一个该类型的对象以供规则内部使用。你可以通过 _locals()
访问它:
[](auto & ctx) { auto & local = _locals(ctx); // Use local here. If 'local' is a hana::tuple, access its members like this: using namespace hana::literals; auto & first_element = local[0_c]; auto & second_element = local[1_c]; }
If there is no current rule, or the current rule has no locals, a none
is returned.
如果没有当前规则,或者当前规则没有本地变量,则返回一个 none
。
_params()
, like _locals()
,
applies to the current rule being used to parse, if any (see More
About Rules). It also returns a reference to a single value, if the
current rule has only one parameter, or a boost::parser::tuple
of multiple values if
the current rule has multiple parameters. If there is no current rule, or
the current rule has no parameters, a none
is returned.
_params()
,类似于 _locals()
,适用于当前正在使用的解析规则(见关于规则的更多信息)。它还返回单个值的引用,如果当前规则只有一个参数,或者返回多个值的 boost::parser::tuple
,如果当前规则有多个参数。如果没有当前规则,或者当前规则没有参数,则返回 none
。
Unlike with _locals()
, you do
not provide a template parameter to rule
. Instead you call the
rule
's
with()
member function (again, see More
About Rules).
与 _locals()
不同,您没有为 rule
提供模板参数。相反,您调用 rule
的 with()
成员函数(再次,请参阅更多关于规则的内容)。
Note 注意 | |
---|---|
|
_no_case()
_no_case()
returns true
if the current parse context is inside one or more (possibly nested) no_case[]
directives. I don't have a
use case for this, but if I didn't expose it, it would be the only thing
in the context that you could not examine from inside a semantic action.
It was easy to add, so I did.
_no_case()
返回 true
,如果当前解析上下文位于一个或多个(可能嵌套的) no_case[]
指令内部。我没有用到这个功能,但如果我不公开它,那么在语义动作内部,你将无法检查上下文中的唯一一个东西。添加它很容易,所以我添加了它。
This example is very similar to the others we've seen so far. This one is
different only because it uses a rule
. As an analogy, think
of a parser like char_
or double_
as an individual line of code, and a rule
as a function. Like a
function, a rule
has its own name, and can even be forward declared. Here is how we define
a rule
,
which is analogous to forward declaring a function:
这个例子与我们迄今为止看到的非常相似。这个例子唯一的不同之处在于它使用了 rule
。作为一个类比,将像 char_
或 double_
这样的解析器视为一行代码,将 rule
视为一个函数。像函数一样, rule
有自己的名字,甚至可以进行前置声明。以下是我们的定义方式,这相当于前置声明一个函数:
bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
This declares the rule itself. The rule
is a parser, and we can
immediately use it in other parsers. That definition is pretty dense; take
note of these things:
这声明了规则本身。 rule
是一个解析器,我们可以在其他解析器中立即使用它。那个定义相当密集;注意以下事项:
struct
doubles
. Here we've declared
the tag type and used it all in one go; you can also use a previously
declared tag type.
struct
doubles
。这里我们声明了标签类型并一次性使用它;您也可以使用之前声明的标签类型。doubles
.
doubles
。doubles
the
diagnstic text "doubles"
so that Boost.Parser knows how to refer to it when producing a trace
of the parser during debugging.
doubles
提供了诊断文本 "doubles"
,这样 Boost.Parser 在调试期间生成解析器跟踪时知道如何引用它。
Ok, so if doubles
is a parser,
what does it do? We define the rule's behavior by defining a separate parser
that by now should look pretty familiar:
好的,所以如果 doubles
是一个解析器,它做什么?我们通过定义一个独立的解析器来定义规则的行為,到目前为止,这个解析器应该看起来相当熟悉:
auto const doubles_def = bp::double_ % ',';
This is analogous to writing a definition for a forward-declared function.
Note that we used the name doubles_def
.
Right now, the doubles
rule
parser and the doubles_def
non-rule parser have no connection to each other. That's intentional —
we want to be able to define them separately. To connect them, we declare
functions with an interface that Boost.Parser understands, and use the tag
type struct doubles
to connect them together. We use a macro for that:
这与为已声明的函数编写定义类似。注意,我们使用了名称 doubles_def
。目前, doubles
规则解析器和 doubles_def
非规则解析器之间没有连接。这是故意的——我们希望能够分别定义它们。为了将它们连接起来,我们声明了 Boost.Parser 能够理解的接口函数,并使用标签类型 struct doubles
将它们连接在一起。我们为此使用了一个宏:
BOOST_PARSER_DEFINE_RULES(doubles);
This macro expands to the code necessary to make the rule doubles
and its parser doubles_def
work together. The _def
suffix
is a naming convention that this macro relies on to work. The tag type allows
the rule parser, doubles
,
to call one of these overloads when used as a parser.
这个宏展开为使规则 doubles
及其解析器 doubles_def
协同工作的必要代码。 _def
后缀是一种命名约定,这个宏依赖于它来工作。标签类型允许规则解析器 doubles
在用作解析器时调用这些重载之一。
BOOST_PARSER_DEFINE_RULES
expands to two overloads of a function called parse_rule()
. In the case above, the overloads each
take a struct doubles
parameter (to distinguish them from the other overloads of parse_rule()
for other rules) and parse using doubles_def
.
You will never need to call any overload of parse_rule()
yourself; it is used internally by the
parser that implements rules
, rule_parser
.
BOOST_PARSER_DEFINE_RULES
展开为名为 parse_rule()
的函数的两个重载。在上面的例子中,每个重载都接受一个 struct doubles
参数(以区分其他规则中 parse_rule()
的其他重载)并使用 doubles_def
进行解析。您永远不需要自己调用 parse_rule()
的任何重载;它由实现 rules
、 rule_parser
的解析器内部使用。
Here is the definition of the macro that is expanded for each rule:
这里是对每个规则展开的宏定义:
#define BOOST_PARSER_DEFINE_IMPL(_, rule_name_) \ template< \ typename Iter, \ typename Sentinel, \ typename Context, \ typename SkipParser> \ decltype(rule_name_)::parser_type::attr_type parse_rule( \ decltype(rule_name_)::parser_type::tag_type *, \ Iter & first, \ Sentinel last, \ Context const & context, \ SkipParser const & skip, \ boost::parser::detail::flags flags, \ bool & success, \ bool & dont_assign) \ { \ auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def); \ using attr_t = \ decltype(parser(first, last, context, skip, flags, success)); \ using attr_type = decltype(rule_name_)::parser_type::attr_type; \ if constexpr (boost::parser::detail::is_nope_v<attr_t>) { \ dont_assign = true; \ parser(first, last, context, skip, flags, success); \ return {}; \ } else if constexpr (std::is_same_v<attr_type, attr_t>) { \ return parser(first, last, context, skip, flags, success); \ } else if constexpr (std::is_constructible_v<attr_type, attr_t>) { \ return attr_type( \ parser(first, last, context, skip, flags, success)); \ } else { \ attr_type attr{}; \ parser(first, last, context, skip, flags, success, attr); \ return attr; \ } \ } \ \ template< \ typename Iter, \ typename Sentinel, \ typename Context, \ typename SkipParser, \ typename Attribute> \ void parse_rule( \ decltype(rule_name_)::parser_type::tag_type *, \ Iter & first, \ Sentinel last, \ Context const & context, \ SkipParser const & skip, \ boost::parser::detail::flags flags, \ bool & success, \ bool & dont_assign, \ Attribute & retval) \ { \ auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def); \ using attr_t = \ decltype(parser(first, last, context, skip, flags, success)); \ if constexpr (boost::parser::detail::is_nope_v<attr_t>) { \ parser(first, last, context, skip, flags, success); \ } else { \ parser(first, last, context, skip, flags, success, retval); \ } \ }
Now that we have the doubles
parser, we can use it like we might any other parser:
现在我们有了 doubles
解析器,我们可以像使用任何其他解析器一样使用它:
auto const result = bp::parse(input, doubles, bp::ws);
The full program: 整个程序:
#include <boost/parser/parser.hpp> #include <deque> #include <iostream> #include <string> namespace bp = boost::parser; bp::rule<struct doubles, std::vector<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULES(doubles); int main() { std::cout << "Please enter a list of doubles, separated by commas. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, doubles, bp::ws); if (result) { std::cout << "You entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Parse failure.\n"; } }
All this is intended to introduce the notion of rules
. It still may be a bit
unclear why you would want to use rules
. The use cases for, and
lots of detail about, rules
is in a later section,
More About Rules.
所有这些旨在引入 rules
的概念。它仍然可能有点不清楚你为什么想使用 rules
。关于 rules
的使用案例和大量细节将在后面的章节“更多关于规则”中介绍。
So far, we've seen only simple parsers that parse the same value repeatedly
(with or without commas and spaces). It's also very common to parse a few
values in a specific sequence. Let's say you want to parse an employee record.
Here's a parser you might write:
到目前为止,我们只看到过简单的解析器,它们反复解析相同的值(带或不带逗号和空格)。解析特定顺序的几个值也非常常见。比如说,你想解析一个员工记录。下面是一个你可能编写的解析器:
namespace bp = boost::parser; auto employee_parser = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}';
The attribute type for employee_parser
is boost::parser::tuple<int,
std::string, std::string, double>
.
That's great, in that you got all the parsed data for the record without
having to write any semantic actions. It's not so great that you now have
to get all the individual elements out by their indices, using get()
.
It would be much nicer to parse into the final data structure that your program
is going to use. This is often some struct
or class
. Boost.Parser supports
parsing into arbitrary aggregate struct
s,
and non-aggregates that are constructible from the tuple at hand.
employee_parser
的属性类型是 boost::parser::tuple<int,
std::string, std::string, double>
。这很好,因为你得到了记录的所有解析数据,而无需编写任何语义操作。现在你必须通过索引使用 get()
来获取所有单个元素,这就不那么好了。如果能解析成程序将要使用的最终数据结构会更好。这通常是某些 struct
或 class
。Boost.Parser 支持将解析结果存储到任意聚合 struct
中,以及可以从当前元组构造的非聚合结构。
If we have a struct
that has
data members of the same types listed in the boost::parser::tuple
attribute type for employee_parser
, it would be nice to parse
directly into it, instead of parsing into a tuple and then constructing our
struct
later. Fortunately, this
just works in Boost.Parser. Here is an example of parsing straight into a
compatible aggregate type.
如果我们有一个具有与 boost::parser::tuple
属性类型中列出的相同类型的数据成员的 struct
,直接解析到它中会更好,而不是先解析到一个元组,然后再构建我们的 struct
。幸运的是,这正好在 Boost.Parser 中工作。这是一个将数据直接解析到兼容聚合类型的示例。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> struct employee { int age; std::string surname; std::string forename; double salary; }; namespace bp = boost::parser; int main() { std::cout << "Enter employee record. "; std::string input; std::getline(std::cin, input); auto quoted_string = bp::lexeme['"' >> +(bp::char_ - '"') >> '"']; auto employee_p = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}'; employee record; auto const result = bp::parse(input, employee_p, bp::ws, record); if (result) { std::cout << "You entered:\nage: " << record.age << "\nsurname: " << record.surname << "\nforename: " << record.forename << "\nsalary : " << record.salary << "\n"; } else { std::cout << "Parse failure.\n"; } }
Unfortunately, this is taking advantage of the loose attribute assignment
logic; the employee_parser
parser still has a boost::parser::tuple
attribute. See The
parse()
API for a description of attribute out-param compatibility.
很不幸,这是利用了宽松的属性赋值逻辑; employee_parser
解析器仍然有一个 boost::parser::tuple
属性。请参阅 parse()
API 了解属性输出参数兼容性的描述。
For this reason, it's even more common to want to make a rule that returns
a specific type like employee
.
Just by giving the rule a struct
type, we make sure that this parser always generates an employee
struct as its attribute, no matter where it is in the parse. If we made a
simple parser P
that uses
the employee_p
rule, like
bp::int >> employee_p
, P
's
attribute type would be boost::parser::tuple<int, employee>
.
因此,更常见的是想要制定一个返回特定类型如 employee
的规则。只需给规则赋予 struct
类型,我们就可以确保这个解析器无论在解析的哪个位置,都始终生成一个 employee
结构作为其属性。如果我们创建一个简单的解析器 P
,它使用 employee_p
规则,如 bp::int >> employee_p
,那么 P
的属性类型将是 boost::parser::tuple<int, employee>
。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> struct employee { int age; std::string surname; std::string forename; double salary; }; namespace bp = boost::parser; bp::rule<struct quoted_string, std::string> quoted_string = "quoted name"; bp::rule<struct employee_p, employee> employee_p = "employee"; auto quoted_string_def = bp::lexeme['"' >> +(bp::char_ - '"') >> '"']; auto employee_p_def = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}'; BOOST_PARSER_DEFINE_RULES(quoted_string, employee_p); int main() { std::cout << "Enter employee record. "; std::string input; std::getline(std::cin, input); static_assert(std::is_aggregate_v<std::decay_t<employee &>>); auto const result = bp::parse(input, employee_p, bp::ws); if (result) { std::cout << "You entered:\nage: " << result->age << "\nsurname: " << result->surname << "\nforename: " << result->forename << "\nsalary : " << result->salary << "\n"; } else { std::cout << "Parse failure.\n"; } }
Just as you can pass a struct
as an out-param to parse()
when the parser's attribute type is a tuple,
you can also pass a tuple as an out-param to parse()
when the parser's attribute type is a struct:
正如您可以将一个 struct
作为 out-param 传递给 parse()
,当解析器的属性类型是元组时,您也可以将一个元组作为 out-param 传递给 parse()
,当解析器的属性类型是结构体时:
// Using the employee_p rule from above, with attribute type employee...
boost::parser::tuple
<int, std::string, std::string, double> tup;
auto const result = bp::parse(input, employee_p, bp::ws, tup); // Ok!
Important 重要 | |
---|---|
This automatic use of |
class
types as attributesclass
类型作为属性
Many times you don't have an aggregate struct that you want to produce from
your parse. It would be even nicer than the aggregate code above if Boost.Parser
could detect that the members of a tuple that is produced as an attribute
are usable as the arguments to some type's constructor. So, Boost.Parser
does that.
很多时候,你并不需要一个从你的解析中生成的聚合结构。如果 Boost.Parser 能够检测到作为属性生成的元组的成员可以用作某些类型的构造函数的参数,那么这将比上面的聚合代码更好。所以,Boost.Parser 就是这样做的。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a string followed by two unsigned integers. "; std::string input; std::getline(std::cin, input); constexpr auto string_uint_uint = bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_; std::string string_from_parse; if (parse(input, string_uint_uint, bp::ws, string_from_parse)) std::cout << "That yields this string: " << string_from_parse << "\n"; else std::cout << "Parse failure.\n"; std::cout << "Enter an unsigned integer followed by a string. "; std::getline(std::cin, input); std::cout << input << "\n"; constexpr auto uint_string = bp::uint_ >> +bp::char_; std::vector<std::string> vector_from_parse; if (parse(input, uint_string, bp::ws, vector_from_parse)) { std::cout << "That yields this vector of strings:\n"; for (auto && str : vector_from_parse) { std::cout << " '" << str << "'\n"; } } else { std::cout << "Parse failure.\n"; } }
Let's look at the first parse.
让我们看看第一次解析。
constexpr auto string_uint_uint = bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_; std::string string_from_parse; if (parse(input, string_uint_uint, bp::ws, string_from_parse)) std::cout << "That yields this string: " << string_from_parse << "\n"; else std::cout << "Parse failure.\n";
Here, we use the parser string_uint_uint
,
which produces a boost::parser::tuple<std::string, unsigned int, unsigned
int>
attribute. When we try to parse that into an out-param std::string
attribute, it just works. This is because std::string
has a constructor that takes a std::string
,
an offset, and a length. Here's the other parse:
这里,我们使用解析器 string_uint_uint
,它产生一个 boost::parser::tuple<std::string, unsigned int, unsigned
int>
属性。当我们尝试将其解析为 out-param std::string
属性时,它就成功了。这是因为 std::string
有一个构造函数,它接受一个 std::string
、一个偏移量和长度。这是另一个解析:
constexpr auto uint_string = bp::uint_ >> +bp::char_; std::vector<std::string> vector_from_parse; if (parse(input, uint_string, bp::ws, vector_from_parse)) { std::cout << "That yields this vector of strings:\n"; for (auto && str : vector_from_parse) { std::cout << " '" << str << "'\n"; } } else { std::cout << "Parse failure.\n"; }
Now we have the parser uint_string
,
which produces boost::parser::tuple<unsigned int, std::string>
attribute — the two char
s
at the end combine into a std::string
.
Those two values can be used to construct a std::vector<std::string>
, via the count, T
constructor.
现在我们有解析器 uint_string
,它产生 boost::parser::tuple<unsigned int, std::string>
属性——末尾的两个 char
结合成一个 std::string
。这两个值可以通过计数, T
构造函数来构建一个 std::vector<std::string>
。
Just like with using aggregates in place of tuples, non-aggregate class
types can be substituted for tuples
in most places. That includes using a non-aggregate class
type as the attribute type of a rule
.
就像用聚合体代替元组一样,大多数情况下可以用非聚合体 class
类型替换元组。这包括将非聚合体 class
类型用作 rule
的属性类型。
However, while compatible tuples can be substituted for aggregates, you
can't substitute a tuple for some class
type T
just because the tuple could have been used to construct T
.
Think of trying to invert the substitution in the second parse above. Converting
a std::vector<std::string>
into a boost::parser::tuple<unsigned int, std::string>
makes no sense.
然而,虽然兼容元组可以替换聚合,但你不能仅仅因为元组可以用来构建某个 class
类型 T
就替换它。想想在上述第二个解析中尝试反转替换。将一个 std::vector<std::string>
转换为 boost::parser::tuple<unsigned int, std::string>
没有意义。
Frequently, you need to parse something that might have one of several forms.
operator|
is overloaded to form alternative parsers. For example:
经常,你需要解析可能具有几种形式的内容。 operator|
被重载以形成替代解析器。例如:
namespace bp = boost::parser; auto const parser_1 = bp::int_ | bp::eps;
parser_1
matches an integer,
or if that fails, it matches epsilon, the empty string.
This is equivalent to writing:
parser_1
匹配一个整数,如果失败,则匹配空字符串 epsilon。这相当于写成:
namespace bp = boost::parser; auto const parser_2 = -bp::int_;
However, neither parser_1
nor parser_2
is equivalent
to writing this:
然而, parser_1
和 parser_2
都不等同于这样写:
namespace bp = boost::parser; auto const parser_3 = bp::eps | bp::int_; // Does not do what you think.
The reason is that alternative parsers try each of their subparsers, one
at a time, and stop on the first one that matches. Epsilon
matches anything, since it is zero length and consumes no input. It even
matches the end of input. This means that parser_3
is equivalent to eps
by itself.
原因是替代解析器逐个尝试它们的子解析器,并在第一个匹配的停止。Epsilon 匹配任何内容,因为它长度为零且不消耗任何输入。它甚至可以匹配输入的末尾。这意味着 parser_3
与 eps
本身等价。
Note 注意 | |
---|---|
For this reason, writing |
Warning 警告 | |
---|---|
This kind of error is very common when |
It is very common to need to parse quoted strings. Quoted strings are slightly
tricky, though, when using a skipper (and you should be using a skipper 99%
of the time). You don't want to allow arbitrary whitespace in the middle
of your strings, and you also don't want to remove all whitespace from your
strings. Both of these things will happen with the typical skipper, ws
.
需要解析引号字符串的情况非常常见。然而,当使用跳过符时(你应该 99%的时间使用跳过符),引号字符串会变得稍微棘手一些。你不想在字符串中间允许任意空白字符,同时也不想从字符串中移除所有空白字符。典型的跳过符 ws
会导致这两种情况都发生。
So, here is how most people would write a quoted string parser:
所以,这是大多数人编写引号字符串解析器的方式:
namespace bp = boost::parser; const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];
Some things to note:
请注意以下几点:
lexeme[]
disables skipping in the
parser, and it must be written around the quotes, not around the operator*
expression; and
lexeme[]
禁用解析器的跳过功能,并且它必须写在引号周围,而不是 operator*
表达式周围;
This is a very common pattern. I have written a quoted string parser like
this dozens of times. The parser above is the quick-and-dirty version. A
more robust version would be able to handle escaped quotes within the string,
and then would immediately also need to support escaped escape characters.
这是一个非常常见的模式。我像这样写过几十次引号字符串解析器。上面的解析器是快速且简单的版本。一个更健壮的版本将能够处理字符串中的转义引号,然后还需要立即支持转义转义字符。
Boost.Parser provides quoted_string
to use in place
of this very common pattern. It supports quote- and escaped-character-escaping,
using backslash as the escape character.
Boost.Parser 提供 quoted_string
来替代这个非常常见的模式。它支持引号和转义字符转义,使用反斜杠作为转义字符。
namespace bp = boost::parser; auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws); assert(result1); std::cout << *result1 << "\n"; // Prints: some text auto result2 = bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws); assert(result2); std::cout << *result2 << "\n"; // Prints: some "text"
As common as this use case is, there are very similar use cases that it does
not cover. So, quoted_string
has some options.
If you call it with a single character, it returns a quoted_string
that uses that
single character as the quote-character.
与这种用例一样常见的是,还有一些非常类似的用例它没有涵盖。因此, quoted_string
有一些选项。如果你用单个字符调用它,它就返回一个使用该单个字符作为引号字符的 quoted_string
。
auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws); assert(result3); std::cout << *result3 << "\n"; // Prints: some text
You can also supply a range of characters. One of the characters from the
range must quote both ends of the string; mismatches are not allowed. Think
of how Python allows you to quote a string with either '"'
or '\''
, but the same character
must be used on both sides.
您也可以提供一组字符。该范围内的一个字符必须引用字符串的两端;不允许有误匹配。想想 Python 如何允许您使用 '"'
或 '\''
来引用字符串,但两侧必须使用相同的字符。
auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws); assert(result4); std::cout << *result4 << "\n"; // Prints: some text
Another common thing to do in a quoted string parser is to recognize escape
sequences. If you have simple escape sequencecs that do not require any real
parsing, like say the simple escape sequences from C++, you can provide a
symbols
object as well. The template parameter T
to symbols<T>
must be char
or char32_t
. You don't need to include the escaped
backslash or the escaped quote character, since those always work.
另一项在引号字符串解析器中常见的操作是识别转义序列。如果您有简单的转义序列,不需要任何实际解析,比如 C++中的简单转义序列,您也可以提供一个 symbols
对象。模板参数 T
到 symbols<T>
必须是 char
或 char32_t
。您不需要包含转义的反斜杠或转义的引号字符,因为那些总是有效的。
// the c++ simple escapes bp::symbols<char> const escapes = { {"'", '\''}, {"?", '\?'}, {"a", '\a'}, {"b", '\b'}, {"f", '\f'}, {"n", '\n'}, {"r", '\r'}, {"t", '\t'}, {"v", '\v'}}; auto result5 = bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws); assert(result5); std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text
Now that you've seen some examples, let's see how parsing works in a bit
more detail. Consider this example.
现在你已经看到了一些例子,让我们更详细地看看解析是如何工作的。考虑这个例子。
namespace bp = boost::parser; auto int_pair = bp::int_ >> bp::int_; // Attribute: tuple<int, int> auto int_pairs_plus = +int_pair >> bp::int_; // Attribute: tuple<std::vector<tuple<int, int>>, int>
int_pairs_plus
must match
a pair of int
s (using int_pair
) one or more times, and then must
match an additional int
. In
other words, it matches any odd number (greater than 1) of int
s in the input. Let's look at how this
parse proceeds.
int_pairs_plus
必须匹配一对 int
s(使用 int_pair
),一次或多次,然后必须匹配一个额外的 int
。换句话说,它匹配输入中任何奇数(大于 1)的 int
s。让我们看看这个解析是如何进行的。
auto result = bp::parse("1 2 3", int_pairs_plus, bp::ws);
At the beginning of the parse, the top level parser uses its first subparser
(if any) to start parsing. So, int_pairs_plus
,
being a sequence parser, would pass control to its first parser +int_pair
.
Then +int_pair
would use int_pair
to do
its parsing, which would in turn use bp::int_
.
This creates a stack of parsers, each one using a particular subparser.
在解析开始时,顶级解析器使用其第一个子解析器(如果有)来开始解析。因此,作为序列解析器的 int_pairs_plus
会将控制权传递给其第一个解析器 +int_pair
。然后 +int_pair
会使用 int_pair
进行解析,而 int_pair
又会使用 bp::int_
。这创建了一个解析器栈,每个解析器都使用特定的子解析器。
Step 1) The input is "1 2 3"
,
and the stack of active parsers is int_pairs_plus
-> +int_pair
-> int_pair
-> bp::int_
.
(Read "->" as "uses".) This parses "1"
,
and the whitespace after is skipped by bp::ws
. Control
passes to the second bp::int_
parser in int_pair
.
步骤 1)输入为 "1 2 3"
,活动解析器栈为 int_pairs_plus
-> +int_pair
-> int_pair
-> bp::int_
。(将"->"读作"使用"。)这解析 "1"
,后面的空白由 bp::ws
跳过。控制权传递到 int_pair
中的第二个 bp::int_
解析器。
Step 2) The input is "2 3"
and the stack of parsers looks the same, except the active parser is the
second bp::int_
from int_pair
.
This parser consumes "2"
and then bp::ws
skips the subsequent space. Since we've
finished with int_pair
's
match, its boost::parser::tuple<int,
int>
attribute is complete. It's parent is +int_pair
, so this tuple attribute is pushed
onto the back of +int_pair
's
attribute, which is a std::vector<boost::parser::tuple<int, int>>
. Control passes up to the parent
of int_pair
, +int_pair
.
Since +int_pair
is a one-or-more parser, it starts a new iteration; control passes to int_pair
again.
步骤 2)输入是 "2 3"
,解析器栈看起来相同,除了活动解析器是第二个 bp::int_
从 int_pair
。这个解析器消耗 "2"
,然后 bp::ws
跳过后续空格。由于我们已经完成了 int_pair
的匹配,其 boost::parser::tuple<int,
int>
属性已完成。它的父级是 +int_pair
,因此这个元组属性被推到 +int_pair
的属性后面, +int_pair
是一个 std::vector<boost::parser::tuple<int, int>>
。控制权传递到 int_pair
的父级, +int_pair
。由于 +int_pair
是一个一次或多次解析器,它开始新的迭代;控制权再次传递到 int_pair
。
Step 3) The input is "3"
and the stack of parsers looks the same, except the active parser is the
first bp::int_
from int_pair
again, and we're in the second iteration of +int_pair
. This parser consumes "3"
. Since this is the end of the
input, the second bp::int_
of int_pair
does not match. This partial match of "3"
should not count, since it was not part of a full match. So, int_pair
indicates its failure, and +int_pair
stops iterating. Since it did match once, +int_pair
does not fail; it is a zero-or-more
parser; failure of its subparser after the first success does not cause it
to fail. Control passes to the next parser in sequence within int_pairs_plus
.
步骤 3)输入是 "3"
,解析器栈看起来相同,除了活动解析器是第一个从 int_pair
开始的 bp::int_
,并且我们处于 +int_pair
的第二次迭代。此解析器消耗 "3"
。由于这是输入的末尾, int_pair
的第二个 bp::int_
不匹配。这个 "3"
的部分匹配不应计算,因为它不是完整匹配的一部分。因此, int_pair
指示其失败, +int_pair
停止迭代。由于它已经匹配过一次, +int_pair
不会失败;它是一个零次或多次解析器;其子解析器在第一次成功后的失败不会导致它失败。控制传递到 int_pairs_plus
中的下一个解析器。
Step 4) The input is "3"
again, and the stack of parsers is int_pairs_plus
-> bp::int_
. This parses the "3"
,
and the parse reaches the end of input. Control passes to int_pairs_plus
,
which has just successfully matched with all parser in its sequence. It then
produces its attribute, a boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>
, which gets returned from bp::parse()
.
步骤 4)输入再次为 "3"
,解析器栈为 int_pairs_plus
-> bp::int_
。这解析了 "3"
,解析到达输入末尾。控制传递到 int_pairs_plus
,它刚刚成功匹配其序列中的所有解析器。然后它产生其属性,一个 boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>
,从 bp::parse()
返回。
Something to take note of between Steps #3 and #4: at the beginning of #4,
the input position had returned to where is was at the beginning of #3. This
kind of backtracking happens in alternative parsers when an alternative fails.
The next page has more details on the semantics of backtracking.
请注意步骤#3 和#4 之间的内容:在#4 的开始,输入位置回到了#3 的开始处。这种回溯发生在替代解析器中,当替代失败时。下一页有更多关于回溯语义的细节。
So far, parsers have been presented as somewhat abstract entities. You may
be wanting more detail. A Boost.Parser parser P
is an invocable object with a pair of call operator overloads. The two functions
are very similar, and in many parsers one is implemented in terms of the
other. The first function does the parsing and returns the default attribute
for the parser. The second function does exactly the same parsing, but takes
an out-param into which it writes the attribute for the parser. The out-param
does not need to be the same type as the default attribute, but they need
to be compatible.
到目前为止,解析器被呈现为某种程度上的抽象实体。你可能想要更多细节。一个 Boost.Parser 解析器 P
是一个可调用的对象,具有一对重载的调用操作符。这两个函数非常相似,在许多解析器中,一个是通过另一个实现的。第一个函数执行解析并返回解析器的默认属性。第二个函数执行完全相同的解析,但将解析器的属性写入一个输出参数。输出参数不需要与默认属性相同类型,但它们需要兼容。
Compatibility means that the default attribute is assignable to the out-param
in some fashion. This usually means direct assignment, but it may also mean
a tuple -> aggregate or aggregate -> tuple conversion. For sequence
types, compatibility means that the sequence type has insert
or push_back
with the usual
semantics. This means that the parser +boost::parser::int_
can fill a std::set<int>
just
as well as a std::vector<int>
.
兼容性意味着默认属性可以以某种方式分配给输出参数。这通常意味着直接赋值,但也可能意味着元组到聚合或聚合到元组的转换。对于序列类型,兼容性意味着序列类型具有 insert
或 push_back
与常规语义。这意味着解析器 +boost::parser::int_
可以像 std::set<int>
一样填充 std::vector<int>
。
Some parsers also have additional state that is required to perform a match.
For instance, char_
parsers
can be parameterized with a single code point to match; the exact value of
that code point is stored in the parser object.
一些解析器还需要额外的状态来执行匹配。例如, char_
解析器可以用单个码点进行参数化以进行匹配;该码点的确切值存储在解析器对象中。
No parser has direct support for all the operations defined on parsers (operator|
,
operator>>
,
etc.). Instead, there is a template called parser_interface
that supports
all of these operations. parser_interface
wraps each
parser, storing it as a data member, adapting it for general use. You should
only ever see parser_interface
in the debugger,
or possibly in some of the reference documentation. You should never have
to write it in your own code.
没有解析器直接支持在解析器上定义的所有操作( operator|
, operator>>
等)。相反,有一个名为 parser_interface
的模板支持所有这些操作。 parser_interface
包装每个解析器,将其存储为数据成员,以便于通用使用。你只能在调试器中看到 parser_interface
,或者在部分参考文档中。你永远不需要在自己的代码中编写它。
As described in the previous page, backtracking occurs when the parse attempts
to match the current parser P
,
matches part of the input, but fails to match all of P
.
The part of the input consumed during the parse of P
is essentially "given back".
如前页所述,当解析尝试匹配当前解析器 P
时,匹配了输入的一部分,但未能匹配所有 P
。在解析 P
时消耗的输入部分实际上是“返回”。
This is necessary because P
may consist of subparsers, and each subparser that succeeds will try to consume
input, produce attributes, etc. When a later subparser fails, the parse of
P
fails, and the input must
be rewound to where it was when P
started its parse, not where the latest matching subparser stopped.
这是必要的,因为 P
可能包含子解析器,每个成功的子解析器都会尝试消费输入、生成属性等。当后续的子解析器失败时, P
的解析也会失败,并且输入必须回滚到 P
开始解析时的位置,而不是最新匹配的子解析器停止的位置。
Alternative parsers will often evaluate multiple subparsers one at a time,
advancing and then restoring the input position, until one of the subparsers
succeeds. Consider this example.
替代解析器通常会逐个评估多个子解析器,前进并恢复输入位置,直到其中一个子解析器成功。考虑这个例子。
namespace bp = boost::parser; auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];
Evaluating parser
means trying
to match other_parser
53
times, and if that fails, trying to match other_parser
10 times. Say you parse input that matches other_parser
11 times. parser
will match
it. It will also evaluate other_parser
21 times during the parse.
评估 parser
意味着尝试匹配 other_parser
53 次,如果失败,则尝试匹配 other_parser
10 次。假设你解析了匹配 other_parser
11 次的输入。 parser
将匹配它。在解析过程中,它还将评估 other_parser
21 次。
The attributes of the repeat(53)[other_parser]
and repeat(10)[other_parser]
are each std::vector<
; let's say that ATTR
(other_parser)>
is ATTR
(other_parser)int
.
The attribute of parser
as
a whole is the same, std::vector<int>
.
Since other_parser
is busy
producing int
s — 21 of
them to be exact — you may be wondering what happens to the ones produced
during the evaluation of repeat(53)[other_parser]
when it fails to find all 53 inputs. Its std::vector<int>
will contain 11 int
s at that
point.
repeat(53)[other_parser]
和 repeat(10)[other_parser]
的属性各为 std::vector<
;假设 ATTR
(other_parser)>
是 ATTR
(other_parser)int
。 parser
的整体属性相同,为 std::vector<int>
。由于 other_parser
正在忙于生产 int
,确切地说有 21 个——你可能想知道在 repeat(53)[other_parser]
未能找到所有 53 个输入时,在评估期间产生的那些会发生什么。那时它的 std::vector<int>
将包含 11 个 int
。
When a repeat-parser fails, and attributes are being generated, it clears
its container. This applies to parsers such as the ones above, but also all
the other repeat parsers, including ones made using operator+
or operator*
.
当重复解析器失败且正在生成属性时,它会清除其容器。这适用于上述解析器,也适用于所有其他重复解析器,包括使用 operator+
或 operator*
制作的解析器。
So, at the end of a successful parse by parser
of 10 inputs (since the right side of the alternative only eats 10 repetitions),
the std::vector<int>
attribute
of parser
would contain 10
int
s.
因此,在通过 parser
成功解析 10 个输入的末尾(因为替代项的右侧只吃 10 次重复), parser
的 std::vector<int>
属性将包含 10 个 int
。
Note 注意 | |
---|---|
Users of Boost.Spirit may be familiar with the |
Ok, so if parsers all try their best to match the input, and are all-or-nothing,
doesn't that leave room for all kinds of bad input to be ignored? Consider
the top-level parser from the Parsing
JSON example.
好的,所以如果所有解析器都尽力匹配输入,并且都是全有或全无的,那么这不是为各种不良输入留出了空间吗?考虑一下“解析 JSON 示例”中的顶级解析器。
auto const value_p_def = number | bp::bool_ | null | string | array_p | object_p;
What happens if I use this to parse "\""
?
The parse tries number
, fails.
It then tries bp::bool_
, fails. Then null
fails too. Finally, it starts parsing string
.
Good news, the first character is the open-quote of a JSON string. Unfortunately,
that's also the end of the input, so string
must fail too. However, we probably don't want to just give up on parsing
string
now and try array_p
, right? If the user wrote an open-quote
with no matching close-quote, that's not the prefix of some later alternative
of value_p_def
; it's ill-formed
JSON. Here's the parser for the string
rule:
如果我用这个来解析 "\""
会发生什么?解析尝试 number
,失败。然后尝试 bp::bool_
,也失败了。接着 null
也失败了。最后,它开始解析 string
。好消息,第一个字符是 JSON 字符串的开引号。不幸的是,这也是输入的结尾,所以 string
也必须失败。然而,我们现在可能不想放弃解析 string
并尝试 array_p
,对吧?如果用户写了一个没有匹配闭合引号的开放引号,那不是 value_p_def
的某些后续替代的前缀;这是不规范的 JSON。这是 string
规则的解析器:
auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];
Notice that operator>
is used on the right instead of operator>>
. This indicates the same sequence
operation as operator>>
,
except that it also represents an expectation. If the parse before the operator>
succeeds, whatever comes after it must also
succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted.
It will say something like "Expected '"' here.", quoting the
line, with a caret pointing to the place in the input where it expected the
right-side match.
请注意,在右侧使用的是 operator>
而不是 operator>>
。这表示与 operator>>
相同的序列操作,但同时也代表了一种期望。如果在 operator>
之前的解析成功,那么它之后的内容也必须成功。否则,顶级解析将失败,并发出诊断。它可能会说“在这里期望'\"'”,引用该行,并用一个箭头指向输入中期望右侧匹配的位置。
Choosing to use >
versus
>>
is how you indicate
to Boost.Parser that parse failure is or is not a hard error, respectively.
选择使用 >
与 >>
来指示 Boost.Parser 解析失败是或不是硬错误。
When writing a parser, it often comes up that there is a set of strings that,
when parsed, are associated with a set of values one-to-one. It is tedious
to write parsers that recognize all the possible input strings when you have
to associate each one with an attribute via a semantic action. Instead, we
can use a symbol table.
在编写解析器时,经常会出现一组字符串,当解析时,它们与一组值一一对应。当你必须通过语义动作将每个字符串与一个属性关联时,编写识别所有可能输入字符串的解析器是繁琐的。相反,我们可以使用符号表。
Say we want to parse Roman numerals, one of the most common work-related
parsing problems. We want to recognize numbers that start with any number
of "M"s, representing thousands, followed by the hundreds, the
tens, and the ones. Any of these may be absent from the input, but not all.
Here are three symbol Boost.Parser tables that we can use to recognize ones,
tens, and hundreds values, respectively:
我们想要解析罗马数字,这是最常见的与工作相关解析问题之一。我们想要识别以任意数量的"M"开头的数字,代表千位,然后是百位、十位和个位。这些中的任何一个都可以从输入中省略,但不能全部省略。以下是三个符号 Boost.Parser 表,我们可以使用它们分别识别个位、十位和百位的值:
bp::symbols<int> const ones = { {"I", 1}, {"II", 2}, {"III", 3}, {"IV", 4}, {"V", 5}, {"VI", 6}, {"VII", 7}, {"VIII", 8}, {"IX", 9}}; bp::symbols<int> const tens = { {"X", 10}, {"XX", 20}, {"XXX", 30}, {"XL", 40}, {"L", 50}, {"LX", 60}, {"LXX", 70}, {"LXXX", 80}, {"XC", 90}}; bp::symbols<int> const hundreds = { {"C", 100}, {"CC", 200}, {"CCC", 300}, {"CD", 400}, {"D", 500}, {"DC", 600}, {"DCC", 700}, {"DCCC", 800}, {"CM", 900}};
A symbols
maps strings of char
to their
associated attributes. The type of the attribute must be specified as a template
parameter to symbols
— in this case, int
.
一个 symbols
将 char
的字符串映射到其关联的属性。属性的类型必须作为模板参数指定给 symbols
— 在这种情况下, int
。
Any "M"s we encounter should add 1000 to the result, and all other
values come from the symbol tables. Here are the semantic actions we'll need
to do that:
任何遇到的“M”都应该将结果加 1000,其他所有值都来自符号表。以下是我们需要执行的语义动作:
int result = 0; auto const add_1000 = [&result](auto & ctx) { result += 1000; }; auto const add = [&result](auto & ctx) { result += _attr(ctx); };
add_1000
just adds 1000
to result
.
add
adds whatever attribute
is produced by its parser to result
.
add_1000
仅将 1000
添加到 result
。 add
将其解析器产生的任何属性添加到 result
。
Now we just need to put the pieces together to make a parser:
现在我们只需要将这些部分组合起来制作一个解析器:
using namespace bp::literals; auto const parser = *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];
We've got a few new bits in play here, so let's break it down. 'M'_l
is a
literal parser. That is, it is a parser that parses
a literal char
, code point,
or string. In this case, a char
'M'
is being parsed. The _l
bit at the end is a UDL
suffix that you can put after any char
,
char32_t
, or char
const *
to form a literal parser. You can also make a literal parser by writing
lit()
, passing an argument of
one of the previously mentioned types.
我们在这里有一些新的功能,让我们来分解一下。 'M'_l
是一个字面量解析器。也就是说,它是一个解析字面量 char
、代码点或字符串的解析器。在这种情况下,正在解析一个 char
'M'
。末尾的 _l
位是一个 UDL 后缀,您可以在任何 char
、 char32_t
或 char
const *
后面添加它来形成一个字面量解析器。您还可以通过编写 lit()
并传递之前提到的类型之一作为参数来创建一个字面量解析器。
Why do we need any of this, considering that we just used a literal ','
in our previous example? The reason is that
'M'
is not used in an expression
with another Boost.Parser parser. It is used within *'M'_l[add_1000]
.
If we'd written *'M'[add_1000]
, clearly that would be ill-formed; char
has no operator*
, nor an operator[]
, associated with it.
为什么我们需要这些,考虑到我们之前例子中刚刚使用了字面量 ','
?原因是 'M'
不在另一个 Boost.Parser 解析器中的表达式中使用。它是在 *'M'_l[add_1000]
中使用的。如果我们写了 *'M'[add_1000]
,显然那是非法的; char
没有与它相关的 operator*
,也没有 operator[]
。
Tip 提示 | |
---|---|
Any time you want to use a |
On to the next bit: -hundreds[add]
.
By now, the use of the index operator should be pretty familiar; it associates
the semantic action add
with
the parser hundreds
. The
operator-
at the beginning is new. It means that the parser it is applied to is optional.
You can read it as "zero or one". So, if hundreds
is not successfully parsed after *'M'[add_1000]
, nothing happens, because hundreds
is allowed to be missing —
it's optional. If hundreds
is parsed successfully, say by matching "CC"
,
the resulting attribute, 200
,
is added to result
inside
add
.
接下来是下一部分: -hundreds[add]
。到现在,索引操作符的使用应该已经很熟悉了;它与解析器 hundreds
关联语义动作 add
。开头的 operator-
是新的。这意味着应用到的解析器是可选的。你可以把它读作“零或一”。所以,如果 hundreds
在 *'M'[add_1000]
之后没有成功解析,就没有什么发生,因为 hundreds
可以缺失——它是可选的。如果 hundreds
成功解析,比如说通过匹配 "CC"
,结果属性 200
将被添加到 result
中的 add
内部。
Here is the full listing of the program. Notice that it would have been inappropriate
to use a whitespace skipper here, since the entire parse is a single number,
so it was removed.
这里是程序的完整列表。请注意,在这里使用空格跳过是不合适的,因为整个解析是一个单独的数字,所以它被移除了。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a number using Roman numerals. "; std::string input; std::getline(std::cin, input); bp::symbols<int> const ones = { {"I", 1}, {"II", 2}, {"III", 3}, {"IV", 4}, {"V", 5}, {"VI", 6}, {"VII", 7}, {"VIII", 8}, {"IX", 9}}; bp::symbols<int> const tens = { {"X", 10}, {"XX", 20}, {"XXX", 30}, {"XL", 40}, {"L", 50}, {"LX", 60}, {"LXX", 70}, {"LXXX", 80}, {"XC", 90}}; bp::symbols<int> const hundreds = { {"C", 100}, {"CC", 200}, {"CCC", 300}, {"CD", 400}, {"D", 500}, {"DC", 600}, {"DCC", 700}, {"DCCC", 800}, {"CM", 900}}; int result = 0; auto const add_1000 = [&result](auto & ctx) { result += 1000; }; auto const add = [&result](auto & ctx) { result += _attr(ctx); }; using namespace bp::literals; auto const parser = *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add]; if (bp::parse(input, parser) && result != 0) std::cout << "That's " << result << " in Arabic numerals.\n"; else std::cout << "That's not a Roman number.\n"; }
Important 重要 | |
---|---|
|
Just like with a rule
,
you can give a symbols
a bit of diagnostic text that will be used in error messages generated by
Boost.Parser when the parse fails at an expectation point, as described in
Error
Handling and Debugging. See the symbols
constructors for details.
就像使用 rule
一样,您可以为 symbols
提供一些诊断文本,这些文本将在 Boost.Parser 在期望点解析失败时生成的错误消息中使用,如错误处理和调试中所述。有关详细信息,请参阅 symbols
构造函数。
The previous example showed how to use a symbol table as a fixed lookup table.
What if we want to add things to the table during the parse? We can do that,
but we need to do so within a semantic action. First, here is our symbol
table, already with a single value in it:
前一个示例展示了如何使用符号表作为固定查找表。如果我们想在解析过程中向表中添加内容怎么办?我们可以这样做,但需要在语义动作中完成。首先,这是我们的符号表,其中已经包含了一个值:
bp::symbols<int> const symbols = {{"c", 8}}; assert(parse("c", symbols));
No surprise that it works to use the symbol table as a parser to parse the
one string in the symbol table. Now, here's our parser:
没有任何惊讶,使用符号表作为解析器来解析符号表中的一个字符串是可行的。现在,这是我们的解析器:
auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;
Here, we've attached the semantic action not to a simple parser like double_
,
but to the sequence parser (bp::char_
>> bp::int_)
. This sequence parser contains two parsers,
each with its own attribute, so it produces two attributes as a tuple.
这里,我们将语义动作附加到序列解析器 (bp::char_
>> bp::int_)
,而不是简单的解析器 double_
。这个序列解析器包含两个解析器,每个解析器都有自己的属性,因此它产生一个包含两个属性的元组。
auto const add_symbol = [&symbols](auto & ctx) { using namespace bp::literals; // symbols::insert() requires a string, not a single character. char chars[2] = {_attr(ctx)[0_c], 0}; symbols.insert(ctx, chars, _attr(ctx)[1_c]); };
Inside the semantic action, we can get the first element of the attribute
tuple using UDLs
provided by Boost.Hana, and boost::hana::tuple::operator[]()
. The first attribute, from the char_
,
is _attr(ctx)[0_c]
, and
the second, from the int_
, is _attr(ctx)[1_c]
(if boost::parser::tuple
aliases to std::tuple
, you'd use std::get
or
boost::parser::get
instead). To add the symbol to the symbol table, we call insert()
.
在语义动作中,我们可以使用 Boost.Hana 提供的 UDL 获取属性元组的第一个元素,以及 boost::hana::tuple::operator[]()
。第一个属性,来自 char_
,是 _attr(ctx)[0_c]
,第二个,来自 int_
,是 _attr(ctx)[1_c]
(如果 boost::parser::tuple
别名到 std::tuple
,则使用 std::get
或 boost::parser::get
)。要将符号添加到符号表中,我们调用 insert()
。
auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;
During the parse, ("X", 9)
is parsed and added to the symbol table. Then, the second 'X'
is recognized by the symbol table parser. However:
在解析过程中, ("X", 9)
被解析并添加到符号表中。然后,符号表解析器识别了第二个 'X'
。然而:
assert(!parse("X", symbols));
If we parse again, we find that "X"
did not stay in the symbol table. The fact that symbols
was declared const might have given you a hint that this would happen.
如果我们再次解析,我们会发现 "X"
没有留在符号表中。 symbols
被声明为 const 的事实可能已经给你暗示了这种情况会发生。
The full program: 整个程序:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { bp::symbols<int> const symbols = {{"c", 8}}; assert(parse("c", symbols)); auto const add_symbol = [&symbols](auto & ctx) { using namespace bp::literals; // symbols::insert() requires a string, not a single character. char chars[2] = {_attr(ctx)[0_c], 0}; symbols.insert(ctx, chars, _attr(ctx)[1_c]); }; auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols; auto const result = parse("X 9 X", parser, bp::ws); assert(result && *result == 9); (void)result; assert(!parse("X", symbols)); }
Important 重要 | |
---|---|
|
It is possible to add symbols to a symbols
permanently. To do
so, you have to use a mutable symbols
object s
, and add the symbols by calling s.insert_for_next_parse()
, instead of s.insert()
. These two operations are orthogonal, so
if you want to both add a symbol to the table for the current top-level parse,
and leave it in the table for subsequent top-level parses, you need to call
both functions.
可以永久地向 symbols
添加符号。为此,您必须使用可变 symbols
对象 s
,并通过调用 s.insert_for_next_parse()
添加符号,而不是 s.insert()
。这两个操作是正交的,因此如果您想同时将符号添加到当前顶级解析的表中,并保留在后续顶级解析的表中,您需要调用这两个函数。
It is also possible to erase a single entry from the symbol table, or to
clear the symbol table entirely. Just as with insertion, there are versions
of erase and clear for the current parse, and another that applies only to
subsequent parses. The full set of operations can be found in the symbols
API docs.
也可以从符号表中删除单个条目,或者完全清除符号表。与插入类似,删除和清除操作也有针对当前解析的版本,以及仅适用于后续解析的版本。完整的操作集可以在 symbols
API 文档中找到。
[mpte There are two versions of each of the symbols
*_for_next_parse()
functions — one that takes a context, and one that does not. The one
with the context is meant to be used within a semantic action. The one without
the context is for use outside of any parse.]
[mpte 每个 symbols
*_for_next_parse()
函数都有两个版本——一个接受上下文,一个不接受。带有上下文的版本旨在在语义动作中使用。不带上下文的版本用于任何解析之外。]
Boost.Parser comes with all the parsers most parsing tasks will ever need.
Each one is a constexpr
object,
or a constexpr
function. Some
of the non-functions are also callable, such as char_
, which may be used directly,
or with arguments, as in char_
('a', 'z')
. Any parser that can be called, whether
a function or callable object, will be called a callable parser
from now on. Note that there are no nullary callable parsers; they each take
one or more arguments.
Boost.Parser 附带所有大多数解析任务所需的解析器。每个解析器都是一个 constexpr
对象,或者一个 constexpr
函数。其中一些非函数也是可调用的,例如 char_
,可以直接使用,或者带参数使用,如 char_
('a', 'z')
。任何可以调用的解析器,无论是函数还是可调用对象,从现在起都称为可调用解析器。请注意,没有无参可调用解析器;它们每个都接受一个或多个参数。
Each callable parser takes one or more parse arguments.
A parse argument may be a value or an invocable object that accepts a reference
to the parse context. The reference parameter may be mutable or constant.
For example:
每个可调用的解析器接受一个或多个解析参数。解析参数可能是一个值或接受解析上下文引用的可调用对象。引用参数可以是可变的或常量的。例如:
struct get_attribute { template<typename Context> auto operator()(Context & ctx) { return _attr(ctx); } };
This can also be a lambda. For example:
这也可以是一个 lambda。例如:
[](auto const & ctx) { return _attr(ctx); }
The operation that produces a value from a parse argument, which may be a
value or a callable taking a parse context argument, is referred to as resolving
the parse argument. If a parse argument arg
can be called with the current context, then the resolved value of arg
is arg(ctx)
;
otherwise, the resolved value is just arg
.
解析参数的操作,该参数可能是一个值或一个接受解析上下文参数的可调用对象,被称为解析参数的解析。如果解析参数 arg
可以在当前上下文中调用,则 arg
的解析值为 arg(ctx)
;否则,解析值就是 arg
。
Some callable parsers take a parse predicate. A parse
predicate is not quite the same as a parse argument, because it must be a
callable object, and cannot be a value. A parse predicate's return type must
be contextually convertible to bool
.
For example:
一些可调用的解析器接受一个解析谓词。解析谓词并不完全等同于解析参数,因为它必须是一个可调用对象,而不能是一个值。解析谓词的返回类型必须能够上下文转换成 bool
。例如:
struct equals_three { template<typename Context> bool operator()(Context const & ctx) { return _attr(ctx) == 3; } };
This may of course be a lambda:
这当然可能是一个 lambda:
[](auto & ctx) { return _attr(ctx) == 3; }
The notional macro RESOLVE
()
expands to the result of resolving a parse
argument or parse predicate. You'll see it used in the rest of the documentation.
该概念宏 RESOLVE
()
扩展为解析参数或解析谓词解析的结果。您将在文档的其余部分看到它的使用。
An example of how parse arguments are used:
一个解析参数的使用示例:
namespace bp = boost::parser; // This parser matches one code point that is at least 'a', and at most // the value of last_char, which comes from the globals. auto last_char = [](auto & ctx) { return _globals(ctx).last_char; } auto subparser = bp::char_('a', last_char);
Don't worry for now about what the globals are for now; the take-away is
that you can make any argument you pass to a parser depend on the current
state of the parse, by using the parse context:
现在不用担心全局变量是什么;重要的是,你可以通过使用解析上下文,使传递给解析器的任何参数都依赖于当前的解析状态
namespace bp = boost::parser; // This parser parses two code points. For the parse to succeed, the // second one must be >= 'a' and <= the first one. auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); }; auto parser = bp::char_[set_last_char] >> subparser;
Each callable parser returns a new parser, parameterized using the arguments
given in the invocation.
每个可调用的解析器都返回一个新的解析器,该解析器使用在调用中给出的参数进行参数化。
This table lists all the Boost.Parser parsers. For the callable parsers,
a separate entry exists for each possible arity of arguments. For a parser
p
, if there is no entry for
p
without arguments, p
is a function, and cannot itself be used
as a parser; it must be called. In the table below:
此表列出了所有 Boost.Parser 解析器。对于可调用的解析器,每个可能的参数数量都有一个单独的条目。对于解析器 p
,如果没有不带参数的 p
条目, p
是一个函数,它本身不能用作解析器;必须调用它。在下表中:
char
");
char
");RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses);
RESOLVE
()
是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》)RESOLVE
(pred) == true
"
is a shorthand notation for "RESOLVE
(pred)
is contextually convertible to bool
and true
";
likewise for false
;
RESOLVE
(pred) == true
" 是 " RESOLVE
(pred)
在语境上可转换为 bool
和 true
" 的缩写;同样适用于 false
;c
is a character of type
char
, char8_t
,
or char32_t
;
c
是类型 char
、 char8_t
或 char32_t
的字符;str
is a string literal
of type char const[]
, char8_t
const []
,
or char32_t const
[]
;
str
是类型 char const[]
、 char8_t
const []
或 char32_t const
[]
的字符串字面量;pred
is a parse predicate;
pred
是一个解析谓词;arg0
, arg1
,
arg2
, ... are parse arguments;
arg0
、 arg1
、 arg2
等是解析参数;a
is a semantic action;
a
是一个语义动作;r
is an object whose
type models parsable_range
;
r
是一个类型为 parsable_range
的对象p
, p1
,
p2
, ... are parsers;
and
p
、 p1
、 p2
等是解析器;并且escapes
is a symbols<T>
object, where T
is char
or char32_t
.
escapes
是一个 symbols<T>
对象,其中 T
是 char
或 char32_t
。Note 注意 | |
---|---|
The definition of
template<typename T> concept parsable_range = std::ranges::forward_range<T> && code_unit<std::ranges::range_value_t<T>>;
|
Note 注意 | |
---|---|
Some of the parsers in this table consume no input. All parsers consume
the input they match unless otherwise stated in the table below.
|
Table 26.6. Parsers and Their Semantics
表 26.6. 解析器和它们的语义
Parser 解析器 |
Semantics 语义 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|---|
Matches epsilon, the empty string. Always
matches, and consumes no input.
|
None. |
Matching |
|
|
Fails to match the input if |
None. |
|
Matches a single whitespace code point (see note), according to
the Unicode White_Space property.
|
None. |
For more info, see the Unicode
properties. |
|
Matches a single newline (see note), following the "hard"
line breaks in the Unicode line breaking algorithm.
|
None. |
For more info, see the Unicode
Line Breaking Algorithm. |
|
Matches only at the end of input, and consumes no input.
|
None. |
||
|
Always matches, and consumes no input. Generates the attribute
|
|
An important use case for |
Matches any single code point.
|
The code point type in Unicode parsing, or |
||
|
Matches exactly the code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
Matches a single code point.
|
|
Similar to |
|
Matches a single code point.
|
|
Similar to |
|
The code point type in Unicode parsing, or |
|||
Matches a single control-character code point.
|
The code point type in Unicode parsing, or |
||
Matches a single decimal digit code point.
|
The code point type in Unicode parsing, or |
||
Matches a single punctuation code point.
|
The code point type in Unicode parsing, or |
||
Matches a single hexidecimal digit code point.
|
The code point type in Unicode parsing, or |
||
Matches a single lower-case code point.
|
The code point type in Unicode parsing, or |
||
Matches a single upper-case code point.
|
The code point type in Unicode parsing, or |
||
|
Matches exactly the given code point |
None. |
|
|
Matches exactly the given code point |
None. |
This is a UDL
that represents |
|
Matches exactly the given string |
None. |
|
|
Matches exactly the given string |
None. |
This is a UDL
that represents |
|
Matches exactly |
|
|
|
Matches exactly |
|
This is a UDL
that represents |
Matches |
|
||
Matches a binary unsigned integral value.
|
|
For example, |
|
|
Matches exactly the binary unsigned integral value |
|
|
Matches an octal unsigned integral value.
|
|
For example, |
|
|
Matches exactly the octal unsigned integral value |
|
|
Matches a hexadecimal unsigned integral value.
|
|
For example, |
|
|
Matches exactly the hexadecimal unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a floating-point number. |
|
||
Matches a floating-point number. |
|
||
|
Matches iff |
|
The special value |
|
Matches iff |
|
The special value |
|
|
It is an error to write |
|
|
Equivalent to |
|
It is an error to write |
|
|
Unlike the other entries in this table, |
|
Matches |
|
The result does not include the quotes. A quote within the string
can be written by escaping it with a backslash. A backslash within
the string can be written by writing two consecutive backslashes.
Any other use of a backslash will fail the parse. Skipping is disabled
while parsing the entire string, as if using |
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
|
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
Important 重要 | |
---|---|
All the character parsers, like |
Note 注意 | |
---|---|
A slightly more complete description of the attributes generated by these
parsers is in a subsequent section. The attributes are repeated here so
you can use see all the properties of the parsers in one place.
|
If you have an integral type IntType
that is not covered by any of the Boost.Parser parsers, you can use a more
verbose declaration to declare a parser for IntType
.
If IntType
were unsigned,
you would use uint_parser
.
If it were signed, you would use int_parser
.
For example:
如果您有一个任何 Boost.Parser 解析器都没有涵盖的整型 IntType
,您可以使用更冗长的声明来声明一个解析器用于 IntType
。如果 IntType
是无符号的,您将使用 uint_parser
。如果是带符号的,您将使用 int_parser
。例如:
constexpr parser_interface<int_parser<IntType>> hex_int;
uint_parser
and int_parser
accept three more non-type template
parameters after the type parameter. They are Radix
,
MinDigits
, and MaxDigits
. Radix
defaults to 10
, MinDigits
to 1
,
and MaxDigits
to -1
, which is
a sentinel value meaning that there is no max number of digits.
uint_parser
和 int_parser
在类型参数之后接受三个额外的非类型模板参数。它们是 Radix
、 MinDigits
和 MaxDigits
。 Radix
默认为 10
, MinDigits
为 1
, MaxDigits
为 -1
,这是一个哨兵值,表示没有最大数字限制。
So, if you wanted to parse exactly eight hexadecimal digits in a row in order
to recognize Unicode character literals like C++ has (e.g. \Udeadbeef
),
you could use this parser for the digits at the end:
因此,如果您想解析连续的八个十六进制数字以识别类似于 C++中的 Unicode 字符字面量(例如 \Udeadbeef
),则可以使用此解析器来解析末尾的数字:
constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;
A directive is an element of your parser that doesn't have any meaning by
itself. Some are second-order parsers that need a first-order parser to do
the actual parsing. Others influence the parse in some way. You can often
spot a directive lexically by its use of []
;
directives always []
. Non-directives
might, but only when attaching a semantic action.
指令是您解析器的一个元素,它本身没有任何意义。有些是二阶解析器,需要一阶解析器来进行实际的解析。其他的一些以某种方式影响解析。您通常可以通过其使用 []
;指令来通过词法识别出指令;非指令可能,但仅当附加语义动作时。
The directives that are second order parsers are technically directives,
but since they are also used to create parsers, it is more useful just to
focus on that. The directives repeat()
and if_()
were already described in
the section on parsers; we won't say much about them here.
二阶解析器指令在技术上也是指令,但鉴于它们也用于创建解析器,因此只需关注这一点更有用。指令 repeat()
和 if_()
已在解析器部分中描述;这里我们不会过多介绍它们。
Sequence, alternative, and permutation parsers do not nest in most cases.
(Let's consider just sequence parsers to keep thinkgs simple, but most of
this logic applies to alternative parsers as well.) a
>> b
>> c
is the same as (a
>> b) >> c
and a
>> (b >> c)
, and
they are each represented by a single seq_parser
with three subparsers,
a
, b
,
and c
. However, if something
prevents two seq_parsers
from interacting directly, they will nest.
For instance, lexeme[a >> b] >>
c
is a seq_parser
containing two parsers,
lexeme[a >> b]
and
c
. This is because lexeme[]
takes its given parser and
wraps it in a lexeme_parser
. This in turn
turns off the sequence parser combining logic, since both sides of the second
operator>>
in lexeme[a >> b] >>
c
are not seq_parsers
. Sequence parsers
have several rules that govern what the overall attribute type of the parser
is, based on the positions and attributes of it subparsers (see Attribute
Generation). Therefore, it's important to know which directives create
a new parser (and what kind), and which ones do not; this is indicated for
each directive below.
序列、替代和排列解析器在大多数情况下不会嵌套。(让我们只考虑序列解析器以保持事情简单,但大部分逻辑也适用于替代解析器。) a
>> b
>> c
与 (a
>> b) >> c
和 a
>> (b >> c)
相同,它们各自由一个包含三个子解析器的单个 seq_parser
表示,分别是 a
、 b
和 c
。然而,如果某些因素阻止两个 seq_parsers
直接交互,它们将会嵌套。例如, lexeme[a >> b] >>
c
是一个包含两个解析器 lexeme[a >> b]
和 c
的 seq_parser
。这是因为 lexeme[]
将其给定的解析器包裹在 lexeme_parser
中。这反过来又关闭了序列解析器组合逻辑,因为 lexeme[a >> b] >>
c
中的第二个 operator>>
的两边都不是 seq_parsers
。序列解析器有几条规则来规范解析器的整体属性类型,基于其子解析器的位置和属性(见属性生成)。因此,了解哪些指令创建新的解析器(以及是什么类型的解析器)以及哪些指令不创建解析器很重要;下面为每个指令指明了这一点。
See The
Parsers And Their Uses. Creates a repeat_parser
.
查看解析器和它们的用途。创建一个 repeat_parser
。
See The
Parsers And Their Uses. Creates a seq_parser
.
查看解析器和它们的用途。创建一个 seq_parser
。
omit[p]
disables attribute generation for the parser p
.
Not only does omit[p]
have no attribute, but any attribute generation work that normally happens
within p
is skipped.
omit[p]
禁用解析器的属性生成 p
。不仅没有属性,而且通常在 p
内发生的任何属性生成工作都会被跳过。
This directive can be useful in cases like this: say you have some fairly
complicated parser p
that
generates a large and expensive-to-construct attribute. Now say that you
want to write a function that just counts how many times p
can match a string (where the matches are non-overlapping). Instead of using
p
directly, and building
all those attributes, or rewriting p
without the attribute generation, use omit[]
.
此指令在这种情况下可能很有用:比如说,你有一个相当复杂的解析器 p
,它生成一个庞大且构建成本高昂的属性。现在假设你想编写一个函数,只计算 p
可以匹配字符串的次数(匹配是非重叠的)。与其直接使用 p
并构建所有这些属性,或者在不生成属性的情况下重写 p
,不如使用 omit[]
。
Creates an omit_parser
.
创建一个 omit_parser
。
raw[p]
changes the attribute from
to to a view that delimits the subrange of the input that was matched by
ATTR
(p)p
. The type of the view is
subrange<I>
,
where I
is the type of the
iterator used within the parse. Note that this may not be the same as the
iterator type passed to parse()
.
For instance, when parsing UTF-8, the iterator passed to parse()
may be char8_t const
*
, but within the parse it will be
a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[]
, raw[]
causes all attribute-generation work within p
to be skipped.
raw[p]
将属性从
更改为定义由 ATTR
(p)p
匹配的输入子范围的视图。视图类型为 subrange<I>
,其中 I
是解析中使用的迭代器的类型。请注意,这可能与传递给 parse()
的迭代器类型不同。例如,当解析 UTF-8 时,传递给 parse()
的迭代器可能是 char8_t const
*
,但在解析过程中将是一个 UTF-8 到 UTF-32 的转换(转换)迭代器。就像 omit[]
一样, raw[]
会导致在 p
内跳过所有属性生成工作。
Similar to the re-use scenario for omit[]
above, raw[]
could be used to find the
locations of all non-overlapping matches
of p
in a string.
类似于上面 omit[]
的复用场景, raw[]
可以用来在一个字符串中找到所有非重叠匹配的 p
的位置。
Creates a raw_parser
.
创建一个 raw_parser
。
string_view[p]
is very similar to raw[p]
, except
that it changes the attribute of p
to std::basic_string_view<C>
,
where C
is the character
type of the underlying range being parsed. string_view[]
requires that the underlying range being parsed is contiguous. Since this
can only be detected in C++20 and later, string_view[]
is not available in C++17 mode.
string_view[p]
与 raw[p]
非常相似,除了它将 p
的属性更改为 std::basic_string_view<C>
,其中 C
是正在解析的底层范围的字符类型。 string_view[]
要求正在解析的底层范围是连续的。由于这只能在 C++20 及以后版本中检测到,因此 string_view[]
在 C++17 模式下不可用。
Similar to the re-use scenario for omit[]
above, string_view[]
could be used to find the
locations of all non-overlapping matches
of p
in a string. Whether
raw[]
or string_view[]
is more natural to use to report the locations depends on your use case,
but they are essentially the same.
类似于上面 omit[]
的复用场景, string_view[]
可以用来查找字符串中所有非重叠匹配的 p
的位置。使用 raw[]
或 string_view[]
来报告位置哪个更自然取决于你的用例,但它们本质上是一样的。
Creates a string_view_parser
.
创建一个 string_view_parser
。
no_case[p]
enables case-insensitive parsing within the parse of p
.
This applies to the text parsed by char_()
,
string()
, and bool_
parsers. The number
parsers are already case-insensitive. The case-insensitivity is achieved
by doing Unicode case folding on the text being parsed and the values in
the parser being matched (see note below if you want to know more about Unicode
case folding). In the non-Unicode code path, a full Unicode case folding
is not done; instead, only the transformations of values less than 0x100
are done. Examples:
no_case[p]
启用对 p
的解析中的不区分大小写的解析。这适用于 char_()
、 string()
和 bool_
解析器解析的文本。数字解析器已经不区分大小写。通过在解析的文本和解析器中匹配的值上进行 Unicode 大小写折叠来实现不区分大小写(如需了解更多关于 Unicode 大小写折叠的信息,请参阅以下注释)。在非 Unicode 代码路径中,不执行完整的 Unicode 大小写折叠;相反,只对小于 0x100
的值进行转换。示例:
#include <boost/parser/transcode_view.hpp> // For as_utfN. namespace bp = boost::parser; auto const street_parser = bp::string(u8"Tobias Straße"); assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser)); // No match. assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match! auto const alpha_parser = bp::no_case[bp::char_('a', 'z')]; assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match! assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
Everything pretty much does what you'd naively expect inside no_case[]
, except that the two-character
range version of char_
has
a limitation. It only compares a code point from the input to its two arguments
(e.g. 'a'
and 'z'
in the example above). It does not do anything special for multi-code point
case folding expansions. For instance, char_(U'ß', U'ß')
matches the input U"s"
, which makes sense, since U'ß'
expands
to U"ss"
.
However, that same parser does not match
the input U"ß"
!
In short, stick to pairs of code points that have single-code point case
folding expansions. If you need to support the multi-expanding code points,
use the other overload, like: char_(U"abcd/*...*/ß")
.
所有内容基本上都符合你天真地期望在 no_case[]
内执行的操作,除了 char_
的两个字符范围版本有一个限制。它只将输入中的一个码点与其两个参数(例如上面的示例中的 'a'
和 'z'
)进行比较。对于多码点的情况折叠扩展,它不做任何特殊处理。例如, char_(U'ß', U'ß')
与输入 U"s"
匹配,这是有意义的,因为 U'ß'
扩展为 U"ss"
。然而,那个相同的解析器不匹配输入 U"ß"
!简而言之,坚持使用具有单码点情况折叠扩展的码点对。如果你需要支持多扩展的码点,请使用其他重载,如: char_(U"abcd/*...*/ß")
。
Note 注意 | |
---|---|
Unicode case folding is an operation that makes text uniformly one case,
and if you do it to two bits of text |
Creates a no_case_parser
.
创建一个 no_case_parser
。
lexeme[p]
disables use of the skipper, if a skipper is being used, within the parse
of p
. This is useful, for
instance, if you want to enable skipping in most parts of your parser, but
disable it only in one section where it doesn't belong. If you are skipping
whitespace in most of your parser, but want to parse strings that may contain
spaces, you should use lexeme[]
:
lexeme[p]
禁用跳过符的使用,如果在解析 p
时正在使用跳过符。这在某些情况下很有用,例如,如果您想在解析器的大多数部分启用跳过,但在不属于该部分的一个部分中禁用它。如果您在解析器的大多数部分跳过空白,但想解析可能包含空格的字符串,则应使用 lexeme[]
:
namespace bp = boost::parser; auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];
Without lexeme[]
, our string parser would correctly
match "foo bar"
, but
the generated attribute would be "foobar"
.
没有 lexeme[]
,我们的字符串解析器会正确匹配 "foo bar"
,但生成的属性会是 "foobar"
。
Creates a lexeme_parser
.
创建一个 lexeme_parser
。
skip[]
is like the inverse of lexeme[]
. It enables skipping in the
parse, even if it was not enabled before. For example, within a call to
parse()
that uses a skipper, let's
say we have these parsers in use:
skip[]
是 lexeme[]
的逆。它允许在解析中跳过,即使之前没有启用。例如,在一个使用跳转器的 parse()
调用中,假设我们使用了以下解析器:
namespace bp = boost::parser; auto const one_or_more = +bp::char_; auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];
The use of lexeme[]
disables skipping, but then
the use of skip[]
turns it back on. The net
result is that the first occurrence of one_or_more
will use the skipper passed to parse()
;
the second will not.
使用 lexeme[]
禁用跳过,但随后使用 skip[]
又将其打开。最终结果是, one_or_more
的第一个出现将使用传递给 parse()
的跳过器;第二个则不会。
skip[]
has another use. You can parameterize
skip with a different parser to change the skipper just within the scope
of the directive. Let's say we passed ws
to parse()
,
and we're using these parsers somewhere within that parse()
call:
skip[]
有另一种用途。您可以使用不同的解析器来参数化跳过,以便仅在指令的作用域内更改跳过器。假设我们将 ws
传递给 parse()
,并且我们正在该 parse()
调用中使用这些解析器:
namespace bp = boost::parser; auto const zero_or_more = *bp::char_; auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];
The first occurrence of zero_or_more
will use the skipper passed to parse()
,
which is ws
;
the second will use blank
as its skipper.
第一次出现 zero_or_more
将使用传递给 parse()
的跳过器,即 ws
;第二次将使用 blank
作为其跳过器。
Creates a skip_parser
.
创建一个 skip_parser
。
transform(f)[]
transform(f)[]
These directives influence the generation of attributes. See Attribute
Generation section for more details on them.
这些指令影响属性的生成。有关详细信息,请参阅属性生成部分。
merge[]
and separate[]
create a copy of the given seq_parser
.
merge[]
和 separate[]
创建给定 seq_parser
的副本。
transform(f)[]
creates a tranform_parser
.
transform(f)[]
创建一个 tranform_parser
。
Certain overloaded operators are defined for all parsers in Boost.Parser.
We've already seen some of them used in this tutorial, especially operator>>
,
operator|
,
and operator||
,
which are used to form sequence parsers, alternative parsers, and permutation
parsers, respectively.
某些重载运算符在 Boost.Parser 的所有解析器中都有定义。我们已经在本次教程中看到了一些它们的用法,特别是 operator>>
、 operator|
和 operator||
,分别用于形成序列解析器、选择解析器和排列解析器。
Here are all the operator overloaded for parsers. In the tables below:
这里列出了所有用于解析器的运算符重载。在下表中的:
c
is a character of type
char
or char32_t
;
c
是类型 char
或 char32_t
的字符;a
is a semantic action;
a
是一个语义动作;r
is an object whose
type models parsable_range
(see Concepts); and
r
是一个对象,其类型模拟 parsable_range
(见概念);p
, p1
,
p2
, ... are parsers.
p
、 p1
、 p2
等是解析器。Note 注意 | |
---|---|
Some of the expressions in this table consume no input. All parsers consume
the input they match unless otherwise stated in the table below.
|
Table 26.7. Combining Operations and Their Semantics
表 26.7. 组合操作及其语义
Expression 表达式 |
Semantics 语义 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|---|
|
Matches iff |
None. |
|
|
Matches iff |
None. |
|
|
Parses using |
|
Matching |
|
Parses using |
|
Matching |
|
|
||
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff either |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
None. |
Important 重要 | |
---|---|
All the character parsers, like |
There are a couple of special rules not captured in the table above:
上表未涵盖以下几条特殊规则:
First, the zero-or-more and one-or-more repetitions (operator*()
and operator+()
, respectively) may collapse when combined.
For any parser p
, +(+p)
collapses to +p
;
**p
,
*+p
,
and +*p
each collapse to just *p
.
首先,零次或多次和一次或多次的重复(分别用 operator*()
和 operator+()
表示)在组合时可能会合并。对于任何解析器 p
, +(+p)
合并为 +p
; **p
、 *+p
和 +*p
各自合并为仅 *p
。
Second, using eps
in an alternative parser as any alternative except
the last one is a common source of errors; Boost.Parser disallows it. This
is true because, for any parser p
,
eps
| p
is equivalent to eps
,
since eps
always matches. This is not true for eps
parameterized with a condition.
For any condition cond
,
eps(cond)
is allowed to appear anywhere within an alternative parser.
其次,在替代解析器中使用 eps
作为除最后一个以外的任何替代方案是常见的错误来源;Boost.Parser 禁止这样做。这是因为,对于任何解析器 p
, eps
| p
与 eps
是等价的,因为 eps
总是匹配。对于用条件参数化的 eps
,则不是这样。对于任何条件 cond
, eps(cond)
都允许出现在替代解析器中的任何位置。
Note 注意 | |
---|---|
When looking at Boost.Parser parsers in a debugger, or when looking at
their reference documentation, you may see reference to the template |
So far, we've seen several different types of attributes that come from different
parsers, int
for int_
,
boost::parser::tuple<char,
int>
for boost::parser::char_ >>
boost::parser::int_
, etc. Let's get into how this works
with more rigor.
到目前为止,我们已经看到了来自不同解析器的几种不同类型的属性,例如 int
对应于 int_
, boost::parser::tuple<char,
int>
对应于 boost::parser::char_ >>
boost::parser::int_
等。让我们更严谨地探讨这是如何工作的。
Note 注意 | |
---|---|
Some parsers have no attribute at all. In the tables below, the type of
the attribute is listed as "None." There is a non- |
Warning 警告 | |
---|---|
Boost.Parser assumes that all attributes are semi-regular (see |
You can use attribute
(and the associated alias, attribute_t
) to determine the
attribute a parser would have if it were passed to parse()
.
Since at least one parser (char_
) has a polymorphic attribute
type, attribute
also takes the type of the range being parsed. If a parser produces no attribute,
attribute
will produce none
,
not void
.
您可以使用 attribute
(以及相关的别名, attribute_t
)来确定如果将其传递给 parse()
,解析器将具有的属性。由于至少有一个解析器( char_
)具有多态属性类型, attribute
也接受正在解析的范围的类型。如果解析器不产生属性, attribute
将产生 none
,而不是 void
。
If you want to feed an iterator/sentinel pair to attribute
, create a range from
it like so:
如果您想将迭代器/哨兵对传递给 attribute
,请创建一个从它开始的范围,如下所示:
constexpr auto parser = /* ... */; auto first = /* ... */; auto const last = /* ... */; namespace bp = boost::parser; // You can of course use std::ranges::subrange directly in C++20 and later. using attr_type = bp::attribute_t<decltype(BOOST_PARSER_SUBRANGE(first, last)), decltype(parser)>;
There is no single attribute type for any parser, since a parser can be placed
within omit[]
, which makes its attribute
type none
.
Therefore, attribute
cannot tell you what attribute your parser will produce under all circumstances;
it only tells you what it would produce if it were passed to parse()
.
没有任何解析器有单一的属性类型,因为解析器可以放置在 omit[]
中,这使得其属性类型为 none
。因此, attribute
不能告诉你你的解析器在所有情况下会产生什么属性;它只能告诉你如果将其传递给 parse()
,它会产生什么。
This table summarizes the attributes generated for all Boost.Parser parsers.
In the table below:
此表总结了为所有 Boost.Parser 解析器生成的属性。在下表中:
RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses); and
RESOLVE
()
是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》);x
and y
represent arbitrary objects.
x
和 y
代表任意对象。Table 26.8. Parsers and Their Attributes
表 26.8。解析器和它们的属性
Parser 解析器 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|
None. |
||
None. |
||
None. |
||
|
|
|
The code point type in Unicode parsing, or |
Includes all the |
|
|
||
|
||
|
None. |
Includes all the |
|
|
Includes all the |
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
char_
is a bit odd, since its attribute type is polymorphic. When you use char_
to parse text in the non-Unicode code path (i.e. a string of char
), the attribute is char
.
When you use the exact same char_
to parse in the Unicode-aware
code path, all matching is code point based, and so the attribute type is
the type used to represent code points, char32_t
.
All parsing of UTF-8 falls under this case.
char_
有点奇怪,因为它的属性类型是多态的。当您使用 char_
在非 Unicode 代码路径中解析文本(即一个 char
字符串)时,属性是 char
。当您使用完全相同的 char_
在支持 Unicode 的代码路径中解析时,所有匹配都是基于代码点的,因此属性类型是用于表示代码点的类型, char32_t
。所有 UTF-8 的解析都属于这种情况。
Here, we're parsing plain char
s,
meaning that the parsing is in the non-Unicode code path, the attribute of
char_
is char
:
这里,我们正在解析纯文本 char
,意味着解析是在非 Unicode 代码路径中, char_
的属性是 char
:
auto result = parse("some text", boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char>>));
When you parse UTF-8, the matching is done on a code point basis, so the
attribute type is char32_t
:
当你解析 UTF-8 时,匹配是基于码点的,因此属性类型是 char32_t
:
auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));
The good news is that usually you don't parse characters individually. When
you parse with char_
,
you usually parse repetition of then, which will produce a std::string
,
regardless of whether you're in Unicode parsing mode or not. If you do need
to parse individual characters, and want to lock down their attribute type,
you can use cp
and/or cu
to enforce a non-polymorphic attribute type.
好消息是,通常您不需要逐个解析字符。当您使用 char_
解析时,通常解析重复的 then,这将产生 std::string
,无论您是否处于 Unicode 解析模式。如果您确实需要解析单个字符,并希望锁定它们的属性类型,您可以使用 cp
和/或 cu
来强制执行非多态属性类型。
Combining operations of course affect the generation of attributes. In the
tables below:
当然,组合操作会影响属性生成。在下表中的:
m
and n
are parse arguments that resolve to integral values;
m
和 n
是解析参数,解析为整数值;pred
is a parse predicate;
pred
是一个解析谓词;arg0
, arg1
,
arg2
, ... are parse arguments;
arg0
、 arg1
、 arg2
等是解析参数;a
is a semantic action;
and
a
是一个语义动作;并且p
, p1
,
p2
, ... are parsers that
generate attributes.
p
、 p1
、 p2
等是生成属性的解析器。Table 26.9. Combining Operations and Their Attributes
表 26.9. 组合操作及其属性
Parser 解析器 |
Attribute Type 属性类型 |
---|---|
|
None. |
|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
Important 重要 | |
---|---|
All the character parsers, like |
Important 重要 | |
---|---|
In case you did not notice it above, adding a semantic action to a parser
erases the parser's attribute. The attribute is still available inside
the semantic action as |
There are a relatively small number of rules that define how sequence parsers
and alternative parsers' attributes are generated. (Don't worry, there are
examples below.)
存在相对较少的规则定义了如何生成序列解析器和替代解析器的属性。(别担心,下面有示例。)
The attribute generation behavior of sequence parsers is conceptually pretty
simple:
序列解析器的属性生成行为在概念上相当简单:
boost::parser::tuple<T>
(even if T
is a type
that means "no attribute"), the attribute becomes T
.
boost::parser::tuple<T>
(即使 T
是一种表示“没有属性”的类型),则属性变为 T
。
More formally, the attribute generation algorithm works like this. For a
sequence parser p
, let the
list of attribute types for the subparsers of p
be a0,
a1, a2, ...,
an
.
更正式地说,属性生成算法是这样工作的。对于一个序列解析器 p
,让 p
的子解析器的属性类型列表为 a0,
a1, a2, ...,
an
。
We get the attribute of p
by evaluating a compile-time left fold operation, left-fold({a1, a2, ..., an}, tuple<a0>, OP)
. OP
is the combining operation that takes the current attribute type (initially
boost::parser::tuple<a0>
) and the next attribute type, and returns
the new current attribute type. The current attribute type at the end of
the fold operation is the attribute type for p
.
我们通过评估编译时左折叠操作来获取 p
的属性, left-fold({a1, a2, ..., an}, tuple<a0>, OP)
是结合操作,它接受当前属性类型(最初为 boost::parser::tuple<a0>
)和下一个属性类型,并返回新的当前属性类型。折叠操作结束时的当前属性类型是 p
的属性类型。
OP
attempts to apply a series
of rules, one at a time. The rules are noted as X
>> Y
-> Z
,
where X
is the type of the
current attribute, Y
is the
type of the next attribute, and Z
is the new current attribute type. In these rules, C<T>
is a container of T
; none
is a special type that indicates that
there is no attribute; T
is a type; CHAR
is a character
type, either char
or char32_t
; and Ts...
is a parameter pack of one or more types.
Note that T
may be the special
type none
. The current attribute
is always a tuple (call it Tup
),
so the "current attribute X
"
refers to the last element of Tup
,
not Tup
itself, except for
those rules that explicitly mention boost::parser::tuple<>
as part of X
's type.
尝试逐个应用一系列规则。规则标记为 X
>> Y
-> Z
,其中 X
是当前属性的类型, Y
是下一个属性的类型, Z
是新的当前属性类型。在这些规则中, C<T>
是 T
的容器; none
是一个特殊类型,表示没有属性; T
是类型; CHAR
是字符类型,要么是 char
要么是 char32_t
; Ts...
是一组一个或多个类型的参数包。注意, T
可能是特殊类型 none
。当前属性始终是一个元组(可以称之为 Tup
),因此“当前属性 X
”指的是 Tup
的最后一个元素,而不是 Tup
本身,除非那些明确提到 boost::parser::tuple<>
是 X
类型一部分的规则。
none >>
T ->
T
CHAR
>> CHAR
-> std::string
T >>
none ->
T
C<T> >> T
-> C<T>
T >>
C<T> -> C<T>
C<T> >> optional<T> -> C<T>
optional<T> >> C<T> -> C<T>
boost::parser::tuple<none> >>
T ->
boost::parser::tuple<T>
boost::parser::tuple<Ts...> >>
T ->
boost::parser::tuple<Ts..., T>
The rules that combine containers with (possibly optional) adjacent values
(e.g. C<T> >> optional<T>
-> C<T>
)
have a special case for strings. If C<T>
is exactly std::string
, and T
is either char
or char32_t
, the combination yields a std::string
.
规则将容器与(可能可选的)相邻值(例如 C<T> >> optional<T>
-> C<T>
)组合在一起,对于字符串有一个特殊情况。如果 C<T>
精确等于 std::string
,并且 T
要么是 char
,要么是 char32_t
,则组合产生一个 std::string
。
Again, if the final result is that the attribute is boost::parser::tuple<T>
,
the attribute becomes T
.
再次,如果最终结果是属性为 boost::parser::tuple<T>
,则属性变为 T
。
Note 注意 | |
---|---|
What constitutes a container in the rules above is determined by the
template<typename T> concept container = std::ranges::common_range<T> && requires(T t) { { t.insert(t.begin(), *t.begin()) } -> std::same_as<std::ranges::iterator_t<T>>; };
|
The rules for alternative parsers are much simpler. For an alternative parer
p
, let the list of attribute
types for the subparsers of p
be a0,
a1, a2, ...,
an
. The attribute of p
is std::variant<a0, a1,
a2, ..., an>
, with the following steps applied:
替代解析器的规则要简单得多。对于替代解析器 p
,让子解析器 p
的属性类型列表为 a0,
a1, a2, ...,
an
。 p
的属性为 std::variant<a0, a1,
a2, ..., an>
,应用以下步骤:
none
attributes
are left out, and if any are, the attribute is wrapped in a std::optional
, like std::optional<std::variant</*...*/>>
;
none
属性都被省略了,如果有,属性会被包裹在 std::optional
中,例如 std::optional<std::variant</*...*/>>
;std::variant
template parameters <T1, T2, ... Tn>
are removed; every type that appears
does so exacly once;
std::variant
模板参数 <T1, T2, ... Tn>
已被移除;每个出现的类型都恰好出现一次std::variant<T>
or std::optional<std::variant<T>>
, the attribute becomes instead
T
or std::optional<T>
, respectively; and
std::variant<T>
或 std::optional<std::variant<T>>
,则属性分别变为 T
或 std::optional<T>
;std::variant<>
or std::optional<std::variant<>>
, the result becomes none
instead.
std::variant<>
或 std::optional<std::variant<>>
,结果变为 none
。
The rule for forming containers from non-containers is simple. You get a
vector from any of the repeating parsers, like +p
, *p
, repeat(3)[p]
, etc.
The value type of the vector is
.
ATTR
(p)
非容器形成容器的规则很简单。您可以从任何重复的解析器中获取一个向量,如 +p
、 *p
、 repeat(3)[p]
等。向量的值类型为
。ATTR
(p)
Another rule for sequence containers is that a value x
and a container c
containing
elements of x
's type will
form a single container. However, x
's
type must be exactly the same as the elements in c
.
There is an exception to this in the special case for strings and characters
noted above. For instance, consider the attribute of char_
>> string("str")
. In the non-Unicode code path, char_
's attribute type is guaranteed to
be char
, so
is ATTR
(char_ >> string("str"))std::string
.
If you are parsing UTF-8 in the Unicode code path, char_
's
attribute type is char32_t
,
and the special rule makes it also produce a std::string
.
Otherwise, the attribute for
would be ATTR
(char_ >> string("str"))boost::parser::tuple<char32_t, std::string>
.
另一条序列容器的规则是,一个值 x
和一个包含 x
类型元素的容器 c
将形成一个单独的容器。然而, x
的类型必须与 c
中的元素完全相同。在上述特殊情况下,对于字符串和字符存在一个例外。例如,考虑 char_
>> string("str")
的属性。在非 Unicode 代码路径中, char_
的属性类型保证是 char
,因此
是 ATTR
(char_ >> string("str"))std::string
。如果你在 Unicode 代码路径中解析 UTF-8, char_
的属性类型是 char32_t
,特殊规则使得它也会产生一个 std::string
。否则,
的属性将是 ATTR
(char_ >> string("str"))boost::parser::tuple<char32_t, std::string>
。
Again, there are no special rules for combining values and containers. Every
combination results from an exact match, or fall into the string+character
special case.
再次强调,组合值和容器没有特殊规则。每一种组合都来自精确匹配,或者落入字符串+字符的特殊情况。
std::string
assignmentstd::string
赋值
std::string
can be assigned from a char
. This is dumb. But, we're stuck with
it. When you write a parser with a char
attribute, and you try to parse it into a std::string
,
you've almost certainly made a mistake. More importantly, if you write this:
std::string
可以从 char
分配。这很愚蠢。但我们别无选择。当你用具有 char
属性的解析器进行解析,并尝试将其解析为 std::string
时,你几乎肯定犯了一个错误。更重要的是,如果你写下这样:
namespace bp = boost::parser; std::string result; auto b = bp::parse("3", bp::int_, bp::ws, result);
... you are even more likely to have made a mistake. Though this should work,
because the assignment in std::string s; s
= 3;
is well-formed, Boost.Parser forbids it.
If you write parsing code like the snippet above, you will get a static assertion.
If you really do want to assign a float
or whatever to a std::string
, do it in a semantic action.
...你甚至更有可能犯错误。尽管这应该可以工作,因为 std::string s; s
= 3;
中的任务格式良好,Boost.Parser 禁止这样做。如果你编写像上面片段那样的解析代码,你会得到一个静态断言。如果你真的想将 float
或任何东西赋值给 std::string
,请在语义动作中这样做。
In the table: a
is a semantic
action; and p
, p1
, p2
,
... are parsers that generate attributes. Note that only >>
is used here; >
has the exact
same attribute generation rules.
在表中: a
是语义动作;而 p
、 p1
、 p2
、... 是生成属性的解析器。注意,这里只使用了 >>
; >
具有完全相同的属性生成规则。
Table 26.10. Sequence and Alternative Combining Operations and Their Attributes
表 26.10. 序列和替代组合操作及其属性
Expression 表达式 |
Attribute Type 属性类型 |
---|---|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
|
|
As we saw in the previous Parsing
into struct
s and class
es section, if you parse two strings
in a row, you get two separate strings in the resulting attribute. The parser
from that example was this:
如我们在上一节“解析为 struct
s 和 class
es”中看到的那样,如果你连续解析两个字符串,结果属性中会得到两个独立的字符串。那个例子中的解析器是这样的:
namespace bp = boost::parser; auto employee_parser = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}';
employee_parser
's attribute
is boost::parser::tuple<int,
std::string, std::string, double>
.
The two quoted_string
parsers
produce std::string
attributes, and those attributes
are not combined. That is the default behavior, and it is just what we want
for this case; we don't want the first and last name fields to be jammed
together such that we can't tell where one name ends and the other begins.
What if we were parsing some string that consisted of a prefix and a suffix,
and the prefix and suffix were defined separately for reuse elsewhere?
'的属性是 boost::parser::tuple<int,
std::string, std::string, double>
。这两个 quoted_string
解析器产生 std::string
属性,并且这些属性没有合并。这是默认行为,这正是我们想要的;我们不希望姓名字段被挤在一起,以至于我们无法分辨一个名字的结束和另一个名字的开始。如果我们正在解析一个由前缀和后缀组成的字符串,而且前缀和后缀被分别定义以供其他地方重用,那会怎么样呢?
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = prefix >> suffix; // Continue to use prefix and suffix to make other parsers....
In this case, we might want to use these separate parsers, but want special_string
to produce a single std::string
for its attribute. merge[]
exists for this purpose.
在这种情况下,我们可能想要使用这些独立的解析器,但希望 special_string
为其属性生成单个 std::string
。 merge[]
就是为了这个目的而存在的。
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = bp::merge[prefix >> suffix];
merge[]
only applies to sequence parsers
(like p1 >>
p2
), and forces all subparsers
in the sequence parser to use the same variable for their attribute.
仅适用于序列解析器(如 p1 >>
p2
),并强制序列解析器中的所有子解析器使用相同的变量来表示它们的属性。
Another directive, separate[]
,
also applies only to sequence parsers, but does the opposite of merge[]
. If forces all the attributes
produced by the subparsers of the sequence parser to stay separate, even
if they would have combined. For instance, consider this parser.
另一个指令 separate[]
也仅适用于序列解析器,但与 merge[]
相反。它强制序列解析器的子解析器产生的所有属性保持独立,即使它们本可以合并。例如,考虑这个解析器。
namespace bp = boost::parser; auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;
string_and_char
matches one
or more 'a'
s, followed by some
other character. As written above, string_and_char
produces a std::string
, and the final character is appended
to the string, after all the 'a'
s.
However, if you wanted to store the final character as a separate value,
you would use separate[]
.
string_and_char
匹配一个或多个 'a'
,后面跟其他字符。如上所述, string_and_char
产生一个 std::string
,最后一个字符追加到字符串中,所有 'a'
之后。但是,如果您想将最后一个字符作为单独的值存储,您将使用 separate[]
。
namespace bp = boost::parser; auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];
With this change, string_and_char
produces the attribute boost::parser::tuple<std::string, char32_t>
.
使用此更改, string_and_char
生成属性 boost::parser::tuple<std::string, char32_t>
。