...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
...世界上最受推崇和精心设计的 C++库项目之一。
— 赫伯·苏特和安德烈·亚历山德鲁斯库,C++ 编程规范
struct
s and class
esstruct
s 和 class
esparse()
API The parse()
API
(原文中包含特殊符号和代码,因此未进行翻译。)
First, let's cover some terminology that we'll be using throughout the docs:
首先,让我们介绍一些将在文档中使用的术语:
A semantic action is an arbitrary bit of logic associated
with a parser, that is only executed when the parser matches.
语义动作是与解析器相关联的任意逻辑片段,仅在解析器匹配时执行。
Simpler parsers can be combined to form more complex parsers. Given some
combining operation C
, and
parsers P0
, P1
, ... PN
,
C(P0, P1, ... PN)
creates a new parser Q
.
This creates a parse tree. Q
is the parent of P1
, P2
is the child of Q
,
etc. The parsers are applied in the top-down fashion implied by this topology.
When you use Q
to parse a
string, it will use P0
,
P1
, etc. to do the actual
work. If P3
is being used
to parse the input, that means that Q
is as well, since the way Q
parses is by dispatching to its children to do some or all of the work. At
any point in the parse, there will be exactly one parser without children
that is being used to parse the input; all other parsers being used are its
ancestors in the parse tree.
更简单的解析器可以组合成更复杂的解析器。给定一些组合操作 C
,以及解析器 P0
, P1
,... PN
, C(P0, P1, ... PN)
创建一个新的解析器 Q
。这创建了一个解析树。 Q
是 P1
的父节点, P2
是 Q
的子节点等。解析器按照这种拓扑隐含的从上到下的方式应用。当你使用 Q
解析字符串时,它将使用 P0
, P1
等来完成实际工作。如果正在使用 P3
来解析输入,这意味着 Q
也在使用,因为 Q
解析的方式是通过将其子节点调度到做部分或全部工作。在解析的任何时刻,将恰好有一个没有子节点的解析器被用来解析输入;所有其他正在使用的解析器都是解析树中的祖先。
A subparser is a parser that is the child of another
parser.
子解析器是另一个解析器的子解析器。
The top-level parser is the root of the tree of parsers.
顶级解析器是解析器树的根。
The current parser or bottommost parser
is the parser with no children that is currently being used to parse the
input.
当前解析器或最底层的解析器是当前用于解析输入的无子节点的解析器。
A rule is a kind of parser that makes building large,
complex parsers easier. A subrule is a rule that is
the child of some other rule. The current rule or bottommost
rule is the one rule currently being used to parse the input that
has no subrules. Note that while there is always exactly one current parser,
there may or may not be a current rule — rules are one kind of parser,
and you may or may not be using one at a given point in the parse.
规则是一种使构建大型、复杂解析器更简单的解析器。子规则是某个其他规则的子规则。当前规则或最底层的规则是当前用于解析没有子规则的输入的规则。请注意,虽然始终只有一个当前解析器,但可能有一个或没有当前规则——规则是解析器的一种,您可能在解析的某个点上使用或不使用它。
The top-level parse is the parse operation being performed
by the top-level parser. This term is necessary because, though most parse
failures are local to a particular parser, some parse failures cause the
call to parse()
to indicate failure of the
entire parse. For these cases, we say that such a local failure "causes
the top-level parse to fail".
顶级解析是顶级解析器正在执行的解释操作。这个术语是必要的,因为尽管大多数解析失败都是局部于特定解析器的,但有些解析失败会导致调用 parse()
以指示整个解析失败。在这些情况下,我们说这种局部失败“导致顶级解析失败”。
Throughout the Boost.Parser documentation, I will refer to "the call
to parse()
". Read this as "the
call to any one of the functions described in The
parse()
API". That includes prefix_parse()
,
callback_parse()
, and callback_prefix_parse()
.
在整个 Boost.Parser 文档中,我将提到“对 parse()
的调用”。请将其理解为“对 The parse()
API 中描述的任何函数的调用”。这包括 prefix_parse()
、 callback_parse()
和 callback_prefix_parse()
。
There are some special kinds of parsers that come up often in this documentation.
这里有一些在文档中经常出现的特殊类型的解析器。
One is a sequence parser; you will see it created using
operator>>
,
as in p1 >>
p2 >>
p3
. A sequence parser tries to
match all of its subparsers to the input, one at a time, in order. It matches
the input iff all its subparsers do.
一个是一个序列解析器;您将看到它是如何使用 operator>>
创建的,就像 p1 >>
p2 >>
p3
一样。序列解析器试图按顺序将所有子解析器与输入匹配,一次一个。如果所有子解析器都匹配,则匹配输入。
Another is an alternative parser; you will see it created
using operator|
,
as in p1 |
p2 |
p3
. An alternative parser tries
to match all of its subparsers to the input, one at a time, in order; it
stops after matching at most one subparser. It matches the input iff one
of its subparsers does.
另一个是替代解析器;您将看到它是如何使用 operator|
创建的,就像 p1 |
p2 |
p3
一样。替代解析器会尝试按顺序将所有子解析器与输入匹配,一次一个;它最多匹配一个子解析器后停止。如果其中一个子解析器匹配输入,则匹配输入。
Finally, there is a permutation parser; it is created
using operator||
,
as in p1 ||
p2 ||
p3
. A permutation parser tries
to match all of its subparsers to the input, in any order. So the parser
p1 ||
p2 ||
p3
is equivalent to (p1 >>
p2 >>
p3) | (p1
>> p3
>> p2) | (p2 >> p1 >> p3) |
(p2 >> p3 >> p1) | (p3 >> p1 >> p2) |
(p3 >> p2 >> p1)
. Hopefully its terseness is self-explanatory.
It matches the input iff all of its subparsers do, regardless of the order
they match in.
最后,有一个排列解析器;它是使用 operator||
创建的,就像 p1 ||
p2 ||
p3
一样。排列解析器尝试以任何顺序将其子解析器与输入匹配。因此,解析器 p1 ||
p2 ||
p3
等同于 (p1 >>
p2 >>
p3) | (p1
>> p3
>> p2) | (p2 >> p1 >> p3) |
(p2 >> p3 >> p1) | (p3 >> p1 >> p2) |
(p3 >> p2 >> p1)
。希望它的简洁性是显而易见的。它只有在所有子解析器都匹配的情况下才匹配输入,无论它们匹配的顺序如何。
Boost.Parser parsers each have an attribute associated
with them, or explicitly have no attribute. An attribute is a value that
the parser generates when it matches the input. For instance, the parser
double_
generates a double
when it matches
the input. ATTR
()
is a notional macro that expands to the attribute type of the parser passed
to it;
is ATTR
(double_)double
.
This is similar to the attribute
type trait.
每个 Boost.Parser 解析器都有一个与之关联的属性,或者明确没有属性。属性是解析器在匹配输入时生成的值。例如,当解析器 double_
匹配输入时,它会生成一个 double
。 ATTR
()
是一个概念宏,它扩展为传递给它的解析器的属性类型;
是 ATTR
(double_)double
。这与 attribute
类型特性类似。
Next, we'll look at some simple programs that parse using Boost.Parser. We'll
start small and build up from there.
接下来,我们将查看一些使用 Boost.Parser 进行解析的简单程序。我们将从小处着手,逐步构建。
This is just about the most minimal example of using Boost.Parser that one
could write. We take a string from the command line, or "World"
if none is given, and then we parse it:
这是使用 Boost.Parser 所能编写的最简例子之一。我们从命令行获取一个字符串,如果没有提供,则使用 "World"
,然后对其进行解析:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main(int argc, char const * argv[]) { std::string input = "World"; if (1 < argc) input = argv[1]; std::string result; bp::parse(input, *bp::char_, result); std::cout << "Hello, " << result << "!\n"; }
The expression *bp::char_
is a parser-expression. It uses one of the many parsers that Boost.Parser
provides: char_
.
Like all Boost.Parser parsers, it has certain operations defined on it. In
this case, *bp::char_
is using an overloaded operator*
as the C++ version of a Kleene
star operator. Since C++ has no postfix unary *
operator, we have to use the one we have, so it is used as a prefix.
表达式 *bp::char_
是一个解析表达式。它使用 Boost.Parser 提供的许多解析器之一: char_
。像所有 Boost.Parser 解析器一样,它在其上定义了某些操作。在这种情况下, *bp::char_
使用了重载的 operator*
作为 C++ 版本的 Kleene 星号运算符。由于 C++ 没有后缀一元 *
运算符,我们必须使用我们有的,所以它被用作前缀。
So, *bp::char_
means "any number of characters". In other words, it really cannot
fail. Even an empty string will match it.
所以, *bp::char_
表示“任意数量的字符”。换句话说,它实际上不可能失败。即使是空字符串也能匹配它。
The parse operation is performed by calling the parse()
function, passing the parser as one of the arguments:
解析操作通过调用 parse()
函数执行,将解析器作为参数之一传递:
bp::parse(input, *bp::char_, result);
The arguments here are: input
,
the range to parse; *bp::char_
,
the parser used to do the parse; and result
,
an out-parameter into which to put the result of the parse. Don't get too
caught up on this method of getting the parse result out of parse()
; there are multiple ways
of doing so, and we'll cover all of them in subsequent sections.
这里的参数有: input
,要解析的范围; *bp::char_
,用于解析的解析器;以及 result
,一个输出参数,用于存放解析结果。不要过于纠结于从 parse()
获取解析结果的方法;有多种方法可以实现,我们将在后续章节中全部介绍。
Also, just ignore for now the fact that Boost.Parser somehow figured out
that the result type of the *bp::char_
parser is a std::string
. There are clear rules for this
that we'll cover later.
此外,现在先忽略这样一个事实:Boost.Parser 不知怎么的推断出 *bp::char_
解析器的结果类型是 std::string
。对此有明确的规则,我们稍后会讨论。
The effects of this call to parse()
is not very interesting — since the parser we gave it cannot ever
fail, and because we're placing the output in the same type as the input,
it just copies the contents of input
to result
.
此调用 parse()
的效果并不很有趣——因为我们给出的解析器永远不会失败,而且因为我们把输出放在与输入相同的类型中,它只是将 input
的内容复制到 result
。
Let's look at a slightly more complicated example, even if it is still trivial.
Instead of taking any old char
s
we're given, let's require some structure. Let's parse one or more double
s, separated by commas.
让我们看看一个稍微复杂一点的例子,即使它仍然很 trivial。不是随便拿给我们的任何旧的 char
,而是要求一些结构。让我们解析一个或多个由逗号分隔的 double
。
The Boost.Parser parser for double
is double_
.
So, to parse a single double
,
we'd just use that. If we wanted to parse two double
s
in a row, we'd use:
The Boost.Parser 解析器用于 double
是 double_
。因此,要解析单个 double
,我们只需使用它。如果我们想连续解析两个 double
,我们会使用:
boost::parser::double_ >> boost::parser::double_
operator>>
in this expression is the sequence-operator; read it as "followed by".
If we combine the sequence-operator with Kleene
star, we can get the parser we want by writing:
operator>>
在这个表达式中是序列运算符;读作“之后”。如果我们把序列运算符与 Kleene 星号结合,就可以通过编写以下内容来得到我们想要的解析器:
boost::parser::double_ >> *(',' >> boost::parser::double_)
This is a parser that matches at least one double
— because of the first double_
in the expression
above — followed by zero or more instances of a-comma-followed-by-a-double
. Notice that we can use ','
directly. Though it is not a parser, operator>>
and the other operators defined on Boost.Parser parsers have overloads that
accept character/parser pairs of arguments; these operator overloads will
create the right parser to recognize ','
.
这是一个至少匹配一个 double
的解析器——因为上述表达式中的第一个 double_
——后面跟着零个或多个由逗号和 double
组成的实例。请注意,我们可以直接使用 ','
。尽管它不是一个解析器, operator>>
和其他在 Boost.Parser 解析器上定义的运算符有接受字符/解析器对参数的重载;这些运算符重载将创建识别 ','
的正确解析器。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. No pressure. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, bp::double_ >> *(',' >> bp::double_)); if (result) { std::cout << "Great! It looks like you entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Good job! Please proceed to the recovery annex for cake.\n"; } }
The first example filled in an out-parameter to deliver the result of the
parse. This call to parse()
returns a result instead. As you can see, the result is contextually convertible
to bool
, and *result
is some sort of range. In fact,
the return type of this call to parse()
is std::optional<std::vector<double>>
. Naturally, if the parse fails,
std::nullopt
is returned. We'll look at how
Boost.Parser maps the type of the parser to the return type, or the filled
in out-parameter's type, a bit later.
第一个示例填充了一个输出参数以传递解析的结果。这个对 parse()
的调用返回了一个结果。正如你所见,结果可以上下文转换成 bool
,而 *result
是一种范围。实际上,这个对 parse()
的调用返回类型是 std::optional<std::vector<double>>
。当然,如果解析失败,则返回 std::nullopt
。我们稍后会看看 Boost.Parser 如何将解析器的类型映射到返回类型,或者填充的输出参数的类型。
Note 注意 | |
---|---|
There's a type trait that can tell you the attribute type for a parser,
|
If I run it in a shell, this is the result:
如果我在 shell 中运行它,这是结果:
$ example/trivial Enter a list of doubles, separated by commas. No pressure. 5.6,8.9 Great! It looks like you entered: 5.6 8.9 $ example/trivial Enter a list of doubles, separated by commas. No pressure. 5.6, 8.9 Good job! Please proceed to the recovery annex for cake.
It does not recognize "5.6, 8.9"
.
This is because it expects a comma followed immediately
by a double
, but I inserted
a space after the comma. The same failure to parse would occur if I put a
space before the comma, or before or after the list of double
s.
它不识别 "5.6, 8.9"
。这是因为它期望逗号后立即跟一个 double
,但我却在逗号后插入了空格。如果我在逗号前或 double
列表前后加空格,也会出现同样的解析失败。
One more thing: there is a much better way to write the parser above. Instead
of repeating the double_
subparser, we could have written this:
还有一件事:上面解析器的写法有更好的方法。我们不必重复使用 double_
子解析器,可以写成这样:
bp::double_ % ','
That's semantically identical to bp::double_ >> *(',' >> bp::double_)
. This pattern — some bit of input
repeated one or more times, with a separator between each instance —
comes up so often that there's an operator specifically for that, operator%
.
We'll be using that operator from now on.
这与 bp::double_ >> *(',' >> bp::double_)
在语义上相同。这种模式——一些输入重复一次或多次,每次之间有分隔符——出现得如此频繁,以至于有一个专门的操作符用于此, operator%
。从现在起,我们将使用该操作符。
Let's modify the trivial parser we just saw to ignore any spaces it might
find among the double
s and commas.
To skip whitespace wherever we find it, we can pass a skip parser
to our call to parse()
(we don't need to touch
the parser passed to parse()
).
Here, we use ws
, which matches
any Unicode whitespace character.
让我们修改我们刚才看到的平凡解析器,使其忽略在 double
s 和逗号之间可能找到的任何空格。要跳过我们找到的任何空白,我们可以将跳过解析器传递给我们的 parse()
调用(我们不需要触摸传递给 parse()
的解析器)。在这里,我们使用 ws
,它匹配任何 Unicode 空白字符。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. No pressure. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, bp::double_ % ',', bp::ws); if (result) { std::cout << "Great! It looks like you entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Good job! Please proceed to the recovery annex for cake.\n"; } }
The skip parser, or skipper, is run between the subparsers
within the parser passed to parse()
.
In this case, the skipper is run before the first double
is parsed, before any subsequent comma or double
is parsed, and at the end. So, the strings "3.6,5.9"
and " 3.6 , \t 5.9 "
are parsed the same by this program.
跳过解析器,或称为跳过器,在传递给 parse()
的解析器内的子解析器之间运行。在这种情况下,跳过器在解析第一个 double
之前运行,在解析任何后续逗号或 double
之前运行,并在最后运行。因此,该程序以相同的方式解析字符串 "3.6,5.9"
和 " 3.6 , \t 5.9 "
。
Skipping is an important concept in Boost.Parser. You can skip anything,
not just whitespace; there are lots of other things you might want to skip.
The skipper you pass to parse()
can be an arbitrary parser. For example, if you write a parser for a scripting
language, you can write a skipper to skip whitespace, inline comments, and
end-of-line comments.
跳过是 Boost.Parser 中的一个重要概念。你可以跳过任何内容,而不仅仅是空白;你可能想要跳过很多东西。传递给 parse()
的跳过器可以是一个任意的解析器。例如,如果你为脚本语言编写了一个解析器,你可以编写一个跳过器来跳过空白、行内注释和行尾注释。
We'll be using skip parsers almost exclusively in the rest of the documentation.
The ability to ignore the parts of your input that you don't care about is
so convenient that parsing without skipping is a rarity in practice.
我们将几乎在文档的其余部分使用跳过解析器。忽略你不需要关注的部分的能力非常方便,以至于在实际应用中不跳过的解析几乎很少见。
Like all parsing systems (lex & yacc, Boost.Spirit,
etc.), Boost.Parser has a mechanism for associating semantic actions with
different parts of the parse. Here is nearly the same program as we saw in
the previous example, except that it is implemented in terms of a semantic
action that appends each parsed double
to a result, instead of automatically building and returning the result.
To do this, we replace the double_
from the previous
example with double_[action]
;
action
is our semantic action:
与所有解析系统(lex & yacc、Boost.Spirit 等)一样,Boost.Parser 有一个将语义动作与解析的不同部分关联的机制。这里是一个与上一个例子几乎相同的程序,只不过它是在语义动作的术语中实现的,该动作将每个解析的 double
追加到结果中,而不是自动构建和返回结果。为此,我们将上一个例子中的 double_
替换为 double_[action]
; action
是我们的语义动作:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a list of doubles, separated by commas. "; std::string input; std::getline(std::cin, input); std::vector<double> result; auto const action = [&result](auto & ctx) { std::cout << "Got one!\n"; result.push_back(_attr(ctx)); }; auto const action_parser = bp::double_[action]; auto const success = bp::parse(input, action_parser % ',', bp::ws); if (success) { std::cout << "You entered:\n"; for (double x : result) { std::cout << x << "\n"; } } else { std::cout << "Parse failure.\n"; } }
Run in a shell, it looks like this:
在 shell 中运行,看起来是这样的:
$ example/semantic_actions Enter a list of doubles, separated by commas. 4,3 Got one! Got one! You entered: 4 3
In Boost.Parser, semantic actions are implemented in terms of invocable objects
that take a single parameter to a parse-context object. The parse-context
object represents the current state of the parse. In the example we used
this lambda as our invocable:
在 Boost.Parser 中,语义动作是通过接受一个解析上下文对象参数的可调用对象实现的。解析上下文对象表示解析的当前状态。在示例中,我们使用这个 lambda 作为我们的可调用对象:
auto const action = [&result](auto & ctx) { std::cout << "Got one!\n"; result.push_back(_attr(ctx)); };
We're both printing a message to std::cout
and recording a parsed result in the lambda. It could do both, either, or
neither of these things if you like. The way we get the parsed double
in the lambda is by asking the parse
context for it. _attr(ctx)
is
how you ask the parse context for the attribute produced by the parser to
which the semantic action is attached. There are lots of functions like
_attr()
that can be used to access the state in the parse context. We'll cover more
of them later on. The
Parse Context defines what exactly the parse context is and how it
works.
我们都在向 std::cout
打印消息并在 lambda 中记录解析结果。如果你喜欢,它可以同时做这两件事,也可以只做其中一件,或者一件都不做。我们通过询问解析上下文来获取 lambda 中的解析 double
。 _attr(ctx)
是询问解析上下文以获取与语义动作相关联的解析器产生的属性的方式。有许多像 _attr()
这样的函数可以用来访问解析上下文中的状态。我们将在稍后介绍更多这样的函数。解析上下文定义了解析上下文的确切含义及其工作方式。
Note that you can't write an unadorned lambda directly as a semantic action.
Otherwise, the compile will see two '['
characters and think it's about to parse an attribute. Parentheses fix this:
请注意,您不能直接将未装饰的 lambda 作为语义动作写入。否则,编译器会看到两个 '['
字符,并认为它即将解析一个属性。括号可以解决这个问题:
p[([](auto & ctx){/*...*/})]
Before you do this, note that the lambdas that you write as semantic actions
are almost always generic (having an auto
& ctx
parameter), and so are very frequently re-usable. Most semantic action lambdas
you write should be written out-of-line, and given a good name. Even when
they are not reused, named lambdas keep your parsers smaller and easier to
read.
在执行此操作之前,请注意,您作为语义动作编写的 lambda 函数几乎总是通用的(具有 auto
& ctx
参数),因此它们非常频繁地可重用。您编写的多数语义动作 lambda 函数应该独立编写,并赋予一个良好的名称。即使它们没有被重用,命名 lambda 函数也能使您的解析器更小、更易于阅读。
Important 重要 | |
---|---|
Attaching a semantic action to a parser removes its attribute. That is,
|
There are some other forms for semantic actions, when they are used inside
of rules
.
See More About Rules
for details.
存在一些其他形式的语义动作,当它们在 rules
内部使用时。有关详细信息,请参阅规则。
So far we've seen examples that parse some text and generate associated attributes.
Sometimes, you want to find some subrange of the input that contains what
you're looking for, and you don't want to generate attributes at all.
到目前为止,我们已经看到了一些解析文本并生成相关属性的示例。有时,你可能只想找到包含你所需内容的输入子范围,而不想生成任何属性。
There are two directives that affect the attribute type
of any parser, raw[]
and string_view[]
.
(We'll get to directives in more detail in the Directives
section later. For now, you just need to know that a directive wraps a parser,
and changes some aspect of how it functions.)
有两个指令会影响任何解析器的属性类型,即 raw[]
和 string_view[]
。(我们将在指令部分详细讨论指令。现在,你只需要知道指令会包装解析器,并改变其功能的一些方面。)
raw[]
changes the attribute of its
parser to be a subrange
whose begin()
and end()
return the bounds of the sequence being parsed that match p
.
raw[]
更改其解析器的属性,使其成为一个 subrange
,该 subrange
的 begin()
和 end()
返回与 p
匹配的序列的界限。
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto subrange_parser = bp::raw[int_parser]; // ATTR(subrange_parser) is a subrange // Parse using int_parser, generating integers. auto ints = bp::parse("1, 2, 3, 4", int_parser, bp::ws); assert(ints); assert(*ints == std::vector<int>({1, 2, 3, 4})); // Parse again using int_parser, but this time generating only the // subrange matched by int_parser. (prefix_parse() allows matches that // don't consume the entire input.) auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto range = bp::prefix_parse(first, str.end(), subrange_parser, bp::ws); assert(range); assert(range->begin() == str.begin()); assert(range->end() == str.begin() + 10); static_assert(std::is_same_v< decltype(range), std::optional<bp::subrange<std::string::const_iterator>>>);
Note that the subrange
has the iterator type std::string::const_iterator
,
because that's the iterator type passed to prefix_parse()
.
If we had passed char const
*
iterators to prefix_parse()
,
that would have been the iterator type. The only exception to this comes
from Unicode-aware parsing (see Unicode
Support). In some of those cases, the iterator being used in the parse
is not the one you passed. For instance, if you call prefix_parse()
with char8_t *
iterators, it will create a UTF-8 to UTF-32 transcoding view, and parse the
iterators of that view. In such a case, you'll get a subrange
whose iterator type
is a transcoding iterator. When that happens, you can get the underlying
iterator — the one you passed to prefix_parse()
— by calling the .base()
member function on each transcoding iterator
in the returned subrange
.
请注意, subrange
具有迭代器类型 std::string::const_iterator
,因为那是传递给 prefix_parse()
的迭代器类型。如果我们向 prefix_parse()
传递了 char const
*
迭代器,那么迭代器类型就是那个。唯一的例外来自对 Unicode 的解析(见 Unicode 支持)。在这些情况中,用于解析的迭代器不是你传递的那个。例如,如果你用 char8_t *
迭代器调用 prefix_parse()
,它将创建一个 UTF-8 到 UTF-32 转换视图,并解析该视图的迭代器。在这种情况下,你将得到一个迭代器类型为转换迭代器的 subrange
。当发生这种情况时,你可以通过在返回的 subrange
中的每个转换迭代器上调用 .base()
成员函数来获取底层迭代器——即你传递给 prefix_parse()
的那个。
auto const u8str = std::u8string(u8"1, 2, 3, 4, a, b, c"); auto u8first = u8str.begin(); auto u8range = bp::prefix_parse(u8first, u8str.end(), subrange_parser, bp::ws); assert(u8range); assert(u8range->begin().base() == u8str.begin()); assert(u8range->end().base() == u8str.begin() + 10);
string_view[]
has very similar semantics
to raw[]
, except that it produces a
std::basic_string_view<CharT>
(where CharT
is the type
of the underlying range begin parsed) instead of a subrange
. For this to work,
the underlying range must be contiguous. Contiguity of iterators is not detectable
before C++20, so this directive is only available in C++20 and later.
string_view[]
与 raw[]
的语义非常相似,除了它产生一个 std::basic_string_view<CharT>
(其中 CharT
是底层范围的开始解析类型)而不是一个 subrange
。为了使其工作,底层范围必须是连续的。在 C++20 之前,迭代器的连续性是不可检测的,因此此指令仅在 C++20 及以后版本中可用。
namespace bp = boost::parser; auto int_parser = bp::int_ % ','; // ATTR(int_parser) is std::vector<int> auto sv_parser = bp::string_view[int_parser]; // ATTR(sv_parser) is a string_view auto const str = std::string("1, 2, 3, 4, a, b, c"); auto first = str.begin(); auto sv1 = bp::prefix_parse(first, str.end(), sv_parser, bp::ws); assert(sv1); assert(*sv1 == str.substr(0, 10)); static_assert(std::is_same_v<decltype(sv1), std::optional<std::string_view>>);
Since string_view[]
produces string_view
s,
it cannot return transcoding iterators as described above for raw[]
. If you parse a sequence of
CharT
with string_view[]
,
you get exactly a std::basic_string_view<CharT>
.
If the parse is using transcoding in the Unicode-aware path, string_view[]
will decompose the transcoding
iterator as necessary. If you pass a transcoding view to parse()
or transcoding iterators to prefix_parse()
,
string_view[]
will still see through the
transcoding iterators without issue, and give you a string_view
of part of the underlying range.
由于 string_view[]
产生 string_view
,它不能像上面描述的那样为 raw[]
返回转码迭代器。如果你用 string_view[]
解析一个 CharT
序列,你会得到一个精确的 std::basic_string_view<CharT>
。如果解析在 Unicode 感知路径中使用转码, string_view[]
将根据需要分解转码迭代器。如果你将转码视图传递给 parse()
或将转码迭代器传递给 prefix_parse()
, string_view[]
仍然可以无问题地看穿转码迭代器,并给你一个底层范围的子范围。
auto sv2 = bp::parse("1, 2, 3, 4" | bp::as_utf32, sv_parser, bp::ws); assert(sv2); assert(*sv2 == "1, 2, 3, 4"); static_assert(std::is_same_v<decltype(sv2), std::optional<std::string_view>>);
Now would be a good time to describe the parse context in some detail. Any
semantic action that you write will need to use state in the parse context,
so you need to know what's available.
现在是一个详细描述解析上下文的好时机。你编写的任何语义动作都需要在解析上下文中使用状态,因此你需要知道有什么可用。
The parse context is an object that stores the current state of the parse
— the current- and end-iterators, the error handler, etc. Data may
seem to be "added" to or "removed" from it at different
times during the parse. For instance, when a parser p
with a semantic action a
succeeds, the context adds the attribute that p
produces to the parse context, then calls a
,
passing it the context.
解析上下文是一个对象,用于存储解析的当前状态——当前和结束迭代器、错误处理器等。数据可能在解析的不同时间被“添加”或“删除”。例如,当解析器 p
执行语义动作 a
成功时,上下文会将 p
生成的属性添加到解析上下文中,然后调用 a
,并将上下文传递给它。
Though the context object appears to have things added to or removed from
it, it does not. In reality, there is no one context object. Contexts are
formed at various times during the parse, usually when starting a subparser.
Each context is formed by taking the previous context and adding or changing
members as needed to form a new context object. When the function containing
the new context object returns, its context object (if any) is destructed.
This is efficient to do, because the parse context has only about a dozen
data members, and each data member is less than or equal to the size of a
pointer. Copying the entire context when mutating the context is therefore
fast. The context does no memory allocation.
尽管上下文对象看起来被添加或删除了东西,但实际上并没有。实际上,没有上下文对象。上下文在解析过程中形成,通常在开始子解析器时。每个上下文都是通过取前一个上下文,并根据需要添加或更改成员来形成新的上下文对象。当包含新上下文对象的函数返回时,其上下文对象(如果有)将被销毁。这样做是高效的,因为解析上下文只有大约十几个数据成员,每个数据成员的大小不超过指针的大小。因此,在修改上下文时复制整个上下文是快速的。上下文不进行内存分配。
Tip 提示 | |
---|---|
All these functions that take the parse context as their first parameter
will find by found by Argument-Dependent Lookup. You will probably never
need to qualify them with |
By convention, the names of all Boost.Parser functions that take a parse
context, and are therefore intended for use inside semantic actions, contain
a leading underscore.
按照惯例,所有接受解析上下文作为参数的 Boost.Parser 函数,因此旨在在语义动作中使用,其名称都包含一个前置下划线。
_pass()
returns a reference to a
bool
indicating the success
of failure of the current parse. This can be used to force the current parse
to pass or fail:
_pass()
返回一个指向 bool
的引用,表示当前解析的成功或失败。这可以用来强制当前解析通过或失败:
[](auto & ctx) { // If the attribute fails to meet this predicate, fail the parse. if (!necessary_condition(_attr(ctx))) _pass(ctx) = false; }
Note that for a semantic action to be executed, its associated parser must
already have succeeded. So unless you previously wrote _pass(ctx)
= false
within your action, _pass(ctx)
= true
does nothing; it's redundant.
请注意,要执行语义动作,其关联的解析器必须已经成功。所以除非你之前在你的动作中写了 _pass(ctx)
= false
,否则 _pass(ctx)
= true
什么也不做;它是多余的。
_begin()
and _end()
return the beginning and end of the range that you passed to parse()
, respectively. _where()
returns a subrange
indicating the bounds
of the input matched by the current parse. _where()
can be useful if you just want to parse some text and return a result consisting
of where certain elements are located, without producing any other attributes.
_where()
can also be essential in
tracking where things are located, to provide good diagnostics at a later
point in the parse. Think mismatched tags in XML; if you parse a close-tag
at the end of an element, and it does not match the open-tag, you want to
produce an error message that mentions or shows both tags. Stashing _where(ctx).begin()
somewhere that is available to the close-tag parser will enable that. See
Error
Handling and Debugging for an example of this.
_begin()
和 _end()
分别返回传递给 parse()
的范围的开始和结束。 _where()
返回一个 subrange
,表示当前解析匹配的输入的界限。 _where()
如果您只想解析一些文本并返回一个仅包含某些元素位置的结果,而不产生其他属性,则非常有用。 _where()
在跟踪位置、在稍后提供良好的诊断方面也至关重要。考虑 XML 中的不匹配标签;如果您解析元素末尾的闭合标签,并且它不匹配开标签,您希望产生一个提及或显示这两个标签的错误消息。将 _where(ctx).begin()
存储在闭合标签解析器可访问的地方将启用此功能。请参阅错误处理和调试的示例。
_error_handler()
returns a reference to the
error handler associated with the parser passed to parse()
.
Using _error_handler()
, you can generate errors
and warnings from within your semantic actions. See Error
Handling and Debugging for concrete examples.
_error_handler()
返回与传递给 parse()
的解析器关联的错误处理程序引用。使用 _error_handler()
,您可以在您的语义动作中生成错误和警告。请参阅错误处理和调试以获取具体示例。
_attr()
returns a reference to the
value of the current parser's attribute. It is available only when the current
parser's parse is successful. If the parser has no semantic action, no attribute
gets added to the parse context. It can be used to read and write the current
parser's attribute:
_attr()
返回当前解析器属性值的引用。仅在当前解析器解析成功时可用。如果解析器没有语义动作,则不会向解析上下文添加任何属性。它可以用来读取和写入当前解析器的属性:
[](auto & ctx) { _attr(ctx) = 3; }
If the current parser has no attribute, a none
is returned.
如果当前解析器没有属性,则返回一个 none
。
_val()
returns a reference to the
value of the attribute of the current rule being used to parse (if any),
and is available even before the rule's parse is successful. It can be used
to set the current rule's attribute, even from a parser that is a subparser
inside the rule. Let's say we're writing a parser with a semantic action
that is within a rule. If we want to set the current rule's value to some
function of subparser's attribute, we would write this semantic action:
_val()
返回当前正在使用的规则(如果有)的属性值的引用,即使在规则解析成功之前也可以使用。可以用来设置当前规则的属性,即使是从规则内部的子解析器中也可以。假设我们正在编写一个具有规则内语义动作的解析器。如果我们想将当前规则的值设置为子解析器属性的某个函数,我们会编写这个语义动作:
[](auto & ctx) { _val(ctx) = some_function(_attr(ctx)); }
If there is no current rule, or the current rule has no attribute, a none
is returned.
如果没有当前规则,或者当前规则没有属性,则返回一个 none
。
You need to use _val()
in cases where the default
attribute for a rule
's
parser is not directly compatible with the attribute type of the rule
.
In these cases, you'll need to write some code like the example above to
compute the rule
's
attribute from the rule
's
parser's generated attribute. For more info on rules
, see the next page, and
More About Rules.
您需要在默认属性对于某个 rule
的解析器不直接兼容于 rule
的属性类型的情况下使用 _val()
。在这些情况下,您需要编写一些像上面示例中的代码来从 rule
的解析器生成的属性计算 rule
的属性。有关 rules
的更多信息,请参阅下一页,以及更多关于规则的内容。
_globals()
returns a reference to a
user-supplied object that contains whatever data you want to use during the
parse. The "globals" for a parse is an object — typically
a struct — that you give to the top-level parser. Then you can use
_globals()
to access it at any time
during the parse. We'll see how globals get associated with the top-level
parser in The parse()
API later. As an example, say that you have an early part of the parse
that needs to record some black-listed values, and that later parts of the
parse might need to parse values, failing the parse if they see the black-listed
values. In the early part of the parse, you could write something like this.
_globals()
返回一个指向用户提供的对象的引用,该对象包含您在解析过程中想要使用的任何数据。解析的“全局变量”是一个对象——通常是结构体——您将其提供给顶层解析器。然后您可以在解析过程中任何时间使用 _globals()
来访问它。我们将在后面的 parse()
API 中看到全局变量是如何与顶层解析器关联的。作为一个例子,假设您在解析的早期部分需要记录一些黑名单值,而解析的后期部分可能需要解析值,如果看到黑名单值则解析失败。在解析的早期部分,您可以编写如下内容。
[](auto & ctx) { // black_list is a std::unordered_set. _globals(ctx).black_list.insert(_attr(ctx)); }
Later in the parse, you could then use black_list
to check values as they are parsed.
稍后解析时,您可以使用 black_list
来检查解析时的值。
[](auto & ctx) { if (_globals(ctx).black_list.contains(_attr(ctx))) _pass(ctx) = false; }
_locals()
returns a reference to one
or more values that are local to the current rule being parsed, if any. If
there are two or more local values, _locals()
returns a reference to a boost::parser::tuple
. Rules with locals are
something we haven't gotten to yet (see More
About Rules), but for now all you need to know is that you can provide
a template parameter (LocalState
)
to rule
,
and the rule will default construct an object of that type for use within
the rule. You access it via _locals()
:
_locals()
返回对当前解析规则中一个或多个局部值的引用(如果有的话)。如果有两个或更多局部值, _locals()
返回对 boost::parser::tuple
的引用。具有局部值的规则是我们还没有涉及的(参见关于规则的更多信息),但到目前为止,你需要知道的是,你可以提供一个模板参数( LocalState
)给 rule
,规则将默认构造一个该类型的对象以供规则内部使用。你可以通过 _locals()
访问它:
[](auto & ctx) { auto & local = _locals(ctx); // Use local here. If 'local' is a hana::tuple, access its members like this: using namespace hana::literals; auto & first_element = local[0_c]; auto & second_element = local[1_c]; }
If there is no current rule, or the current rule has no locals, a none
is returned.
如果没有当前规则,或者当前规则没有本地变量,则返回一个 none
。
_params()
, like _locals()
,
applies to the current rule being used to parse, if any (see More
About Rules). It also returns a reference to a single value, if the
current rule has only one parameter, or a boost::parser::tuple
of multiple values if
the current rule has multiple parameters. If there is no current rule, or
the current rule has no parameters, a none
is returned.
_params()
,类似于 _locals()
,适用于当前正在使用的解析规则(见关于规则的更多信息)。它还返回单个值的引用,如果当前规则只有一个参数,或者返回多个值的 boost::parser::tuple
,如果当前规则有多个参数。如果没有当前规则,或者当前规则没有参数,则返回 none
。
Unlike with _locals()
, you do
not provide a template parameter to rule
. Instead you call the
rule
's
with()
member function (again, see More
About Rules).
与 _locals()
不同,您没有为 rule
提供模板参数。相反,您调用 rule
的 with()
成员函数(再次,请参阅更多关于规则的内容)。
Note 注意 | |
---|---|
|
_no_case()
_no_case()
returns true
if the current parse context is inside one or more (possibly nested) no_case[]
directives. I don't have a
use case for this, but if I didn't expose it, it would be the only thing
in the context that you could not examine from inside a semantic action.
It was easy to add, so I did.
_no_case()
返回 true
,如果当前解析上下文位于一个或多个(可能嵌套的) no_case[]
指令内部。我没有用到这个功能,但如果我不公开它,那么在语义动作内部,你将无法检查上下文中的唯一一个东西。添加它很容易,所以我添加了它。
This example is very similar to the others we've seen so far. This one is
different only because it uses a rule
. As an analogy, think
of a parser like char_
or double_
as an individual line of code, and a rule
as a function. Like a
function, a rule
has its own name, and can even be forward declared. Here is how we define
a rule
,
which is analogous to forward declaring a function:
这个例子与我们迄今为止看到的非常相似。这个例子唯一的不同之处在于它使用了 rule
。作为一个类比,将像 char_
或 double_
这样的解析器视为一行代码,将 rule
视为一个函数。像函数一样, rule
有自己的名字,甚至可以进行前置声明。以下是我们的定义方式,这相当于前置声明一个函数:
bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
This declares the rule itself. The rule
is a parser, and we can
immediately use it in other parsers. That definition is pretty dense; take
note of these things:
这声明了规则本身。 rule
是一个解析器,我们可以在其他解析器中立即使用它。那个定义相当密集;注意以下事项:
struct
doubles
. Here we've declared
the tag type and used it all in one go; you can also use a previously
declared tag type.
struct
doubles
。这里我们声明了标签类型并一次性使用它;您也可以使用之前声明的标签类型。doubles
.
doubles
。doubles
the
diagnstic text "doubles"
so that Boost.Parser knows how to refer to it when producing a trace
of the parser during debugging.
doubles
提供了诊断文本 "doubles"
,这样 Boost.Parser 在调试期间生成解析器跟踪时知道如何引用它。
Ok, so if doubles
is a parser,
what does it do? We define the rule's behavior by defining a separate parser
that by now should look pretty familiar:
好的,所以如果 doubles
是一个解析器,它做什么?我们通过定义一个独立的解析器来定义规则的行為,到目前为止,这个解析器应该看起来相当熟悉:
auto const doubles_def = bp::double_ % ',';
This is analogous to writing a definition for a forward-declared function.
Note that we used the name doubles_def
.
Right now, the doubles
rule
parser and the doubles_def
non-rule parser have no connection to each other. That's intentional —
we want to be able to define them separately. To connect them, we declare
functions with an interface that Boost.Parser understands, and use the tag
type struct doubles
to connect them together. We use a macro for that:
这与为已声明的函数编写定义类似。注意,我们使用了名称 doubles_def
。目前, doubles
规则解析器和 doubles_def
非规则解析器之间没有连接。这是故意的——我们希望能够分别定义它们。为了将它们连接起来,我们声明了 Boost.Parser 能够理解的接口函数,并使用标签类型 struct doubles
将它们连接在一起。我们为此使用了一个宏:
BOOST_PARSER_DEFINE_RULES(doubles);
This macro expands to the code necessary to make the rule doubles
and its parser doubles_def
work together. The _def
suffix
is a naming convention that this macro relies on to work. The tag type allows
the rule parser, doubles
,
to call one of these overloads when used as a parser.
这个宏展开为使规则 doubles
及其解析器 doubles_def
协同工作的必要代码。 _def
后缀是一种命名约定,这个宏依赖于它来工作。标签类型允许规则解析器 doubles
在用作解析器时调用这些重载之一。
BOOST_PARSER_DEFINE_RULES
expands to two overloads of a function called parse_rule()
. In the case above, the overloads each
take a struct doubles
parameter (to distinguish them from the other overloads of parse_rule()
for other rules) and parse using doubles_def
.
You will never need to call any overload of parse_rule()
yourself; it is used internally by the
parser that implements rules
, rule_parser
.
BOOST_PARSER_DEFINE_RULES
展开为名为 parse_rule()
的函数的两个重载。在上面的例子中,每个重载都接受一个 struct doubles
参数(以区分其他规则中 parse_rule()
的其他重载)并使用 doubles_def
进行解析。您永远不需要自己调用 parse_rule()
的任何重载;它由实现 rules
、 rule_parser
的解析器内部使用。
Here is the definition of the macro that is expanded for each rule:
这里是对每个规则展开的宏定义:
#define BOOST_PARSER_DEFINE_IMPL(_, rule_name_) \ template< \ typename Iter, \ typename Sentinel, \ typename Context, \ typename SkipParser> \ decltype(rule_name_)::parser_type::attr_type parse_rule( \ decltype(rule_name_)::parser_type::tag_type *, \ Iter & first, \ Sentinel last, \ Context const & context, \ SkipParser const & skip, \ boost::parser::detail::flags flags, \ bool & success, \ bool & dont_assign) \ { \ auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def); \ using attr_t = \ decltype(parser(first, last, context, skip, flags, success)); \ using attr_type = decltype(rule_name_)::parser_type::attr_type; \ if constexpr (boost::parser::detail::is_nope_v<attr_t>) { \ dont_assign = true; \ parser(first, last, context, skip, flags, success); \ return {}; \ } else if constexpr (std::is_same_v<attr_type, attr_t>) { \ return parser(first, last, context, skip, flags, success); \ } else if constexpr (std::is_constructible_v<attr_type, attr_t>) { \ return attr_type( \ parser(first, last, context, skip, flags, success)); \ } else { \ attr_type attr{}; \ parser(first, last, context, skip, flags, success, attr); \ return attr; \ } \ } \ \ template< \ typename Iter, \ typename Sentinel, \ typename Context, \ typename SkipParser, \ typename Attribute> \ void parse_rule( \ decltype(rule_name_)::parser_type::tag_type *, \ Iter & first, \ Sentinel last, \ Context const & context, \ SkipParser const & skip, \ boost::parser::detail::flags flags, \ bool & success, \ bool & dont_assign, \ Attribute & retval) \ { \ auto const & parser = BOOST_PARSER_PP_CAT(rule_name_, _def); \ using attr_t = \ decltype(parser(first, last, context, skip, flags, success)); \ if constexpr (boost::parser::detail::is_nope_v<attr_t>) { \ parser(first, last, context, skip, flags, success); \ } else { \ parser(first, last, context, skip, flags, success, retval); \ } \ }
Now that we have the doubles
parser, we can use it like we might any other parser:
现在我们有了 doubles
解析器,我们可以像使用任何其他解析器一样使用它:
auto const result = bp::parse(input, doubles, bp::ws);
The full program: 整个程序:
#include <boost/parser/parser.hpp> #include <deque> #include <iostream> #include <string> namespace bp = boost::parser; bp::rule<struct doubles, std::vector<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULES(doubles); int main() { std::cout << "Please enter a list of doubles, separated by commas. "; std::string input; std::getline(std::cin, input); auto const result = bp::parse(input, doubles, bp::ws); if (result) { std::cout << "You entered:\n"; for (double x : *result) { std::cout << x << "\n"; } } else { std::cout << "Parse failure.\n"; } }
All this is intended to introduce the notion of rules
. It still may be a bit
unclear why you would want to use rules
. The use cases for, and
lots of detail about, rules
is in a later section,
More About Rules.
所有这些旨在引入 rules
的概念。它仍然可能有点不清楚你为什么想使用 rules
。关于 rules
的使用案例和大量细节将在后面的章节“更多关于规则”中介绍。
So far, we've seen only simple parsers that parse the same value repeatedly
(with or without commas and spaces). It's also very common to parse a few
values in a specific sequence. Let's say you want to parse an employee record.
Here's a parser you might write:
到目前为止,我们只看到过简单的解析器,它们反复解析相同的值(带或不带逗号和空格)。解析特定顺序的几个值也非常常见。比如说,你想解析一个员工记录。下面是一个你可能编写的解析器:
namespace bp = boost::parser; auto employee_parser = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}';
The attribute type for employee_parser
is boost::parser::tuple<int,
std::string, std::string, double>
.
That's great, in that you got all the parsed data for the record without
having to write any semantic actions. It's not so great that you now have
to get all the individual elements out by their indices, using get()
.
It would be much nicer to parse into the final data structure that your program
is going to use. This is often some struct
or class
. Boost.Parser supports
parsing into arbitrary aggregate struct
s,
and non-aggregates that are constructible from the tuple at hand.
employee_parser
的属性类型是 boost::parser::tuple<int,
std::string, std::string, double>
。这很好,因为你得到了记录的所有解析数据,而无需编写任何语义操作。现在你必须通过索引使用 get()
来获取所有单个元素,这就不那么好了。如果能解析成程序将要使用的最终数据结构会更好。这通常是某些 struct
或 class
。Boost.Parser 支持将解析结果存储到任意聚合 struct
中,以及可以从当前元组构造的非聚合结构。
If we have a struct
that has
data members of the same types listed in the boost::parser::tuple
attribute type for employee_parser
, it would be nice to parse
directly into it, instead of parsing into a tuple and then constructing our
struct
later. Fortunately, this
just works in Boost.Parser. Here is an example of parsing straight into a
compatible aggregate type.
如果我们有一个具有与 boost::parser::tuple
属性类型中列出的相同类型的数据成员的 struct
,直接解析到它中会更好,而不是先解析到一个元组,然后再构建我们的 struct
。幸运的是,这正好在 Boost.Parser 中工作。这是一个将数据直接解析到兼容聚合类型的示例。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> struct employee { int age; std::string surname; std::string forename; double salary; }; namespace bp = boost::parser; int main() { std::cout << "Enter employee record. "; std::string input; std::getline(std::cin, input); auto quoted_string = bp::lexeme['"' >> +(bp::char_ - '"') >> '"']; auto employee_p = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}'; employee record; auto const result = bp::parse(input, employee_p, bp::ws, record); if (result) { std::cout << "You entered:\nage: " << record.age << "\nsurname: " << record.surname << "\nforename: " << record.forename << "\nsalary : " << record.salary << "\n"; } else { std::cout << "Parse failure.\n"; } }
Unfortunately, this is taking advantage of the loose attribute assignment
logic; the employee_parser
parser still has a boost::parser::tuple
attribute. See The
parse()
API for a description of attribute out-param compatibility.
很不幸,这是利用了宽松的属性赋值逻辑; employee_parser
解析器仍然有一个 boost::parser::tuple
属性。请参阅 parse()
API 了解属性输出参数兼容性的描述。
For this reason, it's even more common to want to make a rule that returns
a specific type like employee
.
Just by giving the rule a struct
type, we make sure that this parser always generates an employee
struct as its attribute, no matter where it is in the parse. If we made a
simple parser P
that uses
the employee_p
rule, like
bp::int >> employee_p
, P
's
attribute type would be boost::parser::tuple<int, employee>
.
因此,更常见的是想要制定一个返回特定类型如 employee
的规则。只需给规则赋予 struct
类型,我们就可以确保这个解析器无论在解析的哪个位置,都始终生成一个 employee
结构作为其属性。如果我们创建一个简单的解析器 P
,它使用 employee_p
规则,如 bp::int >> employee_p
,那么 P
的属性类型将是 boost::parser::tuple<int, employee>
。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> struct employee { int age; std::string surname; std::string forename; double salary; }; namespace bp = boost::parser; bp::rule<struct quoted_string, std::string> quoted_string = "quoted name"; bp::rule<struct employee_p, employee> employee_p = "employee"; auto quoted_string_def = bp::lexeme['"' >> +(bp::char_ - '"') >> '"']; auto employee_p_def = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}'; BOOST_PARSER_DEFINE_RULES(quoted_string, employee_p); int main() { std::cout << "Enter employee record. "; std::string input; std::getline(std::cin, input); static_assert(std::is_aggregate_v<std::decay_t<employee &>>); auto const result = bp::parse(input, employee_p, bp::ws); if (result) { std::cout << "You entered:\nage: " << result->age << "\nsurname: " << result->surname << "\nforename: " << result->forename << "\nsalary : " << result->salary << "\n"; } else { std::cout << "Parse failure.\n"; } }
Just as you can pass a struct
as an out-param to parse()
when the parser's attribute type is a tuple,
you can also pass a tuple as an out-param to parse()
when the parser's attribute type is a struct:
正如您可以将一个 struct
作为 out-param 传递给 parse()
,当解析器的属性类型是元组时,您也可以将一个元组作为 out-param 传递给 parse()
,当解析器的属性类型是结构体时:
// Using the employee_p rule from above, with attribute type employee...
boost::parser::tuple
<int, std::string, std::string, double> tup;
auto const result = bp::parse(input, employee_p, bp::ws, tup); // Ok!
Important 重要 | |
---|---|
This automatic use of |
class
types as attributesclass
类型作为属性
Many times you don't have an aggregate struct that you want to produce from
your parse. It would be even nicer than the aggregate code above if Boost.Parser
could detect that the members of a tuple that is produced as an attribute
are usable as the arguments to some type's constructor. So, Boost.Parser
does that.
很多时候,你并不需要一个从你的解析中生成的聚合结构。如果 Boost.Parser 能够检测到作为属性生成的元组的成员可以用作某些类型的构造函数的参数,那么这将比上面的聚合代码更好。所以,Boost.Parser 就是这样做的。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a string followed by two unsigned integers. "; std::string input; std::getline(std::cin, input); constexpr auto string_uint_uint = bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_; std::string string_from_parse; if (parse(input, string_uint_uint, bp::ws, string_from_parse)) std::cout << "That yields this string: " << string_from_parse << "\n"; else std::cout << "Parse failure.\n"; std::cout << "Enter an unsigned integer followed by a string. "; std::getline(std::cin, input); std::cout << input << "\n"; constexpr auto uint_string = bp::uint_ >> +bp::char_; std::vector<std::string> vector_from_parse; if (parse(input, uint_string, bp::ws, vector_from_parse)) { std::cout << "That yields this vector of strings:\n"; for (auto && str : vector_from_parse) { std::cout << " '" << str << "'\n"; } } else { std::cout << "Parse failure.\n"; } }
Let's look at the first parse.
让我们看看第一次解析。
constexpr auto string_uint_uint = bp::lexeme[+(bp::char_ - ' ')] >> bp::uint_ >> bp::uint_; std::string string_from_parse; if (parse(input, string_uint_uint, bp::ws, string_from_parse)) std::cout << "That yields this string: " << string_from_parse << "\n"; else std::cout << "Parse failure.\n";
Here, we use the parser string_uint_uint
,
which produces a boost::parser::tuple<std::string, unsigned int, unsigned
int>
attribute. When we try to parse that into an out-param std::string
attribute, it just works. This is because std::string
has a constructor that takes a std::string
,
an offset, and a length. Here's the other parse:
这里,我们使用解析器 string_uint_uint
,它产生一个 boost::parser::tuple<std::string, unsigned int, unsigned
int>
属性。当我们尝试将其解析为 out-param std::string
属性时,它就成功了。这是因为 std::string
有一个构造函数,它接受一个 std::string
、一个偏移量和长度。这是另一个解析:
constexpr auto uint_string = bp::uint_ >> +bp::char_; std::vector<std::string> vector_from_parse; if (parse(input, uint_string, bp::ws, vector_from_parse)) { std::cout << "That yields this vector of strings:\n"; for (auto && str : vector_from_parse) { std::cout << " '" << str << "'\n"; } } else { std::cout << "Parse failure.\n"; }
Now we have the parser uint_string
,
which produces boost::parser::tuple<unsigned int, std::string>
attribute — the two char
s
at the end combine into a std::string
.
Those two values can be used to construct a std::vector<std::string>
, via the count, T
constructor.
现在我们有解析器 uint_string
,它产生 boost::parser::tuple<unsigned int, std::string>
属性——末尾的两个 char
结合成一个 std::string
。这两个值可以通过计数, T
构造函数来构建一个 std::vector<std::string>
。
Just like with using aggregates in place of tuples, non-aggregate class
types can be substituted for tuples
in most places. That includes using a non-aggregate class
type as the attribute type of a rule
.
就像用聚合体代替元组一样,大多数情况下可以用非聚合体 class
类型替换元组。这包括将非聚合体 class
类型用作 rule
的属性类型。
However, while compatible tuples can be substituted for aggregates, you
can't substitute a tuple for some class
type T
just because the tuple could have been used to construct T
.
Think of trying to invert the substitution in the second parse above. Converting
a std::vector<std::string>
into a boost::parser::tuple<unsigned int, std::string>
makes no sense.
然而,虽然兼容元组可以替换聚合,但你不能仅仅因为元组可以用来构建某个 class
类型 T
就替换它。想想在上述第二个解析中尝试反转替换。将一个 std::vector<std::string>
转换为 boost::parser::tuple<unsigned int, std::string>
没有意义。
Frequently, you need to parse something that might have one of several forms.
operator|
is overloaded to form alternative parsers. For example:
经常,你需要解析可能具有几种形式的内容。 operator|
被重载以形成替代解析器。例如:
namespace bp = boost::parser; auto const parser_1 = bp::int_ | bp::eps;
parser_1
matches an integer,
or if that fails, it matches epsilon, the empty string.
This is equivalent to writing:
parser_1
匹配一个整数,如果失败,则匹配空字符串 epsilon。这相当于写成:
namespace bp = boost::parser; auto const parser_2 = -bp::int_;
However, neither parser_1
nor parser_2
is equivalent
to writing this:
然而, parser_1
和 parser_2
都不等同于这样写:
namespace bp = boost::parser; auto const parser_3 = bp::eps | bp::int_; // Does not do what you think.
The reason is that alternative parsers try each of their subparsers, one
at a time, and stop on the first one that matches. Epsilon
matches anything, since it is zero length and consumes no input. It even
matches the end of input. This means that parser_3
is equivalent to eps
by itself.
原因是替代解析器逐个尝试它们的子解析器,并在第一个匹配的停止。Epsilon 匹配任何内容,因为它长度为零且不消耗任何输入。它甚至可以匹配输入的末尾。这意味着 parser_3
与 eps
本身等价。
Note 注意 | |
---|---|
For this reason, writing |
Warning 警告 | |
---|---|
This kind of error is very common when |
It is very common to need to parse quoted strings. Quoted strings are slightly
tricky, though, when using a skipper (and you should be using a skipper 99%
of the time). You don't want to allow arbitrary whitespace in the middle
of your strings, and you also don't want to remove all whitespace from your
strings. Both of these things will happen with the typical skipper, ws
.
需要解析引号字符串的情况非常常见。然而,当使用跳过符时(你应该 99%的时间使用跳过符),引号字符串会变得稍微棘手一些。你不想在字符串中间允许任意空白字符,同时也不想从字符串中移除所有空白字符。典型的跳过符 ws
会导致这两种情况都发生。
So, here is how most people would write a quoted string parser:
所以,这是大多数人编写引号字符串解析器的方式:
namespace bp = boost::parser; const auto string = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];
Some things to note:
请注意以下几点:
lexeme[]
disables skipping in the
parser, and it must be written around the quotes, not around the operator*
expression; and
lexeme[]
禁用解析器的跳过功能,并且它必须写在引号周围,而不是 operator*
表达式周围;
This is a very common pattern. I have written a quoted string parser like
this dozens of times. The parser above is the quick-and-dirty version. A
more robust version would be able to handle escaped quotes within the string,
and then would immediately also need to support escaped escape characters.
这是一个非常常见的模式。我像这样写过几十次引号字符串解析器。上面的解析器是快速且简单的版本。一个更健壮的版本将能够处理字符串中的转义引号,然后还需要立即支持转义转义字符。
Boost.Parser provides quoted_string
to use in place
of this very common pattern. It supports quote- and escaped-character-escaping,
using backslash as the escape character.
Boost.Parser 提供 quoted_string
来替代这个非常常见的模式。它支持引号和转义字符转义,使用反斜杠作为转义字符。
namespace bp = boost::parser; auto result1 = bp::parse("\"some text\"", bp::quoted_string, bp::ws); assert(result1); std::cout << *result1 << "\n"; // Prints: some text auto result2 = bp::parse("\"some \\\"text\\\"\"", bp::quoted_string, bp::ws); assert(result2); std::cout << *result2 << "\n"; // Prints: some "text"
As common as this use case is, there are very similar use cases that it does
not cover. So, quoted_string
has some options.
If you call it with a single character, it returns a quoted_string
that uses that
single character as the quote-character.
与这种用例一样常见的是,还有一些非常类似的用例它没有涵盖。因此, quoted_string
有一些选项。如果你用单个字符调用它,它就返回一个使用该单个字符作为引号字符的 quoted_string
。
auto result3 = bp::parse("!some text!", bp::quoted_string('!'), bp::ws); assert(result3); std::cout << *result3 << "\n"; // Prints: some text
You can also supply a range of characters. One of the characters from the
range must quote both ends of the string; mismatches are not allowed. Think
of how Python allows you to quote a string with either '"'
or '\''
, but the same character
must be used on both sides.
您也可以提供一组字符。该范围内的一个字符必须引用字符串的两端;不允许有误匹配。想想 Python 如何允许您使用 '"'
或 '\''
来引用字符串,但两侧必须使用相同的字符。
auto result4 = bp::parse("'some text'", bp::quoted_string("'\""), bp::ws); assert(result4); std::cout << *result4 << "\n"; // Prints: some text
Another common thing to do in a quoted string parser is to recognize escape
sequences. If you have simple escape sequencecs that do not require any real
parsing, like say the simple escape sequences from C++, you can provide a
symbols
object as well. The template parameter T
to symbols<T>
must be char
or char32_t
. You don't need to include the escaped
backslash or the escaped quote character, since those always work.
另一项在引号字符串解析器中常见的操作是识别转义序列。如果您有简单的转义序列,不需要任何实际解析,比如 C++中的简单转义序列,您也可以提供一个 symbols
对象。模板参数 T
到 symbols<T>
必须是 char
或 char32_t
。您不需要包含转义的反斜杠或转义的引号字符,因为那些总是有效的。
// the c++ simple escapes bp::symbols<char> const escapes = { {"'", '\''}, {"?", '\?'}, {"a", '\a'}, {"b", '\b'}, {"f", '\f'}, {"n", '\n'}, {"r", '\r'}, {"t", '\t'}, {"v", '\v'}}; auto result5 = bp::parse("\"some text\r\"", bp::quoted_string('"', escapes), bp::ws); assert(result5); std::cout << *result5 << "\n"; // Prints (with a CRLF newline): some text
Now that you've seen some examples, let's see how parsing works in a bit
more detail. Consider this example.
现在你已经看到了一些例子,让我们更详细地看看解析是如何工作的。考虑这个例子。
namespace bp = boost::parser; auto int_pair = bp::int_ >> bp::int_; // Attribute: tuple<int, int> auto int_pairs_plus = +int_pair >> bp::int_; // Attribute: tuple<std::vector<tuple<int, int>>, int>
int_pairs_plus
must match
a pair of int
s (using int_pair
) one or more times, and then must
match an additional int
. In
other words, it matches any odd number (greater than 1) of int
s in the input. Let's look at how this
parse proceeds.
int_pairs_plus
必须匹配一对 int
s(使用 int_pair
),一次或多次,然后必须匹配一个额外的 int
。换句话说,它匹配输入中任何奇数(大于 1)的 int
s。让我们看看这个解析是如何进行的。
auto result = bp::parse("1 2 3", int_pairs_plus, bp::ws);
At the beginning of the parse, the top level parser uses its first subparser
(if any) to start parsing. So, int_pairs_plus
,
being a sequence parser, would pass control to its first parser +int_pair
.
Then +int_pair
would use int_pair
to do
its parsing, which would in turn use bp::int_
.
This creates a stack of parsers, each one using a particular subparser.
在解析开始时,顶级解析器使用其第一个子解析器(如果有)来开始解析。因此,作为序列解析器的 int_pairs_plus
会将控制权传递给其第一个解析器 +int_pair
。然后 +int_pair
会使用 int_pair
进行解析,而 int_pair
又会使用 bp::int_
。这创建了一个解析器栈,每个解析器都使用特定的子解析器。
Step 1) The input is "1 2 3"
,
and the stack of active parsers is int_pairs_plus
-> +int_pair
-> int_pair
-> bp::int_
.
(Read "->" as "uses".) This parses "1"
,
and the whitespace after is skipped by bp::ws
. Control
passes to the second bp::int_
parser in int_pair
.
步骤 1)输入为 "1 2 3"
,活动解析器栈为 int_pairs_plus
-> +int_pair
-> int_pair
-> bp::int_
。(将"->"读作"使用"。)这解析 "1"
,后面的空白由 bp::ws
跳过。控制权传递到 int_pair
中的第二个 bp::int_
解析器。
Step 2) The input is "2 3"
and the stack of parsers looks the same, except the active parser is the
second bp::int_
from int_pair
.
This parser consumes "2"
and then bp::ws
skips the subsequent space. Since we've
finished with int_pair
's
match, its boost::parser::tuple<int,
int>
attribute is complete. It's parent is +int_pair
, so this tuple attribute is pushed
onto the back of +int_pair
's
attribute, which is a std::vector<boost::parser::tuple<int, int>>
. Control passes up to the parent
of int_pair
, +int_pair
.
Since +int_pair
is a one-or-more parser, it starts a new iteration; control passes to int_pair
again.
步骤 2)输入是 "2 3"
,解析器栈看起来相同,除了活动解析器是第二个 bp::int_
从 int_pair
。这个解析器消耗 "2"
,然后 bp::ws
跳过后续空格。由于我们已经完成了 int_pair
的匹配,其 boost::parser::tuple<int,
int>
属性已完成。它的父级是 +int_pair
,因此这个元组属性被推到 +int_pair
的属性后面, +int_pair
是一个 std::vector<boost::parser::tuple<int, int>>
。控制权传递到 int_pair
的父级, +int_pair
。由于 +int_pair
是一个一次或多次解析器,它开始新的迭代;控制权再次传递到 int_pair
。
Step 3) The input is "3"
and the stack of parsers looks the same, except the active parser is the
first bp::int_
from int_pair
again, and we're in the second iteration of +int_pair
. This parser consumes "3"
. Since this is the end of the
input, the second bp::int_
of int_pair
does not match. This partial match of "3"
should not count, since it was not part of a full match. So, int_pair
indicates its failure, and +int_pair
stops iterating. Since it did match once, +int_pair
does not fail; it is a zero-or-more
parser; failure of its subparser after the first success does not cause it
to fail. Control passes to the next parser in sequence within int_pairs_plus
.
步骤 3)输入是 "3"
,解析器栈看起来相同,除了活动解析器是第一个从 int_pair
开始的 bp::int_
,并且我们处于 +int_pair
的第二次迭代。此解析器消耗 "3"
。由于这是输入的末尾, int_pair
的第二个 bp::int_
不匹配。这个 "3"
的部分匹配不应计算,因为它不是完整匹配的一部分。因此, int_pair
指示其失败, +int_pair
停止迭代。由于它已经匹配过一次, +int_pair
不会失败;它是一个零次或多次解析器;其子解析器在第一次成功后的失败不会导致它失败。控制传递到 int_pairs_plus
中的下一个解析器。
Step 4) The input is "3"
again, and the stack of parsers is int_pairs_plus
-> bp::int_
. This parses the "3"
,
and the parse reaches the end of input. Control passes to int_pairs_plus
,
which has just successfully matched with all parser in its sequence. It then
produces its attribute, a boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>
, which gets returned from bp::parse()
.
步骤 4)输入再次为 "3"
,解析器栈为 int_pairs_plus
-> bp::int_
。这解析了 "3"
,解析到达输入末尾。控制传递到 int_pairs_plus
,它刚刚成功匹配其序列中的所有解析器。然后它产生其属性,一个 boost::parser::tuple<std::vector<boost::parser::tuple<int, int>>, int>
,从 bp::parse()
返回。
Something to take note of between Steps #3 and #4: at the beginning of #4,
the input position had returned to where is was at the beginning of #3. This
kind of backtracking happens in alternative parsers when an alternative fails.
The next page has more details on the semantics of backtracking.
请注意步骤#3 和#4 之间的内容:在#4 的开始,输入位置回到了#3 的开始处。这种回溯发生在替代解析器中,当替代失败时。下一页有更多关于回溯语义的细节。
So far, parsers have been presented as somewhat abstract entities. You may
be wanting more detail. A Boost.Parser parser P
is an invocable object with a pair of call operator overloads. The two functions
are very similar, and in many parsers one is implemented in terms of the
other. The first function does the parsing and returns the default attribute
for the parser. The second function does exactly the same parsing, but takes
an out-param into which it writes the attribute for the parser. The out-param
does not need to be the same type as the default attribute, but they need
to be compatible.
到目前为止,解析器被呈现为某种程度上的抽象实体。你可能想要更多细节。一个 Boost.Parser 解析器 P
是一个可调用的对象,具有一对重载的调用操作符。这两个函数非常相似,在许多解析器中,一个是通过另一个实现的。第一个函数执行解析并返回解析器的默认属性。第二个函数执行完全相同的解析,但将解析器的属性写入一个输出参数。输出参数不需要与默认属性相同类型,但它们需要兼容。
Compatibility means that the default attribute is assignable to the out-param
in some fashion. This usually means direct assignment, but it may also mean
a tuple -> aggregate or aggregate -> tuple conversion. For sequence
types, compatibility means that the sequence type has insert
or push_back
with the usual
semantics. This means that the parser +boost::parser::int_
can fill a std::set<int>
just
as well as a std::vector<int>
.
兼容性意味着默认属性可以以某种方式分配给输出参数。这通常意味着直接赋值,但也可能意味着元组到聚合或聚合到元组的转换。对于序列类型,兼容性意味着序列类型具有 insert
或 push_back
与常规语义。这意味着解析器 +boost::parser::int_
可以像 std::set<int>
一样填充 std::vector<int>
。
Some parsers also have additional state that is required to perform a match.
For instance, char_
parsers
can be parameterized with a single code point to match; the exact value of
that code point is stored in the parser object.
一些解析器还需要额外的状态来执行匹配。例如, char_
解析器可以用单个码点进行参数化以进行匹配;该码点的确切值存储在解析器对象中。
No parser has direct support for all the operations defined on parsers (operator|
,
operator>>
,
etc.). Instead, there is a template called parser_interface
that supports
all of these operations. parser_interface
wraps each
parser, storing it as a data member, adapting it for general use. You should
only ever see parser_interface
in the debugger,
or possibly in some of the reference documentation. You should never have
to write it in your own code.
没有解析器直接支持在解析器上定义的所有操作( operator|
, operator>>
等)。相反,有一个名为 parser_interface
的模板支持所有这些操作。 parser_interface
包装每个解析器,将其存储为数据成员,以便于通用使用。你只能在调试器中看到 parser_interface
,或者在部分参考文档中。你永远不需要在自己的代码中编写它。
As described in the previous page, backtracking occurs when the parse attempts
to match the current parser P
,
matches part of the input, but fails to match all of P
.
The part of the input consumed during the parse of P
is essentially "given back".
如前页所述,当解析尝试匹配当前解析器 P
时,匹配了输入的一部分,但未能匹配所有 P
。在解析 P
时消耗的输入部分实际上是“返回”。
This is necessary because P
may consist of subparsers, and each subparser that succeeds will try to consume
input, produce attributes, etc. When a later subparser fails, the parse of
P
fails, and the input must
be rewound to where it was when P
started its parse, not where the latest matching subparser stopped.
这是必要的,因为 P
可能包含子解析器,每个成功的子解析器都会尝试消费输入、生成属性等。当后续的子解析器失败时, P
的解析也会失败,并且输入必须回滚到 P
开始解析时的位置,而不是最新匹配的子解析器停止的位置。
Alternative parsers will often evaluate multiple subparsers one at a time,
advancing and then restoring the input position, until one of the subparsers
succeeds. Consider this example.
替代解析器通常会逐个评估多个子解析器,前进并恢复输入位置,直到其中一个子解析器成功。考虑这个例子。
namespace bp = boost::parser; auto const parser = repeat(53)[other_parser] | repeat(10)[other_parser];
Evaluating parser
means trying
to match other_parser
53
times, and if that fails, trying to match other_parser
10 times. Say you parse input that matches other_parser
11 times. parser
will match
it. It will also evaluate other_parser
21 times during the parse.
评估 parser
意味着尝试匹配 other_parser
53 次,如果失败,则尝试匹配 other_parser
10 次。假设你解析了匹配 other_parser
11 次的输入。 parser
将匹配它。在解析过程中,它还将评估 other_parser
21 次。
The attributes of the repeat(53)[other_parser]
and repeat(10)[other_parser]
are each std::vector<
; let's say that ATTR
(other_parser)>
is ATTR
(other_parser)int
.
The attribute of parser
as
a whole is the same, std::vector<int>
.
Since other_parser
is busy
producing int
s — 21 of
them to be exact — you may be wondering what happens to the ones produced
during the evaluation of repeat(53)[other_parser]
when it fails to find all 53 inputs. Its std::vector<int>
will contain 11 int
s at that
point.
repeat(53)[other_parser]
和 repeat(10)[other_parser]
的属性各为 std::vector<
;假设 ATTR
(other_parser)>
是 ATTR
(other_parser)int
。 parser
的整体属性相同,为 std::vector<int>
。由于 other_parser
正在忙于生产 int
,确切地说有 21 个——你可能想知道在 repeat(53)[other_parser]
未能找到所有 53 个输入时,在评估期间产生的那些会发生什么。那时它的 std::vector<int>
将包含 11 个 int
。
When a repeat-parser fails, and attributes are being generated, it clears
its container. This applies to parsers such as the ones above, but also all
the other repeat parsers, including ones made using operator+
or operator*
.
当重复解析器失败且正在生成属性时,它会清除其容器。这适用于上述解析器,也适用于所有其他重复解析器,包括使用 operator+
或 operator*
制作的解析器。
So, at the end of a successful parse by parser
of 10 inputs (since the right side of the alternative only eats 10 repetitions),
the std::vector<int>
attribute
of parser
would contain 10
int
s.
因此,在通过 parser
成功解析 10 个输入的末尾(因为替代项的右侧只吃 10 次重复), parser
的 std::vector<int>
属性将包含 10 个 int
。
Note 注意 | |
---|---|
Users of Boost.Spirit may be familiar with the |
Ok, so if parsers all try their best to match the input, and are all-or-nothing,
doesn't that leave room for all kinds of bad input to be ignored? Consider
the top-level parser from the Parsing
JSON example.
好的,所以如果所有解析器都尽力匹配输入,并且都是全有或全无的,那么这不是为各种不良输入留出了空间吗?考虑一下“解析 JSON 示例”中的顶级解析器。
auto const value_p_def = number | bp::bool_ | null | string | array_p | object_p;
What happens if I use this to parse "\""
?
The parse tries number
, fails.
It then tries bp::bool_
, fails. Then null
fails too. Finally, it starts parsing string
.
Good news, the first character is the open-quote of a JSON string. Unfortunately,
that's also the end of the input, so string
must fail too. However, we probably don't want to just give up on parsing
string
now and try array_p
, right? If the user wrote an open-quote
with no matching close-quote, that's not the prefix of some later alternative
of value_p_def
; it's ill-formed
JSON. Here's the parser for the string
rule:
如果我用这个来解析 "\""
会发生什么?解析尝试 number
,失败。然后尝试 bp::bool_
,也失败了。接着 null
也失败了。最后,它开始解析 string
。好消息,第一个字符是 JSON 字符串的开引号。不幸的是,这也是输入的结尾,所以 string
也必须失败。然而,我们现在可能不想放弃解析 string
并尝试 array_p
,对吧?如果用户写了一个没有匹配闭合引号的开放引号,那不是 value_p_def
的某些后续替代的前缀;这是不规范的 JSON。这是 string
规则的解析器:
auto const string_def = bp::lexeme['"' >> *(string_char - '"') > '"'];
Notice that operator>
is used on the right instead of operator>>
. This indicates the same sequence
operation as operator>>
,
except that it also represents an expectation. If the parse before the operator>
succeeds, whatever comes after it must also
succeed. Otherwise, the top-level parse is failed, and a diagnostic is emitted.
It will say something like "Expected '"' here.", quoting the
line, with a caret pointing to the place in the input where it expected the
right-side match.
请注意,在右侧使用的是 operator>
而不是 operator>>
。这表示与 operator>>
相同的序列操作,但同时也代表了一种期望。如果在 operator>
之前的解析成功,那么它之后的内容也必须成功。否则,顶级解析将失败,并发出诊断。它可能会说“在这里期望'\"'”,引用该行,并用一个箭头指向输入中期望右侧匹配的位置。
Choosing to use >
versus
>>
is how you indicate
to Boost.Parser that parse failure is or is not a hard error, respectively.
选择使用 >
与 >>
来指示 Boost.Parser 解析失败是或不是硬错误。
When writing a parser, it often comes up that there is a set of strings that,
when parsed, are associated with a set of values one-to-one. It is tedious
to write parsers that recognize all the possible input strings when you have
to associate each one with an attribute via a semantic action. Instead, we
can use a symbol table.
在编写解析器时,经常会出现一组字符串,当解析时,它们与一组值一一对应。当你必须通过语义动作将每个字符串与一个属性关联时,编写识别所有可能输入字符串的解析器是繁琐的。相反,我们可以使用符号表。
Say we want to parse Roman numerals, one of the most common work-related
parsing problems. We want to recognize numbers that start with any number
of "M"s, representing thousands, followed by the hundreds, the
tens, and the ones. Any of these may be absent from the input, but not all.
Here are three symbol Boost.Parser tables that we can use to recognize ones,
tens, and hundreds values, respectively:
我们想要解析罗马数字,这是最常见的与工作相关解析问题之一。我们想要识别以任意数量的"M"开头的数字,代表千位,然后是百位、十位和个位。这些中的任何一个都可以从输入中省略,但不能全部省略。以下是三个符号 Boost.Parser 表,我们可以使用它们分别识别个位、十位和百位的值:
bp::symbols<int> const ones = { {"I", 1}, {"II", 2}, {"III", 3}, {"IV", 4}, {"V", 5}, {"VI", 6}, {"VII", 7}, {"VIII", 8}, {"IX", 9}}; bp::symbols<int> const tens = { {"X", 10}, {"XX", 20}, {"XXX", 30}, {"XL", 40}, {"L", 50}, {"LX", 60}, {"LXX", 70}, {"LXXX", 80}, {"XC", 90}}; bp::symbols<int> const hundreds = { {"C", 100}, {"CC", 200}, {"CCC", 300}, {"CD", 400}, {"D", 500}, {"DC", 600}, {"DCC", 700}, {"DCCC", 800}, {"CM", 900}};
A symbols
maps strings of char
to their
associated attributes. The type of the attribute must be specified as a template
parameter to symbols
— in this case, int
.
一个 symbols
将 char
的字符串映射到其关联的属性。属性的类型必须作为模板参数指定给 symbols
— 在这种情况下, int
。
Any "M"s we encounter should add 1000 to the result, and all other
values come from the symbol tables. Here are the semantic actions we'll need
to do that:
任何遇到的“M”都应该将结果加 1000,其他所有值都来自符号表。以下是我们需要执行的语义动作:
int result = 0; auto const add_1000 = [&result](auto & ctx) { result += 1000; }; auto const add = [&result](auto & ctx) { result += _attr(ctx); };
add_1000
just adds 1000
to result
.
add
adds whatever attribute
is produced by its parser to result
.
add_1000
仅将 1000
添加到 result
。 add
将其解析器产生的任何属性添加到 result
。
Now we just need to put the pieces together to make a parser:
现在我们只需要将这些部分组合起来制作一个解析器:
using namespace bp::literals; auto const parser = *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add];
We've got a few new bits in play here, so let's break it down. 'M'_l
is a
literal parser. That is, it is a parser that parses
a literal char
, code point,
or string. In this case, a char
'M'
is being parsed. The _l
bit at the end is a UDL
suffix that you can put after any char
,
char32_t
, or char
const *
to form a literal parser. You can also make a literal parser by writing
lit()
, passing an argument of
one of the previously mentioned types.
我们在这里有一些新的功能,让我们来分解一下。 'M'_l
是一个字面量解析器。也就是说,它是一个解析字面量 char
、代码点或字符串的解析器。在这种情况下,正在解析一个 char
'M'
。末尾的 _l
位是一个 UDL 后缀,您可以在任何 char
、 char32_t
或 char
const *
后面添加它来形成一个字面量解析器。您还可以通过编写 lit()
并传递之前提到的类型之一作为参数来创建一个字面量解析器。
Why do we need any of this, considering that we just used a literal ','
in our previous example? The reason is that
'M'
is not used in an expression
with another Boost.Parser parser. It is used within *'M'_l[add_1000]
.
If we'd written *'M'[add_1000]
, clearly that would be ill-formed; char
has no operator*
, nor an operator[]
, associated with it.
为什么我们需要这些,考虑到我们之前例子中刚刚使用了字面量 ','
?原因是 'M'
不在另一个 Boost.Parser 解析器中的表达式中使用。它是在 *'M'_l[add_1000]
中使用的。如果我们写了 *'M'[add_1000]
,显然那是非法的; char
没有与它相关的 operator*
,也没有 operator[]
。
Tip 提示 | |
---|---|
Any time you want to use a |
On to the next bit: -hundreds[add]
.
By now, the use of the index operator should be pretty familiar; it associates
the semantic action add
with
the parser hundreds
. The
operator-
at the beginning is new. It means that the parser it is applied to is optional.
You can read it as "zero or one". So, if hundreds
is not successfully parsed after *'M'[add_1000]
, nothing happens, because hundreds
is allowed to be missing —
it's optional. If hundreds
is parsed successfully, say by matching "CC"
,
the resulting attribute, 200
,
is added to result
inside
add
.
接下来是下一部分: -hundreds[add]
。到现在,索引操作符的使用应该已经很熟悉了;它与解析器 hundreds
关联语义动作 add
。开头的 operator-
是新的。这意味着应用到的解析器是可选的。你可以把它读作“零或一”。所以,如果 hundreds
在 *'M'[add_1000]
之后没有成功解析,就没有什么发生,因为 hundreds
可以缺失——它是可选的。如果 hundreds
成功解析,比如说通过匹配 "CC"
,结果属性 200
将被添加到 result
中的 add
内部。
Here is the full listing of the program. Notice that it would have been inappropriate
to use a whitespace skipper here, since the entire parse is a single number,
so it was removed.
这里是程序的完整列表。请注意,在这里使用空格跳过是不合适的,因为整个解析是一个单独的数字,所以它被移除了。
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { std::cout << "Enter a number using Roman numerals. "; std::string input; std::getline(std::cin, input); bp::symbols<int> const ones = { {"I", 1}, {"II", 2}, {"III", 3}, {"IV", 4}, {"V", 5}, {"VI", 6}, {"VII", 7}, {"VIII", 8}, {"IX", 9}}; bp::symbols<int> const tens = { {"X", 10}, {"XX", 20}, {"XXX", 30}, {"XL", 40}, {"L", 50}, {"LX", 60}, {"LXX", 70}, {"LXXX", 80}, {"XC", 90}}; bp::symbols<int> const hundreds = { {"C", 100}, {"CC", 200}, {"CCC", 300}, {"CD", 400}, {"D", 500}, {"DC", 600}, {"DCC", 700}, {"DCCC", 800}, {"CM", 900}}; int result = 0; auto const add_1000 = [&result](auto & ctx) { result += 1000; }; auto const add = [&result](auto & ctx) { result += _attr(ctx); }; using namespace bp::literals; auto const parser = *'M'_l[add_1000] >> -hundreds[add] >> -tens[add] >> -ones[add]; if (bp::parse(input, parser) && result != 0) std::cout << "That's " << result << " in Arabic numerals.\n"; else std::cout << "That's not a Roman number.\n"; }
Important 重要 | |
---|---|
|
Just like with a rule
,
you can give a symbols
a bit of diagnostic text that will be used in error messages generated by
Boost.Parser when the parse fails at an expectation point, as described in
Error
Handling and Debugging. See the symbols
constructors for details.
就像使用 rule
一样,您可以为 symbols
提供一些诊断文本,这些文本将在 Boost.Parser 在期望点解析失败时生成的错误消息中使用,如错误处理和调试中所述。有关详细信息,请参阅 symbols
构造函数。
The previous example showed how to use a symbol table as a fixed lookup table.
What if we want to add things to the table during the parse? We can do that,
but we need to do so within a semantic action. First, here is our symbol
table, already with a single value in it:
前一个示例展示了如何使用符号表作为固定查找表。如果我们想在解析过程中向表中添加内容怎么办?我们可以这样做,但需要在语义动作中完成。首先,这是我们的符号表,其中已经包含了一个值:
bp::symbols<int> const symbols = {{"c", 8}}; assert(parse("c", symbols));
No surprise that it works to use the symbol table as a parser to parse the
one string in the symbol table. Now, here's our parser:
没有任何惊讶,使用符号表作为解析器来解析符号表中的一个字符串是可行的。现在,这是我们的解析器:
auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;
Here, we've attached the semantic action not to a simple parser like double_
,
but to the sequence parser (bp::char_
>> bp::int_)
. This sequence parser contains two parsers,
each with its own attribute, so it produces two attributes as a tuple.
这里,我们将语义动作附加到序列解析器 (bp::char_
>> bp::int_)
,而不是简单的解析器 double_
。这个序列解析器包含两个解析器,每个解析器都有自己的属性,因此它产生一个包含两个属性的元组。
auto const add_symbol = [&symbols](auto & ctx) { using namespace bp::literals; // symbols::insert() requires a string, not a single character. char chars[2] = {_attr(ctx)[0_c], 0}; symbols.insert(ctx, chars, _attr(ctx)[1_c]); };
Inside the semantic action, we can get the first element of the attribute
tuple using UDLs
provided by Boost.Hana, and boost::hana::tuple::operator[]()
. The first attribute, from the char_
,
is _attr(ctx)[0_c]
, and
the second, from the int_
, is _attr(ctx)[1_c]
(if boost::parser::tuple
aliases to std::tuple
, you'd use std::get
or
boost::parser::get
instead). To add the symbol to the symbol table, we call insert()
.
在语义动作中,我们可以使用 Boost.Hana 提供的 UDL 获取属性元组的第一个元素,以及 boost::hana::tuple::operator[]()
。第一个属性,来自 char_
,是 _attr(ctx)[0_c]
,第二个,来自 int_
,是 _attr(ctx)[1_c]
(如果 boost::parser::tuple
别名到 std::tuple
,则使用 std::get
或 boost::parser::get
)。要将符号添加到符号表中,我们调用 insert()
。
auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols;
During the parse, ("X", 9)
is parsed and added to the symbol table. Then, the second 'X'
is recognized by the symbol table parser. However:
在解析过程中, ("X", 9)
被解析并添加到符号表中。然后,符号表解析器识别了第二个 'X'
。然而:
assert(!parse("X", symbols));
If we parse again, we find that "X"
did not stay in the symbol table. The fact that symbols
was declared const might have given you a hint that this would happen.
如果我们再次解析,我们会发现 "X"
没有留在符号表中。 symbols
被声明为 const 的事实可能已经给你暗示了这种情况会发生。
The full program: 整个程序:
#include <boost/parser/parser.hpp> #include <iostream> #include <string> namespace bp = boost::parser; int main() { bp::symbols<int> const symbols = {{"c", 8}}; assert(parse("c", symbols)); auto const add_symbol = [&symbols](auto & ctx) { using namespace bp::literals; // symbols::insert() requires a string, not a single character. char chars[2] = {_attr(ctx)[0_c], 0}; symbols.insert(ctx, chars, _attr(ctx)[1_c]); }; auto const parser = (bp::char_ >> bp::int_)[add_symbol] >> symbols; auto const result = parse("X 9 X", parser, bp::ws); assert(result && *result == 9); (void)result; assert(!parse("X", symbols)); }
Important 重要 | |
---|---|
|
It is possible to add symbols to a symbols
permanently. To do
so, you have to use a mutable symbols
object s
, and add the symbols by calling s.insert_for_next_parse()
, instead of s.insert()
. These two operations are orthogonal, so
if you want to both add a symbol to the table for the current top-level parse,
and leave it in the table for subsequent top-level parses, you need to call
both functions.
可以永久地向 symbols
添加符号。为此,您必须使用可变 symbols
对象 s
,并通过调用 s.insert_for_next_parse()
添加符号,而不是 s.insert()
。这两个操作是正交的,因此如果您想同时将符号添加到当前顶级解析的表中,并保留在后续顶级解析的表中,您需要调用这两个函数。
It is also possible to erase a single entry from the symbol table, or to
clear the symbol table entirely. Just as with insertion, there are versions
of erase and clear for the current parse, and another that applies only to
subsequent parses. The full set of operations can be found in the symbols
API docs.
也可以从符号表中删除单个条目,或者完全清除符号表。与插入类似,删除和清除操作也有针对当前解析的版本,以及仅适用于后续解析的版本。完整的操作集可以在 symbols
API 文档中找到。
[mpte There are two versions of each of the symbols
*_for_next_parse()
functions — one that takes a context, and one that does not. The one
with the context is meant to be used within a semantic action. The one without
the context is for use outside of any parse.]
[mpte 每个 symbols
*_for_next_parse()
函数都有两个版本——一个接受上下文,一个不接受。带有上下文的版本旨在在语义动作中使用。不带上下文的版本用于任何解析之外。]
Boost.Parser comes with all the parsers most parsing tasks will ever need.
Each one is a constexpr
object,
or a constexpr
function. Some
of the non-functions are also callable, such as char_
, which may be used directly,
or with arguments, as in char_
('a', 'z')
. Any parser that can be called, whether
a function or callable object, will be called a callable parser
from now on. Note that there are no nullary callable parsers; they each take
one or more arguments.
Boost.Parser 附带所有大多数解析任务所需的解析器。每个解析器都是一个 constexpr
对象,或者一个 constexpr
函数。其中一些非函数也是可调用的,例如 char_
,可以直接使用,或者带参数使用,如 char_
('a', 'z')
。任何可以调用的解析器,无论是函数还是可调用对象,从现在起都称为可调用解析器。请注意,没有无参可调用解析器;它们每个都接受一个或多个参数。
Each callable parser takes one or more parse arguments.
A parse argument may be a value or an invocable object that accepts a reference
to the parse context. The reference parameter may be mutable or constant.
For example:
每个可调用的解析器接受一个或多个解析参数。解析参数可能是一个值或接受解析上下文引用的可调用对象。引用参数可以是可变的或常量的。例如:
struct get_attribute { template<typename Context> auto operator()(Context & ctx) { return _attr(ctx); } };
This can also be a lambda. For example:
这也可以是一个 lambda。例如:
[](auto const & ctx) { return _attr(ctx); }
The operation that produces a value from a parse argument, which may be a
value or a callable taking a parse context argument, is referred to as resolving
the parse argument. If a parse argument arg
can be called with the current context, then the resolved value of arg
is arg(ctx)
;
otherwise, the resolved value is just arg
.
解析参数的操作,该参数可能是一个值或一个接受解析上下文参数的可调用对象,被称为解析参数的解析。如果解析参数 arg
可以在当前上下文中调用,则 arg
的解析值为 arg(ctx)
;否则,解析值就是 arg
。
Some callable parsers take a parse predicate. A parse
predicate is not quite the same as a parse argument, because it must be a
callable object, and cannot be a value. A parse predicate's return type must
be contextually convertible to bool
.
For example:
一些可调用的解析器接受一个解析谓词。解析谓词并不完全等同于解析参数,因为它必须是一个可调用对象,而不能是一个值。解析谓词的返回类型必须能够上下文转换成 bool
。例如:
struct equals_three { template<typename Context> bool operator()(Context const & ctx) { return _attr(ctx) == 3; } };
This may of course be a lambda:
这当然可能是一个 lambda:
[](auto & ctx) { return _attr(ctx) == 3; }
The notional macro RESOLVE
()
expands to the result of resolving a parse
argument or parse predicate. You'll see it used in the rest of the documentation.
该概念宏 RESOLVE
()
扩展为解析参数或解析谓词解析的结果。您将在文档的其余部分看到它的使用。
An example of how parse arguments are used:
一个解析参数的使用示例:
namespace bp = boost::parser; // This parser matches one code point that is at least 'a', and at most // the value of last_char, which comes from the globals. auto last_char = [](auto & ctx) { return _globals(ctx).last_char; } auto subparser = bp::char_('a', last_char);
Don't worry for now about what the globals are for now; the take-away is
that you can make any argument you pass to a parser depend on the current
state of the parse, by using the parse context:
现在不用担心全局变量是什么;重要的是,你可以通过使用解析上下文,使传递给解析器的任何参数都依赖于当前的解析状态
namespace bp = boost::parser; // This parser parses two code points. For the parse to succeed, the // second one must be >= 'a' and <= the first one. auto set_last_char = [](auto & ctx) { _globals(ctx).last_char = _attr(x); }; auto parser = bp::char_[set_last_char] >> subparser;
Each callable parser returns a new parser, parameterized using the arguments
given in the invocation.
每个可调用的解析器都返回一个新的解析器,该解析器使用在调用中给出的参数进行参数化。
This table lists all the Boost.Parser parsers. For the callable parsers,
a separate entry exists for each possible arity of arguments. For a parser
p
, if there is no entry for
p
without arguments, p
is a function, and cannot itself be used
as a parser; it must be called. In the table below:
此表列出了所有 Boost.Parser 解析器。对于可调用的解析器,每个可能的参数数量都有一个单独的条目。对于解析器 p
,如果没有不带参数的 p
条目, p
是一个函数,它本身不能用作解析器;必须调用它。在下表中:
char
");
char
");RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses);
RESOLVE
()
是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》)RESOLVE
(pred) == true
"
is a shorthand notation for "RESOLVE
(pred)
is contextually convertible to bool
and true
";
likewise for false
;
RESOLVE
(pred) == true
" 是 " RESOLVE
(pred)
在语境上可转换为 bool
和 true
" 的缩写;同样适用于 false
;c
is a character of type
char
, char8_t
,
or char32_t
;
c
是类型 char
、 char8_t
或 char32_t
的字符;str
is a string literal
of type char const[]
, char8_t
const []
,
or char32_t const
[]
;
str
是类型 char const[]
、 char8_t
const []
或 char32_t const
[]
的字符串字面量;pred
is a parse predicate;
pred
是一个解析谓词;arg0
, arg1
,
arg2
, ... are parse arguments;
arg0
、 arg1
、 arg2
等是解析参数;a
is a semantic action;
a
是一个语义动作;r
is an object whose
type models parsable_range
;
r
是一个类型为 parsable_range
的对象p
, p1
,
p2
, ... are parsers;
and
p
、 p1
、 p2
等是解析器;并且escapes
is a symbols<T>
object, where T
is char
or char32_t
.
escapes
是一个 symbols<T>
对象,其中 T
是 char
或 char32_t
。Note 注意 | |
---|---|
The definition of
template<typename T> concept parsable_range = std::ranges::forward_range<T> && code_unit<std::ranges::range_value_t<T>>;
|
Note 注意 | |
---|---|
Some of the parsers in this table consume no input. All parsers consume
the input they match unless otherwise stated in the table below.
|
Table 26.6. Parsers and Their Semantics
表 26.6. 解析器和它们的语义
Parser 解析器 |
Semantics 语义 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|---|
Matches epsilon, the empty string. Always
matches, and consumes no input.
|
None. |
Matching |
|
|
Fails to match the input if |
None. |
|
Matches a single whitespace code point (see note), according to
the Unicode White_Space property.
|
None. |
For more info, see the Unicode
properties. |
|
Matches a single newline (see note), following the "hard"
line breaks in the Unicode line breaking algorithm.
|
None. |
For more info, see the Unicode
Line Breaking Algorithm. |
|
Matches only at the end of input, and consumes no input.
|
None. |
||
|
Always matches, and consumes no input. Generates the attribute
|
|
An important use case for |
Matches any single code point.
|
The code point type in Unicode parsing, or |
||
|
Matches exactly the code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
|
Matches the next code point |
The code point type in Unicode parsing, or |
|
Matches a single code point.
|
|
Similar to |
|
Matches a single code point.
|
|
Similar to |
|
The code point type in Unicode parsing, or |
|||
Matches a single control-character code point.
|
The code point type in Unicode parsing, or |
||
Matches a single decimal digit code point.
|
The code point type in Unicode parsing, or |
||
Matches a single punctuation code point.
|
The code point type in Unicode parsing, or |
||
Matches a single hexidecimal digit code point.
|
The code point type in Unicode parsing, or |
||
Matches a single lower-case code point.
|
The code point type in Unicode parsing, or |
||
Matches a single upper-case code point.
|
The code point type in Unicode parsing, or |
||
|
Matches exactly the given code point |
None. |
|
|
Matches exactly the given code point |
None. |
This is a UDL
that represents |
|
Matches exactly the given string |
None. |
|
|
Matches exactly the given string |
None. |
This is a UDL
that represents |
|
Matches exactly |
|
|
|
Matches exactly |
|
This is a UDL
that represents |
Matches |
|
||
Matches a binary unsigned integral value.
|
|
For example, |
|
|
Matches exactly the binary unsigned integral value |
|
|
Matches an octal unsigned integral value.
|
|
For example, |
|
|
Matches exactly the octal unsigned integral value |
|
|
Matches a hexadecimal unsigned integral value.
|
|
For example, |
|
|
Matches exactly the hexadecimal unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches an unsigned integral value.
|
|
||
|
Matches exactly the unsigned integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a signed integral value.
|
|
||
|
Matches exactly the signed integral value |
|
|
Matches a floating-point number. |
|
||
Matches a floating-point number. |
|
||
|
Matches iff |
|
The special value |
|
Matches iff |
|
The special value |
|
|
It is an error to write |
|
|
Equivalent to |
|
It is an error to write |
|
|
Unlike the other entries in this table, |
|
Matches |
|
The result does not include the quotes. A quote within the string
can be written by escaping it with a backslash. A backslash within
the string can be written by writing two consecutive backslashes.
Any other use of a backslash will fail the parse. Skipping is disabled
while parsing the entire string, as if using |
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
|
|
Matches |
|
The result does not include the |
|
Matches some character |
|
The result does not include the |
Important 重要 | |
---|---|
All the character parsers, like |
Note 注意 | |
---|---|
A slightly more complete description of the attributes generated by these
parsers is in a subsequent section. The attributes are repeated here so
you can use see all the properties of the parsers in one place.
|
If you have an integral type IntType
that is not covered by any of the Boost.Parser parsers, you can use a more
verbose declaration to declare a parser for IntType
.
If IntType
were unsigned,
you would use uint_parser
.
If it were signed, you would use int_parser
.
For example:
如果您有一个任何 Boost.Parser 解析器都没有涵盖的整型 IntType
,您可以使用更冗长的声明来声明一个解析器用于 IntType
。如果 IntType
是无符号的,您将使用 uint_parser
。如果是带符号的,您将使用 int_parser
。例如:
constexpr parser_interface<int_parser<IntType>> hex_int;
uint_parser
and int_parser
accept three more non-type template
parameters after the type parameter. They are Radix
,
MinDigits
, and MaxDigits
. Radix
defaults to 10
, MinDigits
to 1
,
and MaxDigits
to -1
, which is
a sentinel value meaning that there is no max number of digits.
uint_parser
和 int_parser
在类型参数之后接受三个额外的非类型模板参数。它们是 Radix
、 MinDigits
和 MaxDigits
。 Radix
默认为 10
, MinDigits
为 1
, MaxDigits
为 -1
,这是一个哨兵值,表示没有最大数字限制。
So, if you wanted to parse exactly eight hexadecimal digits in a row in order
to recognize Unicode character literals like C++ has (e.g. \Udeadbeef
),
you could use this parser for the digits at the end:
因此,如果您想解析连续的八个十六进制数字以识别类似于 C++中的 Unicode 字符字面量(例如 \Udeadbeef
),则可以使用此解析器来解析末尾的数字:
constexpr parser_interface<uint_parser<unsigned int, 16, 8, 8>> hex_int;
A directive is an element of your parser that doesn't have any meaning by
itself. Some are second-order parsers that need a first-order parser to do
the actual parsing. Others influence the parse in some way. You can often
spot a directive lexically by its use of []
;
directives always []
. Non-directives
might, but only when attaching a semantic action.
指令是您解析器的一个元素,它本身没有任何意义。有些是二阶解析器,需要一阶解析器来进行实际的解析。其他的一些以某种方式影响解析。您通常可以通过其使用 []
;指令来通过词法识别出指令;非指令可能,但仅当附加语义动作时。
The directives that are second order parsers are technically directives,
but since they are also used to create parsers, it is more useful just to
focus on that. The directives repeat()
and if_()
were already described in
the section on parsers; we won't say much about them here.
二阶解析器指令在技术上也是指令,但鉴于它们也用于创建解析器,因此只需关注这一点更有用。指令 repeat()
和 if_()
已在解析器部分中描述;这里我们不会过多介绍它们。
Sequence, alternative, and permutation parsers do not nest in most cases.
(Let's consider just sequence parsers to keep thinkgs simple, but most of
this logic applies to alternative parsers as well.) a
>> b
>> c
is the same as (a
>> b) >> c
and a
>> (b >> c)
, and
they are each represented by a single seq_parser
with three subparsers,
a
, b
,
and c
. However, if something
prevents two seq_parsers
from interacting directly, they will nest.
For instance, lexeme[a >> b] >>
c
is a seq_parser
containing two parsers,
lexeme[a >> b]
and
c
. This is because lexeme[]
takes its given parser and
wraps it in a lexeme_parser
. This in turn
turns off the sequence parser combining logic, since both sides of the second
operator>>
in lexeme[a >> b] >>
c
are not seq_parsers
. Sequence parsers
have several rules that govern what the overall attribute type of the parser
is, based on the positions and attributes of it subparsers (see Attribute
Generation). Therefore, it's important to know which directives create
a new parser (and what kind), and which ones do not; this is indicated for
each directive below.
序列、替代和排列解析器在大多数情况下不会嵌套。(让我们只考虑序列解析器以保持事情简单,但大部分逻辑也适用于替代解析器。) a
>> b
>> c
与 (a
>> b) >> c
和 a
>> (b >> c)
相同,它们各自由一个包含三个子解析器的单个 seq_parser
表示,分别是 a
、 b
和 c
。然而,如果某些因素阻止两个 seq_parsers
直接交互,它们将会嵌套。例如, lexeme[a >> b] >>
c
是一个包含两个解析器 lexeme[a >> b]
和 c
的 seq_parser
。这是因为 lexeme[]
将其给定的解析器包裹在 lexeme_parser
中。这反过来又关闭了序列解析器组合逻辑,因为 lexeme[a >> b] >>
c
中的第二个 operator>>
的两边都不是 seq_parsers
。序列解析器有几条规则来规范解析器的整体属性类型,基于其子解析器的位置和属性(见属性生成)。因此,了解哪些指令创建新的解析器(以及是什么类型的解析器)以及哪些指令不创建解析器很重要;下面为每个指令指明了这一点。
See The
Parsers And Their Uses. Creates a repeat_parser
.
查看解析器和它们的用途。创建一个 repeat_parser
。
See The
Parsers And Their Uses. Creates a seq_parser
.
查看解析器和它们的用途。创建一个 seq_parser
。
omit[p]
disables attribute generation for the parser p
.
Not only does omit[p]
have no attribute, but any attribute generation work that normally happens
within p
is skipped.
omit[p]
禁用解析器的属性生成 p
。不仅没有属性,而且通常在 p
内发生的任何属性生成工作都会被跳过。
This directive can be useful in cases like this: say you have some fairly
complicated parser p
that
generates a large and expensive-to-construct attribute. Now say that you
want to write a function that just counts how many times p
can match a string (where the matches are non-overlapping). Instead of using
p
directly, and building
all those attributes, or rewriting p
without the attribute generation, use omit[]
.
此指令在这种情况下可能很有用:比如说,你有一个相当复杂的解析器 p
,它生成一个庞大且构建成本高昂的属性。现在假设你想编写一个函数,只计算 p
可以匹配字符串的次数(匹配是非重叠的)。与其直接使用 p
并构建所有这些属性,或者在不生成属性的情况下重写 p
,不如使用 omit[]
。
Creates an omit_parser
.
创建一个 omit_parser
。
raw[p]
changes the attribute from
to to a view that delimits the subrange of the input that was matched by
ATTR
(p)p
. The type of the view is
subrange<I>
,
where I
is the type of the
iterator used within the parse. Note that this may not be the same as the
iterator type passed to parse()
.
For instance, when parsing UTF-8, the iterator passed to parse()
may be char8_t const
*
, but within the parse it will be
a UTF-8 to UTF-32 transcoding (converting) iterator. Just like omit[]
, raw[]
causes all attribute-generation work within p
to be skipped.
raw[p]
将属性从
更改为定义由 ATTR
(p)p
匹配的输入子范围的视图。视图类型为 subrange<I>
,其中 I
是解析中使用的迭代器的类型。请注意,这可能与传递给 parse()
的迭代器类型不同。例如,当解析 UTF-8 时,传递给 parse()
的迭代器可能是 char8_t const
*
,但在解析过程中将是一个 UTF-8 到 UTF-32 的转换(转换)迭代器。就像 omit[]
一样, raw[]
会导致在 p
内跳过所有属性生成工作。
Similar to the re-use scenario for omit[]
above, raw[]
could be used to find the
locations of all non-overlapping matches
of p
in a string.
类似于上面 omit[]
的复用场景, raw[]
可以用来在一个字符串中找到所有非重叠匹配的 p
的位置。
Creates a raw_parser
.
创建一个 raw_parser
。
string_view[p]
is very similar to raw[p]
, except
that it changes the attribute of p
to std::basic_string_view<C>
,
where C
is the character
type of the underlying range being parsed. string_view[]
requires that the underlying range being parsed is contiguous. Since this
can only be detected in C++20 and later, string_view[]
is not available in C++17 mode.
string_view[p]
与 raw[p]
非常相似,除了它将 p
的属性更改为 std::basic_string_view<C>
,其中 C
是正在解析的底层范围的字符类型。 string_view[]
要求正在解析的底层范围是连续的。由于这只能在 C++20 及以后版本中检测到,因此 string_view[]
在 C++17 模式下不可用。
Similar to the re-use scenario for omit[]
above, string_view[]
could be used to find the
locations of all non-overlapping matches
of p
in a string. Whether
raw[]
or string_view[]
is more natural to use to report the locations depends on your use case,
but they are essentially the same.
类似于上面 omit[]
的复用场景, string_view[]
可以用来查找字符串中所有非重叠匹配的 p
的位置。使用 raw[]
或 string_view[]
来报告位置哪个更自然取决于你的用例,但它们本质上是一样的。
Creates a string_view_parser
.
创建一个 string_view_parser
。
no_case[p]
enables case-insensitive parsing within the parse of p
.
This applies to the text parsed by char_()
,
string()
, and bool_
parsers. The number
parsers are already case-insensitive. The case-insensitivity is achieved
by doing Unicode case folding on the text being parsed and the values in
the parser being matched (see note below if you want to know more about Unicode
case folding). In the non-Unicode code path, a full Unicode case folding
is not done; instead, only the transformations of values less than 0x100
are done. Examples:
no_case[p]
启用对 p
的解析中的不区分大小写的解析。这适用于 char_()
、 string()
和 bool_
解析器解析的文本。数字解析器已经不区分大小写。通过在解析的文本和解析器中匹配的值上进行 Unicode 大小写折叠来实现不区分大小写(如需了解更多关于 Unicode 大小写折叠的信息,请参阅以下注释)。在非 Unicode 代码路径中,不执行完整的 Unicode 大小写折叠;相反,只对小于 0x100
的值进行转换。示例:
#include <boost/parser/transcode_view.hpp> // For as_utfN. namespace bp = boost::parser; auto const street_parser = bp::string(u8"Tobias Straße"); assert(!bp::parse("Tobias Strasse" | bp::as_utf32, street_parser)); // No match. assert(bp::parse("Tobias Strasse" | bp::as_utf32, bp::no_case[street_parser])); // Match! auto const alpha_parser = bp::no_case[bp::char_('a', 'z')]; assert(bp::parse("a" | bp::as_utf32, bp::no_case[alpha_parser])); // Match! assert(bp::parse("B" | bp::as_utf32, bp::no_case[alpha_parser])); // Match!
Everything pretty much does what you'd naively expect inside no_case[]
, except that the two-character
range version of char_
has
a limitation. It only compares a code point from the input to its two arguments
(e.g. 'a'
and 'z'
in the example above). It does not do anything special for multi-code point
case folding expansions. For instance, char_(U'ß', U'ß')
matches the input U"s"
, which makes sense, since U'ß'
expands
to U"ss"
.
However, that same parser does not match
the input U"ß"
!
In short, stick to pairs of code points that have single-code point case
folding expansions. If you need to support the multi-expanding code points,
use the other overload, like: char_(U"abcd/*...*/ß")
.
所有内容基本上都符合你天真地期望在 no_case[]
内执行的操作,除了 char_
的两个字符范围版本有一个限制。它只将输入中的一个码点与其两个参数(例如上面的示例中的 'a'
和 'z'
)进行比较。对于多码点的情况折叠扩展,它不做任何特殊处理。例如, char_(U'ß', U'ß')
与输入 U"s"
匹配,这是有意义的,因为 U'ß'
扩展为 U"ss"
。然而,那个相同的解析器不匹配输入 U"ß"
!简而言之,坚持使用具有单码点情况折叠扩展的码点对。如果你需要支持多扩展的码点,请使用其他重载,如: char_(U"abcd/*...*/ß")
。
Note 注意 | |
---|---|
Unicode case folding is an operation that makes text uniformly one case,
and if you do it to two bits of text |
Creates a no_case_parser
.
创建一个 no_case_parser
。
lexeme[p]
disables use of the skipper, if a skipper is being used, within the parse
of p
. This is useful, for
instance, if you want to enable skipping in most parts of your parser, but
disable it only in one section where it doesn't belong. If you are skipping
whitespace in most of your parser, but want to parse strings that may contain
spaces, you should use lexeme[]
:
lexeme[p]
禁用跳过符的使用,如果在解析 p
时正在使用跳过符。这在某些情况下很有用,例如,如果您想在解析器的大多数部分启用跳过,但在不属于该部分的一个部分中禁用它。如果您在解析器的大多数部分跳过空白,但想解析可能包含空格的字符串,则应使用 lexeme[]
:
namespace bp = boost::parser; auto const string_parser = bp::lexeme['"' >> *(bp::char_ - '"') >> '"'];
Without lexeme[]
, our string parser would correctly
match "foo bar"
, but
the generated attribute would be "foobar"
.
没有 lexeme[]
,我们的字符串解析器会正确匹配 "foo bar"
,但生成的属性会是 "foobar"
。
Creates a lexeme_parser
.
创建一个 lexeme_parser
。
skip[]
is like the inverse of lexeme[]
. It enables skipping in the
parse, even if it was not enabled before. For example, within a call to
parse()
that uses a skipper, let's
say we have these parsers in use:
skip[]
是 lexeme[]
的逆。它允许在解析中跳过,即使之前没有启用。例如,在一个使用跳转器的 parse()
调用中,假设我们使用了以下解析器:
namespace bp = boost::parser; auto const one_or_more = +bp::char_; auto const skip_or_skip_not_there_is_no_try = bp::lexeme[bp::skip[one_or_more] >> one_or_more];
The use of lexeme[]
disables skipping, but then
the use of skip[]
turns it back on. The net
result is that the first occurrence of one_or_more
will use the skipper passed to parse()
;
the second will not.
使用 lexeme[]
禁用跳过,但随后使用 skip[]
又将其打开。最终结果是, one_or_more
的第一个出现将使用传递给 parse()
的跳过器;第二个则不会。
skip[]
has another use. You can parameterize
skip with a different parser to change the skipper just within the scope
of the directive. Let's say we passed ws
to parse()
,
and we're using these parsers somewhere within that parse()
call:
skip[]
有另一种用途。您可以使用不同的解析器来参数化跳过,以便仅在指令的作用域内更改跳过器。假设我们将 ws
传递给 parse()
,并且我们正在该 parse()
调用中使用这些解析器:
namespace bp = boost::parser; auto const zero_or_more = *bp::char_; auto const skip_both_ways = zero_or_more >> bp::skip(bp::blank)[zero_or_more];
The first occurrence of zero_or_more
will use the skipper passed to parse()
,
which is ws
;
the second will use blank
as its skipper.
第一次出现 zero_or_more
将使用传递给 parse()
的跳过器,即 ws
;第二次将使用 blank
作为其跳过器。
Creates a skip_parser
.
创建一个 skip_parser
。
transform(f)[]
transform(f)[]
These directives influence the generation of attributes. See Attribute
Generation section for more details on them.
这些指令影响属性的生成。有关详细信息,请参阅属性生成部分。
merge[]
and separate[]
create a copy of the given seq_parser
.
merge[]
和 separate[]
创建给定 seq_parser
的副本。
transform(f)[]
creates a tranform_parser
.
transform(f)[]
创建一个 tranform_parser
。
Certain overloaded operators are defined for all parsers in Boost.Parser.
We've already seen some of them used in this tutorial, especially operator>>
,
operator|
,
and operator||
,
which are used to form sequence parsers, alternative parsers, and permutation
parsers, respectively.
某些重载运算符在 Boost.Parser 的所有解析器中都有定义。我们已经在本次教程中看到了一些它们的用法,特别是 operator>>
、 operator|
和 operator||
,分别用于形成序列解析器、选择解析器和排列解析器。
Here are all the operator overloaded for parsers. In the tables below:
这里列出了所有用于解析器的运算符重载。在下表中的:
c
is a character of type
char
or char32_t
;
c
是类型 char
或 char32_t
的字符;a
is a semantic action;
a
是一个语义动作;r
is an object whose
type models parsable_range
(see Concepts); and
r
是一个对象,其类型模拟 parsable_range
(见概念);p
, p1
,
p2
, ... are parsers.
p
、 p1
、 p2
等是解析器。Note 注意 | |
---|---|
Some of the expressions in this table consume no input. All parsers consume
the input they match unless otherwise stated in the table below.
|
Table 26.7. Combining Operations and Their Semantics
表 26.7. 组合操作及其语义
Expression 表达式 |
Semantics 语义 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|---|
|
Matches iff |
None. |
|
|
Matches iff |
None. |
|
|
Parses using |
|
Matching |
|
Parses using |
|
Matching |
|
|
||
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff either |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Equivalent to |
|
|
|
Matches iff |
None. |
Important 重要 | |
---|---|
All the character parsers, like |
There are a couple of special rules not captured in the table above:
上表未涵盖以下几条特殊规则:
First, the zero-or-more and one-or-more repetitions (operator*()
and operator+()
, respectively) may collapse when combined.
For any parser p
, +(+p)
collapses to +p
;
**p
,
*+p
,
and +*p
each collapse to just *p
.
首先,零次或多次和一次或多次的重复(分别用 operator*()
和 operator+()
表示)在组合时可能会合并。对于任何解析器 p
, +(+p)
合并为 +p
; **p
、 *+p
和 +*p
各自合并为仅 *p
。
Second, using eps
in an alternative parser as any alternative except
the last one is a common source of errors; Boost.Parser disallows it. This
is true because, for any parser p
,
eps
| p
is equivalent to eps
,
since eps
always matches. This is not true for eps
parameterized with a condition.
For any condition cond
,
eps(cond)
is allowed to appear anywhere within an alternative parser.
其次,在替代解析器中使用 eps
作为除最后一个以外的任何替代方案是常见的错误来源;Boost.Parser 禁止这样做。这是因为,对于任何解析器 p
, eps
| p
与 eps
是等价的,因为 eps
总是匹配。对于用条件参数化的 eps
,则不是这样。对于任何条件 cond
, eps(cond)
都允许出现在替代解析器中的任何位置。
Note 注意 | |
---|---|
When looking at Boost.Parser parsers in a debugger, or when looking at
their reference documentation, you may see reference to the template |
So far, we've seen several different types of attributes that come from different
parsers, int
for int_
,
boost::parser::tuple<char,
int>
for boost::parser::char_ >>
boost::parser::int_
, etc. Let's get into how this works
with more rigor.
到目前为止,我们已经看到了来自不同解析器的几种不同类型的属性,例如 int
对应于 int_
, boost::parser::tuple<char,
int>
对应于 boost::parser::char_ >>
boost::parser::int_
等。让我们更严谨地探讨这是如何工作的。
Note 注意 | |
---|---|
Some parsers have no attribute at all. In the tables below, the type of
the attribute is listed as "None." There is a non- |
Warning 警告 | |
---|---|
Boost.Parser assumes that all attributes are semi-regular (see |
You can use attribute
(and the associated alias, attribute_t
) to determine the
attribute a parser would have if it were passed to parse()
.
Since at least one parser (char_
) has a polymorphic attribute
type, attribute
also takes the type of the range being parsed. If a parser produces no attribute,
attribute
will produce none
,
not void
.
您可以使用 attribute
(以及相关的别名, attribute_t
)来确定如果将其传递给 parse()
,解析器将具有的属性。由于至少有一个解析器( char_
)具有多态属性类型, attribute
也接受正在解析的范围的类型。如果解析器不产生属性, attribute
将产生 none
,而不是 void
。
If you want to feed an iterator/sentinel pair to attribute
, create a range from
it like so:
如果您想将迭代器/哨兵对传递给 attribute
,请创建一个从它开始的范围,如下所示:
constexpr auto parser = /* ... */; auto first = /* ... */; auto const last = /* ... */; namespace bp = boost::parser; // You can of course use std::ranges::subrange directly in C++20 and later. using attr_type = bp::attribute_t<decltype(BOOST_PARSER_SUBRANGE(first, last)), decltype(parser)>;
There is no single attribute type for any parser, since a parser can be placed
within omit[]
, which makes its attribute
type none
.
Therefore, attribute
cannot tell you what attribute your parser will produce under all circumstances;
it only tells you what it would produce if it were passed to parse()
.
没有任何解析器有单一的属性类型,因为解析器可以放置在 omit[]
中,这使得其属性类型为 none
。因此, attribute
不能告诉你你的解析器在所有情况下会产生什么属性;它只能告诉你如果将其传递给 parse()
,它会产生什么。
This table summarizes the attributes generated for all Boost.Parser parsers.
In the table below:
此表总结了为所有 Boost.Parser 解析器生成的属性。在下表中:
RESOLVE
()
is a notional macro that expands to the resolution of parse argument
or evaluation of a parse predicate (see The
Parsers And Their Uses); and
RESOLVE
()
是一个概念宏,它扩展为解析参数的解析或解析谓词的评估(参见《解析器和它们的用途》);x
and y
represent arbitrary objects.
x
和 y
代表任意对象。Table 26.8. Parsers and Their Attributes
表 26.8。解析器和它们的属性
Parser 解析器 |
Attribute Type 属性类型 |
Notes 注释 |
---|---|---|
None. |
||
None. |
||
None. |
||
|
|
|
The code point type in Unicode parsing, or |
Includes all the |
|
|
||
|
||
|
None. |
Includes all the |
|
|
Includes all the |
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
char_
is a bit odd, since its attribute type is polymorphic. When you use char_
to parse text in the non-Unicode code path (i.e. a string of char
), the attribute is char
.
When you use the exact same char_
to parse in the Unicode-aware
code path, all matching is code point based, and so the attribute type is
the type used to represent code points, char32_t
.
All parsing of UTF-8 falls under this case.
char_
有点奇怪,因为它的属性类型是多态的。当您使用 char_
在非 Unicode 代码路径中解析文本(即一个 char
字符串)时,属性是 char
。当您使用完全相同的 char_
在支持 Unicode 的代码路径中解析时,所有匹配都是基于代码点的,因此属性类型是用于表示代码点的类型, char32_t
。所有 UTF-8 的解析都属于这种情况。
Here, we're parsing plain char
s,
meaning that the parsing is in the non-Unicode code path, the attribute of
char_
is char
:
这里,我们正在解析纯文本 char
,意味着解析是在非 Unicode 代码路径中, char_
的属性是 char
:
auto result = parse("some text", boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char>>));
When you parse UTF-8, the matching is done on a code point basis, so the
attribute type is char32_t
:
当你解析 UTF-8 时,匹配是基于码点的,因此属性类型是 char32_t
:
auto result = parse("some text" | boost::parser::as_utf8, boost::parser::char_); static_assert(std::is_same_v<decltype(result), std::optional<char32_t>>));
The good news is that usually you don't parse characters individually. When
you parse with char_
,
you usually parse repetition of then, which will produce a std::string
,
regardless of whether you're in Unicode parsing mode or not. If you do need
to parse individual characters, and want to lock down their attribute type,
you can use cp
and/or cu
to enforce a non-polymorphic attribute type.
好消息是,通常您不需要逐个解析字符。当您使用 char_
解析时,通常解析重复的 then,这将产生 std::string
,无论您是否处于 Unicode 解析模式。如果您确实需要解析单个字符,并希望锁定它们的属性类型,您可以使用 cp
和/或 cu
来强制执行非多态属性类型。
Combining operations of course affect the generation of attributes. In the
tables below:
当然,组合操作会影响属性生成。在下表中的:
m
and n
are parse arguments that resolve to integral values;
m
和 n
是解析参数,解析为整数值;pred
is a parse predicate;
pred
是一个解析谓词;arg0
, arg1
,
arg2
, ... are parse arguments;
arg0
、 arg1
、 arg2
等是解析参数;a
is a semantic action;
and
a
是一个语义动作;并且p
, p1
,
p2
, ... are parsers that
generate attributes.
p
、 p1
、 p2
等是生成属性的解析器。Table 26.9. Combining Operations and Their Attributes
表 26.9. 组合操作及其属性
Parser 解析器 |
Attribute Type 属性类型 |
---|---|
|
None. |
|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
Important 重要 | |
---|---|
All the character parsers, like |
Important 重要 | |
---|---|
In case you did not notice it above, adding a semantic action to a parser
erases the parser's attribute. The attribute is still available inside
the semantic action as |
There are a relatively small number of rules that define how sequence parsers
and alternative parsers' attributes are generated. (Don't worry, there are
examples below.)
存在相对较少的规则定义了如何生成序列解析器和替代解析器的属性。(别担心,下面有示例。)
The attribute generation behavior of sequence parsers is conceptually pretty
simple:
序列解析器的属性生成行为在概念上相当简单:
boost::parser::tuple<T>
(even if T
is a type
that means "no attribute"), the attribute becomes T
.
boost::parser::tuple<T>
(即使 T
是一种表示“没有属性”的类型),则属性变为 T
。
More formally, the attribute generation algorithm works like this. For a
sequence parser p
, let the
list of attribute types for the subparsers of p
be a0,
a1, a2, ...,
an
.
更正式地说,属性生成算法是这样工作的。对于一个序列解析器 p
,让 p
的子解析器的属性类型列表为 a0,
a1, a2, ...,
an
。
We get the attribute of p
by evaluating a compile-time left fold operation, left-fold({a1, a2, ..., an}, tuple<a0>, OP)
. OP
is the combining operation that takes the current attribute type (initially
boost::parser::tuple<a0>
) and the next attribute type, and returns
the new current attribute type. The current attribute type at the end of
the fold operation is the attribute type for p
.
我们通过评估编译时左折叠操作来获取 p
的属性, left-fold({a1, a2, ..., an}, tuple<a0>, OP)
是结合操作,它接受当前属性类型(最初为 boost::parser::tuple<a0>
)和下一个属性类型,并返回新的当前属性类型。折叠操作结束时的当前属性类型是 p
的属性类型。
OP
attempts to apply a series
of rules, one at a time. The rules are noted as X
>> Y
-> Z
,
where X
is the type of the
current attribute, Y
is the
type of the next attribute, and Z
is the new current attribute type. In these rules, C<T>
is a container of T
; none
is a special type that indicates that
there is no attribute; T
is a type; CHAR
is a character
type, either char
or char32_t
; and Ts...
is a parameter pack of one or more types.
Note that T
may be the special
type none
. The current attribute
is always a tuple (call it Tup
),
so the "current attribute X
"
refers to the last element of Tup
,
not Tup
itself, except for
those rules that explicitly mention boost::parser::tuple<>
as part of X
's type.
尝试逐个应用一系列规则。规则标记为 X
>> Y
-> Z
,其中 X
是当前属性的类型, Y
是下一个属性的类型, Z
是新的当前属性类型。在这些规则中, C<T>
是 T
的容器; none
是一个特殊类型,表示没有属性; T
是类型; CHAR
是字符类型,要么是 char
要么是 char32_t
; Ts...
是一组一个或多个类型的参数包。注意, T
可能是特殊类型 none
。当前属性始终是一个元组(可以称之为 Tup
),因此“当前属性 X
”指的是 Tup
的最后一个元素,而不是 Tup
本身,除非那些明确提到 boost::parser::tuple<>
是 X
类型一部分的规则。
none >>
T ->
T
CHAR
>> CHAR
-> std::string
T >>
none ->
T
C<T> >> T
-> C<T>
T >>
C<T> -> C<T>
C<T> >> optional<T> -> C<T>
optional<T> >> C<T> -> C<T>
boost::parser::tuple<none> >>
T ->
boost::parser::tuple<T>
boost::parser::tuple<Ts...> >>
T ->
boost::parser::tuple<Ts..., T>
The rules that combine containers with (possibly optional) adjacent values
(e.g. C<T> >> optional<T>
-> C<T>
)
have a special case for strings. If C<T>
is exactly std::string
, and T
is either char
or char32_t
, the combination yields a std::string
.
规则将容器与(可能可选的)相邻值(例如 C<T> >> optional<T>
-> C<T>
)组合在一起,对于字符串有一个特殊情况。如果 C<T>
精确等于 std::string
,并且 T
要么是 char
,要么是 char32_t
,则组合产生一个 std::string
。
Again, if the final result is that the attribute is boost::parser::tuple<T>
,
the attribute becomes T
.
再次,如果最终结果是属性为 boost::parser::tuple<T>
,则属性变为 T
。
Note 注意 | |
---|---|
What constitutes a container in the rules above is determined by the
template<typename T> concept container = std::ranges::common_range<T> && requires(T t) { { t.insert(t.begin(), *t.begin()) } -> std::same_as<std::ranges::iterator_t<T>>; };
|
The rules for alternative parsers are much simpler. For an alternative parer
p
, let the list of attribute
types for the subparsers of p
be a0,
a1, a2, ...,
an
. The attribute of p
is std::variant<a0, a1,
a2, ..., an>
, with the following steps applied:
替代解析器的规则要简单得多。对于替代解析器 p
,让子解析器 p
的属性类型列表为 a0,
a1, a2, ...,
an
。 p
的属性为 std::variant<a0, a1,
a2, ..., an>
,应用以下步骤:
none
attributes
are left out, and if any are, the attribute is wrapped in a std::optional
, like std::optional<std::variant</*...*/>>
;
none
属性都被省略了,如果有,属性会被包裹在 std::optional
中,例如 std::optional<std::variant</*...*/>>
;std::variant
template parameters <T1, T2, ... Tn>
are removed; every type that appears
does so exacly once;
std::variant
模板参数 <T1, T2, ... Tn>
已被移除;每个出现的类型都恰好出现一次std::variant<T>
or std::optional<std::variant<T>>
, the attribute becomes instead
T
or std::optional<T>
, respectively; and
std::variant<T>
或 std::optional<std::variant<T>>
,则属性分别变为 T
或 std::optional<T>
;std::variant<>
or std::optional<std::variant<>>
, the result becomes none
instead.
std::variant<>
或 std::optional<std::variant<>>
,结果变为 none
。
The rule for forming containers from non-containers is simple. You get a
vector from any of the repeating parsers, like +p
, *p
, repeat(3)[p]
, etc.
The value type of the vector is
.
ATTR
(p)
非容器形成容器的规则很简单。您可以从任何重复的解析器中获取一个向量,如 +p
、 *p
、 repeat(3)[p]
等。向量的值类型为
。ATTR
(p)
Another rule for sequence containers is that a value x
and a container c
containing
elements of x
's type will
form a single container. However, x
's
type must be exactly the same as the elements in c
.
There is an exception to this in the special case for strings and characters
noted above. For instance, consider the attribute of char_
>> string("str")
. In the non-Unicode code path, char_
's attribute type is guaranteed to
be char
, so
is ATTR
(char_ >> string("str"))std::string
.
If you are parsing UTF-8 in the Unicode code path, char_
's
attribute type is char32_t
,
and the special rule makes it also produce a std::string
.
Otherwise, the attribute for
would be ATTR
(char_ >> string("str"))boost::parser::tuple<char32_t, std::string>
.
另一条序列容器的规则是,一个值 x
和一个包含 x
类型元素的容器 c
将形成一个单独的容器。然而, x
的类型必须与 c
中的元素完全相同。在上述特殊情况下,对于字符串和字符存在一个例外。例如,考虑 char_
>> string("str")
的属性。在非 Unicode 代码路径中, char_
的属性类型保证是 char
,因此
是 ATTR
(char_ >> string("str"))std::string
。如果你在 Unicode 代码路径中解析 UTF-8, char_
的属性类型是 char32_t
,特殊规则使得它也会产生一个 std::string
。否则,
的属性将是 ATTR
(char_ >> string("str"))boost::parser::tuple<char32_t, std::string>
。
Again, there are no special rules for combining values and containers. Every
combination results from an exact match, or fall into the string+character
special case.
再次强调,组合值和容器没有特殊规则。每一种组合都来自精确匹配,或者落入字符串+字符的特殊情况。
std::string
assignmentstd::string
赋值
std::string
can be assigned from a char
. This is dumb. But, we're stuck with
it. When you write a parser with a char
attribute, and you try to parse it into a std::string
,
you've almost certainly made a mistake. More importantly, if you write this:
std::string
可以从 char
分配。这很愚蠢。但我们别无选择。当你用具有 char
属性的解析器进行解析,并尝试将其解析为 std::string
时,你几乎肯定犯了一个错误。更重要的是,如果你写下这样:
namespace bp = boost::parser; std::string result; auto b = bp::parse("3", bp::int_, bp::ws, result);
... you are even more likely to have made a mistake. Though this should work,
because the assignment in std::string s; s
= 3;
is well-formed, Boost.Parser forbids it.
If you write parsing code like the snippet above, you will get a static assertion.
If you really do want to assign a float
or whatever to a std::string
, do it in a semantic action.
...你甚至更有可能犯错误。尽管这应该可以工作,因为 std::string s; s
= 3;
中的任务格式良好,Boost.Parser 禁止这样做。如果你编写像上面片段那样的解析代码,你会得到一个静态断言。如果你真的想将 float
或任何东西赋值给 std::string
,请在语义动作中这样做。
In the table: a
is a semantic
action; and p
, p1
, p2
,
... are parsers that generate attributes. Note that only >>
is used here; >
has the exact
same attribute generation rules.
在表中: a
是语义动作;而 p
、 p1
、 p2
、... 是生成属性的解析器。注意,这里只使用了 >>
; >
具有完全相同的属性生成规则。
Table 26.10. Sequence and Alternative Combining Operations and Their Attributes
表 26.10. 序列和替代组合操作及其属性
Expression 表达式 |
Attribute Type 属性类型 |
---|---|
None. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
None. |
|
|
|
|
|
|
|
|
|
|
As we saw in the previous Parsing
into struct
s and class
es section, if you parse two strings
in a row, you get two separate strings in the resulting attribute. The parser
from that example was this:
如我们在上一节“解析为 struct
s 和 class
es”中看到的那样,如果你连续解析两个字符串,结果属性中会得到两个独立的字符串。那个例子中的解析器是这样的:
namespace bp = boost::parser; auto employee_parser = bp::lit("employee") >> '{' >> bp::int_ >> ',' >> quoted_string >> ',' >> quoted_string >> ',' >> bp::double_ >> '}';
employee_parser
's attribute
is boost::parser::tuple<int,
std::string, std::string, double>
.
The two quoted_string
parsers
produce std::string
attributes, and those attributes
are not combined. That is the default behavior, and it is just what we want
for this case; we don't want the first and last name fields to be jammed
together such that we can't tell where one name ends and the other begins.
What if we were parsing some string that consisted of a prefix and a suffix,
and the prefix and suffix were defined separately for reuse elsewhere?
'的属性是 boost::parser::tuple<int,
std::string, std::string, double>
。这两个 quoted_string
解析器产生 std::string
属性,并且这些属性没有合并。这是默认行为,这正是我们想要的;我们不希望姓名字段被挤在一起,以至于我们无法分辨一个名字的结束和另一个名字的开始。如果我们正在解析一个由前缀和后缀组成的字符串,而且前缀和后缀被分别定义以供其他地方重用,那会怎么样呢?
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = prefix >> suffix; // Continue to use prefix and suffix to make other parsers....
In this case, we might want to use these separate parsers, but want special_string
to produce a single std::string
for its attribute. merge[]
exists for this purpose.
在这种情况下,我们可能想要使用这些独立的解析器,但希望 special_string
为其属性生成单个 std::string
。 merge[]
就是为了这个目的而存在的。
namespace bp = boost::parser; auto prefix = /* ... */; auto suffix = /* ... */; auto special_string = bp::merge[prefix >> suffix];
merge[]
only applies to sequence parsers
(like p1 >>
p2
), and forces all subparsers
in the sequence parser to use the same variable for their attribute.
仅适用于序列解析器(如 p1 >>
p2
),并强制序列解析器中的所有子解析器使用相同的变量来表示它们的属性。
Another directive, separate[]
,
also applies only to sequence parsers, but does the opposite of merge[]
. If forces all the attributes
produced by the subparsers of the sequence parser to stay separate, even
if they would have combined. For instance, consider this parser.
另一个指令 separate[]
也仅适用于序列解析器,但与 merge[]
相反。它强制序列解析器的子解析器产生的所有属性保持独立,即使它们本可以合并。例如,考虑这个解析器。
namespace bp = boost::parser; auto string_and_char = +bp::char_('a') >> ' ' >> bp::cp;
string_and_char
matches one
or more 'a'
s, followed by some
other character. As written above, string_and_char
produces a std::string
, and the final character is appended
to the string, after all the 'a'
s.
However, if you wanted to store the final character as a separate value,
you would use separate[]
.
string_and_char
匹配一个或多个 'a'
,后面跟其他字符。如上所述, string_and_char
产生一个 std::string
,最后一个字符追加到字符串中,所有 'a'
之后。但是,如果您想将最后一个字符作为单独的值存储,您将使用 separate[]
。
namespace bp = boost::parser; auto string_and_char = bp::separate[+bp::char_('a') >> ' ' >> bp::cp];
With this change, string_and_char
produces the attribute boost::parser::tuple<std::string, char32_t>
.
使用此更改, string_and_char
生成属性 boost::parser::tuple<std::string, char32_t>
。
As mentioned previously, merge[]
applies only to sequence parsers. All subparsers must have the same attribute,
or produce no attribute at all. At least one subparser must produce an attribute.
When you use merge[]
, you create a combining
group. Every parser in a combining group uses the same variable
for its attribute. No parser in a combining group interacts with the attributes
of any parsers outside of its combining group. Combining groups are disjoint;
merge[/*...*/]
>> merge[/*...*/]
will produce a tuple of two attributes,
not one.
如前所述, merge[]
仅适用于序列解析器。所有子解析器必须具有相同的属性,或者根本不产生属性。至少有一个子解析器必须产生一个属性。当您使用 merge[]
时,您创建一个组合组。组合组中的每个解析器都使用相同的变量作为其属性。组合组中的任何解析器都不会与其组合组之外的任何解析器的属性交互。组合组是互斥的; merge[/*...*/]
>> merge[/*...*/]
将产生两个属性的元组,而不是一个。
separate[]
also applies only to sequence
parsers. When you use separate[]
,
you disable interaction of all the subparsers' attributes with adjacent attributes,
whether they are inside or outside the separate[]
directive; you force each subparser to have a separate attribute.
separate[]
也仅适用于序列解析器。当您使用 separate[]
时,您将禁用所有子解析器属性与相邻属性(无论它们是在 separate[]
指令内部还是外部)的交互;您将强制每个子解析器具有单独的属性。
The rules for merge[]
and separate[]
overrule the steps of the algorithm described above for combining the attributes
of a sequence parser. Consider an example.
规则 merge[]
和 separate[]
覆盖了上述算法中组合序列解析器属性的步骤。考虑一个例子。
namespace bp = boost::parser; constexpr auto parser = bp::char_ >> bp::merge[(bp::string("abc") >> bp::char_ >> bp::char_) >> bp::string("ghi")];
You might think that
would be ATTR
(parser)bp::tuple<char,
std::string>
.
It is not. The parser above does not even compile. Since we created a merge
group above, we disabled the default behavior in which the char_
parsers would have collapsed into
the string
parser that preceded
them. Since they are all treated as separate entities, and since they have
different attribute types, the use of merge[]
is an error.
您可能认为
会是 ATTR
(parser)bp::tuple<char,
std::string>
。不是的。上面的解析器甚至无法编译。由于我们在上面创建了一个合并组,我们禁用了默认行为,即 char_
解析器会合并到它们之前的 string
解析器中。由于它们都被视为独立的实体,并且具有不同的属性类型,因此使用 merge[]
是错误的。
Many directives create a new parser out of the parser they are given. merge[]
and separate[]
do not. Since they operate only on sequence parsers, all they do is create
a copy of the sequence parser they are given. The seq_parser
template has a template
parameter CombiningGroups
,
and all merge[]
and separate[]
do is take a given seq_parser
and create a copy
of it with a different CombiningGroups
template parameter. This means that merge[]
and separate[]
are can be ignored in operator>>
expressions much like parentheses are. Consider an example.
许多指令会从给定的解析器中创建一个新的解析器。 merge[]
和 separate[]
不会这样做。由于它们仅对序列解析器进行操作,它们所做的只是创建给定序列解析器的副本。 seq_parser
模板有一个模板参数 CombiningGroups
,而 merge[]
和 separate[]
所做的只是接受一个给定的 seq_parser
并创建一个具有不同 CombiningGroups
模板参数的副本。这意味着在 merge[]
和 separate[]
表达式中可以忽略 operator>>
,就像括号一样。考虑一个例子。
namespace bp = boost::parser; constexpr auto parser1 = bp::separate[bp::int_ >> bp::int_] >> bp::int_; constexpr auto parser2 = bp::lexeme[bp::int_ >> ' ' >> bp::int_] >> bp::int_;
Note that separate[]
is a no-op here; it's only
being used this way for this example. These parsers have different attribute
types.
is ATTR
(parser1)boost::parser::tuple(int,
int, int)
.
is ATTR
(parser2)boost::parser::tuple(boost::parser::tuple(int,
int), int)
. This
is because bp::lexeme[]
wraps its given parser in a new parser. merge[]
does not. That's why, even though parser1
and parser2
look so structurally
similar, they have different attributes.
请注意, separate[]
在这里是一个空操作;它只是在这个例子中这样使用。这些解析器有不同的属性类型。
是 ATTR
(parser1)boost::parser::tuple(int,
int, int)
。
是 ATTR
(parser2)boost::parser::tuple(boost::parser::tuple(int,
int), int)
。这是因为 bp::lexeme[]
将其给定的解析器包装在一个新的解析器中。 merge[]
没有这样做。这就是为什么,尽管 parser1
和 parser2
看起来结构上很相似,但它们有不同的属性。
transform(f)[]
transform(f)[]
is a directive that transforms the attribute of a parser using the given
function f
. For example:
transform(f)[]
是一个指令,用于使用给定的函数 f
转换解析器的属性。例如:
auto str_sum = [&](std::string const & s) { int retval = 0; for (auto ch : s) { retval += ch - '0'; } return retval; }; namespace bp = boost::parser; constexpr auto parser = +bp::char_; std::string str = "012345"; auto result = bp::parse(str, bp::transform(str_sum)[parser]); assert(result); assert(*result == 15); static_assert(std::is_same_v<decltype(result), std::optional<int>>);
Here, we have a function str_sum
that we use for f
. It assumes
each character in the given std::string
s
is a digit, and returns
the sum of all the digits in s
.
Out parser parser
would normally
return a std::string
. However, since str_sum
returns a different type — int
— that is the attribute type of the full parser, bp::transform(by_value_str_sum)[parser]
, as you can see from the static_assert
.
这里,我们有一个用于 f
的函数 str_sum
。它假设给定 std::string
s
中的每个字符都是数字,并返回所有数字的总和。我们的解析器 parser
通常会返回一个 std::string
。然而,由于 str_sum
返回了不同的类型—— int
,这是完整解析器 bp::transform(by_value_str_sum)[parser]
的属性类型,正如您从 static_assert
中看到的。
As is the case with attributes all throughout Boost.Parser, the attribute
passed to f
will be moved.
You can take it by const &
,
&&
, or by value.
与 Boost.Parser 中的所有属性一样,传递给 f
的属性将被移动。您可以通过 const &
、 &&
或按值取它。
No distinction is made between parsers with and without an attribute, because
there is a Regular special no-attribute type that is generated by parsers
with no attribute. You may therefore write something like transform(f)[eps]
, and Boost.Parser will happily call f
with this special no-attribute type.
没有在具有和不具有属性的解析器之间做出区分,因为解析器会生成一个没有属性的 Regular 特殊类型。因此,您可以写类似 transform(f)[eps]
的东西,Boost.Parser 会高兴地用这个特殊的没有属性的类型调用 f
。
omit[p]
disables attribute generation for the parser p
.
raw[p]
changes the attribute from
to a view that indicates the subrange of the input that was matched by ATTR
(p)p
. string_view[p]
is just
like raw[p]
,
except that it produces std::basic_string_view
s.
See Directives for
details.
omit[p]
禁用解析器的属性生成。 p
. raw[p]
将属性从
更改为表示输入匹配的子范围的视图。 ATTR
(p)p
。 string_view[p]
与 raw[p]
类似,只是它产生 std::basic_string_view
。有关详细信息,请参阅指令。
There are multiple top-level parse functions. They have some things in common:
存在多个顶级解析函数。它们有一些共同点:
bool
.
bool
的值。char
,
wchar_t
, char8_t
,
char16_t
, or char32_t
.
char
、 wchar_t
、 char8_t
、 char16_t
或 char32_t
的范围。prefix_
in their name take an iterator/sentinel pair. For example prefix_parse(first, last, p, ws)
,
which parses the range [first, last)
,
advancing first
as it
goes. If the parse succeeds, the entire input may or may not have been
matched. The value of first
will indicate the last location within the input that p
matched. The whole input was matched
if and only if first == last
after the call to parse()
.
prefix_
的重载函数接受一个迭代器/哨兵对。例如 prefix_parse(first, last, p, ws)
,它解析范围 [first, last)
,在解析过程中前进 first
。如果解析成功,整个输入可能已经或尚未完全匹配。 first
的值将指示输入中 p
匹配的最后一个位置。只有在调用 parse()
之后 first == last
,整个输入才被完全匹配。parse()
,
for example parse(r, p, ws)
, parse()
only indicates success if all of r
was matched by p
.
parse()
的任何范围重载,例如 parse(r, p, ws)
、 parse()
时,只有当 r
全部被 p
匹配时,才表示成功。Note 注意 | |
---|---|
|
There are eight overloads of parse()
and prefix_parse()
combined, because there
are three either/or options in how you call them.
共有八个 parse()
和 prefix_parse()
的重载,因为调用它们的方式有三种任选其一的选项。
You can call prefix_parse()
with an iterator and sentinel that delimit a range of character values. For
example:
您可以使用迭代器和哨兵来调用 prefix_parse()
,以限定字符值范围。例如:
namespace bp = boost::parser; auto const p = /* some parser ... */; char const * str_1 = /* ... */; // Using null_sentinel, str_1 can point to three billion characters, and // we can call prefix_parse() without having to find the end of the string first. auto result_1 = bp::prefix_parse(str_1, bp::null_sentinel, p, bp::ws); char str_2[] = /* ... */; auto result_2 = bp::prefix_parse(std::begin(str_2), std::end(str_2), p, bp::ws);
The iterator/sentinel overloads can parse successfully without matching the
entire input. You can tell if the entire input was matched by checking if
first ==
last
is true after prefix_parse()
returns.
迭代器/哨兵重载可以在不匹配整个输入的情况下成功解析。您可以通过检查 prefix_parse()
返回后 first ==
last
是否为真来确定是否匹配了整个输入。
By contrast, you call parse()
with a range of character values. When the range is a reference to an array
of characters, any terminating 0
is ignored; this allows calls like parse("str",
p)
to work naturally.
相比之下,您使用具有字符值范围的 parse()
。当范围是字符数组引用时,任何终止的 0
将被忽略;这允许像 parse("str",
p)
这样的调用自然工作。
namespace bp = boost::parser; auto const p = /* some parser ... */; std::u8string str_1 = "str"; auto result_1 = bp::parse(str_1, p, bp::ws); // The null terminator is ignored. This call parses s-t-r, not s-t-r-0. auto result_2 = bp::parse(U"str", p, bp::ws); char const * str_3 = "str"; auto result_3 = bp::parse(bp::null_term(str_3) | bp::as_utf16, p, bp::ws);
Since there is no way to indicate that p
matches the input, but only a prefix of the input was matched, the range
(non-iterator/sentinel) overloads of parse()
indicate failure if the entire input is not matched.
由于无法表示 p
与输入匹配,但只匹配了输入的前缀,因此 parse()
的非迭代器/哨兵重载在输入未完全匹配时表示失败。
namespace bp = boost::parser; auto const p = '"' >> *(bp::char_ - '"') >> '"'; char const * str = "\"two words\"" ; std::string result_1; bool const success = bp::parse(str, p, result_1); // success is true; result_1 is "two words" auto result_2 = bp::parse(str, p); // !!result_2 is true; *result_2 is "two words"
When you call parse()
with
an attribute out-parameter and parser p
,
the expected type is something like
.
It doesn't have to be exactly that; I'll explain in a bit. The return type
is ATTR
(p)bool
.
当你使用具有属性输出参数的 parse()
和解析器 p
调用时,期望的类型类似于
。它不必完全是这样;我稍后会解释。返回类型是 ATTR
(p)bool
。
When you call parse()
without
an attribute out-parameter and parser p
,
the return type is std::optional<
.
Note that when ATTR
(p)>
is itself an ATTR
(p)optional
, the
return type is std::optional<std::optional<...>>
. Each of those optionals tells
you something different. The outer one tells you whether the parse succeeded.
If so, the parser was successful, but it still generates an attribute that
is an optional
— that's
the inner one.
当您调用没有属性输出参数的 parse()
和解析器 p
时,返回类型是 std::optional<
。请注意,当 ATTR
(p)>
本身是一个 ATTR
(p)optional
时,返回类型是 std::optional<std::optional<...>>
。每个可选参数都告诉你不同的事情。外部的告诉你解析是否成功。如果是这样,解析器就成功了,但它仍然生成一个属性,这是一个 optional
——那就是内部的。
namespace bp = boost::parser; auto const p = '"' >> *(bp::char_ - '"') >> '"'; char const * str = "\"two words\"" ; auto result_1 = bp::parse(str, p); // !!result_1 is true; *result_1 is "two words" auto result_2 = bp::parse(str, p, bp::ws); // !!result_2 is true; *result_2 is "twowords"
For any call to parse()
that takes an attribute
out-parameter, like parse("str",
p, bp::ws, out)
,
the call is well-formed for a number of possible types of out
;
decltype(out)
does
not need to be exactly
.
ATTR
(p)
对于任何调用 parse()
,它接受一个属性输出参数,如 parse("str",
p, bp::ws, out)
,该调用对于多种可能的 out
类型都是良好形成的; decltype(out)
不需要正好是
。ATTR
(p)
For instance, this is well-formed code that does not abort (remember that
the attribute type of string()
is std::string
):
例如,这是一段没有终止的格式良好的代码(记住 string()
的属性类型是 std::string
):
namespace bp = boost::parser; auto const p = bp::string("foo"); std::vector<char> result; bool const success = bp::parse("foo", p, result); assert(success && result == std::vector<char>({'f', 'o', 'o'}));
Even though p
generates a
std::string
attribute, when it actually takes
the data it generates and writes it into an attribute, it only assumes that
the attribute is a container
(see Concepts), not that it
is some particular container type. It will happily insert()
into a std::string
or a std::vector<char>
all
the same. std::string
and std::vector<char>
are both containers of char
,
but it will also insert into a container with a different element type.
p
just needs to be able to
insert the elements it produces into the attribute-container. As long as
an implicit conversion allows that to work, everything is fine:
尽管 p
生成一个 std::string
属性,但实际上它将生成的数据写入属性时,它只假设该属性是一个 container
(见概念),而不是某种特定的容器类型。它将愉快地 insert()
到 std::string
或 std::vector<char>
。 std::string
和 std::vector<char>
都是 char
的容器,但它也会将元素插入到具有不同元素类型的容器中。 p
只需要能够将其生成的元素插入到属性容器中即可。只要隐式转换允许这样做,一切就都正常了:
namespace bp = boost::parser; auto const p = bp::string("foo"); std::deque<int> result; bool const success = bp::parse("foo", p, result); assert(success && result == std::deque<int>({'f', 'o', 'o'}));
This works, too, even though it requires inserting elements from a generated
sequence of char32_t
into a
container of char
(remember
that the attribute type of +cp
is std::vector<char32_t>
):
这也可以工作,尽管它需要将生成的序列中的元素 char32_t
插入到容器 char
中(记住 +cp
的属性类型是 std::vector<char32_t>
):
namespace bp = boost::parser; auto const p = +bp::cp; std::string result; bool const success = bp::parse("foo", p, result); assert(success && result == "foo");
This next example works as well, even though the change to a container is
not at the top level. It is an element of the result tuple:
这个下一个例子也有效,即使将更改应用到容器不是在顶层。它是结果元组的元素:
namespace bp = boost::parser; auto const p = +(bp::cp - ' ') >> ' ' >> string("foo"); using attr_type = decltype(bp::parse(u8"", p)); static_assert(std::is_same_v< attr_type, std::optional<bp::tuple<std::string, std::string>>>); using namespace bp::literals; { // This is similar to attr_type, with the first std::string changed to a std::vector<int>. bp::tuple<std::vector<int>, std::string> result; bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result); assert(success); assert(bp::get(result, 0_c) == std::vector<int>({'r', U'ô', 'l', 'e'})); assert(bp::get(result, 1_c) == "foo"); } { // This time, we have a std::vector<char> instead of a std::vector<int>. bp::tuple<std::vector<char>, std::string> result; bool const success = bp::parse(u8"rôle foo" | bp::as_utf8, p, result); assert(success); // The 4 code points "rôle" get transcoded to 5 UTF-8 code points to fit in the std::string. assert(bp::get(result, 0_c) == std::vector<char>({'r', (char)0xc3, (char)0xb4, 'l', 'e'})); assert(bp::get(result, 1_c) == "foo"); }
As indicated in the inline comments, there are a couple of things to take
away from this example:
如内联注释所示,从这个例子中我们可以得到几点启示:
std::string
to std::vector<int>
,
or std::vector<char32_t>
to std::deque<int>
),
the call to parse()
will often still be
well-formed.
std::string
更改为 std::vector<int>
,或从 std::vector<char32_t>
更改为 std::deque<int>
),对 parse()
的调用通常仍然良好格式化。char32_t
(or wchar_t
for non-MSVC
builds), and the new container's element type is char
or char8_t
, Boost.Parser
assumes that this is a UTF-32-to-UTF-8 conversion, and silently transcodes
the data when inserting into the new container.
char32_t
(或非 MSVC 构建的 wchar_t
),而新容器的元素类型为 char
或 char8_t
,Boost.Parser 假定这是 UTF-32 到 UTF-8 的转换,并在将数据插入新容器时静默转换数据。
Let's look at a case where another simple-seeming type replacement does
not work. First, the case that works:
让我们看看一个看似简单的类型替换不起作用的案例。首先,这是一个起作用的案例:
namespace bp = boost::parser; auto parser = -(bp::char_ % ','); std::vector<int> result; auto b = bp::parse("a, b", parser, bp::ws, result);
is ATTR
(parser)std::optional<std::string>
. Even though we pass a std::vector<int>
,
everything is fine. However, if we modify this case only sightly, so that
the std::optional<std::string>
is nested within the attribute, the code
becomes ill-formed.
是 ATTR
(parser)std::optional<std::string>
。即使我们传递一个 std::vector<int>
,一切正常。然而,如果我们只稍微修改这个情况,使得 std::optional<std::string>
被嵌套在属性中,代码就变得不合法了。
struct S { std::vector<int> chars; int i; }; namespace bp = boost::parser; auto parser = -(bp::char_ % ',') >> bp::int_; S result; auto b = bp::parse("a, b 42", parser, bp::ws, result);
If we change chars
to a
std::vector<char>
,
the code is still ill-formed. Same if we change chars
to a std::string
. We must actually use std::optional<std::string>
exactly to make the code well-formed
again.
如果我们把 chars
改为 std::vector<char>
,代码仍然是不合法的。同样,如果我们把 chars
改为 std::string
。实际上我们必须精确地使用 std::optional<std::string>
才能使代码再次合法。
The reason the same looseness from the top-level parser does not apply to
a nested parser is that, at some point in the code, the parser -(bp::char_ % ',')
would try
to assign a std::optional<std::string>
— the element type of the attribute
type it normally generates — to a chars
.
If there's no implicit conversion there, the code is ill-formed.
同一级别的解析器中的相同宽松性不适用于嵌套解析器的原因是,在代码的某个点上,解析器 -(bp::char_ % ',')
会尝试将 std::optional<std::string>
(它通常生成的属性类型的元素类型)赋值给 chars
。如果没有隐式转换,则代码是不规范的。
The take-away for this last example is that the ability to arbitrarily swap
out data types within the type of the attribute you pass to parse()
is very flexible, but is
also limited to structurally simple cases. When we discuss rules
in the next section,
we'll see how this flexibility in the types of attributes can help when writing
complicated parsers.
这个最后一个例子的启示是,在传递给 parse()
的属性类型中任意交换数据类型的能力非常灵活,但也仅限于结构简单的情形。当我们讨论下一节的 rules
时,我们将看到这种属性类型上的灵活性在编写复杂的解析器时是如何有帮助的。
Those were examples of swapping out one container type for another. They
make good examples because that is more likely to be surprising, and so it's
getting lots of coverage here. You can also do much simpler things like parse
using a uint_
,
and writing its attribute into a double
.
In general, you can swap any type T
out of the attribute, as long as the swap would not result in some ill-formed
assignment within the parse.
这些是替换一种容器类型为另一种类型的示例。它们是很好的例子,因为这样更令人惊讶,因此在这里得到了很多关注。您还可以做更简单的事情,比如使用 uint_
进行解析,并将它的属性写入 double
。一般来说,您可以交换属性中的任何类型 T
,只要交换不会在解析中导致某些不正确的赋值。
Here is another example that also produces surprising results, for a different
reason.
这里还有一个例子,它也产生了令人惊讶的结果,但原因不同。
namespace bp = boost::parser; constexpr auto parser = bp::char_('a') >> bp::char_('b') >> bp::char_('c') | bp::char_('x') >> bp::char_('y') >> bp::char_('z'); std::string str = "abc"; bp::tuple<char, char, char> chars; bool b = bp::parse(str, parser, chars); assert(b); assert(chars == bp::tuple('c', '\0', '\0'));
This looks wrong, but is expected behavior. At every stage of the parse that
produces an attribute, Boost.Parser tries to assign that attribute to some
part of the out-param attribute provided to parse()
,
if there is one. Note that
is ATTR
(parser)std::string
,
because each sequence parser is three char_
parsers in a row, which forms a std::string
;
there are two such alternatives, so the overall attribute is also std::string
.
During the parse, when the first parser bp::char_('a')
matches the input, it produces the attribute 'a'
and needs to assign it to its destination. Some logic inside the sequence
parser indicates that this 'a'
contributes to the value in the 0
th
position in the result tuple, if the result is being written into a tuple.
Here, we passed a bp::tuple<char, char, char>
,
so it writes 'a'
into the first
element. Each subsequent char_
parser does the same thing, and writes over the first element. If we had
passed a std::string
as the out-param instead, the logic
would have seen that the out-param attribute is a string, and would have
appended 'a'
to it. Then each subsequent
parser would have appended to the string.
这看起来是错误的,但这是预期的行为。在解析过程中产生属性的每个阶段,Boost.Parser 都会尝试将那个属性分配给提供给 parse()
的出参属性的一部分,如果有的话。注意,
是 ATTR
(parser)std::string
,因为每个序列解析器是连续的三个 char_
解析器,形成一个 std::string
;有两个这样的选择,所以整体属性也是 std::string
。在解析过程中,当第一个解析器 bp::char_('a')
与输入匹配时,它会产生属性 'a'
并将其分配给目标位置。序列解析器内部的某些逻辑表明,如果结果写入元组,则这 'a'
有助于在结果元组的第 0
个位置上的值。在这里,我们传递了一个 bp::tuple<char, char, char>
,因此它将 'a'
写入第一个元素。每个后续的 char_
解析器都会做同样的事情,并覆盖第一个元素。如果我们传递了一个 std::string
作为出参,逻辑就会看到出参属性是一个字符串,并将 'a'
附加到它上面。然后每个后续解析器都会附加到字符串上。
Boost.Parser never looks at the arity of the tuple passed to parse()
to see if there are too
many or too few elements in it, compared to the expected attribute for the
parser. In this case, there are two extra elements that are never touched.
If there had been too few elements in the tuple, you would have seen a compilation
error. The reason that Boost.Parser never does this kind of type-checking
up front is that the loose assignment logic is spread out among the individual
parsers; the top-level parse can determine what the expected attribute is,
but not whether a passed attribute of another type is a suitable stand-in.
Boost.Parser 在传递给 parse()
的元组中从不检查元组的阶数,以查看其中是否元素过多或过少,与解析器期望的属性相比。在这种情况下,有两个额外的元素从未被触及。如果元组中的元素过少,你会看到编译错误。Boost.Parser 从不进行此类类型检查的原因是松散的赋值逻辑分散在各个解析器中;顶层解析可以确定期望的属性是什么,但不能确定传递的另一个类型的属性是否是合适的替代品。
variant
attribute out-parametersvariant
属性输出参数的兼容性
The use of a variant in an out-param is compatible if the default attribute
can be assigned to the variant
.
No other work is done to make the assignment compatible. For instance, this
will work as you'd expect:
一个输出参数中的变体使用与默认属性可以分配给 variant
兼容。无需进行其他工作以使分配兼容。例如,这将按预期工作:
namespace bp = boost::parser; std::variant<int, double> v; auto b = bp::parse("42", bp::int_, v); assert(b); assert(v.index() == 0); assert(std::get<0>(v) == 42);
Again, this works because v = 42
is well-formed.
However, other kinds of substitutions will not work. In particular, the
boost::parser::tuple
to aggregate or aggregate to boost::parser::tuple
transformations will
not work. Here's an example.
再次,这是因为 v = 42
格式正确。然而,其他类型的替换将不会工作。特别是,将 boost::parser::tuple
聚合或聚合到 boost::parser::tuple
的转换将不会工作。这里有一个例子。
struct key_value { int key; double value; }; namespace bp = boost::parser; std::variant<key_value, double> kv_or_d; key_value kv; bp::parse("42 13.0", bp::int_ >> bp::double_, kv); // Ok. bp::parse("42 13.0", bp::int_ >> bp::double_, kv_or_d); // Error: ill-formed!
In this case, it would be easy for Boost.Parser to look at the alternative
types covered by the variant, and do a conversion. However, there are many
cases in which there is no obviously correct variant alternative type, or
in which the user might expect one variant alternative type and get another.
Consider a couple of cases.
在这种情况下,Boost.Parser 很容易查看变体覆盖的替代类型并进行转换。然而,有许多情况下没有明显正确的变体替代类型,或者用户可能期望一种变体替代类型却得到另一种。考虑几个例子。
struct i_d { int i; double d; }; struct d_i { double d; int i; }; using v1 = std::variant<i_d, d_i>; struct i_s { int i; short s; }; struct d_d { double d1; double d2; }; using v2 = std::variant<i_s, d_d>; using tup_t = boost::parser::tuple<short, short>;
If we have a parser that produces a tup_t
,
and we have a v1
attribute
out-param, the correct variant alternative type clearly does not exist —
this case is ambiguous, and anyone can see that neither variant alternative
is a better match. If we were assigning a tup_t
to v2
, it's even worse. The
same ambiguity exists, but to the user, i_s
is clearly "closer" than d_d
.
如果我们有一个生成 tup_t
的解析器,并且我们有一个 v1
属性输出参数,正确的变体替代类型显然不存在——这种情况是模糊的,任何人都可以看出这两种变体替代都不是更好的匹配。如果我们正在将 tup_t
分配给 v2
,那就更糟了。存在相同的模糊性,但对于用户来说, i_s
明显比 d_d
更接近。
So, Boost.Parser only does assignment. If some parser P
generates a default attribute that is not assignable to a variant alternative
that you want to assign it to, you can just create a rule
that creates either an
exact variant alternative type, or the variant itself, and use P
as your rule's parser.
所以,Boost.Parser 只做赋值。如果某个解析器 P
生成了一个不能分配给想要分配的变体备选方案的默认属性,你可以创建一个 rule
,它创建一个精确的变体备选方案类型或变体本身,并使用 P
作为你的规则解析器。
A call to parse()
either considers the entire
input to be in a UTF format (UTF-8, UTF-16, or UTF-32), or it considers the
entire input to be in some unknown encoding. Here is how it deduces which
case the call falls under:
调用 parse()
时,要么将整个输入视为 UTF 格式(UTF-8、UTF-16 或 UTF-32),要么视为某种未知编码。以下是它是如何推断调用属于哪种情况的:
char8_t
,
or if the input is a boost::parser::utf8_view
,
the input is UTF-8.
char8_t
的序列,或者输入是 boost::parser::utf8_view
,则输入是 UTF-8。char
,
the input is in an unknown encoding.
char
,则输入处于未知编码。Tip 提示 | |
---|---|
if you want to want to parse in ASCII-only mode, or in some other non-Unicode
encoding, use only sequences of |
Tip 提示 | |
---|---|
If you want to ensure all input is parsed as Unicode, pass the input range
|
Note 注意 | |
---|---|
Since passing |
trace_mode
parameter to
parse()trace_mode
parameter to parse() 的翻译为:解析()的trace_mode
参数
Debugging parsers is notoriously difficult once they reach a certain size.
To get a verbose trace of your parse, pass boost::parser::trace::on
as the final parameter to parse()
. It will show you the current
parser being matched, the next few characters to be parsed, and any attributes
generated. See the Error
Handling and Debugging section of the tutorial for details.
调试解析器一旦达到一定规模就特别困难。要获取你的解析的详细跟踪,请将 boost::parser::trace::on
作为 parse()
的最后一个参数传递。它将显示当前正在匹配的解析器、接下来要解析的几个字符以及生成的任何属性。有关详细信息,请参阅教程中的错误处理和调试部分。
Each call to parse()
can optionally have a globals
object associated with it. To use a particular globals object with you parser,
you call with_globals()
to create a new parser with
the globals object in it:
每次调用 parse()
都可以选择性地与一个全局对象关联。要使用特定的全局对象与您的解析器,您可以通过调用 with_globals()
来创建一个新的包含全局对象的解析器:
struct globals_t { int foo; std::string bar; }; auto const parser = /* ... */; globals_t globals{42, "yay"}; auto result = boost::parser::parse("str", boost::parser::with_globals(parser, globals));
Every semantic action within that call to parse()
can access the same globals_t
object using _globals(ctx)
.
每个对该 parse()
的调用中的语义动作都可以使用 _globals(ctx)
访问相同的 globals_t
对象。
The default error handler is great for most needs, but if you want to change
it, you can do so by creating a new parser with a call to with_error_handler()
:
默认错误处理器适用于大多数需求,但如果你想要更改它,可以通过调用 with_error_handler()
创建一个新的解析器来做到这一点
auto const parser = /* ... */; my_error_handler error_handler; auto result = boost::parser::parse("str", boost::parser::with_error_handler(parser, error_handler));
Tip 提示 | |
---|---|
If your parsing environment does not allow you to report errors to a terminal,
you may want to use |
Important 重要 | |
---|---|
Globals and the error handler are ignored, if present, on any parser except
the top-level parser.
|
In the earlier page about rules
(Rule
Parsers), I described rules
as being analogous to
functions. rules
are, at base, organizational. Here are the common use cases for rules
.
Use a rule
if you want to:
在关于 rules
(规则解析器)的早期页面中,我将 rules
描述为类似于函数。 rules
在本质上属于组织性的。以下是 rules
的常见用例。如果你想使用 rule
:
Let's look at the use cases in detail.
让我们详细看看这些用例。
We saw in the previous section how parse()
is flexible in what types it will accept as attribute out-parameters. Here's
another example.
我们在上一节中看到了 parse()
在作为属性输出参数时可以接受哪些类型的灵活性。这里有一个另一个例子。
namespace bp = boost::parser; auto result = bp::parse(input, bp::int % ',', result);
result
can be one of many
different types. It could be std::vector<int>
.
It could be std::set<long long>
. It could be a lot of things. Often,
this is a very useful property; if you had to rewrite all of your parser
logic because you changed the desired container in some part of your attribute
from a std::vector
to a std::deque
,
that would be annoying. However, that flexibility comes at the cost of type
checking. If you want to write a parser that always
produces exactly a std::vector<unsigned int>
and no other type,
you also probably want a compilation error if you accidentally pass that
parser a std::set<unsigned int>
attribute instead. There is no way with
a plain parser to enforce that its attribute type may only ever be a single,
fixed type.
result
可以是许多不同类型之一。它可能是 std::vector<int>
。它可能是 std::set<long long>
。它可能是许多事物。通常,这是一个非常有用的属性;如果你不得不重写所有解析器逻辑,因为你在属性的一部分将期望的容器从 std::vector
改为 std::deque
,那会很烦人。然而,这种灵活性是以类型检查为代价的。如果你想编写一个总是产生 exactly a std::vector<unsigned int>
而不是其他类型的解析器,那么如果你不小心传递了一个 std::set<unsigned int>
属性给那个解析器,你可能也希望出现编译错误。使用普通的解析器无法强制其属性类型只能是单一、固定的类型。
Fortunately, rules
allow you to write a parser that has a fixed attribute type. Every rule has
a specific attribute type, provided as a template parameter. If one is not
specified, the rule has no attribute. The fact that the attribute is a specific
type allows you to remove attribute flexibility. For instance, say we have
a rule defined like this:
幸运的是, rules
允许您编写具有固定属性类型的解析器。每个规则都有一个特定的属性类型,作为模板参数提供。如果没有指定,则规则没有属性。属性是特定类型的事实允许您去除属性灵活性。例如,假设我们有一个如下定义的规则:
bp::rule<struct doubles, std::vector<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULES(doubles);
You can then use it in a call to parse()
,
and parse()
will return a std::optional<std::vector<double>>
:
您可以在调用 parse()
时使用它, parse()
将返回一个 std::optional<std::vector<double>>
:
auto const result = bp::parse(input, doubles, bp::ws);
If you call parse()
with an attribute out-parameter,
it must be exactly std::vector<double>
:
如果您使用 parse()
带有属性输出参数,它必须是精确的 std::vector<double>
:
std::vector<double> vec_result; bp::parse(input, doubles, bp::ws, vec_result); // Ok. std::deque<double> deque_result; bp::parse(input, doubles, bp::ws, deque_result); // Ill-formed!
If we wanted to use a std::deque<double>
as the attribute type of our rule:
如果我们想将 std::deque<double>
用作我们规则的属性类型:
// Attribute changed to std::deque<double>. bp::rule<struct doubles, std::deque<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULES(doubles); int main() { std::deque<double> deque_result; bp::parse(input, doubles, bp::ws, deque_result); // Ok. }
The take-away here is that the attribute flexibility is still available,
but only within the rule — the parser
bp::double_ % ','
can parse into a std::vector<double>
or a std::deque<double>
, but the rule doubles
must parse into only the exact attribute it was declared to generate.
这里的关键是属性灵活性仍然可用,但仅限于规则内——解析器 bp::double_ % ','
可以解析为 std::vector<double>
或 std::deque<double>
,但规则 doubles
必须仅解析为声明时指定的确切属性。
The reason for this is that, inside the rule parsing implementation, there
is code something like this:
.
这个原因在于,在规则解析实现内部,存在类似以下的代码:
using attr_t = ATTR
(doubles_def);
attr_t attr;
parse(first, last, parser, attr);
attribute_out_param = std::move(attr);
Where attribute_out_param
is the attribute out-parameter we pass to parse()
.
If that final move assignment is ill-formed, the call to parse()
is too.
attribute_out_param
是我们传递给 parse()
的属性输出参数。如果最后的移动赋值不正确,对 parse()
的调用也是如此。
You can also use rules to exploit attribute flexibility. Even though a rule
reduces the flexibility of attributes it can generate, the fact that it is
so easy to write a new rule means that we can use rules themselves to get
the attribute flexibility we want across our code:
您也可以使用规则来利用属性灵活性。尽管规则会降低它所生成的属性的灵活性,但编写新规则如此简单的事实意味着我们可以使用规则本身来在我们的代码中获得我们想要的属性灵活性:
namespace bp = boost::parser; // We only need to write the definition once... auto const generic_doubles_def = bp::double_ % ','; bp::rule<struct vec_doubles, std::vector<double>> vec_doubles = "vec_doubles"; auto const & vec_doubles_def = generic_doubles_def; // ... and re-use it, BOOST_PARSER_DEFINE_RULES(vec_doubles); // Attribute changed to std::deque<double>. bp::rule<struct deque_doubles, std::deque<double>> deque_doubles = "deque_doubles"; auto const & deque_doubles_def = generic_doubles_def; // ... and re-use it again. BOOST_PARSER_DEFINE_RULES(deque_doubles);
Now we have one of each, and we did not have to copy any parsing logic that
would have to be maintained in two places.
现在我们每种都有一份,而且我们不必复制任何需要在两个地方维护的解析逻辑。
Sometimes, you need to create a rule to enforce a certain attribute type,
but the rule's attribute is not constructible from its parser's attribute.
When that happens, you'll need to write a semantic action.
有时,您需要创建一条规则来强制执行某种属性类型,但规则的属性无法从其解析器的属性构建。当这种情况发生时,您需要编写语义动作。
struct type_t { type_t() = default; explicit type_t(double x) : x_(x) {} // etc. double x_; }; namespace bp = boost::parser; auto doubles_to_type = [](auto & ctx) { using namespace bp::literals; _val(ctx) = type_t(_attr(ctx)[0_c] * _attr(ctx)[1_c]); }; bp::rule<struct type_tag, type_t> type = "type"; auto const type_def = (bp::double_ >> bp::double_)[doubles_to_type]; BOOST_PARSER_DEFINE_RULES(type);
For a rule R
and its parser
P
, we do not need to write
such a semantic action if:
对于规则 R
及其解析器 P
,如果不需要编写这样的语义动作:
-
is an
aggregate, and ATTR
(R)
is a compatible tuple;
ATTR
(P)
-
是一个聚合, ATTR
(R)
是一个兼容元组;ATTR
(P)
-
is a
tuple, and ATTR
(R)
is a
compatible aggregate;
ATTR
(P)
-
是一个元组, ATTR
(R)
是一个兼容的聚合ATTR
(P)
-
is a
non-aggregate class type ATTR
(R)C
,
and
is a
tuple whose elements can be used to construct ATTR
(P)C
;
or
-
是一个非聚合类类型 ATTR
(R)C
,而
是一个元组,其元素可以用来构建 ATTR
(P)C
;或者
-
and
ATTR
(R)
are
compatible types.
ATTR
(P)
-
和 ATTR
(R)
是兼容的类型。ATTR
(P)
The notion of "compatible" is defined in The
parse()
API.
"“兼容”这一概念在 The parse()
API 中定义。"
Each rule
has associated diagnostic text that Boost.Parser can use for failures of
that rule. This is useful when the parse reaches a parse failure at an expectation
point (see Expectation
points). Let's say you have the following code defined somewhere.
每个 rule
都与 Boost.Parser 可以用于该规则失败的诊断文本相关联。这在解析达到期望点时的解析失败时很有用(参见期望点)。假设你在某处定义了以下代码。
namespace bp = boost::parser; bp::rule<struct value_tag> value = "an integer, or a list of integers in braces"; auto const ints = '{' > (value % ',') > '}'; auto const value_def = bp::int_ | ints; BOOST_PARSER_DEFINE_RULES(value);
Notice the two expectation points. One before (value % ',')
, one before
the final '}'
. Later, you call
parse in some input:
请注意两个期望点。一个在 (value % ',')
之前,一个在最终的 '}'
之前。稍后,你在某些输入中调用 parse:
bp::parse("{ 4, 5 a", value, bp::ws);
This runs should of the second expectation point, and produces output like
this:
这次运行应该达到第二个期望点,并产生如下输出:
1:7: error: Expected '}' here: { 4, 5 a ^
That's a pretty good error message. Here's what it looks like if we violate
the earlier expectation:
这是一个相当好的错误信息。如果我们违反了之前的期望,它看起来是这样的:
bp::parse("{ }", value, bp::ws);
1:2: error: Expected an integer, or a list of integers in braces % ',' here: { } ^
Not nearly as nice. The problem is that the expectation is on (value % ',')
.
So, even thought we gave value
reasonable dianostic text, we put the text on the wrong thing. We can introduce
a new rule to put the diagnstic text in the right place.
远不如预期好。问题是期望在 (value % ',')
上。所以,尽管我们给出了 value
合理的诊断文本,但我们把文本放在了错误的地方。我们可以引入一条新规则,将诊断文本放在正确的位置。
namespace bp = boost::parser; bp::rule<struct value_tag> value = "an integer, or a list of integers in braces"; bp::rule<struct comma_values_tag> comma_values = "a comma-delimited list of integers"; auto const ints = '{' > comma_values > '}'; auto const value_def = bp::int_ | ints; auto const comma_values_def = (value % ','); BOOST_PARSER_DEFINE_RULES(value, comma_values);
Now when we call bp::parse("{ }",
value,
bp::ws)
we
get a much better message:
现在当我们调用 bp::parse("{ }",
value,
bp::ws)
时,我们得到一条更好的消息:
1:2: error: Expected a comma-delimited list of integers here: { } ^
The rule
value
might be useful elsewhere
in our code, perhaps in another parser. It's diagnostic text is appropriate
for those other potential uses.
这段代码可能在我们代码的其他地方有用,也许在另一个解析器中。它的诊断文本适用于那些其他潜在用途。
It's pretty common to see grammars that include recursive rules. Consider
this EBNF rule for balanced parentheses:
它很常见,语法中包含递归规则。考虑这个平衡括号的 EBNF 规则:
<parens> ::= "" | ( "(" <parens> ")" )
We can try to write this using Boost.Parser like this:
我们可以尝试使用 Boost.Parser 这样编写:
namespace bp = boost::parser; auto const parens = '(' >> parens >> ')' | bp::eps;
We had to put the bp::eps
second, because Boost.Parser's parsing
algorithm is greedy. Otherwise, it's just a straight transliteration. Unfortunately,
it does not work. The code is ill-formed because you can't define a variable
in terms of itself. Well you can, but nothing good comes of it. If we instead
make the parser in terms of a forward-declared rule
, it works.
我们不得不将 bp::eps
放在第二个位置,因为 Boost.Parser 的解析算法是贪婪的。否则,它只是简单的转写。不幸的是,它不起作用。代码是不合法的,因为你不能在自身定义一个变量。虽然你可以这样做,但结果并不好。如果我们用前声明的 rule
来编写解析器,它就能工作了。
namespace bp = boost::parser; bp::rule<struct parens_tag> parens = "matched parentheses"; auto const parens_def = '(' >> parens > ')' | bp::eps; BOOST_PARSER_DEFINE_RULES(parens);
Later, if we use it to parse, it does what we want.
稍后,如果我们用它来解析,它就会做我们想要的事情。
assert(bp::parse("(((())))", parens, bp::ws));
When it fails, it even produces nice diagnostics.
当它失败时,甚至还能产生良好的诊断信息。
bp::parse("(((()))", parens, bp::ws);
1:7: error: Expected ')' here (end of input): (((())) ^
Recursive rules
work differently from other parsers in one way: when re-entering the rule
recursively, only the attribute variable (_attr(ctx)
in your semantic actions) is unique to that instance of the rule. All the
other state of the uppermost instance of that rule is shared. This includes
the value of the rule (_val(ctx)
),
and the locals and parameters to the rule. In other words, _val(ctx)
returns a reference to the same
object in every instance of a recursive rule
. This is because each
instance of the rule needs a place to put the attribute it generates from
its parse. However, we only want a single return value for the uppermost
rule; if each instance had a separate value in _val(ctx)
,
then it would be impossible to build up the result of a recursive rule step
by step during in the evaluation of the recursive instantiations.
递归工作方式与其他解析器不同:在递归进入规则时,只有属性变量(在您的语义动作中为 _attr(ctx)
)对该规则的实例是唯一的。该规则最顶层实例的所有其他状态都是共享的。这包括规则值( _val(ctx)
)、局部变量和规则参数。换句话说, _val(ctx)
在递归 rule
的每个实例中返回对同一对象的引用。这是因为每个规则的实例都需要一个地方来放置从解析生成的属性。然而,我们只想为最顶层的规则返回单个值;如果每个实例在 _val(ctx)
中都有不同的值,那么在递归实例的评估过程中逐步构建递归规则的结果将是不可能的。
Also, consider this rule:
此外,请考虑这条规则:
namespace bp = boost::parser; bp::rule<struct ints_tag, std::vector<int>> ints = "ints"; auto const ints_def = bp::int_ >> ints | bp::eps;
What is the default attribute type for ints_def? It sure looks like std::optional<std::vector<int>>
.
Inside the evaluation of ints
,
Boost.Parser must evaluate ints_def
,
and then produce a std::vector<int>
—
the return type of ints
—
from it. How? How do you turn a std::optional<std::vector<int>>
into a std::vector<int>
? To
a human, it seems obvious, but the metaprogramming that properly handles
this simple example and the general case is certainly beyond me.
默认 ints_def 的属性类型是什么?它看起来像是 std::optional<std::vector<int>>
。在 ints
的评估过程中,Boost.Parser 必须评估 ints_def
,然后从它生成一个 std::vector<int>
—— ints
的返回类型。如何做到?你如何将一个 std::optional<std::vector<int>>
转换为 std::vector<int>
?对人类来说这似乎很明显,但正确处理这个简单示例和一般情况的元编程肯定超出了我的能力。
Boost.Parser has a specific semantic for what consitutes a recursive rule.
Each rule has a tag type associated with it, and if Boost.Parser enters a
rule with a certain tag Tag
,
and the currently-evaluating rule (if there is one) also has the tag Tag
, then rule instance being entered is
considered to be a recursion. No other situations are considered recursion.
In particular, if you have rules Ra
and Rb
, and Ra
uses Rb
,
which in turn used Ra
, the
second use of Ra
is not considered
recursion. Ra
and Rb
are of course mutually recursive, but
neither is considered a "recursive rule" for purposes of getting
a unique value, locals, and parameters.
Boost.Parser 具有特定的语义来定义递归规则。每个规则都与一个标签类型相关联,如果 Boost.Parser 进入一个带有特定标签 Tag
的规则,并且当前正在评估的规则(如果有的话)也带有标签 Tag
,那么进入的规则实例被认为是递归。其他情况不认为是递归。特别是,如果你有规则 Ra
和 Rb
,并且 Ra
使用 Rb
,而 Rb
又使用 Ra
,那么 Ra
的第二次使用不被认为是递归。 Ra
和 Rb
当然是相互递归的,但它们都不被认为是用于获取唯一值、局部变量和参数的“递归规则”。
One of the advantages of using rules is that you can declare all your rules
up front and then use them immediately afterward. This lets you make rules
that use each other without introducing cycles:
使用规则的一个优点是您可以在一开始就声明所有规则,然后立即使用它们。这使得您能够创建相互使用的规则,而不会引入循环:
namespace bp = boost::parser; // Assume we have some polymorphic type that can be an object/dictionary, // array, string, or int, called `value_type`. bp::rule<class string, std::string> const string = "string"; bp::rule<class object_element, bp::tuple<std::string, value_type>> const object_element = "object-element"; bp::rule<class object, value_type> const object = "object"; bp::rule<class array, value_type> const array = "array"; bp::rule<class value_tag, value_type> const value = "value"; auto const string_def = bp::lexeme['"' >> *(bp::char_ - '"') > '"']; auto const object_element_def = string > ':' > value; auto const object_def = '{'_l >> -(object_element % ',') > '}'; auto const array_def = '['_l >> -(value % ',') > ']'; auto const value_def = bp::int_ | bp::bool_ | string | array | object; BOOST_PARSER_DEFINE_RULES(string, object_element, object, array, value);
Here we have a parser for a Javascript-value-like type value_type
.
value_type
may be an array,
which itself may contain other arrays, objects, strings, etc. Since we need
to be able to parse objects within arrays and vice versa, we need each of
those two parsers to be able to refer to each other.
这里有一个用于类似 JavaScript 值的解析器 value_type
。 value_type
可能是一个数组,它本身可能包含其他数组、对象、字符串等。由于我们需要能够解析数组中的对象以及反之亦然,因此需要这两个解析器能够相互引用。
Only rules
can be callback parsers, so if you want to get attributes supplied to you
via callbacks instead of somewhere in the middle of a giant attribute that
represents the whole parse result, you need to use rules
. See Parsing
JSON With Callbacks for an extended example of callback parsing.
只有 rules
可以作为回调解析器,所以如果你想通过回调而不是在表示整个解析结果的巨大属性中间某处获取传递给你的属性,你需要使用 rules
。请参阅使用回调解析 JSON 的示例,以了解回调解析的扩展示例。
Inside all of a rule's semantic actions, the expression _val(ctx)
is a reference to the attribute that the rule generates. This can be useful
when you want subparsers to build up the attribute in a specific way:
在所有规则的语义动作中,表达式 _val(ctx)
是对规则生成的属性的引用。这在你想要子解析器以特定方式构建属性时非常有用:
namespace bp = boost::parser; using namespace bp::literals; bp::rule<class ints, std::vector<int>> const ints = "ints"; auto twenty_zeros = [](auto & ctx) { _val(ctx).resize(20, 0); }; auto push_back = [](auto & ctx) { _val(ctx).push_back(_attr(ctx)); }; auto const ints_def = "20-zeros"_l[twenty_zeros] | +bp::int_[push_back]; BOOST_PARSER_DEFINE_RULES(ints);
Tip 提示 | |
---|---|
That's just an example. It's almost always better to do things without
using semantic actions. We could have instead written |
The rule
template takes another template parameter we have not discussed yet. You
can pass a third parameter LocalState
to rule
,
which will be defaulted csontructed by the rule
, and made available within
semantic actions used in the rule as _locals(ctx)
. This
gives your rule some local state, if it needs it. The type of LocalState
can be anything regular. It
could be a single value, a struct containing multiple values, or a tuple,
among others.
该 rule
模板使用了我们尚未讨论的另一个模板参数。您可以将第三个参数 LocalState
传递给 rule
,它将由 rule
默认构造,并在规则中使用的语义动作中作为 _locals(ctx)
提供。这为您的规则提供了一些局部状态,如果需要的话。 LocalState
的类型可以是任何常规类型。它可以是单个值、包含多个值的结构体或元组等。
struct foo_locals { char first_value = 0; }; namespace bp = boost::parser; bp::rule<class foo, int, foo_locals> const foo = "foo"; auto record_first = [](auto & ctx) { _locals(ctx).first_value = _attr(ctx); } auto check_against_first = [](auto & ctx) { char const first = _locals(ctx).first_value; char const attr = _attr(ctx); if (attr == first) _pass(ctx) = false; _val(ctx) = (int(first) << 8) | int(attr); }; auto const foo_def = bp::cu[record_first] >> bp::cu[check_against_first]; BOOST_PARSER_DEFINE_RULES(foo);
foo
matches the input if
it can match two elements of the input in a row, but only if they are not
the same value. Without locals, it's a lot harder to write parsers that have
to track state as they parse.
foo
匹配输入,如果它能够连续匹配输入中的两个元素,但前提是这两个元素不是相同的值。没有局部变量,编写需要跟踪状态的解析器会更困难。
Sometimes, it is convenient to parameterize parsers. Consider these parsing
rules from the YAML 1.2
spec:
有时,参数化解析器很方便。考虑以下来自 YAML 1.2 规范的解析规则:
[80] s-separate(n,BLOCK-OUT) ::= s-separate-lines(n) s-separate(n,BLOCK-IN) ::= s-separate-lines(n) s-separate(n,FLOW-OUT) ::= s-separate-lines(n) s-separate(n,FLOW-IN) ::= s-separate-lines(n) s-separate(n,BLOCK-KEY) ::= s-separate-in-line s-separate(n,FLOW-KEY) ::= s-separate-in-line [136] in-flow(n,FLOW-OUT) ::= ns-s-flow-seq-entries(n,FLOW-IN) in-flow(n,FLOW-IN) ::= ns-s-flow-seq-entries(n,FLOW-IN) in-flow(n,BLOCK-KEY) ::= ns-s-flow-seq-entries(n,FLOW-KEY) in-flow(n,FLOW-KEY) ::= ns-s-flow-seq-entries(n,FLOW-KEY) [137] c-flow-sequence(n,c) ::= “[” s-separate(n,c)? in-flow(c)? “]”
YAML [137] says that the parsing should proceed into two YAML subrules, both
of which have these n
and
c
parameters. It is certainly
possible to transliterate these YAML parsing rules to something that uses
unparameterized Boost.Parser rules
, but it is quite painful
to do so. It is better to use a parameterized rule.
YAML [137] 表示解析应继续进行到两个 YAML 子规则,这两个子规则都有这些 n
和 c
参数。当然,可以将这些 YAML 解析规则转换为使用未参数化的 Boost.Parser rules
的某种形式,但这相当痛苦。最好使用参数化规则。
You give parameters to a rule
by calling its with()
member. The values you pass to with()
are used to create a boost::parser::tuple
that is available in
semantic actions attached to the rule, using _params(ctx)
.
您通过调用 rule
的 with()
成员来传递参数。您传递给 with()
的值用于创建一个在规则附加的语义动作中可用的 boost::parser::tuple
,使用 _params(ctx)
。
Passing parameters to rules
like this allows you
to easily write parsers that change the way they parse depending on contextual
data that they have already parsed.
传递参数给 rules
的方式允许你轻松编写根据已解析的上下文数据改变解析方式的解析器。
Here is an implementation of YAML [137]. It also implements the two YAML
rules used directly by [137], rules [136] and [80]. The rules that those rules use are also represented below, but are
implemented using only eps
, so that I don't have
to repeat too much of the (very large) YAML spec.
这里是对 YAML [137]的一个实现。它还实现了[137]直接使用的两个 YAML 规则,即规则[136]和[80]。这些规则所使用的规则也如下所示,但仅使用 eps
实现,这样我就不必重复太多(非常庞大)的 YAML 规范。
namespace bp = boost::parser; // A type to represent the YAML parse context. enum class context { block_in, block_out, block_key, flow_in, flow_out, flow_key }; // A YAML value; no need to fill it in for this example. struct value { // ... }; // YAML [66], just stubbed in here. auto const s_separate_in_line = bp::eps; // YAML [137]. bp::rule<struct c_flow_seq_tag, value> c_flow_sequence = "c-flow-sequence"; // YAML [80]. bp::rule<struct s_separate_tag> s_separate = "s-separate"; // YAML [136]. bp::rule<struct in_flow_tag, value> in_flow = "in-flow"; // YAML [138]; just eps below. bp::rule<struct ns_s_flow_seq_entries_tag, value> ns_s_flow_seq_entries = "ns-s-flow-seq-entries"; // YAML [81]; just eps below. bp::rule<struct s_separate_lines_tag> s_separate_lines = "s-separate-lines"; // Parser for YAML [137]. auto const c_flow_sequence_def = '[' >> -s_separate.with(bp::_p<0>, bp::_p<1>) >> -in_flow.with(bp::_p<0>, bp::_p<1>) >> ']'; // Parser for YAML [80]. auto const s_separate_def = bp::switch_(bp::_p<1>) (context::block_out, s_separate_lines.with(bp::_p<0>)) (context::block_in, s_separate_lines.with(bp::_p<0>)) (context::flow_out, s_separate_lines.with(bp::_p<0>)) (context::flow_in, s_separate_lines.with(bp::_p<0>)) (context::block_key, s_separate_in_line) (context::flow_key, s_separate_in_line); // Parser for YAML [136]. auto const in_flow_def = bp::switch_(bp::_p<1>) (context::flow_out, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_in)) (context::flow_in, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_in)) (context::block_out, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_key)) (context::flow_key, ns_s_flow_seq_entries.with(bp::_p<0>, context::flow_key)); auto const ns_s_flow_seq_entries_def = bp::eps; auto const s_separate_lines_def = bp::eps; BOOST_PARSER_DEFINE_RULES( c_flow_sequence, s_separate, in_flow, ns_s_flow_seq_entries, s_separate_lines);
YAML [137] (c_flow_sequence
)
parses a list. The list may be empty, and must be surrounded by brackets,
as you see here. But, depending on the current YAML context (the c
parameter to [137]), we may require certain
spacing to be matched by s-separate
,
and how sub-parser in-flow
behaves also depends on the current
context.
YAML [137]( c_flow_sequence
)解析列表。列表可能为空,并且必须用括号括起来,就像这里一样。但是,根据当前的 YAML 上下文([137]的 c
参数),我们可能需要通过 s-separate
匹配某些间距,并且子解析器 in-flow
的行为也取决于当前上下文。
In s_separate
above, we parse
differently based on the value of c
.
This is done above by using the value of the second parameter to s_separate
in a switch-parser. The second
parameter is looked up by using _p
as a parse argument.
在上述 s_separate
中,我们根据 c
的值进行不同的解析。这是通过使用 switch-parser 中的第二个参数的值来实现的。第二个参数是通过使用 _p
作为解析参数来查找的。
in_flow
does something similar.
Note that in_flow
calls its
subrule by passing its first parameter, but using a fixed value for the second
value. s_separate
only passes
its n
parameter conditionally.
The point is that a rule can be used with and without .with()
,
and that you can pass constants or parse arguments to .with()
.
in_flow
做类似的事情。注意, in_flow
通过传递第一个参数来调用其子规则,但第二个值使用固定值。 s_separate
仅在条件满足时传递其 n
参数。重点是规则可以带 .with()
使用,也可以不带使用,并且可以向 .with()
传递常量或解析参数。
With those rules defined, we could write a unit test for YAML [137] like
this:
定义了这些规则后,我们可以这样编写一个针对 YAML [137] 的单元测试:
auto const test_parser = c_flow_sequence.with(4, context::block_out); auto result = bp::parse("[]", test_parser); assert(result);
You could extend this with tests for different values of n
and c
. Obviously, in real
tests, you parse actual contents inside the "[]"
,
if the other rules were implemented, like [138].
您可以使用不同的 n
和 c
值进行扩展测试。显然,在实际测试中,如果实施了其他规则,如[138],您将解析 "[]"
内的实际内容。
Getting at one of a rule's arguments and passing it as an argument to another
parser can be very verbose. _p
is a variable template
that allows you to refer to the n
th
argument to the current rule, so that you can, in turn, pass it to one of
the rule's subparsers. Using this, foo_def
above can be rewritten as:
获取一个规则的参数并将其作为参数传递给另一个解析器可能非常冗长。 _p
是一个变量模板,允许您引用当前规则的 n
个参数,这样您就可以将其传递给规则的一个子解析器。使用此功能,上面的 foo_def
可以重写为:
auto const foo_def = bp::repeat(bp::_p<0>)[' '_l];
Using _p
can prevent you from having to write a bunch of lambdas that get each get
an argument out of the parse context using _params(ctx)[0_c]
or
similar.
使用 _p
可以防止您不得不编写一大堆 lambda 表达式,每个表达式都使用 _params(ctx)[0_c]
或类似的方式从解析上下文中获取一个参数。
Note that _p
is a parse argument (see The
Parsers And Their Uses), meaning that it is an invocable that takes
the context as its only parameter. If you want to use it inside a semantic
action, you have to call it.
请注意 _p
是一个解析参数(参见《解析器和它们的用途》),意味着它是一个只接受上下文作为参数的可调用对象。如果您想在语义动作中使用它,必须调用它。
Semantic actions in this tutorial are usually of the signature void (auto
& ctx)
. That is, they take a context by reference,
and return nothing. If they were to return something, that something would
just get dropped on the floor.
本教程中的语义动作通常具有签名 void (auto
& ctx)
。也就是说,它们通过引用接收上下文,并返回空值。如果它们返回某些内容,那些内容就会被扔在地上。
It is a pretty common pattern to create a rule in order to get a certain
kind of value out of a parser, when you don't normally get it automatically.
If I want to parse an int
,
int_
does that, and the thing that I parsed is also the desired attribute. If
I parse an int
followed by a
double
, I get a boost::parser::tuple
containing one of each.
But what if I don't want those two values, but some function of those two
values? I probably write something like this.
这是一个很常见的模式,当你不希望自动获取时,为了从解析器中获取某种类型的值而创建一个规则。如果我想解析一个 int
, int_
就做这个,我解析的东西也是想要的属性。如果我解析一个 int
后面跟着一个 double
,我会得到一个包含每个元素的 boost::parser::tuple
。但如果我不想得到这两个值,而是想得到这两个值的某个函数呢?我可能会写点像这样东西。
struct obj_t { /* ... */ }; obj_t to_obj(int i, double d) { /* ... */ } namespace bp = boost::parser; bp::rule<struct obj_tag, obj_t> obj = "obj"; auto make_obj = [](auto & ctx) { using boost::hana::literals; _val(ctx) = to_obj(_attr(ctx)[0_c], _attr(ctx)[1_c]); }; constexpr auto obj_def = (bp::int_ >> bp::double_)[make_obj];
That's fine, if a little verbose. However, you can also do this instead:
那没问题,有点啰嗦。然而,你也可以这样做:
namespace bp = boost::parser; bp::rule<struct obj_tag, obj_t> obj = "obj"; auto make_obj = [](auto & ctx) { using boost::hana::literals; return to_obj(_attr(ctx)[0_c], _attr(ctx)[1_c]); }; constexpr auto obj_def = (bp::int_ >> bp::double_)[make_obj];
Above, we return the value from a semantic action, and the returned value
gets assigned to _val(ctx)
.
以上,我们从语义动作返回值,返回的值被赋给 _val(ctx)
。
Finally, you can provide a function that takes the individual elements of
the attribute (if it's a tuple), and returns the value to assign to _val(ctx)
:
最后,你可以提供一个函数,该函数接受属性(如果它是元组)的各个元素,并返回分配给 _val(ctx)
的值
namespace bp = boost::parser; bp::rule<struct obj_tag, obj_t> obj = "obj"; constexpr auto obj_def = (bp::int_ >> bp::double_)[to_obj];
More formally, within a rule, the use of a semantic action is determined
as follows. Assume we have a function APPLY
that calls a function with the elements of a tuple, like std::apply
.
For some context ctx
, semantic
action action
, and attribute
attr
, action
is used like this:
更正式地说,在一条规则中,语义动作的使用如下确定。假设我们有一个函数 APPLY
,它调用一个带有元组元素的函数,如 std::apply
。对于某个上下文 ctx
,语义动作 action
和属性 attr
, action
的使用如下:
- _val(ctx) =
APPLY(action, std::move(attr))
,
if that is well-formed, and attr
is a tuple of size 2 or larger;
- 如果那样是正确格式的,并且 attr
是一个大小为 2 或更大的元组;
- otherwise, _val(ctx) =
action(ctx)
, if
that is well-formed;
否则, _val(ctx) =
action(ctx)
,如果它是正确形成的;
- otherwise, action(ctx)
.
否则, action(ctx)
。
The first case does not pass the context to the action at all. The last case
is the normal use of semantic actions outside of rules.
第一种情况根本不将上下文传递给动作。最后一种情况是规则之外的语义动作的正常使用。
Unless otherwise noted, all the algorithms and views are constrained very
much like the way the parse()
overloads are. The kinds of ranges, parsers, etc., that they accept are the
same.
除非另有说明,所有算法和视图都受到非常类似于 parse()
重载的方式的限制。它们接受的类型、解析器等范围是相同的。
As shown in The
parse()
API, the two patterns of parsing in Boost.Parser are whole-parse and
prefix-parse. When you want to find something in the middle of the range
being parsed, there's no parse
API for that. You can of course make a simple parser that skips everything
before what you're looking for.
如 The parse()
API 所示,Boost.Parser 中的解析模式有两种:完整解析和前缀解析。当你想在解析范围内的中间位置查找某些内容时,没有 parse
API 可以做到这一点。当然,你可以创建一个简单的解析器,跳过你想要查找内容之前的所有内容。
namespace bp = boost::parser; constexpr auto parser = /* ... */; constexpr auto middle_parser = bp::omit[*(bp::char_ - parser)] >> parser;
middle_parser
will skip over
everything, one char_
at
a time, as long as the next char_
is not the beginning of a successful match of parser
.
After this, control passes to parser
itself. Ok, so that's not too hard to write. If you need to parse something
from the middle in order to generate attributes, this is what you should
use.
middle_parser
将跳过所有内容,每次跳过一个 char_
,只要下一个 char_
不是 parser
成功匹配的开始。之后,控制权传递给 parser
本身。好吧,这并不难写。如果您需要从中部解析某些内容以生成属性,这就是您应该使用的。
However, it often turns out you only need to find some subrange in the parsed
range. In these cases, it would be nice to turn this into a proper algorithm
in the pattern of the ones in std::ranges
,
since that's more idiomatic. boost::parser::search()
is that algorithm. It has very similar semantics to std::ranges::search
,
except that it searches not for a match to an exact subrange, but to a match
with the given parser. Like std::ranges::search()
, it returns a subrange (boost::parser::subrange
in C++17, std::ranges::subrange
in C++20 and later).
然而,通常情况下,你只需要在解析的范围内找到某个子范围。在这些情况下,将其转换为类似于 std::ranges
中的算法模式会更好,因为这样更符合惯例。 boost::parser::search()
就是那个算法。它与 std::ranges::search
的语义非常相似,不同之处在于它不是搜索与精确子范围匹配,而是与给定的解析器匹配。像 std::ranges::search()
一样,它返回一个子范围(C++17 中的 boost::parser::subrange
,C++20 及以后版本中的 std::ranges::subrange
)。
namespace bp = boost::parser; auto result = bp::search("aaXYZq", bp::lit("XYZ"), bp::ws); assert(!result.empty()); assert(std::string_view(result.begin(), result.end() - result.begin()) == "XYZ");
Since boost::parser::search()
returns a subrange, whatever
parser you give it produces no attribute. I wrote bp::lit("XYZ")
above; if I had written bp::string("XYZ")
instead, the result (and lack of std::string
construction) would not change.
由于 boost::parser::search()
返回一个子范围,无论你给它什么解析器,都不会产生属性。我在上面写了 bp::lit("XYZ")
;如果我用 bp::string("XYZ")
代替,结果(以及缺少 std::string
构造)都不会改变。
As you can see above, one aspect of boost::parser::search()
differs intentionally from the conventions of the std::ranges
algorithms — it accepts C-style strings, treating them as if they
were proper ranges.
如您所见, boost::parser::search()
的一个方面故意与 std::ranges
算法的惯例不同——它接受 C 风格字符串,将它们视为适当的范围。
Also, boost::parser::search()
knows how to accommodate
your iterator type. You can pass the C-style string "aaXYZq"
as in the example above, or "aaXYZq"
| bp::as_utf32
,
or "aaXYZq" |
bp::as_utf8
, or even "aaXYZq"
| bp::as_utf16
,
and it will return a subrange whose iterators are the type that you passed
as input, even though internally the iterator type might be something different
(a UTF-8 -> UTF-32 transcoding iterator in Unicode parsing, as with all
the | bp::as_utfN
examples above). As long as you pass a range to be parsed whose value type
is char
, char8_t
,
char32_t
, or that is adapted
using some combination of as_utfN
adaptors, this accommodation will operate correctly.
此外, boost::parser::search()
知道如何适应您的迭代器类型。您可以将上面的示例中的 C 风格字符串 "aaXYZq"
传递,或者 "aaXYZq"
| bp::as_utf32
,或者 "aaXYZq" |
bp::as_utf8
,或者甚至 "aaXYZq"
| bp::as_utf16
,它将返回一个子范围,其迭代器类型与您传递的类型相同,即使内部迭代器类型可能不同(Unicode 解析中的 UTF-8 -> UTF-32 转码迭代器,如上面所有 | bp::as_utfN
示例所示)。只要传递一个要解析的范围,其值类型为 char
, char8_t
, char32_t
,或者通过某些 as_utfN
适配器的组合进行适配,这种适应就会正常工作。
boost::parser::search()
has multiple overloads.
You can pass a range or an iterator/sentinel pair, and you can pass a skip
parser or not. That's four overloads. Also, all four overloads take an optional
boost::parser::trace
parameter at the end. This is really handy for investigating why you're not
finding something in the input that you expected to.
boost::parser::search()
有多个重载。你可以传递一个范围或迭代器/哨兵对,也可以传递一个跳过解析器或不传递。这四种重载。另外,所有四种重载都在最后接受一个可选的 boost::parser::trace
参数。这真的很方便,可以用来调查为什么你没有找到你期望在输入中找到的东西。
boost::parser::search_all
creates boost::parser::search_all_views
.
boost::parser::search_all_view
is a std::views
-style view. It produces a range of
subranges. Each subrange it produces is the next match of the given parser
in the parsed range.
boost::parser::search_all
创建 boost::parser::search_all_views
。 boost::parser::search_all_view
是一种 std::views
风格的视图。它产生一系列子范围。它产生的每个子范围都是给定解析器在解析范围中的下一个匹配项。
namespace bp = boost::parser; auto r = "XYZaaXYZbaabaXYZXYZ" | bp::search_all(bp::lit("XYZ")); int count = 0; // Prints XYZ XYZ XYZ XYZ. for (auto subrange : r) { std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << " "; ++count; } std::cout << "\n"; assert(count == 4);
All the details called out in the subsection on boost::parser::search()
above apply to boost::parser::search_all
: its parser produces
no attributes; it accepts C-style strings as if they were ranges; and it
knows how to get from the internally-used iterator type back to the given
iterator type, in typical cases.
所有在上述 boost::parser::search()
子节中提到的细节都适用于 boost::parser::search_all
:它的解析器不产生属性;它将 C 风格字符串视为范围;并且它知道如何在内部使用的迭代器类型和给定的迭代器类型之间转换,在典型情况下。
boost::parser::search_all
can be called with, and boost::parser::search_all_view
can be constructed
with, a skip parser or not, and you can always pass boost::parser::trace
at the end of any of their
overloads.
boost::parser::search_all
可以与,以及 boost::parser::search_all_view
可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace
。
boost::parser::split
creates boost::parser::split_views
.
boost::parser::split_view
is a std::views
-style view. It produces a range of
subranges of the parsed range split on matches of the given parser. You can
think of boost::parser::split_view
as being the complement of boost::parser::search_all_view
, in that boost::parser::split_view
produces the subranges between the subranges produced by boost::parser::search_all_view
. boost::parser::split_view
has very similar semantics to std::views::split_view
.
Just like std::views::split_view
, boost::parser::split_view
will produce empty
ranges between the beginning/end of the parsed range and an adjacent match,
or between adjacent matches.
boost::parser::split
创建 boost::parser::split_views
。 boost::parser::split_view
是一种 std::views
风格的视图。它根据给定的解析器在匹配项上分割解析范围,产生一系列子范围。您可以将 boost::parser::split_view
视为 boost::parser::search_all_view
的补集,因为 boost::parser::split_view
生成由 boost::parser::search_all_view
生成的子范围之间的子范围。 boost::parser::split_view
与 std::views::split_view
的语义非常相似。就像 std::views::split_view
一样, boost::parser::split_view
将在解析范围的开始/结束和相邻匹配项之间产生空范围,或者在相邻匹配项之间产生空范围。
namespace bp = boost::parser; auto r = "XYZaaXYZbaabaXYZXYZ" | bp::split(bp::lit("XYZ")); int count = 0; // Prints '' 'aa' 'baaba' '' ''. for (auto subrange : r) { std::cout << "'" << std::string_view(subrange.begin(), subrange.end() - subrange.begin()) << "' "; ++count; } std::cout << "\n"; assert(count == 5);
All the details called out in the subsection on boost::parser::search()
above apply to boost::parser::split
:
its parser produces no attributes; it accepts C-style strings as if they
were ranges; and it knows how to get from the internally-used iterator type
back to the given iterator type, in typical cases.
所有在上述 boost::parser::search()
子节中提到的细节都适用于 boost::parser::split
:它的解析器不产生属性;它将 C 风格字符串视为范围;并且它知道如何在内部使用的迭代器类型和给定的迭代器类型之间转换,在典型情况下。
boost::parser::split
can be called with, and boost::parser::split_view
can be constructed
with, a skip parser or not, and you can always pass boost::parser::trace
at the end of any of their
overloads.
boost::parser::split
可以与,以及 boost::parser::split_view
可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace
。
Important 重要 | |
---|---|
|
boost::parser::replace
creates boost::parser::replace_views
.
boost::parser::replace_view
is a std::views
-style view. It produces a range of
subranges from the parsed range r
and the given replacement range replacement
.
Wherever in the parsed range a match to the given parser parser
is found, replacement
is
the subrange produced. Each subrange of r
that does not match parser
is produced as a subrange as well. The subranges are produced in the order
in which they occur in r
.
Unlike boost::parser::split_view
,
boost::parser::replace_view
does not produce empty subranges, unless replacement
is empty.
boost::parser::replace
创建 boost::parser::replace_views
。 boost::parser::replace_view
是一种 std::views
风格的视图。它从解析范围 r
和给定的替换范围 replacement
生成一系列子范围。在解析范围内, wherever 找到与给定解析器 parser
匹配的地方, replacement
就是生成的子范围。 r
的每个子范围如果不匹配 parser
,也会生成一个子范围。子范围按照它们在 r
中出现的顺序生成。与 boost::parser::split_view
不同, boost::parser::replace_view
不会生成空子范围,除非 replacement
为空。
namespace bp = boost::parser; auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_]; auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX"); int count = 0; // Prints My credit card number is XXXX-XXXX-XXXX-XXXX. for (auto subrange : rng) { std::cout << std::string_view(subrange.begin(), subrange.end() - subrange.begin()); ++count; } std::cout << "\n"; assert(count == 3);
If the iterator types Ir
and Ireplacement
for the
r
and replacement
ranges passed are identical (as in the example above), the iterator type
for the subranges produced is Ir
.
If they are different, an implementation-defined type is used for the iterator.
This type is the moral equivalent of a std::variant<Ir, Ireplacement>
. This works as long as Ir
and Ireplacement
are compatible. To be compatible, they must have common reference, value,
and rvalue reference types, as determined by std::common_type_t
.
One advantage to this scheme is that the range of subranges represented by
boost::parser::replace_view
is easily joined back into a single range.
如果传递给 r
和 replacement
范围的迭代器类型 Ir
和 Ireplacement
相同(如上例所示),则产生的子范围的迭代器类型为 Ir
。如果它们不同,则使用实现定义的类型作为迭代器。此类型是 std::variant<Ir, Ireplacement>
的道德等价物。只要 Ir
和 Ireplacement
兼容,它就可以正常工作。为了兼容,它们必须具有由 std::common_type_t
确定的共同引用、值和右值引用类型。此方案的一个优点是,由 boost::parser::replace_view
表示的子范围的范围可以轻松地合并成一个范围。
namespace bp = boost::parser; auto card_number = bp::int_ >> bp::repeat(3)['-' >> bp::int_]; auto rng = "My credit card number is 1234-5678-9012-3456." | bp::replace(card_number, "XXXX-XXXX-XXXX-XXXX") | std::views::join; std::string replace_result; for (auto ch : rng) { replace_result.push_back(ch); } assert(replace_result == "My credit card number is XXXX-XXXX-XXXX-XXXX.");
Note that we could not have written std::string
replace_result(r.begin(), r.end())
.
This is ill-formed because the std::string
range constructor takes two iterators of the same type, but decltype(rng.end())
is a sentinel type different from decltype(rng.begin())
.
请注意,我们无法编写 std::string
replace_result(r.begin(), r.end())
。这是不合法的,因为 std::string
范围构造函数需要两个相同类型的迭代器,但 decltype(rng.end())
是不同于 decltype(rng.begin())
的哨兵类型。
Though the ranges r
and
replacement
can both be C-style
strings, boost::parser::replace_view
must know the end of replacement
before it does any work. This is because the subranges produced are all common
ranges, and so if replacement
is not, a common range must be formed from it. If you expect to pass very
long C-style strings to boost::parser::replace
and not pay to see
the end until the range is used, don't.
尽管范围 r
和 replacement
都可以是 C 风格字符串, boost::parser::replace_view
必须在 replacement
之前知道其结束才能进行任何操作。这是因为产生的子范围都是公共范围,因此如果 replacement
不是,就必须从它形成公共范围。如果你预计要将非常长的 C 风格字符串传递给 boost::parser::replace
,并且不付费查看其结束直到使用范围,那么不要这样做。
ReplacementV
is constrained
almost exactly the same as V
.
V
must model parsable_range
and std::ranges::viewable_range
.
ReplacementV
is the same,
except that it can also be a std::ranges::input_range
,
whereas V
must be a std::ranges::forward_range
.
ReplacementV
与 V
几乎完全相同。 V
必须模拟 parsable_range
和 std::ranges::viewable_range
。 ReplacementV
相同,但也可以是 std::ranges::input_range
,而 V
必须是 std::ranges::forward_range
。
You may wonder what happens when you pass a UTF-N range for r
, and a UTF-M range for replacement
. What happens in this case
is silent transcoding of replacement
from UTF-M to UTF-N by the boost::parser::replace
range adaptor. This
doesn't require memory allocation; boost::parser::replace
just slaps | boost::parser::as_utfN
onto replacement
. However,
since Boost.Parser treats char
ranges as unknown encoding, boost::parser::replace
will not transcode
from char
ranges. So calls like
this won't work:
您可能会想知道当您传递一个 UTF-N 范围给 r
,以及一个 UTF-M 范围给 replacement
时会发生什么。在这种情况下, replacement
会被 boost::parser::replace
范围适配器静默地从 UTF-M 转换为 UTF-N。这不需要内存分配; boost::parser::replace
只是将 | boost::parser::as_utfN
粘贴到 replacement
上。然而,由于 Boost.Parser 将 char
范围视为未知编码, boost::parser::replace
不会从 char
范围进行转换。因此,这样的调用将不会工作:
char const str[] = "some text"; char const replacement_str[] = "some text"; using namespace bp = boost::parser; auto r = empty_str | bp::replace(parser, replacement_str | bp::as_utf8); // Error: ill-formed! Can't mix plain-char inputs and UTF replacements.
This does not work, even though char
and UTF-8 are the same size. If r
and replacement
are both
ranges of char
, everything will
work of course. It's just mixing char
and UTF-encoded ranges that does not work.
这不起作用,即使 char
和 UTF-8 大小相同。如果 r
和 replacement
都是 char
的范围,当然一切都会正常工作。只是混合 char
和 UTF 编码的范围不起作用。
All the details called out in the subsection on boost::parser::search()
above apply to boost::parser::replace
:
its parser produces no attributes; it accepts C-style strings for the r
and replacement
parameters as if they were ranges; and it knows how to get from the internally-used
iterator type back to the given iterator type, in typical cases.
所有在上述 boost::parser::search()
子节中提到的细节都适用于 boost::parser::replace
:它的解析器不产生属性;它将 C 风格的字符串作为 r
和 replacement
参数的范围接受;并且它知道如何在典型情况下从内部使用的迭代器类型回到给定的迭代器类型。
boost::parser::replace
can be called with, and boost::parser::replace_view
can be constructed
with, a skip parser or not, and you can always pass boost::parser::trace
at the end of any of their
overloads.
boost::parser::replace
可以与,以及 boost::parser::replace_view
可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace
。
Important 重要 | |
---|---|
|
Important 重要 | |
---|---|
|
boost::parser::transform_replace
creates boost::parser::transform_replace_views
. boost::parser::transform_replace_view
is a std::views
-style view. It produces a range of
subranges from the parsed range r
and the given invocable f
.
Wherever in the parsed range a match to the given parser parser
is found, let parser
's attribute
be attr
; f(std::move(attr))
is the subrange produced. Each subrange
of r
that does not match
parser
is produced as a subrange
as well. The subranges are produced in the order in which they occur in
r
. Unlike boost::parser::split_view
, boost::parser::transform_replace_view
does
not produce empty subranges, unless f(std::move(attr))
is empty. Here is an example.
boost::parser::transform_replace
创建 boost::parser::transform_replace_views
。 boost::parser::transform_replace_view
是一种 std::views
风格的视图。它从解析范围 r
和给定的可调用 f
中生成一系列子范围。在解析范围内,只要找到与给定解析器 parser
匹配的项,就让 parser
的属性为 attr
; f(std::move(attr))
是生成的子范围。对于不匹配 parser
的每个 r
子范围,也生成一个子范围。子范围按其在 r
中出现的顺序生成。与 boost::parser::split_view
不同, boost::parser::transform_replace_view
不会生成空子范围,除非 f(std::move(attr))
为空。以下是一个示例。
auto string_sum = [](std::vector<int> const & ints) { return std::to_string(std::accumulate(ints.begin(), ints.end(), 0)); }; auto rng = "There are groups of [1, 2, 3, 4, 5] in the set." | bp::transform_replace('[' >> bp::int_ % ',' >> ']', bp::ws, string_sum); int count = 0; // Prints "There are groups of 15 in the set". for (auto subrange : rng) { for (auto ch : subrange) { std::cout << ch; } ++count; } std::cout << "\n"; assert(count == 3);
Let the type decltype(f(std::move(attr)))
be Replacement
. Replacement
must be a range, and must be
compatible with r
. See the
description of boost::parser::replace_view
's iterator compatibility
requirements in the section above for details.
让类型 decltype(f(std::move(attr)))
为 Replacement
。 Replacement
必须是一个范围,并且必须与 r
兼容。有关 boost::parser::replace_view
迭代器兼容性要求的详细信息,请参阅上方章节。
As with boost::parser::replace
,
boost::parser::transform_replace
can be flattened from a view of subranges into a view of elements by piping
it to std::views::join
. See the section on boost::parser::replace
above for an example.
与 boost::parser::replace
一样, boost::parser::transform_replace
可以通过将其管道化到 std::views::join
中从子范围视图转换为元素视图。有关示例,请参阅上面的 boost::parser::replace
部分。
Just like boost::parser::replace
and boost::parser::replace_view
,
boost::parser::transform_replace
and boost::parser::transform_replace_view
do silent
transcoding of the result to the appropriate UTF, if applicable. If both
r
and f(std::move(attr))
are ranges of char
,
or are both the same UTF, no transcoding occurs. If one of r
and f(std::move(attr))
is a range of char
and the other is some UTF, the program is ill-formed.
就像 boost::parser::replace
和 boost::parser::replace_view
一样, boost::parser::transform_replace
和 boost::parser::transform_replace_view
在适用的情况下将结果静默转换为适当的 UTF。如果 r
和 f(std::move(attr))
都是 char
的范围,或者都是相同的 UTF,则不进行转换。如果 r
和 f(std::move(attr))
中有一个是 char
的范围,而另一个是某些 UTF,则程序是无效的。
boost::parser::transform_replace_view
will move each attribute into f
;
f
may move from the argument
or copy it as desired. f
may return an lvalue reference. If it does so, the address of the reference
will be taken and stored within boost::parser::transform_replace_view
. Otherwise,
the value returned by f
is
moved into boost::parser::transform_replace_view
. In
either case, the value type of boost::parser::transform_replace_view
is always
a subrange.
boost::parser::transform_replace_view
将每个属性移动到 f
; f
可以从参数移动或按需复制。 f
可能返回一个左值引用。如果它这样做,引用的地址将被取出并存储在 boost::parser::transform_replace_view
中。否则, f
返回的值将被移动到 boost::parser::transform_replace_view
。在两种情况下, boost::parser::transform_replace_view
的值类型始终是子范围。
boost::parser::transform_replace
can be called with, and boost::parser::transform_replace_view
can
be constructed with, a skip parser or not, and you can always pass boost::parser::trace
at the end of any of their overloads.
boost::parser::transform_replace
可以与,以及 boost::parser::transform_replace_view
可以使用跳转解析器或不使用跳转解析器来构建,并且你总是可以在它们的任何重载的末尾传递 boost::parser::trace
。
Boost.Parser was designed from the start to be Unicode friendly. There are
numerous references to the "Unicode code path" and the "non-Unicode
code path" in the Boost.Parser documentation. Though there are in fact
two code paths for Unicode and non-Unicode parsing, the code is not very
different in the two code paths, as they are written generically. The only
difference is that the Unicode code path parses the input as a range of code
points, and the non-Unicode path does not. In effect, this means that, in
the Unicode code path, when you call parse(r, p)
for some input range r
and some parser p
, the parse
happens as if you called parse(r | boost::parser::as_utf32, p)
instead. (Of course, it does not matter if r
is a proper range, or an iterator/sentinel pair; those both work fine with
boost::parser::as_utf32
.)
Boost.Parser 从一开始就被设计成对 Unicode 友好。Boost.Parser 文档中有很多关于“Unicode 代码路径”和“非 Unicode 代码路径”的引用。尽管实际上存在两个用于 Unicode 和非 Unicode 解析的代码路径,但由于它们是通用编写的,这两个代码路径中的代码并没有很大差异。唯一的区别是,Unicode 代码路径将输入解析为一系列代码点,而非 Unicode 路径则不是。实际上,这意味着在 Unicode 代码路径中,当你为某个输入范围 r
和某个解析器 p
调用 parse(r, p)
时,解析就像你调用了 parse(r | boost::parser::as_utf32, p)
一样发生。(当然,如果 r
是一个合适的范围,或者是一个迭代器/哨兵对,这两者都与 boost::parser::as_utf32
配合得很好。)
Matching "characters" within Boost.Parser's parsers is assumed
to be a code point match. In the Unicode path there is a code point from
the input that is matched to each char_
parser. In the non-Unicode
path, the encoding is unknown, and so each element of the input is considered
to be a whole "character" in the input encoding, analogous to a
code point. From this point on, I will therefore refer to a single element
of the input exclusively as a code point.
匹配 Boost.Parser 的解析器中的“字符”被认为是码点匹配。在 Unicode 路径中,输入中的一个码点与每个 char_
解析器匹配。在非 Unicode 路径中,编码未知,因此输入的每个元素都被视为输入编码中的一个“完整字符”,类似于码点。从现在起,因此我将专门将输入的单个元素称为码点。
So, let's say we write this parser:
所以,假设我们编写这个解析器:
constexpr auto char8_parser = boost::parser::char_('\xcc');
For any char_
parser that should match a value or values, the type of the value to match
is retained. So char8_parser
contains a char
that it will
use for matching. If we had written:
对于任何应该匹配值或值的 char_
解析器,保留要匹配的值的类型。因此, char8_parser
包含一个 char
,它将用于匹配。如果我们写成:
constexpr auto char32_parser = boost::parser::char_(U'\xcc');
char32_parser
would instead
contain a char32_t
that it would
use for matching.
char32_parser
将包含一个用于匹配的 char32_t
。
So, at any point during the parse, if char8_parser
were being used to match a code point next_cp
from the input, we would see the moral equivalent of next_cp
== '\xcc'
,
and if char32_parser
were
being used to match next_cp
,
we'd see the equivalent of next_cp
== U'\xcc'
. The take-away here is that you can write
char_
parsers that match specific values, without worrying if the input is Unicode
or not because, under the covers, what takes place is a simple comparison
of two integral values.
因此,在解析过程中,如果使用 char8_parser
来匹配输入中的代码点 next_cp
,我们会看到 next_cp
== '\xcc'
的道德等价物;如果使用 char32_parser
来匹配 next_cp
,我们会看到 next_cp
== U'\xcc'
的等价物。这里的要点是,您可以编写匹配特定值的 char_
解析器,无需担心输入是否为 Unicode,因为实际上发生的是两个整数值的简单比较。
Note 注意 | |
---|---|
Boost.Parser actually promotes any two values to a common type using |
Since matches are always done at a code point level (remember, a "code
point" in the non-Unicode path is assumed to be a single char
), you get different results trying to
match UTF-8 input in the Unicode and non-Unicode code paths:
由于匹配总是在代码点级别进行的(记住,在非 Unicode 路径中,“代码点”被认为是单个 char
),因此尝试在 Unicode 和非 Unicode 代码路径中匹配 UTF-8 输入时,您会得到不同的结果:
namespace bp = boost::parser; { std::string str = (char const *)u8"\xcc\x80"; // encodes the code point U+0300 auto first = str.begin(); // Since we've done nothing to indicate that we want to do Unicode // parsing, and we've passed a range of char to parse(), this will do // non-Unicode parsing. std::string chars; assert(bp::parse(first, str.end(), *bp::char_('\xcc'), chars)); // Finds one match of the *char* 0xcc, because the value in the parser // (0xcc) was matched against the two code points in the input (0xcc and // 0x80), and the first one was a match. assert(chars == "\xcc"); } { std::u8string str = u8"\xcc\x80"; // encodes the code point U+0300 auto first = str.begin(); // Since the input is a range of char8_t, this will do Unicode // parsing. The same thing would have happened if we passed // str | boost::parser::as_utf32 or even str | boost::parser::as_utf8. std::string chars; assert(bp::parse(first, str.end(), *bp::char_('\xcc'), chars)); // Finds zero matches of the *code point* 0xcc, because the value in // the parser (0xcc) was matched against the single code point in the // input, 0x0300. assert(chars == ""); }
Additionally, it is expected that most programs will use UTF-8 for the encoding
of Unicode strings. Boost.Parser is written with this typical case in mind.
This means that if you are parsing 32-bit code points (as you always are
in the Unicode path), and you want to catch the result in a container C
of char
or char8_t
values, Boost.Parser
will silently transcode from UTF-32 to UTF-8 and write the attribute into
C
. This means that std::string
,
std::u8string
, etc. are fine to use as attribute
out-parameters for *char_
, and the result
will be UTF-8.
此外,预计大多数程序将使用 UTF-8 对 Unicode 字符串进行编码。Boost.Parser 就是针对这种典型情况编写的。这意味着如果您正在解析 32 位代码点(在 Unicode 路径中您总是这样做),并且希望将结果捕获在包含 char
或 char8_t
值的容器 C
中,Boost.Parser 将静默地将 UTF-32 转换为 UTF-8,并将属性写入 C
。这意味着 std::string
、 std::u8string
等可以作为 *char_
的属性输出参数使用,结果将是 UTF-8。
Note 注意 | |
---|---|
UTF-16 strings as attributes are not supported directly. If you want to
use UTF-16 strings as attributes, you may need to do so by transcoding
a UTF-8 or UTF-32 attribute to UTF-16 within a semantic action. You can
do this by using |
The treatment of strings as UTF-8 is nearly ubiquitous within Boost.Parser.
For instance, though the entire interface of symbols
uses std::string
or std::string_view
, UTF-32 comparisons are used
internally.
字符串作为 UTF-8 的处理在 Boost.Parser 中几乎是普遍的。例如,尽管 symbols
的整个接口使用 std::string
或 std::string_view
,但内部使用 UTF-32 比较。
I mentioned above that the use of boost::parser::utf*_view
as the range to parse opts you in
to Unicode parsing. Here's a bit more about these views and how best to use
them.
我上面提到,使用 boost::parser::utf*_view
作为范围来解析 opts,将其引入 Unicode 解析。这里有一些关于这些视图以及如何最好地使用它们的更多信息。
If you want to do Unicode parsing, you're always going to be comparing code
points at each step of the parse. As such, you're going to implicitly convert
any parse input to UTF-32, if needed. This is what all the parse API functions
do internally.
如果您想进行 Unicode 解析,您将始终在每个解析步骤中比较码点。因此,如果需要,您将隐式地将任何解析输入转换为 UTF-32。这就是所有解析 API 函数在内部所做的事情。
However, there are times when you have parse input that is a sequence of
UTF-8-encoded char
s, and you
want to do Unicode-aware parsing. As mentioned previously, Boost.Parser has
a special case for char
inputs,
and it will not assume that char
sequences are UTF-8. If you want to tell
the parse API to do Unicode processing on them anyway, you can use the as_utf32
range adapter. (Note that you
can use any of the as_utf*
adaptors and the semantics will not differ
from the semantics below.)
然而,有时你需要解析输入为 UTF-8 编码的 char
序列,并且希望进行 Unicode 感知解析。如前所述,Boost.Parser 对 char
输入有特殊处理,它不会假设 char
序列是 UTF-8。如果你想让解析 API 无论如何都对这些进行 Unicode 处理,可以使用 as_utf32
范围适配器。(注意,你可以使用任何 as_utf*
适配器,其语义与下面的语义不会不同。)
namespace bp = boost::parser; auto const p = '"' >> *(bp::char_ - '"' - 0xb6) >> '"'; char const * str = "\"two wörds\""; // ö is two code units, 0xc3 0xb6 auto result_1 = bp::parse(str, p); // Treat each char as a code point (typically ASCII). assert(!result_1); auto result_2 = bp::parse(str | bp::as_utf32, p); // Unicode-aware parsing on code points. assert(result_2);
The first call to parse()
treats each char
as a code point,
and since "ö"
is the
pair of code units 0xc3
0xb6
, the parse matches the second code unit
against the - 0xb6
part of the parser above, causing the parse to fail. This happens because
each code unit/char
in str
is treated as an independent code point.
第一次调用 parse()
将每个 char
视为一个码点,由于 "ö"
是码单元对 0xc3
0xb6
,解析器将第二个码单元与上面的解析器的 - 0xb6
部分进行匹配,导致解析失败。这是因为 str
中的每个码单元/ char
都被视为一个独立的码点。
The second call to parse()
succeeds because, when the parse gets to the code point for 'ö'
, it is 0xf6
(U+00F6), which does not match the -
0xb6
part of the parser.
第二次调用 parse()
成功,因为当解析器到达 'ö'
的代码点时,它是 0xf6
(U+00F6),这与解析器的 -
0xb6
部分不匹配。
The other adaptors as_utf8
and as_utf16
are also provided
for completeness, if you want to use them. They each can transcode any sequence
of character types.
其他适配器 as_utf8
和 as_utf16
也提供以保持完整性,如果您想使用它们。它们各自可以转码任何字符类型的序列。
Important 重要 | |
---|---|
The |
One thing that Boost.Parser does not handle for you is normalization; Boost.Parser
is completely normalization-agnostic. Since all the parsers do their matching
using equality comparisons of code points, you should make sure that your
parsed range and your parsers all use the same normalization form.
Boost.Parser 不为你处理的一件事是规范化;Boost.Parser 对规范化一无所知。由于所有解析器都通过代码点的相等比较来进行匹配,你应该确保你的解析范围和解析器都使用相同的规范化形式。
In most parsing cases, being able to generate an attribute that represents
the result of the parse, or being able to parse into such an attribute, is
sufficient. Sometimes, it is not. If you need to parse a very large chunk
of text, the generated attribute may be too large to fit in memory. In other
cases, you may want to generate attributes sometimes, and not others. callback_rules
exist for these kinds of uses. A callback_rule
is just like
a rule, except that it allows the rule's attribute to be returned to the
caller via a callback, as long as the parse is started with a call to callback_parse()
instead of parse()
. Within a call to parse()
, a callback_rule
is identical
to a regular rule
.
在大多数解析情况下,能够生成一个表示解析结果的属性,或者能够将解析结果解析到这样的属性中,就足够了。有时则不然。如果你需要解析一个非常大的文本块,生成的属性可能太大而无法放入内存。在其他情况下,你可能有时想生成属性,有时则不想。 callback_rules
就是为了这些用途而存在的。 callback_rule
就像一条规则,只不过它允许通过回调将规则的属性返回给调用者,只要解析是以对 callback_parse()
的调用而不是对 parse()
的调用开始的。在 parse()
的调用中, callback_rule
与常规的 rule
相同。
For a rule with no attribute, the signature of a callback function is void (tag)
, where tag
is the tag-type used when declaring the rule. For a rule with an attribute
attr
, the signature is void (tag, attr)
. For instance, with this rule:
对于没有属性的规则,回调函数的签名是 void (tag)
,其中 tag
是在声明规则时使用的标签类型。对于具有属性 attr
的规则,签名是 void (tag, attr)
。例如,对于这个规则:
boost::parser::callback_rule<struct foo_tag> foo = "foo";
this would be an appropriate callback function:
这是一个合适的回调函数:
void foo_callback(foo_tag) { std::cout << "Parsed a 'foo'!\n"; }
For this rule: 对于这个规则:
boost::parser::callback_rule<struct bar_tag, std::string> bar = "bar";
this would be an appropriate callback function:
这是一个合适的回调函数:
void bar_callback(bar_tag, std::string const & s) { std::cout << "Parsed a 'bar' containing " << s << "!\n"; }
Important 重要 | |
---|---|
In the case of |
You opt into callback parsing by parsing with a call to callback_parse()
instead of parse()
. If you use callback_rules
with parse()
, they're just regular rules
.
This allows you to choose whether to do "normal" attribute-generating/attribute-assigning
parsing with parse()
, or callback parsing with
callback_parse()
, without rewriting much
parsing code, if any.
您通过调用 callback_parse()
而不是 parse()
来选择回调解析。如果您使用 callback_rules
与 parse()
,它们只是普通的 rules
。这允许您在不重写太多解析代码的情况下,选择是否使用 parse()
进行“正常”的属性生成/属性分配解析,或者使用 callback_parse()
进行回调解析。
The only reason all rules
are not callback_rules
is that you may want to have some rules
use callbacks within
a parse, and have some that do not. For instance, if you want to report the
attribute of callback_rule
r1
via callback, r1
's
implementation may use some rule r2
to generate some or all of its attribute.
唯一的原因是,所有 rules
都不是 callback_rules
,是因为你可能想在解析过程中让一些 rules
使用回调,而另一些则不使用。例如,如果你想通过回调报告 callback_rule
r1
的属性, r1
的实现可能使用某些规则 r2
来生成其属性的一部分或全部。
See Parsing
JSON With Callbacks for an extended example of callback parsing.
查看使用回调进行 JSON 解析的扩展示例。
Boost.Parser has good error reporting built into it. Consider what happens
when we fail to parse at an expectation point (created using operator>
).
If I feed the parser from the Parsing
JSON With Callbacks example a file called sample.json containing this
input (note the unmatched '['
):
Boost.Parser 内置了良好的错误报告功能。考虑当我们在一个期望点(使用 operator>
创建)处解析失败时会发生什么。如果我从“使用回调解析 JSON”示例中给解析器提供一个名为 sample.json 的文件,该文件包含以下输入(注意未匹配的 '['
):
{ "key": "value", "foo": [, "bar": [] }
This is the error message that is printed to the terminal:
这是打印到终端的错误信息:
sample.json:3:12: error: Expected ']' here: "foo": [, "bar": [] ^
That message is formatted like the diagnostics produced by Clang and GCC.
It quotes the line on which the failure occurred, and even puts a caret under
the exact position at which the parse failed. This error message is suitable
for many kinds of end-users, and interoperates well with anything that supports
Clang and/or GCC diagnostics.
该消息的格式类似于 Clang 和 GCC 生成的诊断信息。它引用了发生失败的行,甚至还在解析失败的确切位置下面放置了一个插入符。此错误消息适用于许多类型的最终用户,并且与支持 Clang 和/或 GCC 诊断的任何东西都具有良好的互操作性。
Most of Boost.Parser's error handlers format their diagnostics this way,
though you are not bound by that. You can make an error handler type that
does whatever you want, as long as it meets the error handler interface.
大多数 Boost.Parser 的错误处理器以这种方式格式化它们的诊断信息,尽管你并不受此限制。你可以创建一个满足错误处理器接口的任何错误处理器类型。
The Boost.Parser error handlers are:
Boost.Parser 的错误处理器有:
default_error_handler
:
Produces formatted diagnostics like the one above, and prints them to
std::cerr
. default_error_handler
has
no associated file name, and both errors and diagnostics are printed
to std::cerr
. This handler is constexpr
-friendly.
default_error_handler
:生成类似于上面的格式化诊断信息,并将它们打印到 std::cerr
。 default_error_handler
没有关联的文件名,错误和诊断信息都打印到 std::cerr
。此处理程序对 constexpr
友好。stream_error_handler
:
Produces formatted diagnostics. One or two streams may be used. If two
are used, errors go to one stream and warnings go to the other. A file
name can be associated with the parse; if it is, that file name will
appear in all diagnostics.
stream_error_handler
:生成格式化的诊断信息。可以使用一个或两个流。如果使用两个流,错误信息发送到一个流,警告信息发送到另一个流。可以与解析关联一个文件名;如果是这样,该文件名将出现在所有诊断信息中。callback_error_handler
:
Produces formatted diagnostics. Calls a callback with the diagnostic
message to report the diagnostic, rather than streaming out the diagnostic.
A file name can be associated with the parse; if it is, that file name
will appear in all diagnostics. This handler is useful for recording
the diagnostics in memory.
callback_error_handler
:生成格式化的诊断信息。通过回调函数传递诊断消息来报告诊断,而不是将诊断信息流式输出。可以与解析关联一个文件名;如果是这样,该文件名将出现在所有诊断信息中。此处理程序适用于在内存中记录诊断信息。rethrow_error_handler
:
Does nothing but re-throw any exception that it is asked to handle. Its
diagnose()
member functions are no-ops.
rethrow_error_handler
: 只做重新抛出它被要求处理的任何异常。它的 diagnose()
成员函数都是空操作。vs_output_error_handler
:
Directs all errors and warnings to the debugging output panel inside
Visual Studio. Available on Windows only. Probably does nothing useful
desirable when executed outside of Visual Studio.
vs_output_error_handler
:将所有错误和警告直接发送到 Visual Studio 内部的调试输出面板。仅在 Windows 上可用。在 Visual Studio 外部执行时可能没有任何有用的期望效果。
You can set the error handler to any of these, or one of your own, using
with_error_handler()
(see The
parse()
API). If you do not set one, default_error_handler
will
be used.
您可以将错误处理器设置为以下任何一个,或者使用您自己的,通过 with_error_handler()
(参见 parse()
API)。如果您没有设置,将使用 default_error_handler
。
Boost.Parser only generates error messages like the ones in this page at
failed expectation points, like a > b
, where you have successfully
parsed a
, but then cannot successfully parse b
.
This may seem limited to you. It's actually the best that we can do.
Boost.Parser 仅在失败期望点生成错误消息,如本页中的这些,例如 a > b
,你在其中成功解析了 a
,但随后无法成功解析 b
。这可能看起来很有限。实际上,这是我们能做到的最好的。
In order for error handling to happen other than at expectation points, we
have to know that there is no further processing that might take place. This
is true because Boost.Parser has P1 | P2 | ... | Pn
parsers
("or_parser
s"). If any one of these parsers Pi
fails to match, it is not allowed to fail the parse — the next one
(Pi+1
) might match. If we get to the end of the alternatives
of the or_parser and Pn
fails, we still cannot fail the top-level
parse, because the or_parser
might be a subparser within a parent
or_parser
.
为了使错误处理发生在预期点之外,我们必须知道没有进一步的加工可能发生。这是真的,因为 Boost.Parser 有 P1 | P2 | ... | Pn
解析器(" or_parser
s")。如果这些解析器中的任何一个 Pi
无法匹配,则不允许解析失败——下一个( Pi+1
)可能匹配。如果我们到达 or_parser 的替代方案末尾且 Pn
失败,我们仍然不能使顶级解析失败,因为 or_parser
可能是一个父 or_parser
中的子解析器。
Ok, so what might we do? Perhaps we could at least indicate when we ran into
end-of-input. But we cannot, for exactly the same reason already stated.
For any parser P
, reaching end-of-input is a failure for P
,
but not necessarily for the whole parse.
好的,那么我们可能做什么呢?也许我们至少可以指出我们遇到了输入结束。但我们不能,原因与之前已经说明的完全相同。对于任何解析器 P
,遇到输入结束是 P
的失败,但不一定是整个解析的失败。
Perhaps we could record the farthest point ever reached during the parse,
and report that at the top level, if the top level parser fails. That would
be little help without knowing which parser was active when we reached that
point. This would require some sort of repeated memory allocation, since
in Boost.Parser the progress point of the parser is stored exclusively on
the stack — by the time we fail the top-level parse, all those far-reaching
stack frames are long gone. Not the best.
也许我们可以记录解析过程中达到的最远点,并在顶级解析失败时在顶级报告。但这并没有什么帮助,除非我们知道在达到那个点时是哪个解析器处于活动状态。这需要某种形式的重复内存分配,因为在 Boost.Parser 中,解析器的进度点仅存储在栈上——当我们失败顶级解析时,所有那些遥远的栈帧都已经消失了。这不是最好的。
Worse still, knowing how far you got in the parse and which parser was active
is not very useful. Consider this.
更糟糕的是,知道你在解析中走了多远以及哪个解析器正在运行并不是很有用。考虑一下这个。
namespace bp = boost::parser; auto a_b = bp::char_('a') >> bp::char_('b'); auto c_b = bp::char_('c') >> bp::char_('b'); auto result = bp::parse("acb", a_b | c_b);
If we reported the farthest-reaching parser and it's position, it would be
the a_b
parser, at position "bc"
in the
input. Is this really enlightening? Was the error in the input putting the
'a'
at the beginning or putting the 'c'
in the
middle? If you point the user at a_b
as the parser that failed,
and never mention c_b
, you are potentially just steering them
in the wrong direction.
如果我们报告了影响最远的解析器和它的位置,它将是 a_b
解析器,位于输入中的 "bc"
位置。这真的有启发性吗?错误是在输入中将 'a'
放在开头还是将 'c'
放在中间?如果您将用户指向 a_b
作为失败的解析器,并且从未提及 c_b
,您可能只是在误导他们。
All error messages must come from failed expectation points. Consider parsing
JSON. If you open a list with '['
, you know that you're parsing
a list, and if the list is ill-formed, you'll get an error message saying
so. If you open an object with '{'
, the same thing is possible
— when missing the matching '}'
, you can tell the user,
"That's not an object", and this is useful feedback. The same thing
with a partially parsed number, etc. If the JSON parser does not build in
expectations like matched braces and brackets, how can Boost.Parser know
that a missing '}'
is really a problem, and that no later parser
will match the input even without the '}'
?
所有错误信息必须来自失败的预期点。考虑解析 JSON。如果你以 '['
打开一个列表,你知道你正在解析一个列表,如果列表格式不正确,你会得到一个错误信息说它是这样的。如果你以 '{'
打开一个对象,同样的事情可能发生——当缺少匹配的 '}'
时,你可以告诉用户“这不是一个对象”,这是一种有用的反馈。部分解析的数字等情况也是如此。如果 JSON 解析器没有内置匹配的括号和方括号等预期,Boost.Parser 如何知道缺少的 '}'
真的是一个问题,以及即使没有 '}'
,后续的解析器也不会匹配输入呢?
Important 重要 | |
---|---|
The bottom line is that you should build expectation points into your parsers
using |
You can get access to the error handler within any semantic action by calling
_error_handler(ctx)
(see The
Parse Context). Any error handler must have the following member functions:
您可以通过调用 _error_handler(ctx)
(参见解析上下文)在任何语义动作中获取错误处理器的访问权限。任何错误处理器都必须具有以下成员函数:
template<typename Context, typename Iter> void diagnose( diagnostic_kind kind, std::string_view message, Context const & context, Iter it) const;
template<typename Context> void diagnose( diagnostic_kind kind, std::string_view message, Context const & context) const;
If you call the second one, the one without the iterator parameter, it will
call the first with _where(context).begin()
as the iterator parameter. The one without the iterator is the one you will
use most often. The one with the explicit iterator parameter can be useful
in situations where you have messages that are related to each other, associated
with multiple locations. For instance, if you are parsing XML, you may want
to report that a close-tag does not match its associated open-tag by showing
the line where the open-tag was found. That may of course not be located
anywhere near _where(ctx).begin()
. (A description of _globals()
is below.)
如果您调用第二个,没有迭代器参数的那个,它将使用 _where(context).begin()
作为迭代器参数调用第一个。没有迭代器参数的那个是您最常使用的。具有显式迭代器参数的那个在您有相互关联的消息、与多个位置相关联的情况下可能很有用。例如,如果您正在解析 XML,您可能希望报告一个闭合标签与其关联的开放标签不匹配,通过显示开放标签被找到的行。当然,这可能在 _where(ctx).begin()
附近任何地方。( _globals()
的描述如下。)
[](auto & ctx) { // Assume we have a std::vector of open tags, and another // std::vector of iterators to where the open tags were parsed, in our // globals. if (_attr(ctx) != _globals(ctx).open_tags.back()) { std::string open_tag_msg = "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:"; _error_handler(ctx).diagnose( boost::parser::diagnostic_kind::error, open_tag_msg, ctx, _globals(ctx).open_tags_position.back()); std::string close_tag_msg = "does not match close-tag \"" + _attr(ctx) + "\" here:"; _error_handler(ctx).diagnose( boost::parser::diagnostic_kind::error, close_tag_msg, ctx); // Explicitly fail the parse. Diagnostics do not affect parse success. _pass(ctx) = false; } }
There are also some convenience functions that make the above code a little
less verbose, _report_error()
and _report_warning()
:
有一些便利函数可以使上述代码更简洁, _report_error()
和 _report_warning()
:
[](auto & ctx) { // Assume we have a std::vector of open tags, and another // std::vector of iterators to where the open tags were parsed, in our // globals. if (_attr(ctx) != _globals(ctx).open_tags.back()) { std::string open_tag_msg = "Previous open-tag \"" + _globals(ctx).open_tags.back() + "\" here:"; _report_error(ctx, open_tag_msg, _globals(ctx).open_tag_positions.back()); std::string close_tag_msg = "does not match close-tag \"" + _attr(ctx) + "\" here:"; _report_error(ctx, close_tag_msg); // Explicitly fail the parse. Diagnostics do not affect parse success. _pass(ctx) = false; } }
You should use these less verbose functions almost all the time. The only
time you would want to use _error_handler()
directly is when you are using a custom error handler, and you want access
to some part of its interface besides diagnose()
.
您几乎应该始终使用这些更简洁的函数。唯一您想直接使用 _error_handler()
的情况是,当您使用自定义错误处理器,并且想访问其接口的某些部分,而不仅仅是 diagnose()
。
Though there is support for reporting warnings using the functions above,
none of the error handlers supplied by Boost.Parser will ever report a warning.
Warnings are strictly for user code.
尽管支持使用上述函数报告警告,但 Boost.Parser 提供的任何错误处理器都不会报告警告。警告仅用于用户代码。
For more information on the rest of the error handling and diagnostic API,
see the header reference pages for error_handling_fwd.hpp
and error_handling.hpp
.
有关错误处理和诊断 API 的其余部分,请参阅 error_handling_fwd.hpp
和 error_handling.hpp
的头文件参考页面。
Creating your own error handler is pretty easy; you just need to implement
three member functions. Say you want an error handler that writes diagnostics
to a file. Here's how you might do that.
创建自己的错误处理器相当简单;你只需要实现三个成员函数。比如说,你想创建一个将诊断信息写入文件的错误处理器。以下是实现方法。
struct logging_error_handler { logging_error_handler() {} logging_error_handler(std::string_view filename) : filename_(filename), ofs_(filename_) { if (!ofs_) throw std::runtime_error("Could not open file."); } // This is the function called by Boost.Parser after a parser fails the // parse at an expectation point and throws a parse_error. It is expected // to create a diagnostic message, and put it where it needs to go. In // this case, we're writing it to a log file. This function returns a // bp::error_handler_result, which is an enum with two enumerators -- fail // and rethrow. Returning fail fails the top-level parse; returning // rethrow just re-throws the parse_error exception that got us here in // the first place. template<typename Iter, typename Sentinel> bp::error_handler_result operator()(Iter first, Sentinel last, bp::parse_error<Iter> const & e) const { bp::write_formatted_expectation_failure_error_message( ofs_, filename_, first, last, e); return bp::error_handler_result::fail; } // This function is for users to call within a semantic action to produce // a diagnostic. template<typename Context, typename Iter> void diagnose( bp::diagnostic_kind kind, std::string_view message, Context const & context, Iter it) const { bp::write_formatted_message( ofs_, filename_, bp::_begin(context), it, bp::_end(context), message); } // This is just like the other overload of diagnose(), except that it // determines the Iter parameter for the other overload by calling // _where(ctx). template<typename Context> void diagnose( bp::diagnostic_kind kind, std::string_view message, Context const & context) const { diagnose(kind, message, context, bp::_where(context).begin()); } std::string filename_; mutable std::ofstream ofs_; };
That's it. You just need to do the important work of the error handler in
its call operator, and then implement the two overloads of diagnose()
that it must provide for use inside semantic actions. The default implementation
of these is even available as the free function write_formatted_message()
,
so you can just call that, as you see above. Here's how you might use it.
这就可以了。你只需要在其调用操作符中完成错误处理程序的重要工作,然后实现它必须为语义动作内部使用提供的两个重载的 diagnose()
。这些的默认实现甚至作为免费函数 write_formatted_message()
可用,所以你只需调用它,就像上面看到的那样。下面是如何使用它的示例。
int main() { std::cout << "Enter a list of integers, separated by commas. "; std::string input; std::getline(std::cin, input); constexpr auto parser = bp::int_ >> *(',' > bp::int_); logging_error_handler error_handler("parse.log"); auto const result = bp::parse(input, bp::with_error_handler(parser, error_handler)); if (result) { std::cout << "It looks like you entered:\n"; for (int x : *result) { std::cout << x << "\n"; } } }
We just define a logging_error_handler
, and pass it by reference
to with_error_handler()
, which decorates the top-level
parser with the error handler. We could not
have written bp::with_error_handler(parser, logging_error_handler("parse.log"))
,
because with_error_handler()
does not accept rvalues. This is becuse the error handler eventually goes
into the parse context. The parse context only stores pointers and iterators,
keeping it cheap to copy.
我们刚刚定义了一个 logging_error_handler
,并通过引用传递给 with_error_handler()
,它用错误处理器装饰了顶层解析器。我们无法编写 bp::with_error_handler(parser, logging_error_handler("parse.log"))
,因为 with_error_handler()
不接受右值引用。这是因为错误处理器最终会进入解析上下文。解析上下文只存储指针和迭代器,以保持其复制成本低。
If we run the example and give it the input "1,"
,
this shows up in the log file:
如果我们运行示例并给它输入 "1,"
,这将在日志文件中显示:
parse.log:1:2: error: Expected int_ here (end of input): 1, ^
Sometimes, during the writing of a parser, you make a simple mistake that
is diagnosed horrifyingly, due to the high number of template instantiations
between the line you just wrote and the point of use (usually, the call to
parse()
). By "sometimes",
I mean "almost always and many, many times". Boost.Parser has a
workaround for situations like this. The workaround is to make the ill-formed
code well-formed in as many circumstances as possible, and then do a runtime
assert instead.
有时,在编写解析器时,你可能会犯一个简单的错误,由于你刚刚编写的行和调用 parse()
的点之间有大量的模板实例化,这个错误会被可怕地诊断出来。这里的“有时”指的是“几乎总是,而且很多次”。Boost.Parser 为这种情况提供了一个解决方案。解决方案是在尽可能多的环境中使不规范的代码变得规范,然后在运行时进行断言。
Usually, C++ programmers try whenever they can to catch mistakes as early
as they can. That usually means making as much bad code ill-formed as possible.
Counter-intuitively, this does not work well in parser combinator situations.
For an example of just how dramatically different these two debugging scenarios
can be with Boost.Parser, please see the very long discussion in the none
is weird section of Rationale.
通常,C++程序员会尽可能地尽早捕捉错误。这意味着尽可能多地使不良代码无效。出人意料的是,在解析器组合场景中,这并不奏效。例如,要了解使用 Boost.Parser 这两个调试场景可以有多么不同,请参阅《理由》中“ none
很奇怪”部分的漫长讨论。
If you are morally opposed to this approach, or just hate fun, good news:
you can turn off the use of this technique entirely by defining BOOST_PARSER_NO_RUNTIME_ASSERTIONS
.
如果您道德上反对这种方法,或者只是讨厌乐趣,好消息是:您可以通过定义 BOOST_PARSER_NO_RUNTIME_ASSERTIONS
完全关闭该技术的使用。
Debugging parsers is hard. Any parser above a certain complexity level is
nearly impossible to debug simply by looking at the parser's code. Stepping
through the parse in a debugger is even worse. To provide a reasonable chance
of debugging your parsers, Boost.Parser has a trace mode that you can turn
on simply by providing an extra parameter to parse()
or callback_parse()
:
调试解析器很困难。任何高于一定复杂度的解析器几乎不可能仅通过查看解析器的代码来调试。在调试器中逐步执行解析甚至更糟。为了提高调试解析器的可能性,Boost.Parser 提供了一个跟踪模式,您可以通过为 parse()
或 callback_parse()
提供一个额外的参数来开启它:
boost::parser::parse(input, parser, boost::parser::trace::on);
Every overload of parse()
and callback_parse()
takes this final parameter,
which is defaulted to boost::parser::trace::off
.
每个 parse()
和 callback_parse()
的重载都采用这个最终参数,该参数默认为 boost::parser::trace::off
。
If we trace a substantial parser, we will see a lot
of output. Each code point of the input must be considered, one at a time,
to see if a certain rule matches. An an example, let's trace a parse using
the JSON parser from Parsing
JSON. The input is "null"
. null
is one of the types that a Javascript value can have; the top-level parser
in the JSON parser example is:
如果我们追踪一个大型解析器,我们会看到很多输出。输入的每个代码点都必须逐个考虑,以查看是否有某个规则匹配。例如,让我们使用从《解析 JSON》中的 JSON 解析器追踪一个解析。输入是 "null"
。 null
是 JavaScript 值可以具有的类型之一;JSON 解析器示例中的顶层解析器是:
auto const value_p_def = number | bp::bool_ | null | string | array_p | object_p;
So, a JSON value can be a number, or a Boolean, a null
, etc.
During the parse, each alternative will be tried in turn, until one is matched.
I picked null
because it is relatively close to the beginning
of the value_p_def
alternative parser. Even so, the output is
pretty huge. Let's break it down as we go:
所以,JSON 值可以是数字,也可以是布尔值,或者是 null
等。在解析过程中,将依次尝试每个选项,直到匹配成功。我选择了 null
,因为它相对接近 value_p_def
替代解析器的开头。即便如此,输出仍然相当庞大。让我们边走边分解它:
[begin value; input="null"]
Each parser is traced as [begin foo; ...]
, then the parsing
operations themselves, and then [end foo; ...]
. The name of
a rule is used as its name in the begin
and end
parts of the trace. Non-rules have a name that is similar to the way the
parser looked when you wrote it. Most lines will have the next few code points
of the input quoted, as we have here (input="null"
).
每个解析器都按 [begin foo; ...]
进行追踪,然后是解析操作本身,接着是 [end foo; ...]
。规则的名字用作追踪的 begin
和 end
部分的名字。非规则的名字与你在编写解析器时的外观相似。大多数行都会引用输入的几个代码点,就像这里一样( input="null"
)。
[begin number | bool_ | null | string | ...; input="null"]
This shows the beginning of the parser inside
the rule value
— the parser that actually does all the
work. In the example code, this parser is called value_p_def
.
Since it isn't a rule, we have no name for it, so we show its implementation
in terms of subparsers. Since it is a bit long, we don't print the entire
thing. That's why that ellipsis is there.
这显示了规则 value
内部的解析器开始——实际上做所有工作的解析器。在示例代码中,这个解析器被称为 value_p_def
。由于它不是一个规则,我们无法为其命名,因此我们用子解析器来展示其实现。由于它有点长,我们没有打印整个内容。这就是为什么那里有一个省略号的原因。
[begin number; input="null"] [begin raw[lexeme[ >> ...]][<<action>>]; input="null"]
Now we're starting to see the real work being done. number
is
a somewhat complicated parser that does not match "null"
,
so there's a lot to wade through when following the trace of its attempt
to do so. One thing to note is that, since we cannot print a name for an
action, we just print "<<action>>"
. Something
similar happens when we come to an attribute that we cannot print, because
it has no stream insertion operation. In that case, "<<unprintable-value>>"
is printed.
现在我们开始看到真正的努力正在进行。 number
是一个相当复杂的解析器,它不匹配 "null"
,所以在追踪其尝试匹配的过程中有很多东西需要处理。需要注意的是,由于我们无法打印一个动作的名称,所以我们只打印 "<<action>>"
。当我们遇到一个无法打印的属性时,也会发生类似的情况,因为它没有流插入操作。在这种情况下,打印 "<<unprintable-value>>"
。
[begin raw[lexeme[ >> ...]]; input="null"] [begin lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"] [begin -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"] [begin -char_('-'); input="null"] [begin char_('-'); input="null"] no match [end char_('-'); input="null"] matched "" attribute: <<empty>> [end -char_('-'); input="null"] [begin char_('1', '9') >> *digit | char_('0'); input="null"] [begin char_('1', '9') >> *digit; input="null"] [begin char_('1', '9'); input="null"] no match [end char_('1', '9'); input="null"] no match [end char_('1', '9') >> *digit; input="null"] [begin char_('0'); input="null"] no match [end char_('0'); input="null"] no match [end char_('1', '9') >> *digit | char_('0'); input="null"] no match [end -char_('-') >> char_('1', '9') >> *digit | char_('0') >> -(char_('.') >> ...) >> -( >> ...); input="null"] no match [end lexeme[-char_('-') >> char_('1', '9') >> ... | ... >> ...]; input="null"] no match [end raw[lexeme[ >> ...]]; input="null"] no match [end raw[lexeme[ >> ...]][<<action>>]; input="null"] no match [end number; input="null"] [begin bool_; input="null"] no match [end bool_; input="null"]
number
and boost::parser::bool_
did not match,
but null
will:
number
和 boost::parser::bool_
不匹配,但 null
将会匹配:
[begin null; input="null"] [begin "null" >> attr(null); input="null"] [begin "null"; input="null"] [begin string("null"); input="null"] matched "null" attribute: [end string("null"); input=""] matched "null" attribute: null
Finally, this parser actually matched, and the match generated the attribute
null
, which is a special value of the type json::value
.
Since we were matching a string literal "null"
, earlier
there was no attribute until we reached the attr(null)
parser.
最后,这个解析器实际上匹配了,匹配生成了属性 null
,它是一种特殊的类型 json::value
的值。由于我们正在匹配一个字符串字面量 "null"
,所以在达到 attr(null)
解析器之前,没有属性。
[end "null"; input=""] [begin attr(null); input=""] matched "" attribute: null [end attr(null); input=""] matched "null" attribute: null [end "null" >> attr(null); input=""] matched "null" attribute: null [end null; input=""] matched "null" attribute: null [end number | bool_ | null | string | ...; input=""] matched "null" attribute: null [end value; input=""] -------------------- parse succeeded --------------------
At the very end of the parse, the trace code prints out whether the top-level
parse succeeded or failed.
在解析的末尾,跟踪代码会打印出顶层解析是否成功或失败。
Some things to be aware of when looking at Boost.Parser trace output:
一些在查看 Boost.Parser 跟踪输出时需要注意的事项:
p[a]
forms an action_parser
containing the parser p
and semantic action a
.
This is essentially an implementation detail, but unfortunately the trace
output does not hide this from you.
p[a]
形成一个包含解析器 p
和语义动作 a
的 action_parser
。这本质上是实现细节,但不幸的是,跟踪输出并没有从您那里隐藏这一点。p
, the trace-name may be intentionally different
from the actual structure of p
. For example, in the trace
above, you see a parser called simply "null"
.
This parser is actually boost::parser::omit[boost::parser::string("null")]
,
but what you typically write is just "null"
, so
that's the name used. There are two special cases like this: the one
described here for omit[string]
, and another for omit[char_]
.
p
,其跟踪名称可能与实际结构 p
故意不同。例如,在上面的跟踪中,您可以看到一个简单地称为 "null"
的解析器。这个解析器实际上是 boost::parser::omit[boost::parser::string("null")]
,但您通常只写 "null"
,所以这就是使用的名称。有两种特殊情况:这里描述的 omit[string]
的情况,以及另一个 omit[char_]
的情况。if_(pred)[p]
is described as "Equivalent
to eps(pred)
>> p
". In a trace, you will not see if_
;
you will see eps
and p
instead.
if_(pred)[p]
被描述为“相当于 eps(pred)
>> p
”。在跟踪中,您将看不到 if_
;您将看到 eps
和 p
。
Boost.Parser seldom allocates memory. The exceptions to this are:
Boost.Parser 很少分配内存。其例外情况是:
symbols
allocates memory for the symbol/attribute pairs it contains. If symbols
are added during the parse, allocations must also occur then. The data
structure used by symbols
is also a trie,
which is a node-based tree. So, lots of allocations are likely if you
use symbols
.
symbols
分配它包含的符号/属性对的内存。如果在解析过程中添加了符号,也必须在那时进行分配。 symbols
使用的也是 trie,这是一种基于节点的树。因此,如果您使用 symbols
,很可能会进行大量的分配。boost::parser::trace::on
to a top-level
parsing function, the names of parsers are allocated.
boost::parser::trace::on
到顶级解析函数来开启跟踪,则分配解析器的名称。operator>
),
the name of the failed parser is placed into a std::string
,
which will usually cause an allocation.
operator>
)时,失败的解析器的名称将被放入 std::string
中,这通常会导致分配。string()
's attribute is a std::string
, the use of which implies allocation.
You can avoid this allocation by explicitly using a different string
type for the attribute that does not allocate.
std::string
,使用它意味着分配。您可以通过显式使用不进行分配的字符串类型来避免这种分配。repeat(p)
in
all its forms, including operator*
, operator+
,
and operator%
, is std::vector<ATTR
(p)>
,
the use of which implies allocation. You can avoid this allocation by
explicitly using a different sequence container for the attribute that
does not allocate. boost::container::static_vector
or C++26's
std::inplace_vector
may be useful as such replacements.
operator*
、 operator+
和 operator%
,都是 std::vector<ATTR
(p)>
,其使用意味着分配。您可以通过显式使用不进行分配的不同序列容器来避免这种分配。 boost::container::static_vector
或 C++26 的 std::inplace_vector
可能作为此类替代品很有用。
With the exception of allocating the name of the parser that was expected
in a failed expectation situation, Boost.Parser does not does not allocate
unless you tell it to, by using symbols
, using a particular
error_handler, turning on trace, or parsing into attributes that allocate.
除了在失败的期望情况下分配预期解析器的名称之外,Boost.Parser 不会分配,除非你通过使用 symbols
、使用特定的错误处理程序、开启跟踪或解析到分配属性的属性中告诉它。
If you want to parse ASCII, using the Unicode parsing API will not actually
cost you anything. Your input will be parsed, char
by char
,
and compared to values that are Unicode code points (which are char32_t
s).
One caveat is that there may be an extra branch on each char, if the input
is UTF-8. If your performance requirements can tolerate this, your life will
be much easier if you just start with Unicode and stick with it.
如果您想解析 ASCII,使用 Unicode 解析 API 实际上不会让您付出任何代价。您的输入将被解析, char
由 char
解析,并与 Unicode 码点(即 char32_t
)进行比较。一个注意事项是,如果输入是 UTF-8,每个字符可能都有一个额外的分支。如果您的性能要求可以容忍这一点,那么如果您从 Unicode 开始并坚持使用它,生活将会容易得多。
Starting with Unicode support and UTF-8 input will allow you to properly
handle unexpected input, like non-ASCII languages (that's most of them),
with no additional effort on your part.
从 Unicode 支持开始,UTF-8 输入将允许您无需额外努力即可正确处理意外输入,如非 ASCII 语言(这几乎是所有语言)。
Treat rules as the unit of work in your parser. Write a rule, test its corners,
and then use it to build larger rules or parsers. This allows you to get
better coverage with less work, since exercising all the code paths of your
rules, one by one, keeps the combinatorial number of paths through your code
manageable.
将规则视为解析器中的工作单元。编写一个规则,测试其边界情况,然后使用它来构建更大的规则或解析器。这样可以以更少的劳动获得更好的覆盖率,因为逐个执行规则的所有代码路径,可以保持代码路径的组合数量在可控范围内。
There are multiple ways to get attributes out of a parser. You can:
有多种方法可以从解析器中获取属性。您可以选择:
parse()
for the parser to fill in;
parse()
,以便解析器填充;
All of these are fairly similar in how much effort they require, except for
the semantic action method. For the semantic action approach, you need to
have values to fill in from your parser, and keep them in scope for the duration
of the parse.
所有这些在所需努力程度方面相当相似,除了语义动作方法。对于语义动作方法,你需要从你的解析器中获取值来填充,并在解析过程中保持它们的作用域。
It is much more straight forward, and leads to more reusable parsers, to
have the parsers produce the attributes of the parse directly as a result
of the parse.
它更直接,并且导致有更多可重用的解析器,让解析器直接将解析结果作为属性输出。
This does not mean that you should never use semantic actions. They are sometimes
necessary. However, you should default to using the other non-semantic action
methods, and only use semantic actions with a good reason.
这并不意味着你永远不应该使用语义动作。它们有时是必要的。然而,你应该默认使用其他非语义动作方法,并且只有出于良好理由才使用语义动作。
A typical error message produced by Boost.Parser will say something like,
"Expected FOO here", where FOO is some rule or parser. Give your
rules names that will read well in error messages like this. For instance,
the JSON examples have these rules:
一个典型的由 Boost.Parser 生成的错误信息可能会说:“这里期望 FOO”,其中 FOO 是某个规则或解析器。为您的规则命名时,请确保它们在类似这样的错误信息中易于阅读。例如,JSON 示例中有这些规则:
bp::rule<class escape_seq, uint32_t> const escape_seq = "\\uXXXX hexadecimal escape sequence"; bp::rule<class escape_double_seq, uint32_t, double_escape_locals> const escape_double_seq = "\\uXXXX hexadecimal escape sequence"; bp::rule<class single_escaped_char, uint32_t> const single_escaped_char = "'\"', '\\', '/', 'b', 'f', 'n', 'r', or 't'";
Some things to note:
请注意以下几点:
- escape_seq
and escape_double_seq
have the same
name-string. To an end-user who is trying to figure out why their input failed
to parse, it doesn't matter which kind of result a parser rule generates.
They just want to know how to fix their input. For either rule, the fix is
the same: put a hexadecimal escape sequence there.
escape_seq
和 escape_double_seq
具有相同的名称字符串。对于试图弄清楚为什么他们的输入无法解析的最终用户来说,解析规则生成的任何结果类型都无关紧要。他们只想知道如何修复他们的输入。对于这两个规则,修复方法相同:在那里放置一个十六进制转义序列。
- single_escaped_char
has a terrible-looking name. However,
it's not really used as a name anywhere per se. In error messages, it works
nicely, though. The error will be "Expected '"', '', '/', 'b',
'f', 'n', 'r', or 't' here", which is pretty helpful.
- single_escaped_char
有一个看起来很糟糕的名字。然而,实际上它并不作为名字使用。在错误信息中,它工作得很好。错误将是“这里期望 ''', '', '/', 'b', 'f', 'n', 'r', 或 't'”,这非常有帮助。
Most of these errors are found at parser construction time, so no actual
parsing is even necessary. For instance, a test case might look like this:
大多数这些错误都在解析器构建时被发现,因此甚至不需要进行实际解析。例如,一个测试用例可能看起来像这样:
TEST(my_parser_tests, my_rule_test) { my_rule r; }
You should probably never need to write your own low-level parser. You have
primitives like char_
from which to build up the parsers that you need. It is unlikely that you're
going to need to do things on a lower level than a single character.
你可能永远不需要编写自己的底层解析器。你可以从诸如 char_
这样的原语开始构建所需的解析器。你不太可能需要在比单个字符更低的级别上进行操作。
However. Some people are obsessed with writing everything for themselves.
We call them C++ programmers. This section is for them. However, this section
is not an in-depth tutorial. It is a basic orientation to get you familiar
enough with all the moving parts of writing a parser that you can then learn
by reading the Boost.Parser code.
然而,有些人沉迷于自己编写一切。我们称他们为 C++程序员。本节是为他们准备的。然而,本节不是一个深入的教程。它是一个基本的入门,让你熟悉编写解析器的所有组成部分,然后你可以通过阅读 Boost.Parser 代码来学习。
Each parser must provide two overloads of a function call()
.
One overload parses, producing an attribute (which may be the special no-attribute
type detail::nope
). The other one parses, filling in a given
attribute. The type of the given attribute is a template parameter, so it
can take any type that you can form a reference to.
每个解析器必须提供函数 call()
的两个重载版本。一个重载用于解析,生成一个属性(可能是特殊的无属性类型 detail::nope
)。另一个重载用于解析,填充给定的属性。给定属性的类型是一个模板参数,因此它可以接受任何可以形成引用的类型。
Let's take a look at a Boost.Parser parser, opt_parser
. This
is the parser produced by use of operator-
. First, here is the
beginning of its definition.
让我们看看一个 Boost.Parser 解析器, opt_parser
。这是通过使用 operator-
产生的解析器。首先,这是其定义的开始。
template<typename Parser> struct opt_parser {
The end of its definition is:
它的定义结束为:
Parser parser_; };
As you can see, opt_parser
's only data member is the parser
it adapts, parser_
. Here is its attribute-generating overload
to call()
.
如您所见, opt_parser
的唯一数据成员是它所适配的解析器, parser_
。这里是其生成属性的覆盖函数 call()
。
template< typename Iter, typename Sentinel, typename Context, typename SkipParser> auto call( Iter & first, Sentinel last, Context const & context, SkipParser const & skip, detail::flags flags, bool & success) const { using attr_t = decltype(parser_.call( first, last, context, skip, flags, success)); detail::optional_of<attr_t> retval; call(first, last, context, skip, flags, success, retval); return retval; }
First, let's look at the template and function parameters.
首先,让我们看看模板和函数参数。
Iter & first
is the iterator. It is taken as an out-param.
It is the responsibility of call()
to advance first
if and only if the parse succeeds.
Iter & first
是迭代器。它被视为输出参数。只有在解析成功的情况下, call()
才负责前进 first
。Sentinel last
is the sentinel. If the parse has not yet
succeeded within call()
, and first == last
is true
, call()
must fail (by setting bool
& success
to false
).
Sentinel last
是哨兵。如果在 call()
内尚未成功解析,并且 first == last
是 true
,则 call()
必须失败(通过将 bool
& success
设置为 false
)。Context const & context
is the parse context. It will
be some specialization of detail::parse_context
. The context
is used in any call to a subparser's call()
, and in some
cases a new context should be created, and the new context passed to
a subparser instead; more on that below.
Context const & context
是解析上下文。它将是 detail::parse_context
的某种特殊化。上下文用于对子解析器的 call()
的任何调用,在某些情况下,应创建新的上下文,并将新上下文传递给子解析器;下面将详细介绍。SkipParser const & skip
is the current skip parser.
skip
should be used at the beginning of the parse, and in
between any two uses of any subparser(s).
SkipParser const & skip
是当前跳过解析器。 skip
应该用于解析的开始,以及任何两个子解析器使用之间。detail::flags flags
are a collection of flags indicating
various things about the current state of the parse. flags
is concerned with whether to produce attributes at all; whether to apply
the skip parser skip
; whether to produce a verbose trace
(as when boost::parser::trace::on
is passed at the top level); and whether we are currently inside the
utility function detail::apply_parser
.
detail::flags flags
是一组标志,表示关于当前解析状态的各个方面。 flags
关注是否产生属性;是否应用跳过解析器 skip
;是否产生详细跟踪(例如当 boost::parser::trace::on
在顶层传递时);以及我们是否当前在实用函数 detail::apply_parser
内部。bool & success
is the final function parameter. It should
be set to true
if the parse succeeds, and false
otherwise.
bool & success
是最终函数参数。如果解析成功,则应设置为 true
,否则设置为 false
。
Now the body of the function. Notice that it just dispatches to the other
call()
overload. This is really common, since both overloads
need to to the same parsing; only the attribute may differ. The first line
of the body defines attr_t
, the default attribute type of our
wrapped parser parser_
. It does this by getting the decltype()
of a use of parser_.call()
. (This is the logic represented by
ATTR
()
in the rest of the documentation.) Since opt_parser
represents
an optional value, the natural type for its attribute is std::optional<
.
However, this does not work for all cases. In particular, it does not work
for the "no-attribute" type ATTR
(parser)>detail::nope
, nor for
std::optional<T>
—
is just ATTR
(--p)
. So,
the second line uses an alias that takes care of those details, ATTR
(-p)detail::optional_of<>
.
The third line just calls the other overload of call()
, passing
retval
as the out-param. Finally, retval
is returned
on the last line.
现在进入函数体。注意,它只是调度到其他 call()
重载。这真的很常见,因为两个重载都需要进行相同的解析;只有属性可能不同。函数体的第一行定义了 attr_t
,我们包装解析器 parser_
的默认属性类型。它是通过获取 parser_.call()
的使用 decltype()
来实现的。(这是文档中 ATTR
()
所代表的逻辑。)由于 opt_parser
代表一个可选值,其属性的自然类型是 std::optional<
。然而,这并不适用于所有情况。特别是,它不适用于“无属性”类型 ATTR
(parser)>detail::nope
,也不适用于 std::optional<T>
——
只是 ATTR
(--p)
。因此,第二行使用了一个别名来处理这些细节, ATTR
(-p)detail::optional_of<>
。第三行只是调用了 call()
的另一个重载,并将 retval
作为输出参数传递。最后, retval
在最后一行返回。
Now, on to the other overload.
现在,转到其他重载。
template< typename Iter, typename Sentinel, typename Context, typename SkipParser, typename Attribute> void call( Iter & first, Sentinel last, Context const & context, SkipParser const & skip, detail::flags flags, bool & success, Attribute & retval) const { [[maybe_unused]] auto _ = detail::scoped_trace( *this, first, last, context, flags, retval); detail::skip(first, last, skip, flags); if (!detail::gen_attrs(flags)) { parser_.call(first, last, context, skip, flags, success); success = true; return; } parser_.call(first, last, context, skip, flags, success, retval); success = true; }
The template and function parameters here are identical to the ones from
the other overload, except that we have Attribute & retval
,
our out-param.
模板和函数参数此处与来自其他重载的相同,除了我们有自己的输出参数 Attribute & retval
。
Let's look at the implementation a bit at a time.
让我们一次看一点实现。
[[maybe_unused]] auto _ = detail::scoped_trace( *this, first, last, context, flags, retval);
This defines a RAII trace object that will produce the verbose trace requested
by the user if they passed boost::parser::trace::on
to the top-level
parse. It only has effect if detail::enable_trace(flags)
is
true
. If trace is enabled, it will show the state of the parse
at the point at which it is defined, and then again when it goes out of scope.
这定义了一个 RAII 跟踪对象,如果用户将 boost::parser::trace::on
传递给顶级解析,它将生成用户请求的详细跟踪。只有当 detail::enable_trace(flags)
是 true
时才有效。如果跟踪已启用,它将在定义解析状态时显示,然后在它超出作用域时再次显示。
Important 重要 | |
---|---|
For the tracing code to work, you must define an overload of |
detail::skip(first, last, skip, flags);
This one is pretty simple; it just applies the skip parser. opt_parser
only has one subparser, but if it had more than one, or if it had one that
it applied more than once, it would need to repeat this line using skip
between every pair of uses of any subparser.
这一部分相当简单;它只是应用了跳过解析器。 opt_parser
只有一个子解析器,但如果它有多个,或者如果它应用了不止一次,它就需要在每个子解析器的每次使用之间重复这一行,使用 skip
分隔。
if (!detail::gen_attrs(flags)) { parser_.call(first, last, context, skip, flags, success); success = true; return; }
This path accounts for the case where we don't want to generate attributes
at all, perhaps because this parser sits inside an omit[]
directive.
此路径考虑了我们不希望生成任何属性的情况,可能是因为此解析器位于 omit[]
指令内部。
parser_.call(first, last, context, skip, flags, success, retval); success = true;
This is the other, typical, path. Here, we do want to generate attributes,
and so we do the same call to parser_.call()
, except that we
also pass retval
.
这是另一条典型路径。在这里,我们确实想要生成属性,所以我们调用 parser_.call()
,同时也会传递 retval
。
Note that we set success
to true
after the call
to parser_.call()
in both code paths. Since opt_parser
is zero-or-one, if the subparser fails, opt_parse
still succeeds.
请注意,在两个代码路径中,我们在调用 parser_.call()
之后将 success
设置为 true
。由于 opt_parser
是零或一,如果子解析器失败, opt_parse
仍然成功。
Sometimes, you need to change something about the parse context before calling
a subparser. For instance, rule_parser
sets up the value, locals,
etc., that are available for that rule. action_parser
adds the
generated attribute to the context (available as _attr(ctx)
).
Contexts are immutable in Boost.Parser. To "modify" one for a subparser,
you create a new one with the appropriate call to detail::make_context()
.
有时,在调用子解析器之前,您需要更改解析上下文中的某些内容。例如, rule_parser
设置可用于该规则的值、局部变量等。 action_parser
将生成的属性添加到上下文中(作为 _attr(ctx)
可用)。在 Boost.Parser 中,上下文是不可变的。要“修改”一个用于子解析器,您需要创建一个新的,并使用适当的 detail::make_context()
调用。
detail::apply_parser()
Sometimes a parser needs to operate on an out-param that is not exactly the
same as its default attribute, but that is compatible in some way. To do
this, it's often useful for the parser to call itself, but with slightly
different parameters. detail::apply_parser()
helps with this.
See the out-param overload of repeat_parser::call()
for an example.
Note that since this creates a new scope for the ersatz parser, the scoped_trace
object needs to know whether we're inside detail::apply_parser
or not.
有时解析器需要在一个与默认属性不完全相同但以某种方式兼容的输出参数上操作。为此,解析器通常需要调用自身,但使用略微不同的参数。 detail::apply_parser()
有助于此。有关示例,请参阅 repeat_parser::call()
的输出参数重载。请注意,由于这为替代解析器创建了一个新的作用域, scoped_trace
对象需要知道我们是否处于 detail::apply_parser
内部。
That's a lot, I know. Again, this section is not meant to be an in-depth
tutorial. You know enough now that the parsers in parser.hpp
are at least readable.
这很多,我知道。再次强调,本节并非旨在提供深入教程。你现在已经足够了解, parser.hpp
中的解析器至少是可读的。