Parser generator

Invocation

Tool parser_gen transform the syntax definition as standard input into a a C++ generated code onto the standard output.

parser_gen <syntax_def.csf >syntax.hh

Using GNU Automake

TODO.

Grammar

Lexical part

Space, horizontal tabulation, new line, carriage return characters are ignored. Also double slash ("//") ignores the end of the line.

SpecialChar ::= "\\" | "n" | "r" | "t" | "\"" | "-" | "]"

Char ::= [^\]\\\-] | "\\" SpecialChar
Range ::= Char "-" Char
CharClass ::= "[" (Range | Char)+ "]"
            | "[" "^"  (Range | Char)+ "]"

StringChar ::= [^\"\\] | "\\" SpecialChar
String ::= "\"" StringChar* "\""

Symbol ::= ([A-Z] | [a-z] | "*" | "?" | "+" | "-" | "_")+

AttrRule ::= "$" CxxCode "$"

Syntactic part

A syntax definition consists on a list of rules. Rules are defined as such:

Rule ::=
  ("<" AttrChainElt ">")? Symbol "::=" ListOfChains Filtering?

ListOfChains ::= {(AttrChainElt* AttrRule?) "|"}+

AttrChainElt ::= AttrRule? ChainElt

ChainElt ::= Symbol | String | CharClass

Layout

Lexical restrictions

Filtering ::= "-/-" LookAhead+

User code

The attribute rules (AttrRule) on the left of a parsing element (symbol, string, or character class) are prior to the descent onto parsing this element.

It must define next_frame of type convertible to type aurelia::llstack::frame.

The default copies the current frame:

$ frame next_frame(f); $

The frame should never be modified in case of left recursion, otherwise it breaks the parsing algorithm and does not halt.

The last attribute rule must define ret. If there is no return value, it must define it as type aurelia::llstack::void_return.

void_return ret;

If the code decides to reject the current parsing, then the code must return.

Frames

Aurelia provides template aurelia::llstack::ret_frame to stack easily data to pass through rules. The data must be hash-able and movable or copyable.

Local variables available

  • f is the current frame.
  • s is the current stream.

Types

Type stream must provide operators "++" and "*". Aurelia provides aurelia::buf_stream_iterator.

Writing a parser

// We first check that the stream has the right first character.
if (s.eof()) {
  if (!is_empty_first(S_0_0))
    throw error();
}
else if (!is_first(s, S_0_0)) {
  throw error();
}

// We then define the boundary nodes of the graph.
//   end is the end iterator of the stream.
node u1 = node::boundary(start_label(), end);
node u0 = node::boundary(end_label(), end);
u1.parents().insert(u0);

// We define a queue of parsing contexts.
r_t R;

// We then add the start symbol in the queue.
S start;
start(R, empty_frame(), s, u1);

// We then loop requesting descriptors from the queue, then calling
// them.
try {
  while (true) {
    descriptor d = R.pop();
    d.L.call(R, d.f, d.s, d.n);
  }
} catch (r_t::eof) {
}