[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6 The Lexical Analyzer Function yylex

The lexical analyzer function, yylex, recognizes tokens from the input stream and returns them to the parser. Bison does not create this function automatically; you must write it so that yyparse can call it. The function is sometimes referred to as a lexical scanner.

In simple programs, yylex is often defined at the end of the Bison grammar file. If yylex is defined in a separate source file, you need to arrange for the token-type macro definitions to be available there. To do this, use the ‘-d’ option when you run Bison, so that it will write these macro definitions into the separate parser header file, ‘name.tab.h’, which you can include in the other source files that need it. See section Invoking Bison.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.1 Calling Convention for yylex

The value that yylex returns must be the positive numeric code for the type of token it has just found; a zero or negative value signifies end-of-input.

When a token is referred to in the grammar rules by a name, that name in the parser implementation file becomes a C macro whose definition is the proper numeric code for that token type. So yylex can use the name to indicate that type. See section Symbols, Terminal and Nonterminal.

When a token is referred to in the grammar rules by a character literal, the numeric code for that character is also the code for the token type. So yylex can simply return that character code, possibly converted to unsigned char to avoid sign-extension. The null character must not be used this way, because its code is zero and that signifies end-of-input.

Here is an example showing these things:

 
int
yylex (void)
{
  …
  if (c == EOF)    /* Detect end-of-input.  */
    return 0;
  …
  if (c == '+' || c == '-')
    return c;      /* Assume token type for '+' is '+'.  */
  …
  return INT;      /* Return the type of the token.  */
  …
}

This interface has been designed so that the output from the lex utility can be used without change as the definition of yylex.

If the grammar uses literal string tokens, there are two ways that yylex can determine the token type codes for them:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.2 Semantic Values of Tokens

In an ordinary (nonreentrant) parser, the semantic value of the token must be stored into the global variable yylval. When you are using just one data type for semantic values, yylval has that type. Thus, if the type is int (the default), you might write this in yylex:

 
  …
  yylval = value;  /* Put value onto Bison stack.  */
  return INT;      /* Return the type of the token.  */
  …

When you are using multiple data types, yylval’s type is a union made from the %union declaration (see section The Union Declaration). So when you store a token’s value, you must use the proper member of the union. If the %union declaration looks like this:

 
%union {
  int intval;
  double val;
  symrec *tptr;
}

then the code in yylex might look like this:

 
  …
  yylval.intval = value; /* Put value onto Bison stack.  */
  return INT;            /* Return the type of the token.  */
  …

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.3 Textual Locations of Tokens

If you are using the ‘@n’-feature (see section Tracking Locations) in actions to keep track of the textual locations of tokens and groupings, then you must provide this information in yylex. The function yyparse expects to find the textual location of a token just parsed in the global variable yylloc. So yylex must store the proper data in that variable.

By default, the value of yylloc is a structure and you need only initialize the members that are going to be used by the actions. The four members are called first_line, first_column, last_line and last_column. Note that the use of this feature makes the parser noticeably slower.

The data type of yylloc has the name YYLTYPE.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6.4 Calling Conventions for Pure Parsers

When you use the Bison declaration %define api.pure full to request a pure, reentrant parser, the global communication variables yylval and yylloc cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by pointers passed as arguments to yylex. You must declare them as shown here, and pass the information back by storing it through those pointers.

 
int
yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
{
  …
  *lvalp = value;  /* Put value onto Bison stack.  */
  return INT;      /* Return the type of the token.  */
  …
}

If the grammar file does not use the ‘@’ constructs to refer to textual locations, then the type YYLTYPE will not be defined. In this case, omit the second argument; yylex will be called with only one argument.

If you wish to pass additional arguments to yylex, use %lex-param just like %parse-param (see section The Parser Function yyparse). To pass additional arguments to both yylex and yyparse, use %param.

Directive: %lex-param {argument-declaration} …

Specify that argument-declaration are additional yylex argument declarations. You may pass one or more such declarations, which is equivalent to repeating %lex-param.

Directive: %param {argument-declaration} …

Specify that argument-declaration are additional yylex/yyparse argument declaration. This is equivalent to ‘%lex-param {argument-declaration} … %parse-param {argument-declaration} …’. You may pass one or more declarations, which is equivalent to repeating %param.

For instance:

 
%lex-param   {scanner_mode *mode}
%parse-param {parser_mode *mode}
%param       {environment_type *env}

results in the following signatures:

 
int yylex   (scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

If ‘%define api.pure full’ is added:

 
int yylex   (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

and finally, if both ‘%define api.pure full’ and %locations are used:

 
int yylex   (YYSTYPE *lvalp, YYLTYPE *llocp,
             scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.