Bison 3.0.2: 4.6 The Lexical Analyzer Function yylex

4.6 The Lexical Analyzer Function `yylex`

The lexical analyzer function, yylex, recognizes tokens from the input stream and returns them to the parser. Bison does not create this function automatically; you must write it so that yyparse can call it. The function is sometimes referred to as a lexical scanner.

In simple programs, yylex is often defined at the end of the Bison grammar file. If yylex is defined in a separate source file, you need to arrange for the token-type macro definitions to be available there. To do this, use the ‘-d’ option when you run Bison, so that it will write these macro definitions into the separate parser header file, ‘name.tab.h’, which you can include in the other source files that need it. See section Invoking Bison.

4.6.1 Calling Convention for `yylex`		How `yyparse` calls `yylex`.
4.6.2 Semantic Values of Tokens		How `yylex` must return the semantic value of the token it has read.
4.6.3 Textual Locations of Tokens		How `yylex` must return the text location (line number, etc.) of the token, if the actions want that.
4.6.4 Calling Conventions for Pure Parsers		How the calling convention differs in a pure parser (see section A Pure (Reentrant) Parser).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.6.1 Calling Convention for `yylex`

The value that yylex returns must be the positive numeric code for the type of token it has just found; a zero or negative value signifies end-of-input.

When a token is referred to in the grammar rules by a name, that name in the parser implementation file becomes a C macro whose definition is the proper numeric code for that token type. So yylex can use the name to indicate that type. See section Symbols, Terminal and Nonterminal.

When a token is referred to in the grammar rules by a character literal, the numeric code for that character is also the code for the token type. So yylex can simply return that character code, possibly converted to unsigned char to avoid sign-extension. The null character must not be used this way, because its code is zero and that signifies end-of-input.

Here is an example showing these things:

int
yylex (void)
{
  …
  if (c == EOF)    /* Detect end-of-input.  */
    return 0;
  …
  if (c == '+' || c == '-')
    return c;      /* Assume token type for '+' is '+'.  */
  …
  return INT;      /* Return the type of the token.  */
  …
}

This interface has been designed so that the output from the lex utility can be used without change as the definition of yylex.

If the grammar uses literal string tokens, there are two ways that yylex can determine the token type codes for them:

If the grammar defines symbolic token names as aliases for the literal string tokens, yylex can use these symbolic names like all others. In this case, the use of the literal string tokens in the grammar file has no effect on yylex.

yylex can find the multicharacter token in the yytname table. The index of the token in the table is the token type’s code. The name of a multicharacter token is recorded in yytname with a double-quote, the token’s characters, and another double-quote. The token’s characters are escaped as necessary to be suitable as input to Bison.

Here’s code for looking up a multicharacter token in yytname, assuming that the characters of the token are stored in token_buffer, and assuming that the token does not contain any characters like ‘"’ that require escaping.

for (i = 0; i < YYNTOKENS; i++)
  {
    if (yytname[i] != 0
        && yytname[i][0] == '"'
        && ! strncmp (yytname[i] + 1, token_buffer,
                      strlen (token_buffer))
        && yytname[i][strlen (token_buffer) + 1] == '"'
        && yytname[i][strlen (token_buffer) + 2] == 0)
      break;
  }

The yytname table is generated only if you use the %token-table declaration. See section Bison Declaration Summary.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.6.2 Semantic Values of Tokens

In an ordinary (nonreentrant) parser, the semantic value of the token must be stored into the global variable yylval. When you are using just one data type for semantic values, yylval has that type. Thus, if the type is int (the default), you might write this in yylex:

  …
  yylval = value;  /* Put value onto Bison stack.  */
  return INT;      /* Return the type of the token.  */
  …

When you are using multiple data types, yylval’s type is a union made from the %union declaration (see section The Union Declaration). So when you store a token’s value, you must use the proper member of the union. If the %union declaration looks like this:

%union {
  int intval;
  double val;
  symrec *tptr;
}

then the code in yylex might look like this:

  …
  yylval.intval = value; /* Put value onto Bison stack.  */
  return INT;            /* Return the type of the token.  */
  …

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.6.3 Textual Locations of Tokens

If you are using the ‘@n’-feature (see section Tracking Locations) in actions to keep track of the textual locations of tokens and groupings, then you must provide this information in yylex. The function yyparse expects to find the textual location of a token just parsed in the global variable yylloc. So yylex must store the proper data in that variable.

By default, the value of yylloc is a structure and you need only initialize the members that are going to be used by the actions. The four members are called first_line, first_column, last_line and last_column. Note that the use of this feature makes the parser noticeably slower.

The data type of yylloc has the name YYLTYPE.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

4.6.4 Calling Conventions for Pure Parsers

When you use the Bison declaration %define api.pure full to request a pure, reentrant parser, the global communication variables yylval and yylloc cannot be used. (See section A Pure (Reentrant) Parser.) In such parsers the two global variables are replaced by pointers passed as arguments to yylex. You must declare them as shown here, and pass the information back by storing it through those pointers.

int
yylex (YYSTYPE *lvalp, YYLTYPE *llocp)
{
  …
  *lvalp = value;  /* Put value onto Bison stack.  */
  return INT;      /* Return the type of the token.  */
  …
}

If the grammar file does not use the ‘@’ constructs to refer to textual locations, then the type YYLTYPE will not be defined. In this case, omit the second argument; yylex will be called with only one argument.

If you wish to pass additional arguments to yylex, use %lex-param just like %parse-param (see section The Parser Function yyparse). To pass additional arguments to both yylex and yyparse, use %param.

Directive: %lex-param {argument-declaration} …: Specify that argument-declaration are additional yylex argument declarations. You may pass one or more such declarations, which is equivalent to repeating %lex-param.

Directive: %param {argument-declaration} …: Specify that argument-declaration are additional yylex/yyparse argument declaration. This is equivalent to ‘%lex-param {argument-declaration} … %parse-param {argument-declaration} …’. You may pass one or more declarations, which is equivalent to repeating %param.

For instance:

%lex-param   {scanner_mode *mode}
%parse-param {parser_mode *mode}
%param       {environment_type *env}

results in the following signatures:

int yylex   (scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

If ‘%define api.pure full’ is added:

int yylex   (YYSTYPE *lvalp, scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

and finally, if both ‘%define api.pure full’ and %locations are used:

int yylex   (YYSTYPE *lvalp, YYLTYPE *llocp,
             scanner_mode *mode, environment_type *env);
int yyparse (parser_mode *mode, environment_type *env);

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

This document was generated by Rick Perry on December 29, 2013 using texi2html 1.82.

4.6 The Lexical Analyzer Function yylex

4.6.1 Calling Convention for yylex

4.6.2 Semantic Values of Tokens

4.6.3 Textual Locations of Tokens

4.6.4 Calling Conventions for Pure Parsers

4.6 The Lexical Analyzer Function `yylex`

4.6.1 Calling Convention for `yylex`