[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19. Reentrant C Scanners

flex has the ability to generate a reentrant C scanner. This is accomplished by specifying %option reentrant (‘-R’) The generated scanner is both portable, and safe to use in one or more separate threads of control. The most common use for reentrant scanners is from within multi-threaded applications. Any thread may create and execute a reentrant flex scanner without the need for synchronization with other threads.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.1 Uses for Reentrant Scanners

However, there are other uses for a reentrant scanner. For example, you could scan two or more files simultaneously to implement a diff at the token level (i.e., instead of at the character level):

 
    /* Example of maintaining more than one active scanner. */

    do {
        int tok1, tok2;

        tok1 = yylex( scanner_1 );
        tok2 = yylex( scanner_2 );

        if( tok1 != tok2 )
            printf("Files are different.");

   } while ( tok1 && tok2 );

Another use for a reentrant scanner is recursion. (Note that a recursive scanner can also be created using a non-reentrant scanner and buffer states. See section Multiple Input Buffers.)

The following crude scanner supports the ‘eval’ command by invoking another instance of itself.

 
    /* Example of recursive invocation. */

    %option reentrant

    %%
    "eval(".+")"  {
                      yyscan_t scanner;
                      YY_BUFFER_STATE buf;

                      yylex_init( &scanner );
                      yytext[yyleng-1] = ' ';

                      buf = yy_scan_string( yytext + 5, scanner );
                      yylex( scanner );

                      yy_delete_buffer(buf,scanner);
                      yylex_destroy( scanner );
                 }
    ...
    %%

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.2 An Overview of the Reentrant API

The API for reentrant scanners is different than for non-reentrant scanners. Here is a quick overview of the API:


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.3 Reentrant Example

First, an example of a reentrant scanner:

 
    /* This scanner prints "//" comments. */

    %option reentrant stack noyywrap
    %x COMMENT

    %%

    "//"                 yy_push_state( COMMENT, yyscanner);
    .|\n

    <COMMENT>\n          yy_pop_state( yyscanner );
    <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);

    %%

    int main ( int argc, char * argv[] )
    {
        yyscan_t scanner;

        yylex_init ( &scanner );
        yylex ( scanner );
        yylex_destroy ( scanner );
    return 0;
   }

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4 The Reentrant API in Detail

Here are the things you need to do or know to use the reentrant C API of flex.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.1 Declaring a Scanner As Reentrant

%option reentrant (–reentrant) must be specified.

Notice that %option reentrant is specified in the above example (see section Reentrant Example. Had this option not been specified, flex would have happily generated a non-reentrant scanner without complaining. You may explicitly specify %option noreentrant, if you do not want a reentrant scanner, although it is not necessary. The default is to generate a non-reentrant scanner.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.2 The Extra Argument

All functions take one additional argument: yyscanner.

Notice that the calls to yy_push_state and yy_pop_state both have an argument, yyscanner , that is not present in a non-reentrant scanner. Here are the declarations of yy_push_state and yy_pop_state in the reentrant scanner:

 
    static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
    static void yy_pop_state  ( yyscan_t yyscanner  ) ;

Notice that the argument yyscanner appears in the declaration of both functions. In fact, all flex functions in a reentrant scanner have this additional argument. It is always the last argument in the argument list, it is always of type yyscan_t (which is typedef’d to void *) and it is always named yyscanner. As you may have guessed, yyscanner is a pointer to an opaque data structure encapsulating the current state of the scanner. For a list of function declarations, see Functions and Macros Available in Reentrant C Scanners. Note that preprocessor macros, such as BEGIN, ECHO, and REJECT, do not take this additional argument.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.3 Global Variables Replaced By Macros

All global variables in traditional flex have been replaced by macro equivalents.

Note that in the above example, yyout and yytext are not plain variables. These are macros that will expand to their equivalent lvalue. All of the familiar flex globals have been replaced by their macro equivalents. In particular, yytext, yyleng, yylineno, yyin, yyout, yyextra, yylval, and yylloc are macros. You may safely use these macros in actions as if they were plain variables. We only tell you this so you don’t expect to link to these variables externally. Currently, each macro expands to a member of an internal struct, e.g.,

 
#define yytext (((struct yyguts_t*)yyscanner)->yytext_r)

One important thing to remember about yytext and friends is that yytext is not a global variable in a reentrant scanner, you can not access it directly from outside an action or from other functions. You must use an accessor method, e.g., yyget_text, to accomplish this. (See below).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.4 Init and Destroy Functions

yylex_init and yylex_destroy must be called before and after yylex, respectively.

 
    int yylex_init ( yyscan_t * ptr_yy_globals ) ;
    int yylex_init_extra ( YY_EXTRA_TYPE user_defined, yyscan_t * ptr_yy_globals ) ;
    int yylex ( yyscan_t yyscanner ) ;
    int yylex_destroy ( yyscan_t yyscanner ) ;

The function yylex_init must be called before calling any other function. The argument to yylex_init is the address of an uninitialized pointer to be filled in by yylex_init, overwriting any previous contents. The function yylex_init_extra may be used instead, taking as its first argument a variable of type YY_EXTRA_TYPE. See the section on yyextra, below, for more details.

The value stored in ptr_yy_globals should thereafter be passed to yylex and yylex_destroy. Flex does not save the argument passed to yylex_init, so it is safe to pass the address of a local pointer to yylex_init so long as it remains in scope for the duration of all calls to the scanner, up to and including the call to yylex_destroy.

The function yylex should be familiar to you by now. The reentrant version takes one argument, which is the value returned (via an argument) by yylex_init. Otherwise, it behaves the same as the non-reentrant version of yylex.

Both yylex_init and yylex_init_extra returns 0 (zero) on success, or non-zero on failure, in which case errno is set to one of the following values:

The function yylex_destroy should be called to free resources used by the scanner. After yylex_destroy is called, the contents of yyscanner should not be used. Of course, there is no need to destroy a scanner if you plan to reuse it. A flex scanner (both reentrant and non-reentrant) may be restarted by calling yyrestart.

Below is an example of a program that creates a scanner, uses it, then destroys it when done:

 
    int main ()
    {
        yyscan_t scanner;
        int tok;

        yylex_init(&scanner);

        while ((tok=yylex(scanner)) > 0)
            printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));

        yylex_destroy(scanner);
        return 0;
    }

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.5 Accessing Variables with Reentrant Scanners

Accessor methods (get/set functions) provide access to common flex variables.

Many scanners that you build will be part of a larger project. Portions of your project will need access to flex values, such as yytext. In a non-reentrant scanner, these values are global, so there is no problem accessing them. However, in a reentrant scanner, there are no global flex values. You can not access them directly. Instead, you must access flex values using accessor methods (get/set functions). Each accessor method is named yyget_NAME or yyset_NAME, where NAME is the name of the flex variable you want. For example:

 
    /* Set the last character of yytext to NULL. */
    void chop ( yyscan_t scanner )
    {
        int len = yyget_leng( scanner );
        yyget_text( scanner )[len - 1] = '\0';
    }

The above code may be called from within an action like this:

 
    %%
    .+\n    { chop( yyscanner );}

You may find that %option header-file is particularly useful for generating prototypes of all the accessor functions. See option-header.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.6 Extra Data

User-specific data can be stored in yyextra.

In a reentrant scanner, it is unwise to use global variables to communicate with or maintain state between different pieces of your program. However, you may need access to external data or invoke external functions from within the scanner actions. Likewise, you may need to pass information to your scanner (e.g., open file descriptors, or database connections). In a non-reentrant scanner, the only way to do this would be through the use of global variables. Flex allows you to store arbitrary, “extra” data in a scanner. This data is accessible through the accessor methods yyget_extra and yyset_extra from outside the scanner, and through the shortcut macro yyextra from within the scanner itself. They are defined as follows:

 
    #define YY_EXTRA_TYPE  void*
    YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
    void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);

In addition, an extra form of yylex_init is provided, yylex_init_extra. This function is provided so that the yyextra value can be accessed from within the very first yyalloc, used to allocate the scanner itself.

By default, YY_EXTRA_TYPE is defined as type void *. You may redefine this type using %option extra-type="your_type" in the scanner:

 
    /* An example of overriding YY_EXTRA_TYPE. */
    %{
    #include <sys/stat.h>
    #include <unistd.h>
    %}
    %option reentrant
    %option extra-type="struct stat *"
    %%

    __filesize__     printf( "%ld", yyextra->st_size  );
    __lastmod__      printf( "%ld", yyextra->st_mtime );
    %%
    void scan_file( char* filename )
    {
        yyscan_t scanner;
        struct stat buf;
        FILE *in;

        in = fopen( filename, "r" );
        stat( filename, &buf );

        yylex_init_extra( buf, &scanner );
        yyset_in( in, scanner );
        yylex( scanner );
        yylex_destroy( scanner );

        fclose( in );
   }

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.4.7 About yyscan_t

yyscan_t is defined as:

 
     typedef void* yyscan_t;

It is initialized by yylex_init() to point to an internal structure. You should never access this value directly. In particular, you should never attempt to free it (use yylex_destroy() instead.)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

19.5 Functions and Macros Available in Reentrant C Scanners

The following Functions are available in a reentrant scanner:

 
    char *yyget_text ( yyscan_t scanner );
    int yyget_leng ( yyscan_t scanner );
    FILE *yyget_in ( yyscan_t scanner );
    FILE *yyget_out ( yyscan_t scanner );
    int yyget_lineno ( yyscan_t scanner );
    YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
    int  yyget_debug ( yyscan_t scanner );

    void yyset_debug ( int flag, yyscan_t scanner );
    void yyset_in  ( FILE * in_str , yyscan_t scanner );
    void yyset_out  ( FILE * out_str , yyscan_t scanner );
    void yyset_lineno ( int line_number , yyscan_t scanner );
    void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );

There are no “set” functions for yytext and yyleng. This is intentional.

The following Macro shortcuts are available in actions in a reentrant scanner:

 
    yytext
    yyleng
    yyin
    yyout
    yylineno
    yyextra
    yy_flex_debug

In a reentrant C scanner, support for yylineno is always present (i.e., you may access yylineno), but the value is never modified by flex unless %option yylineno is enabled. This is to allow the user to maintain the line count independently of flex.

The following functions and macros are made available when %option bison-bridge (‘--bison-bridge’) is specified:

 
    YYSTYPE * yyget_lval ( yyscan_t scanner );
    void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
    yylval

The following functions and macros are made available when %option bison-locations (‘--bison-locations’) is specified:

 
    YYLTYPE *yyget_lloc ( yyscan_t scanner );
    void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
    yylloc

Support for yylval assumes that YYSTYPE is a valid type. Support for yylloc assumes that YYSLYPE is a valid type. Typically, these types are generated by bison, and are included in section 1 of the flex input.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.