[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A flex
scanner has the ability to save the DFA tables to a file, and
load them at runtime when needed. The motivation for this feature is to reduce
the runtime memory footprint. Traditionally, these tables have been compiled into
the scanner as C arrays, and are sometimes quite large. Since the tables are
compiled into the scanner, the memory used by the tables can never be freed.
This is a waste of memory, especially if an application uses several scanners,
but none of them at the same time.
The serialization feature allows the tables to be loaded at runtime, before scanning begins. The tables may be discarded when scanning is finished.
22.1 Creating Serialized Tables | ||
22.2 Loading and Unloading Serialized Tables | ||
22.3 Tables File Format |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
You may create a scanner with serialized tables by specifying:
%option tables-file=FILE or --tables-file=FILE |
These options instruct flex to save the DFA tables to the file FILE. The tables will not be embedded in the generated scanner. The scanner will not function on its own. The scanner will be dependent upon the serialized tables. You must load the tables from this file at runtime before you can scan anything.
If you do not specify a filename to --tables-file
, the tables will be
saved to ‘lex.yy.tables’, where ‘yy’ is the appropriate prefix.
If your project uses several different scanners, you can concatenate the serialized tables into one file, and flex will find the correct set of tables, using the scanner prefix as part of the lookup key. An example follows:
$ flex --tables-file --prefix=cpp cpp.l $ flex --tables-file --prefix=c c.l $ cat lex.cpp.tables lex.c.tables > all.tables |
The above example created two scanners, ‘cpp’, and ‘c’. Since we did not specify a filename, the tables were serialized to ‘lex.c.tables’ and ‘lex.cpp.tables’, respectively. Then, we concatenated the two files together into ‘all.tables’, which we will distribute with our project. At runtime, we will open the file and tell flex to load the tables from it. Flex will find the correct tables automatically. (See next section).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If you’ve built your scanner with %option tables-file
, then you must
load the scanner tables at runtime. This can be accomplished with the following
function:
Locates scanner tables in the stream pointed to by fp and loads them.
Memory for the tables is allocated via yyalloc
. You must call this
function before the first call to yylex
. The argument scanner
only appears in the reentrant scanner.
This function returns ‘0’ (zero) on success, or non-zero on error.
The loaded tables are not automatically destroyed (unloaded) when you
call yylex_destroy
. The reason is that you may create several scanners
of the same type (in a reentrant scanner), each of which needs access to these
tables. To avoid a nasty memory leak, you must call the following function:
Unloads the scanner tables. The tables must be loaded again before you can scan any more data. The argument scanner only appears in the reentrant scanner. This function returns ‘0’ (zero) on success, or non-zero on error.
The functions yytables_fload
and yytables_destroy
are not
thread-safe. You must ensure that these functions are called exactly once (for
each scanner type) in a threaded program, before any thread calls yylex
.
After the tables are loaded, they are never written to, and no thread
protection is required thereafter – until you destroy them.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section defines the file format of serialized flex
tables.
The tables format allows for one or more sets of tables to be specified, where each set corresponds to a given scanner. Scanners are indexed by name, as described below. The file format is as follows:
TABLE SET 1 +-------------------------------+ Header | uint32 th_magic; | | uint32 th_hsize; | | uint32 th_ssize; | | uint16 th_flags; | | char th_version[]; | | char th_name[]; | | uint8 th_pad64[]; | +-------------------------------+ Table 1 | uint16 td_id; | | uint16 td_flags; | | uint32 td_hilen; | | uint32 td_lolen; | | void td_data[]; | | uint8 td_pad64[]; | +-------------------------------+ Table 2 | | . . . . . . . . . . . . Table n | | +-------------------------------+ TABLE SET 2 . . . TABLE SET N |
The above diagram shows that a complete set of tables consists of a header followed by multiple individual tables. Furthermore, multiple complete sets may be present in the same file, each set with its own header and tables. The sets are contiguous in the file. The only way to know if another set follows is to check the next four bytes for the magic number (or check for EOF). The header and tables sections are padded to 64-bit boundaries. Below we describe each field in detail. This format does not specify how the scanner will expand the given data, i.e., data may be serialized as int8, but expanded to an int32 array at runtime. This is to reduce the size of the serialized data where possible. Remember, all integer values are in network byte order.
Fields of a table header:
th_magic
Magic number, always 0xF13C57B1.
th_hsize
Size of this entire header, in bytes, including all fields plus any padding.
th_ssize
Size of this entire set, in bytes, including the header, all tables, plus any padding.
th_flags
Bit flags for this table set. Currently unused.
th_version[]
Flex version in NULL-terminated string format. e.g., ‘2.5.13a’. This is the version of flex that was used to create the serialized tables.
th_name[]
Contains the name of this table set. The default is ‘yytables’, and is prefixed accordingly, e.g., ‘footables’. Must be NULL-terminated.
th_pad64[]
Zero or more NULL bytes, padding the entire header to the next 64-bit boundary as calculated from the beginning of the header.
Fields of a table:
td_id
Specifies the table identifier. Possible values are:
YYTD_ID_ACCEPT (0x01)
yy_accept
YYTD_ID_BASE (0x02)
yy_base
YYTD_ID_CHK (0x03)
yy_chk
YYTD_ID_DEF (0x04)
yy_def
YYTD_ID_EC (0x05)
yy_ec
YYTD_ID_META (0x06)
yy_meta
YYTD_ID_NUL_TRANS (0x07)
yy_NUL_trans
YYTD_ID_NXT (0x08)
yy_nxt
. This array may be two dimensional. See the td_hilen
field below.
YYTD_ID_RULE_CAN_MATCH_EOL (0x09)
yy_rule_can_match_eol
YYTD_ID_START_STATE_LIST (0x0A)
yy_start_state_list
. This array is handled specially because it is an
array of pointers to structs. See the td_flags
field below.
YYTD_ID_TRANSITION (0x0B)
yy_transition
. This array is handled specially because it is an array of
structs. See the td_lolen
field below.
YYTD_ID_ACCLIST (0x0C)
yy_acclist
td_flags
Bit flags describing how to interpret the data in td_data
.
The data arrays are one-dimensional by default, but may be
two dimensional as specified in the td_hilen
field.
YYTD_DATA8 (0x01)
The data is serialized as an array of type int8.
YYTD_DATA16 (0x02)
The data is serialized as an array of type int16.
YYTD_DATA32 (0x04)
The data is serialized as an array of type int32.
YYTD_PTRANS (0x08)
The data is a list of indexes of entries in the expanded yy_transition
array. Each index should be expanded to a pointer to the corresponding entry
in the yy_transition
array. We count on the fact that the
yy_transition
array has already been seen.
YYTD_STRUCT (0x10)
The data is a list of yy_trans_info structs, each of which consists of
two integers. There is no padding between struct elements or between structs.
The type of each member is determined by the YYTD_DATA*
bits.
td_hilen
If td_hilen
is non-zero, then the data is a two-dimensional array.
Otherwise, the data is a one-dimensional array. td_hilen
contains the
number of elements in the higher dimensional array, and td_lolen
contains
the number of elements in the lowest dimension.
Conceptually, td_data
is either sometype td_data[td_lolen]
, or
sometype td_data[td_hilen][td_lolen]
, where sometype
is specified
by the td_flags
field. It is possible for both td_lolen
and
td_hilen
to be zero, in which case td_data
is a zero length
array, and no data is loaded, i.e., this table is simply skipped. Flex does not
currently generate tables of zero length.
td_lolen
Specifies the number of elements in the lowest dimension array. If this is
a one-dimensional array, then it is simply the number of elements in this array.
The element size is determined by the td_flags
field.
td_data[]
The table data. This array may be a one- or two-dimensional array, of type
int8
, int16
, int32
, struct yy_trans_info
, or
struct yy_trans_info*
, depending upon the values in the
td_flags
, td_hilen
, and td_lolen
fields.
td_pad64[]
Zero or more NULL bytes, padding the entire table to the next 64-bit boundary as calculated from the beginning of this table.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.