[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21. Memory Management

This chapter describes how flex handles dynamic memory, and how you can override the default behavior.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.1 The Default Memory Management

Flex allocates dynamic memory during initialization, and once in a while from within a call to yylex(). Initialization takes place during the first call to yylex(). Thereafter, flex may reallocate more memory if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up all memory when you call yylex_destroy See faq-memory-leak.

Flex allocates dynamic memory for four purposes, listed below (2)

16kB for the input buffer.

Flex allocates memory for the character buffer used to perform pattern matching. Flex must read ahead from the input stream and store it in a large character buffer. This buffer is typically the largest chunk of dynamic memory flex consumes. This buffer will grow if necessary, doubling the size each time. Flex frees this memory when you call yylex_destroy(). The default size of this buffer (16384 bytes) is almost always too large. The ideal size for this buffer is the length of the longest token expected, in bytes, plus a little more. Flex will allocate a few extra bytes for housekeeping. Currently, to override the size of the input buffer you must #define YY_BUF_SIZE to whatever number of bytes you want. We don’t plan to change this in the near future, but we reserve the right to do so if we ever add a more robust memory management API.

64kb for the REJECT state. This will only be allocated if you use REJECT.

The size is large enough to hold the same number of states as characters in the input buffer. If you override the size of the input buffer (via YY_BUF_SIZE), then you automatically override the size of this buffer as well.

100 bytes for the start condition stack.

Flex allocates memory for the start condition stack. This is the stack used for pushing start states, i.e., with yy_push_state(). It will grow if necessary. Since the states are simply integers, this stack doesn’t consume much memory. This stack is not present if %option stack is not specified. You will rarely need to tune this buffer. The ideal size for this stack is the maximum depth expected. The memory for this stack is automatically destroyed when you call yylex_destroy(). See option-stack.

40 bytes for each YY_BUFFER_STATE.

Flex allocates memory for each YY_BUFFER_STATE. The buffer state itself is about 40 bytes, plus an additional large character buffer (described above.) The initial buffer state is created during initialization, and with each call to yy_create_buffer(). You can’t tune the size of this, but you can tune the character buffer as described above. Any buffer state that you explicitly create by calling yy_create_buffer() is NOT destroyed automatically. You must call yy_delete_buffer() to free the memory. The exception to this rule is that flex will delete the current buffer automatically when you call yylex_destroy(). If you delete the current buffer, be sure to set it to NULL. That way, flex will not try to delete the buffer a second time (possibly crashing your program!) At the time of this writing, flex does not provide a growable stack for the buffer states. You have to manage that yourself. See section Multiple Input Buffers.

84 bytes for the reentrant scanner guts

Flex allocates about 84 bytes for the reentrant scanner structure when you call yylex_init(). It is destroyed when the user calls yylex_destroy().


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.2 Overriding The Default Memory Management

Flex calls the functions yyalloc, yyrealloc, and yyfree when it needs to allocate or free memory. By default, these functions are wrappers around the standard C functions, malloc, realloc, and free, respectively. You can override the default implementations by telling flex that you will provide your own implementations.

To override the default implementations, you must do two things:

  1. Suppress the default implementations by specifying one or more of the following options:
  2. Provide your own implementation of the following functions: (3)
     
    // For a non-reentrant scanner
    void * yyalloc (size_t bytes);
    void * yyrealloc (void * ptr, size_t bytes);
    void   yyfree (void * ptr);
    
    // For a reentrant scanner
    void * yyalloc (size_t bytes, void * yyscanner);
    void * yyrealloc (void * ptr, size_t bytes, void * yyscanner);
    void   yyfree (void * ptr, void * yyscanner);
    

In the following example, we will override all three memory routines. We assume that there is a custom allocator with garbage collection. In order to make this example interesting, we will use a reentrant scanner, passing a pointer to the custom allocator through yyextra.

 
%{
#include "some_allocator.h"
%}

/* Suppress the default implementations. */
%option noyyalloc noyyrealloc noyyfree
%option reentrant

/* Initialize the allocator. */
#define YY_EXTRA_TYPE  struct allocator*
#define YY_USER_INIT  yyextra = allocator_create();

%%
.|\n   ;
%%

/* Provide our own implementations. */
void * yyalloc (size_t bytes, void* yyscanner) {
    return allocator_alloc (yyextra, bytes);
}

void * yyrealloc (void * ptr, size_t bytes, void* yyscanner) {
    return allocator_realloc (yyextra, bytes);
}

void yyfree (void * ptr, void * yyscanner) {      
    /* Do nothing -- we leave it to the garbage collector. */
}


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

21.3 A Note About yytext And Memory

When flex finds a match, yytext points to the first character of the match in the input buffer. The string itself is part of the input buffer, and is NOT allocated separately. The value of yytext will be overwritten the next time yylex() is called. In short, the value of yytext is only valid from within the matched rule’s action.

Often, you want the value of yytext to persist for later processing, i.e., by a parser with non-zero lookahead. In order to preserve yytext, you will have to copy it with strdup() or a similar function. But this introduces some headache because your parser is now responsible for freeing the copy of yytext. If you use a yacc or bison parser, (commonly used with flex), you will discover that the error recovery mechanisms can cause memory to be leaked.

To prevent memory leaks from strdup’d yytext, you will have to track the memory somehow. Our experience has shown that a garbage collection mechanism or a pooled memory mechanism will save you a lot of grief when writing parsers.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Rick Perry on January 7, 2013 using texi2html 1.82.