<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet title="XSL formatting" type="text/xsl" href="http://www.usualcoding.eu/feed/rss2/xslt" ?><rss version="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:wfw="http://wellformedweb.org/CommentAPI/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
  <title>UsualCoding.eu - parser</title>
  <link>http://www.usualcoding.eu/</link>
  <description>Use(less|ful) stuff</description>
  <language>en</language>
  <pubDate>Wed, 21 Jul 2010 22:55:36 +0200</pubDate>
  <copyright></copyright>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Dotclear</generator>
  
    
  <item>
    <title>Building a reentrant parser in C with Flex/Bison</title>
    <link>http://www.usualcoding.eu/post/2007/09/03/Building-a-reentrant-parser-in-C-with-Flex/Bison</link>
    <guid isPermaLink="false">urn:md5:60be796c09ff71e29f779f36a12675eb</guid>
    <pubDate>Mon, 03 Sep 2007 20:51:00 +0000</pubDate>
    <dc:creator>Seb</dc:creator>
        <category>Code</category>
        <category>bison</category><category>C</category><category>code</category><category>flex</category><category>lex</category><category>parser</category><category>reentrant</category><category>yacc</category>    
    <description>    &lt;p&gt;It is not easy for beginners to find all the right information on how to create a reentrant parser with tools like flex and bison. Most examples out there are out of date or not very clear.&lt;br /&gt;
Once I found out, I wrote a small example of how to do it. The objective is not yet another flex/bison tutorial, but to focus solely on the reentrant code aspects.&lt;br /&gt;
It is a simple caculator that evaluates arithmetic expressions and it is separated in three small files:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;calc.h (common header file for lexer and parser)&lt;/li&gt;
&lt;li&gt;lexer.l (obviously the lexer)&lt;/li&gt;
&lt;li&gt;parser.y (the parser and the main function)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;First, lets see the header:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    typedef struct      parse_parm_s&lt;br /&gt;    {&lt;br /&gt;        void            *yyscanner;&lt;br /&gt;        char            *buf;&lt;br /&gt;        int             pos;&lt;br /&gt;        int             length;&lt;br /&gt;        double          result;&lt;br /&gt;    }                   parse_parm;&lt;br /&gt;&lt;br /&gt;    void    parse(char *buf, double *result);&lt;br /&gt;&lt;br /&gt;    #define YYSTYPE         double&lt;br /&gt;    #define YY_EXTRA_TYPE   parse_parm *&lt;br /&gt;&lt;br /&gt;    int     yylex(YYSTYPE *, void *);&lt;br /&gt;    int     yylex_init(void **);&lt;br /&gt;    int     yylex_destroy(void *);&lt;br /&gt;    void    yyset_extra(YY_EXTRA_TYPE, void *);&lt;br /&gt;    int     yyparse(parse_parm *, void *);&lt;br /&gt;    void    yyerror();&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Lets focus on the &lt;code&gt;parse_parm&lt;/code&gt; structure. What we need is a way to share data between the lexer and the parser without the use of global variables. That is where this structure comes in. In this one we have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;yyscanner&lt;/code&gt; which is an uninitialized pointer that will be used by the lexer for internal suff (instead of globals)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;buf&lt;/code&gt; is the buffer that contains the data to be parsed&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pos&lt;/code&gt; is the current position in the buffer&lt;/li&gt;
&lt;li&gt;&lt;code&gt;length&lt;/code&gt; is the size of the buffer&lt;/li&gt;
&lt;li&gt;&lt;code&gt;result&lt;/code&gt; will be the result of the evaluated expression&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;parse&lt;/code&gt; is our main parsing function. We will see it in the parser.&lt;br /&gt;
&lt;code&gt;YYSTYPE&lt;/code&gt; is the type of &lt;code&gt;yylval&lt;/code&gt; but you should know that otherwise go back to basics.&lt;br /&gt;
&lt;code&gt;YY_EXTRA_TYPE&lt;/code&gt; is the type of the extra data which will be given to the lexer thanks to the &lt;code&gt;yyset_extra&lt;/code&gt; function. In this case it is our structure.&lt;/p&gt;
&lt;p&gt;The prototypes are forward declaration of lexer/parser functions so the compiler shuts up about warnings. I think I could generate a .h with all that but I was lazy.&lt;/p&gt;
&lt;p&gt;Here comes the lexer:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    %{&lt;br /&gt;    #include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;    #include &amp;lt;string.h&amp;gt;&lt;br /&gt;&lt;br /&gt;    #include &quot;calc.h&quot;&lt;br /&gt;    #include &quot;parser.tab.h&quot;&lt;br /&gt;&lt;br /&gt;    #define PARM    yyget_extra(yyscanner)&lt;br /&gt;&lt;br /&gt;    #define YY_INPUT(buffer, res, max_size)             \&lt;br /&gt;    do {                                                \&lt;br /&gt;        if (PARM-&amp;gt;pos &amp;gt;= PARM-&amp;gt;length)                  \&lt;br /&gt;            res = YY_NULL;                              \&lt;br /&gt;        else                                            \&lt;br /&gt;        {                                               \&lt;br /&gt;            res = PARM-&amp;gt;length - PARM-&amp;gt;pos;             \&lt;br /&gt;            res &amp;gt; (int)max_size ? res = max_size : 0;   \&lt;br /&gt;            memcpy(buffer, PARM-&amp;gt;buf + PARM-&amp;gt;pos, res); \&lt;br /&gt;            PARM-&amp;gt;pos += res;                           \&lt;br /&gt;        }                                               \&lt;br /&gt;    } while (0)&lt;br /&gt;&lt;br /&gt;    %}&lt;br /&gt;&lt;br /&gt;    %option reentrant bison-bridge&lt;br /&gt;    %option noyywrap&lt;br /&gt;    %option nounput&lt;br /&gt;&lt;br /&gt;    %%&lt;br /&gt;&lt;br /&gt;    [+\-*/()] { return (*yytext); }&lt;br /&gt;&lt;br /&gt;    [0-9]+ { *yylval = atoi(yytext); return (INT); }&lt;br /&gt;&lt;br /&gt;    ([0-9]*.[0-9]+) { *yylval = atof(yytext); return (FLOAT); }&lt;br /&gt;&lt;br /&gt;    [ \t\r\n] ;&lt;br /&gt;&lt;br /&gt;    %%&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The &lt;code&gt;parse_parm&lt;/code&gt; structure is extra data so we have to access it with &lt;code&gt;yyget_extra(yyscanner)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Since we read from a buffer and not stdin or a file, we have to to redefine the &lt;code&gt;YY_INPUT&lt;/code&gt; macro (see section 10 of the flex manual 'The generated scanner').&lt;/p&gt;
&lt;p&gt;We want the scanner to be reentrant, therefore generate no global variables. That is what the &lt;code&gt;reentrant&lt;/code&gt; option is for. &lt;code&gt;bison-bridge&lt;/code&gt; is used to create a bison compatible scanner and share &lt;code&gt;yylval&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Finally the parser:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    %{&lt;br /&gt;    #include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;    #include &amp;lt;string.h&amp;gt;&lt;br /&gt;&lt;br /&gt;    #include &quot;calc.h&quot;&lt;br /&gt;&lt;br /&gt;    void    parse(char *buf, double *result)&lt;br /&gt;    {&lt;br /&gt;        parse_parm  pp;&lt;br /&gt;&lt;br /&gt;        pp.buf = buf;&lt;br /&gt;        pp.length = strlen(buf);&lt;br /&gt;        pp.pos = 0;&lt;br /&gt;        *result = 0;&lt;br /&gt;        yylex_init(&amp;amp;pp.yyscanner);&lt;br /&gt;        yyset_extra(&amp;amp;pp, pp.yyscanner);&lt;br /&gt;        yyparse(&amp;amp;pp, pp.yyscanner);&lt;br /&gt;        *result = pp.result;&lt;br /&gt;        yylex_destroy(pp.yyscanner);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    %}&lt;br /&gt;&lt;br /&gt;    %pure_parser&lt;br /&gt;    %parse-param {parse_parm *parm}&lt;br /&gt;    %parse-param {void *scanner}&lt;br /&gt;    %lex-param {yyscan_t *scanner}&lt;br /&gt;&lt;br /&gt;    %token INT FLOAT&lt;br /&gt;&lt;br /&gt;    %left '-' '+'&lt;br /&gt;    %left '*' '/'&lt;br /&gt;    %left NEG POS&lt;br /&gt;&lt;br /&gt;    %%&lt;br /&gt;&lt;br /&gt;    expr: math { parm-&amp;gt;result = $1; }&lt;br /&gt;        ;&lt;br /&gt;&lt;br /&gt;    math: math '+' math { $$ = $1 + $3; }&lt;br /&gt;        | math '-' math { $$ = $1 - $3; }&lt;br /&gt;        | math '*' math { $$ = $1 * $3; }&lt;br /&gt;        | math '/' math { $$ = $1 / $3; }&lt;br /&gt;        | '-' math %prec NEG { $$ = -$2; }&lt;br /&gt;        | '+' math %prec POS { $$ = $2; }&lt;br /&gt;        | '(' math ')' { $$ = $2; }&lt;br /&gt;        | INT { $$ = $1; }&lt;br /&gt;        | FLOAT { $$ = $1; }&lt;br /&gt;        ;&lt;br /&gt;&lt;br /&gt;    %%&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;yylex_init&lt;/code&gt; and &lt;code&gt;yylex_destroy&lt;/code&gt; have to be called to initialize the lexer and free its ressources after the parsing.
&lt;code&gt;yyset_extra&lt;/code&gt; is used to pass the &lt;code&gt;parse_parm&lt;/code&gt; structure to the lexer.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;pure_parser&lt;/code&gt; option tells bison to use no global variables and create a reentrant parser.&lt;br /&gt;
&lt;code&gt;yyparse&lt;/code&gt; gets two new parameters with the &lt;code&gt;parse-param&lt;/code&gt;. The &lt;code&gt;parse_parm&lt;/code&gt; structure and the pointer for the lexer.&lt;br /&gt;
&lt;code&gt;yylex&lt;/code&gt; gets a new parameter thanks to &lt;code&gt;lex-param&lt;/code&gt; so the parser can pass him the pointer.&lt;/p&gt;
&lt;p&gt;Now the parse function can be called any time to evaluate an expression and get the result. Even in multiple threads.&lt;/p&gt;
&lt;p&gt;Here is a link to the sources in this tutorial: &lt;a href=&quot;http://www.sig11.org/~seb/calc.tar.gz&quot;&gt;calc.tar.gz&lt;/a&gt;. You can compile it with:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    $ bison -d parser.y&lt;br /&gt;    $ flex lexer.l&lt;br /&gt;    $ gcc lex.yy.c parser.tab.c -ly -ll&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Other links:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://flex.sourceforge.net/&quot;&gt;Flex&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://flex.sourceforge.net/manual/&quot;&gt;Flex Manual&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.gnu.org/software/bison/&quot;&gt;Bison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;http://www.gnu.org/software/bison/manual/index.html&quot;&gt;Bison Manual&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>
    
          <enclosure url="http://www.usualcoding.eu/public/calc/calc.tar.gz"
      length="1532" type="application/x-gzip" />
    
    
          <comments>http://www.usualcoding.eu/post/2007/09/03/Building-a-reentrant-parser-in-C-with-Flex/Bison#comment-form</comments>
      <wfw:comment>http://www.usualcoding.eu/post/2007/09/03/Building-a-reentrant-parser-in-C-with-Flex/Bison#comment-form</wfw:comment>
      <wfw:commentRss>http://www.usualcoding.eu/feed/rss2/comments/1</wfw:commentRss>
      </item>
    
</channel>
</rss>