[Previous] [Contents] [Next]

lex

Lexical analyzer generator (POSIX)

Syntax:

lex [-Icdflnoitv78] [-s skeleton] [file...]

Options:

-7
-8
Generate 7- or 8-bit scanners. The 7-bit scanner is the standard for most lex systems. This implementation defaults to 8-bit scanners, but if you intend to use the generated scanner on other systems, you should use -7 to ensure maximum portability.
-c
Generate a C language scanner (default). This option has no real effect since lex generates only C programs, but is required by POSIX to emphasize that the scanner is generated in C.
-d
Generate code for debugging the scanner. The code displays what state the scanner is in, what characters are currently expected, etc. Useful for debugging scanners.
-f
Generate larger, faster, full-table scanners. Since tables aren't compressed, you shouldn't use this option for large scanners.
-i
Generate a case-insensitive scanner.
-I
Generate an interactive scanner. An interactive scanner must be able to stop looking ahead as soon as a rule has been matched. If your program is interactive, you should use this option, even though it disables some of the optimizations lex performs.
-l
("el") Disable generation of "line directives" in the scanner. This option may be required to step through the scanner using a source-level debugger.
-n
Don't generate scanner statistics (default).
-o
Minimize the size of the scanner tables, possibly at the cost of execution time.
-s skeleton
Use the specified skeleton file (default is /usr/lib/lex/skeleton).
-t
Write the scanner to standard output rather than to the lex.yy.c file.
-v
Print a summary of the scanner statistics.
filename
The source filename, or standard input if not specified.

Description:

The lex utility, which is a modified version of Vern Paxton's flex 2.3.7, generates a C program from lex source code. The program generated is suitable for processing character input and is designed to interface to the yacc utility.

The lex utility requires the file /usr/lib/lex/skeleton. You may create a link to lex called flex, which supports a greater set of options for controlling the generated scanner. If you create this link, you must also copy or link the file /usr/lib/lex/skeleton to /usr/local/lib/flex.skel.

The lex specification contains a set of regular expressions and actions where the scanner will perform the action when the regular expression is recognized in the input stream.

The following "publics" are defined in the scanner:

Variable Description
char *yytext; When a token is recognized in the input stream, it is placed into this string, with a terminating \0.

In some traditional implementations, this is char yytext[].

int yyleng; This integer contains the length of the string yytext.
FILE* yyout; The file written to for default, or ECHO actions.
FILE *yyin; The file associated with the current input.

The following "externals" are required by the scanner:

Variable Description
int input() Return the next character from the input, or return zero to signify no more input.
int output(int c) Print the char c on yyout.
int unput(int c) Return the character c to the input stream so it will be read by a subsequent call to the input() function.
int yywrap() Called when lex reaches the end of file. If there are further files to be processed, yywrap() should initialize yyin and return 0, otherwise return 1.

These routines are engineered for simplifying the most common uses of lex scanners and may be overridden by defining them in programs themselves.

Variations of the externals are defined in the lex library.

You can use the following lexical analyzer in conjunction with the parser in yacc to implement a simple calculator. You'll find the source (calc.l) for this analyzer in /usr/demo/src.

%{
/*
 * Sample lexical analyzer for a simple calculator.
 * 
 */
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <math.h>
#define YYSTYPE double
#include "y.tab.h"

extern int lineno;        /* current line number  */
extern YYSTYPE yylval;    /* value of numeric token */

%}

      digit [0-9]
space [ \t]

%%
{space}         { ; }    /* spaces are ignored */

{digit}+\.?|{digit}*\.{digit}+ { 
                   yylval = strtod(yytext,0); 
                   return NUMBER; }

\*\*            { return '^';  }
last            { return LAST; }
cos             { return COS;  }
exp             { return EXP;  }
sin             { return SIN;  }
sqrt            { return SQRT; }
tan             { return TAN;  }
pi              { yylval = atan(1.0)*4; 
                  return NUMBER; }
e               { yylval = exp(1.0);    
                  return NUMBER; }
\n              { lineno++; return '\n'; }

.               { return yytext[0]; }

%%

Examples:

The following lex program will trim trailing spaces from lines, fold multiple spaces into a single tab, and convert all lowercase text to uppercase:

% {
#include <stdio.h>
#include <ctype.h>
% }
% %
[a-z] putchar(toupper(yytext[0]));
[  ]+$
[  ]+  putchar('\t');

Files:

lex.yy.c
By default lex will create this file and put the generated scanner source code in it.
/usr/lib/lex/skeleton
Default skeleton file (-s to change).

Contributing author:

Vern Paxton, The University of California at Berkeley

Caveats:

Some code will expect yytext to be defined as

    char yytext[];

instead of

    char *yytext;

See also:

yacc

A.V. Aho, R. Sethi and J.D. Ullman, Compilers: Principles, Techniques and Tools, Addison-Wesley, 1977.

M.E. Lesk and E.Schmidt, "Lex -- A Lexical Analyzer Generator," Bell Laboratories Computing Science Technical Report # 39, October 1975.

S.C. Johnson, "Yacc -- Yet Another Compiler-Compiler," Bell Laboratories Computing Science Technical Report # 32, July 1978.

J. P. Bennett, Introduction to Compiling Techniques -- A First Course using ANSI C, LEX, and YACC, McGraw-Hill, 1990.


[Previous] [Contents] [Next]