
GREP			PB-Lib C/C++ Library Programmer's Manual		GREP


NAME
	grep_compile - compile a regular expression
	grep_match 	 - match a line against compiled regular expression
	grep		 - match a line against regular expression
	glob_compile - compile wildcard filename for globbing
	glob_match	 - match filename against compiled expression
	glob		 - match a filename against expression

SYNOPSIS
	#include <grep.h>

	extern int grepError;

	int   grep_compile(const char *pattern);
	char *grep_match(const char *line);
	char *grep(const char *pattern, const char *line);
	int   glob_compile(const char *pattern);
	int   glob_match(const char *s);
	int   glob(const char *pattern, const char *s);

DESCRIPTION
	The grep* and glob* provide regular expression parsing and matching.
	The main difference between the two sets is what they are designed
	for. The grep* functions are generalized and follow the *IX regular
	expression syntax. The glob* functions are designed for matching
	filenames against wildcards. The are also modelled after the *IX
	environment and as such are a lot more versatile that MS-DOS limited
	capabilities.

	The grep* regular expression compiler recognizes the following tokens:

	c		- normal character, matches this character (case-sensitive)
	\       - backslash, escapes any character following it. For example,
			  '\$' matches '$', '\\' matches '\' and '\i' matches 'i'.
	^		- circumflex, at start of expression matches beginning of line.
	$		- dollar sign at end of expression matches the end of line.
	.		- period, matches any one character (there has to be one).
	*		- star, matches zero or more occurences of preceding expression.
	+		- plus, matches one or more occurences of preceding expression.
			  The plus and the star must follow a valid expression.
	[]		- square brackes, matches any of the characters in the set of
			  characters included between them. If the first character is
			  the circumflex, '^', then matches any character that is not
			  in the set between the brackets. You can specify ranges of
			  characters, like this [a-zA-Z] for the whole alphabet, or
			  [0-9aeyuio] for all digits and the vowels. If the '-' is the
			  first or last character in the set, it loses its special
			  meaning. For a range to be valid, the first character in it
			  must be smaller (syntactically) than the other.

	The glob* compiler recognizes the following tokens:

	c		- normal character, matches the same character (case-sensitive)
	?		- matches any one character (there has to be one)
	*		- matches zero or more characters (longest match)
	[]		- same as brackets for grep* (see above)

	Note that even though it was designed for filename matching, the glob*
	functions do not do any checking as to the filenames, they just try
	to match blindly against the pattern. The only difference is in the
	tokens recognized and in the way the '*' is processed.

	Each regular expression must be compiled in order to be usable. The
	longest compilable string depends on the complexity of the expression.
	The buffer is currently 512 bytes long. This should be adequate for
	most needs. The compiling scheme is really easy. Each normal charac-
	ter is stored verbatim. The special sequences are encoded. Stars are
	stored a little bit differently in grep* and glob* modes. Each
	compiled pattern is terminated by the End-of-Pattern symbol, followed
	by the NUL-string terminator. Sets are encoded with a start and end
	characters. The ranges have only a start character followed by the
	start and end of range symbols. Stars are stored as an escape symbol
	followed by the pattern they affect, terminated with the End-of-Pattern
	symbol (in grep* mode) and simply as a special symbol (in glob* mode).
	Pluses are stored like the stars (in grep* mode only).

	The external variable grepError will hold the error that caused the
	last requested operation to fail (applies only to compiling). The
	values that it can take are defined in the header file.

	The compiling functions return 0 on success and -1 on error. The
	grep* functions return a pointer to the first character of the match
	in the string or NULL on error. The glob* functions return 0 on
	match and -1 on error.

	The globbing functions are a lot like the UNIX shells' command-line
	parsing. For example, extensions have no real meaning and the '*'
	matches everything (not limited to name or extension). Also, sets
	are allowed. For example, 'tes*e' will match 'tese', 'teste',
	'tes.e', 'test_this_here', etc.

	If you need to test a lot of input against the same pattern, it is
	recommended that you use the *_compile() functions followed by
	calls to *_match(). On the other side, if you want to change the
	pattern, use the grep() and glob() functions that let you specify
	the pattern and the input string to match. They will compile the
	expression internally, so you won't need to do that. Because they
	compile the expression every time they are called, they tend to
	run a lot slower.

	Note that since the grep* and glob* functions use the same static
	buffer for the compiled expression, each call to a compiling
	function will overwrite it.

EXAMPLE
	The example GLOBTEST.BAT file will test the globbing function. You
	will need the compile the GREPTEST.C for it to work. You can see
	why some of the things fail and why others work.

LIMITS
	Currently, the grep* mode does not recognize parenthesis, so you
	cannot group expressions. Also, none of the extended regular
	expressions are recognized, only the basic.

