Pattern Class Reference

List of all members.

Public Member Functions

 ~Pattern ()
std::string replace (const std::string &str, const std::string &replaceWith)
std::vector< std::string > split (const std::string &str, const bool keepEmptys=0, const unsigned long limit=0)
std::vector< std::string > findAll (const std::string &str)
bool matches (const std::string &str)
unsigned long getFlags () const
std::string getPattern () const
MatchercreateMatcher (const std::string &str)

Static Public Member Functions

static Patterncompile (const std::string &pattern, const unsigned long mode=0)
static PatterncompileAndKeep (const std::string &pattern, const unsigned long mode=0)
static std::string replace (const std::string &pattern, const std::string &str, const std::string &replaceWith, const unsigned long mode=0)
static std::vector< std::string > split (const std::string &pattern, const std::string &str, const bool keepEmptys=0, const unsigned long limit=0, const unsigned long mode=0)
static std::vector< std::string > findAll (const std::string &pattern, const std::string &str, const unsigned long mode=0)
static bool matches (const std::string &pattern, const std::string &str, const unsigned long mode=0)
static bool registerPattern (const std::string &name, const std::string &pattern, const unsigned long mode=0)
static void unregisterPatterns ()
static void clearPatternCache ()
static std::pair< std::string,
int > 
findNthMatch (const std::string &pattern, const std::string &str, const int matchNum, const unsigned long mode=0)

Static Public Attributes

static const unsigned long CASE_INSENSITIVE = 0x01
 We should match regardless of case.
static const unsigned long LITERAL = 0x02
 We are implicitly quoted.
static const unsigned long DOT_MATCHES_ALL = 0x04
 We should treat a . as [-]
static const unsigned long MULTILINE_MATCHING = 0x08
static const unsigned long UNIX_LINE_MODE = 0x10
static const int MIN_QMATCH = 0x00000000
 The absolute minimum number of matches a quantifier can match (0).
static const int MAX_QMATCH = 0x7FFFFFFF
 The absolute maximum number of matches a quantifier can match (0x7FFFFFFF).

Protected Member Functions

void raiseError ()
NFANode * registerNode (NFANode *node)
std::string classUnion (std::string s1, std::string s2) const
std::string classIntersect (std::string s1, std::string s2) const
std::string classNegate (std::string s1) const
std::string classCreateRange (char low, char hi) const
int getInt (int start, int end)
bool quantifyCurly (int &sNum, int &eNum)
NFANode * quantifyGroup (NFANode *start, NFANode *stop, const int gn)
NFANode * quantify (NFANode *newNode)
std::string parseClass ()
std::string parsePosix ()
std::string parseOctal ()
std::string parseHex ()
NFANode * parseBackref ()
std::string parseEscape (bool &inv, bool &quo)
NFANode * parseRegisteredPattern (NFANode **end)
NFANode * parseBehind (const bool pos, NFANode **end)
NFANode * parseQuote ()
NFANode * parse (const bool inParen=0, const bool inOr=0, NFANode **end=NULL)

Protected Attributes

std::map< NFANode *, bool > nodes
Matchermatcher
NFANode * head
std::string pattern
bool error
int curInd
int groupCount
int nonCapGroupCount
unsigned long flags

Static Protected Attributes

static std::map< std::string,
Pattern * > 
compiledPatterns
static std::map< std::string,
std::pair< std::string, unsigned
long > > 
registeredPatterns

Friends

class Matcher
class NFANode
class NFAQuantifierNode

Detailed Description

This pattern class is very similar in functionality to Java's java.util.regex.Pattern class. The pattern class represents an immutable regular expression object. Instead of having a single object contain both the regular expression object and the matching object, instead the two objects are split apart. The Matcher class represents the maching object.

The Pattern class works primarily off of "compiled" patterns. A typical instantiation of a regular expression looks like:

Author:
Jeffery Stuart
Since:
March 2003, Stable Since November 2004
Version:
0.02a A class used to represent "PERL 5"-ish regular expressions


Constructor & Destructor Documentation

Pattern::~Pattern (  ) 

Deletes all NFA nodes allocated during compilation


Member Function Documentation

void Pattern::raiseError (  )  [protected]

Raises an error during compilation. Compilation will cease at that point and compile will return NULL.

NFANode * Pattern::registerNode ( NFANode *  node  )  [protected]

Convenience function for registering a node in nodes.

Parameters:
node The node to register
Returns:
The registered node

std::string Pattern::classUnion ( std::string  s1,
std::string  s2 
) const [protected]

Calculates the union of two strings. This function will first sort the strings and then use a simple selection algorithm to find the union.

Parameters:
s1 The first "class" to union
s2 The second "class" to union
Returns:
A new string containing all unique characters. Each character must have appeared in one or both of s1 and s2.

std::string Pattern::classIntersect ( std::string  s1,
std::string  s2 
) const [protected]

Calculates the intersection of two strings. This function will first sort the strings and then use a simple selection algorithm to find the intersection.

Parameters:
s1 The first "class" to intersect
s2 The second "class" to intersect
Returns:
A new string containing all unique characters. Each character must have appeared both s1 and s2.

std::string Pattern::classNegate ( std::string  s1  )  const [protected]

Calculates the negation of a string. The negation is the set of all characters between and not contained in s1.

Parameters:
s1 The "class" to be negated.
s2 The second "class" to intersect
Returns:
A new string containing all unique characters. Each character must have appeared both s1 and s2.

std::string Pattern::classCreateRange ( char  low,
char  hi 
) const [protected]

Creates a new "class" representing the range from low thru hi. This function will wrap if low > hi. This is a feature, not a buf. Sometimes it is useful to be able to say [-] instead of [--].

Parameters:
low The beginning character
hi The ending character
Returns:
A new string containing all the characters from low thru hi.

int Pattern::getInt ( int  start,
int  end 
) [protected]

Extracts a decimal number from the substring of member-variable pattern starting at start and ending at end.

Parameters:
start The starting index in pattern
end The last index in pattern
Returns:
The decimal number in pattern

bool Pattern::quantifyCurly ( int &  sNum,
int &  eNum 
) [protected]

Parses a {n,m} string out of the member-variable pattern stores the result in sNum and eNum.

Parameters:
sNum Output parameter. The minimum number of matches required by the curly quantifier are stored here.
eNum Output parameter. The maximum number of matches allowed by the curly quantifier are stored here.
Returns:
Success/Failure. Fails when the curly does not have the proper syntax

NFANode * Pattern::quantifyGroup ( NFANode *  start,
NFANode *  stop,
const int  gn 
) [protected]

Tries to quantify the currently parsed group. If the group being parsed is indeed quantified in the member-variable pattern, then the NFA is modified accordingly.

Parameters:
start The starting node of the current group being parsed
stop The ending node of the current group being parsed
gn The group number of the current group being parsed
Returns:
The node representing the starting node of the group. If the group becomes quantified, then this node is not necessarily a GroupHead node.

NFANode * Pattern::quantify ( NFANode *  newNode  )  [protected]

Tries to quantify the last parsed expression. If the character was indeed quantified, then the NFA is modified accordingly.

Parameters:
newNode The recently created expression node
Returns:
The node representing the last parsed expression. If the expression was quantified, return value != newNode

std::string Pattern::parseClass (  )  [protected]

Parses the current class being examined in pattern.

Returns:
A string of unique characters contained in the current class being parsed

std::string Pattern::parsePosix (  )  [protected]

Parses the current POSIX class being examined in pattern.

Returns:
A string of unique characters representing the POSIX class being parsed

std::string Pattern::parseOctal (  )  [protected]

Returns a string containing the octal character being parsed

Returns:
The string contained the octal value being parsed

std::string Pattern::parseHex (  )  [protected]

Returns a string containing the hex character being parsed

Returns:
The string contained the hex value being parsed

NFANode * Pattern::parseBackref (  )  [protected]

Returns a new node representing the back reference being parsed

Returns:
The new node representing the back reference being parsed

std::string Pattern::parseEscape ( bool &  inv,
bool &  quo 
) [protected]

Parses the escape sequence currently being examined. Determines if the escape sequence is a class, a single character, or the beginning of a quotation sequence.

Parameters:
inv Output parameter. Whether or not to invert the returned class
quo Output parameter. Whether or not this sequence starts a quotation.
Returns:
The characters represented by the class

NFANode * Pattern::parseRegisteredPattern ( NFANode **  end  )  [protected]

Parses a supposed registered pattern currently under compilation. If the sequence of characters does point to a registered pattern, then the registered pattern is appended to *end. The registered pattern is parsed with the current compilation flags.

Parameters:
end The ending node of the thus-far compiled pattern
Returns:
The new end node of the current pattern

NFANode * Pattern::parseBehind ( const bool  pos,
NFANode **  end 
) [protected]

Parses a lookbehind expression. Appends the necessary nodes *end.

Parameters:
pos Positive or negative look behind
end The ending node of the current pattern
Returns:
The new end node of the current pattern

NFANode * Pattern::parseQuote (  )  [protected]

Parses the current expression and tacks on nodes until a is found.

Returns:
The end of the current pattern

NFANode * Pattern::parse ( const bool  inParen = 0,
const bool  inOr = 0,
NFANode **  end = NULL 
) [protected]

Parses pattern. This function is called recursively when an or (|) or a group is encountered.

Parameters:
inParen Are we currently parsing inside a group
inOr Are we currently parsing one side of an or (|)
end The end of the current expression
Returns:
The starting node of the NFA constructed from this parse

Pattern * Pattern::compile ( const std::string &  pattern,
const unsigned long  mode = 0 
) [static]

Call this function to compile a regular expression into a Pattern object. Special values can be assigned to mode when certain non-standard behaviors are expected from the Pattern object.

Parameters:
pattern The regular expression to compile
mode A bitwise or of flags signalling what special behaviors are wanted from this Pattern object
Returns:
If successful, compile returns a Pattern pointer. Upon failure, compile returns NULL

Pattern * Pattern::compileAndKeep ( const std::string &  pattern,
const unsigned long  mode = 0 
) [static]

Dont use this function. This function will compile a pattern, and cache the result. This will eventually be used as an optimization when people just want to call static methods using the same pattern over and over instead of first compiling the pattern and then using the compiled instance for matching.

Parameters:
pattern The regular expression to compile
mode A bitwise or of flags signalling what special behaviors are wanted from this Pattern object
Returns:
If successful, compileAndKeep returns a Pattern pointer. Upon failure, compile returns NULL.

std::string Pattern::replace ( const std::string &  pattern,
const std::string &  str,
const std::string &  replaceWith,
const unsigned long  mode = 0 
) [static]

Searches through str and replaces all substrings matched by pattern with replaceWith. replaceWith may contain backreferences (e.g. ) to capture groups. A typical invocation looks like:

Pattern::replace("(a+)b(c+)", "abcccbbabcbabc", "\\2b\\1");

which would replace abcccbbabcbabc with cccbabbcbabcba.

Parameters:
pattern The regular expression
str The string in which to perform replacements
replaceWith The replacement text
mode The special mode requested of the Pattern during the replacement process
Returns:
The text with the replacement string substituted where necessary

std::vector< std::string > Pattern::split ( const std::string &  pattern,
const std::string &  str,
const bool  keepEmptys = 0,
const unsigned long  limit = 0,
const unsigned long  mode = 0 
) [static]

Splits the specified string over occurrences of the specified pattern. Empty strings can be optionally ignored. The number of strings returned is configurable. A typical invocation looks like:

std::string str(strSize, '');
FILE * fp = fopen(fileName, "r");
fread((char*)str.data(), strSize, 1, fp);
fclose(fp);

std::vector<std::string> lines = Pattern::split("[\r\n]+", str, true);

Parameters:
pattern The regular expression
replace The string to split
keepEmptys Whether or not to keep empty strings
limit The maximum number of splits to make
mode The special mode requested of the Pattern during the split process
Returns:
All substrings of str split across pattern.

std::vector< std::string > Pattern::findAll ( const std::string &  pattern,
const std::string &  str,
const unsigned long  mode = 0 
) [static]

Finds all the instances of the specified pattern within the string. You should be careful to only pass patterns with a minimum length of one. For example, the pattern a* can be matched by an empty string, so instead you should pass a+ since at least one character must be matched. A typical invocation of findAll looks like:

std::vector<td::string> numbers = Pattern::findAll("\\d+", string);

Parameters:
pattern The pattern for which to search
str The string to search
mode The special mode requested of the Pattern during the find process
Returns:
All instances of pattern in str

bool Pattern::matches ( const std::string &  pattern,
const std::string &  str,
const unsigned long  mode = 0 
) [static]

Determines if an entire string matches the specified pattern

Parameters:
pattern The pattern for to match
str The string to match
mode The special mode requested of the Pattern during the replacement process
Returns:
True if str is recognized by pattern

bool Pattern::registerPattern ( const std::string &  name,
const std::string &  pattern,
const unsigned long  mode = 0 
) [static]

Registers a pattern under a specific name for use in later compilations. A typical invocation and later use looks like:

Pattern::registerPattern("ip", "(?:\\d{1,3}\\.){3}\\d{1,3}");
Pattern * p1 = Pattern::compile("{ip}:\\d+");
Pattern * p2 = Pattern::compile("Connection from ({ip}) on port \\d+");

Multiple calls to registerPattern with the same name will result in the pattern getting overwritten.

Parameters:
name The name to give to the pattern
pattern The pattern to register
mode Any special flags to use when compiling pattern
Returns:
Success/Failure. Fails only if pattern has invalid syntax

void Pattern::unregisterPatterns (  )  [static]

Clears the pattern registry

void Pattern::clearPatternCache (  )  [static]

Don't use

std::pair< std::string, int > Pattern::findNthMatch ( const std::string &  pattern,
const std::string &  str,
const int  matchNum,
const unsigned long  mode = 0 
) [static]

Searches through a string for the nth match of the given pattern in the string. Match indeces start at zero, not one. A typical invocation looks like this:

std::pair<std::string, int> match = Pattern::findNthMatch("\\d{1,3}", "192.168.1.101:22", 1);
printf("%s %i\n", match.first.c_str(), match.second);

Output: 168 4

Parameters:
pattern The pattern for which to search
str The string to search
matchNum Which match to find
mode Any special flags to use during the matching process
Returns:
A string and an integer. The string is the string matched. The integer is the starting location of the matched string in str. You can check for success/failure by making sure that the integer returned is greater than or equal to zero.

unsigned long Pattern::getFlags (  )  const

Returns the flags used during compilation of this pattern

Returns:
The flags used during compilation of this pattern

std::string Pattern::getPattern (  )  const

Returns the regular expression this pattern represents

Returns:
The regular expression this pattern represents

Matcher * Pattern::createMatcher ( const std::string &  str  ) 

Creates a matcher object using the specified string and this pattern.

Parameters:
str The string to match against
Returns:
A new matcher using object using this pattern and the specified string


Member Data Documentation

std::map< std::string, Pattern * > Pattern::compiledPatterns [static, protected]

This currently is not used, so don't try to do anything with it. Holds all the compiled patterns for quick access.

std::map< std::string, std::pair< std::string, unsigned long > > Pattern::registeredPatterns [static, protected]

Holds all of the registered patterns as strings. Due to certain problems with compilation of patterns, especially with capturing groups, this seemed to be the best way to do it.

std::map<NFANode*, bool> Pattern::nodes [protected]

Holds all the NFA nodes used. This makes deletion of a pattern, as well as clean-up from an unsuccessful compile much easier and faster.

Matcher* Pattern::matcher [protected]

Used when methods like split are called. The matcher class uses a lot of dynamic memeory, so having an instance increases speedup of certain operations.

NFANode* Pattern::head [protected]

The front node of the NFA.

std::string Pattern::pattern [protected]

The actual regular expression we rerpesent

bool Pattern::error [protected]

Flag used during compilation. Once the pattern is successfully compiled, error is no longer used.

int Pattern::curInd [protected]

Used during compilation to keep track of the current index into pattern. Once the pattern is successfully compiled, error is no longer used.

int Pattern::groupCount [protected]

The number of capture groups this contains.

int Pattern::nonCapGroupCount [protected]

The number of non-capture groups this contains.

unsigned long Pattern::flags [protected]

The flags specified when this was compiled.

const unsigned long Pattern::MULTILINE_MATCHING = 0x08 [static]

^ and $ should anchor to the beginning and ending of lines, not all input

const unsigned long Pattern::UNIX_LINE_MODE = 0x10 [static]

When enabled, only instances of
</codes> are recognized as line terminators


The documentation for this class was generated from the following files:
Generated on Fri Apr 27 13:12:36 2007 for Highlight Code Converter by  doxygen 1.5.2