Tanl Linguistic Pipeline

Tanl::Text::HtmlTokenizer Class Reference

Similar to StringTokenizer, except that it skips HTML tags. More...

#include <HtmlTokenizer.h>

Inheritance diagram for Tanl::Text::HtmlTokenizer:
Tanl::Text::StringTokenizer

List of all members.

Public Member Functions

 HtmlTokenizer (istream &is, char const *delim=delimitersNL)
 Tokenize into words delimited by.
 HtmlTokenizer (char const *s, char const *end=0, char const *delim=delimitersNL)
 Tokenize into words delimited by.
char const * next ()
char const * hasNext ()
 Tell whether there is a next token.

Static Public Attributes

static char const delimitersNL [] = " \t\n\r"
 Default newline delimiters.

Detailed Description

Similar to StringTokenizer, except that it skips HTML tags.


Constructor & Destructor Documentation

Tanl::Text::HtmlTokenizer::HtmlTokenizer ( istream &  is,
char const *  delim = delimitersNL 
) [inline]

Tokenize into words delimited by.

Parameters:
delim. Read text from stream
is input stream
delim string of deliminting characters
Tanl::Text::HtmlTokenizer::HtmlTokenizer ( char const *  s,
char const *  end = 0,
char const *  delim = delimitersNL 
) [inline]

Tokenize into words delimited by.

Parameters:
delim. Read from text between
start and
end. 
start string beginning
end string end
delim string of deliminting characters

Member Function Documentation

char const * Tanl::Text::HtmlTokenizer::next (  ) 
Returns:
next token.

Reimplemented from Tanl::Text::StringTokenizer.


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines
 
Copyright © 2005-2011 G. Attardi. Generated on 4 Mar 2011 by doxygen 1.6.1.