log in / create account
Tokenization and sentence splitting
Each token is a distinct word
One root per sentence.
Titles are sentences.
Sentences may be broken at ";" and ":" if the resulting sentences are meaningful. Exceptions are:
lists of short items following a ":"
Retrieved from "
Last modified on 25 July 2013, at 13:34.