The scanners package contains classes responsible for the tertiary identification of tags. The lower level classes in the {@link org.htmlparser.lexer.Lexer lexer} package convert byte streams to characters and characters to nodes (via the {@link org.htmlparser.NodeFactory NodeFactory}). In the case of tags, the scanners in this package can then complete the tag or override the current tag and return an augmented tag. The existing implementation of the {@link org.htmlparser.scanners.CompositeTagScanner composite tag scanner}, for example, gathers the children of composite tags, identifying the nested structure of HTML documents. The {@link org.htmlparser.scanners.ScriptScanner script scanner} overrides the nodes returned by the lexer and creates a tag containing a single string that is the script code.
You might need to create a scanner (that implements the {@link org.htmlparser.scanners.Scanner Scanner} interface) if the text you are trying to parse doesn't look like HTML, as is the case for the script scanner, or the normal processing of tags by nesting their structure is inadequate.