com.theloutons.search.specialreaders
Class HTMLReader

java.lang.Object
  extended bycom.theloutons.search.specialreaders.HTMLReader
All Implemented Interfaces:
DocumentAnalyze

public class HTMLReader
extends java.lang.Object
implements DocumentAnalyze

Author:
Tom Louton

Constructor Summary
HTMLReader()
           
 
Method Summary
 org.apache.lucene.document.Document getDocument()
          The just returns the document created.
 void setFile(java.io.File f, java.io.PrintWriter log)
          This sets the files and does the extraction.
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

HTMLReader

public HTMLReader()
Method Detail

getDocument

public org.apache.lucene.document.Document getDocument()
Description copied from interface: DocumentAnalyze
The just returns the document created.

Specified by:
getDocument in interface DocumentAnalyze
Returns:
the lucene document with the text extracted from the file f (below) was extracted.

setFile

public void setFile(java.io.File f,
                    java.io.PrintWriter log)
Description copied from interface: DocumentAnalyze
This sets the files and does the extraction. Of course, one could use the getDocument to do the extraction too.

Specified by:
setFile in interface DocumentAnalyze
Parameters:
f - the file from which the tokens are to be extracted.
log - a log file. I suggest that where ever a doc=null;return is done, write an reason to the log.

toString

public java.lang.String toString()