com.theloutons.search.specialreaders
Interface DocumentAnalyze
- All Known Implementing Classes:
- CSVReader, DOCReader, HTMLReader, PDFReader, PPTReader, RTFReader, TXTReader, XLSReader, XMLReader
- public interface DocumentAnalyze
- Author:
- Tom Louton
This is to define the way in which new file formats can be
added.
Method Summary |
org.apache.lucene.document.Document |
getDocument()
The just returns the document created. |
void |
setFile(java.io.File f,
java.io.PrintWriter log)
This sets the files and does the extraction. |
getDocument
public org.apache.lucene.document.Document getDocument()
- The just returns the document created.
- Returns:
- the lucene document with the text extracted from
the file f (below) was extracted.
setFile
public void setFile(java.io.File f,
java.io.PrintWriter log)
- This sets the files and does the extraction. Of course, one
could use the getDocument to do the extraction too.
- Parameters:
f
- the file from which the tokens are to be extracted.log
- a log file. I suggest that where ever a doc=null;return
is done, write an reason to the log.