Class EventBasedExcelExtractor

All Implemented Interfaces:
Closeable, AutoCloseable, ExcelExtractor

public class EventBasedExcelExtractor extends POIOLE2TextExtractor implements ExcelExtractor
A text extractor for Excel files, that is based on the HSSF EventUserModel API. It will typically use less memory than ExcelExtractor, but may not provide the same richness of formatting. Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

See Also:
  • Constructor Details

    • EventBasedExcelExtractor

      public EventBasedExcelExtractor(DirectoryNode dir)
    • EventBasedExcelExtractor

      public EventBasedExcelExtractor(POIFSFileSystem fs)
  • Method Details

    • getDocSummaryInformation

      public DocumentSummaryInformation getDocSummaryInformation()
      Would return the document information metadata for the document, if we supported it
      Overrides:
      getDocSummaryInformation in class POIOLE2TextExtractor
      Returns:
      The Document Summary Information or null if it could not be read for this document.
    • getSummaryInformation

      public SummaryInformation getSummaryInformation()
      Would return the summary information metadata for the document, if we supported it
      Overrides:
      getSummaryInformation in class POIOLE2TextExtractor
      Returns:
      The Summary information for the document or null if it could not be read for this document.
    • setIncludeCellComments

      public void setIncludeCellComments(boolean includeComments)
      Would control the inclusion of cell comments from the document, if we supported it
      Specified by:
      setIncludeCellComments in interface ExcelExtractor
      Parameters:
      includeComments - true if cell comments should be included
    • setIncludeHeadersFooters

      public void setIncludeHeadersFooters(boolean includeHeadersFooters)
      Would control the inclusion of headers and footers from the document, if we supported it
      Specified by:
      setIncludeHeadersFooters in interface ExcelExtractor
      Parameters:
      includeHeadersFooters - true if headers and footers should be included
    • setIncludeSheetNames

      public void setIncludeSheetNames(boolean includeSheetNames)
      Should sheet names be included? Default is true
      Specified by:
      setIncludeSheetNames in interface ExcelExtractor
      Parameters:
      includeSheetNames - true if the sheet names should be included
    • setFormulasNotResults

      public void setFormulasNotResults(boolean formulasNotResults)
      Should we return the formula itself, and not the result it produces? Default is false
      Specified by:
      setFormulasNotResults in interface ExcelExtractor
      Parameters:
      formulasNotResults - true if the formula itself is returned
    • getText

      public String getText()
      Retreives the text contents of the file
      Specified by:
      getText in interface ExcelExtractor
      Specified by:
      getText in class POITextExtractor
      Returns:
      All the text from the document