Class XSSFEventBasedExcelExtractor

All Implemented Interfaces:
Closeable, AutoCloseable, ExcelExtractor
Direct Known Subclasses:
XSSFBEventBasedExcelExtractor

public class XSSFEventBasedExcelExtractor extends POIXMLTextExtractor implements ExcelExtractor
Implementation of a text extractor from OOXML Excel files that uses SAX event based parsing.
  • Field Details

    • container

      protected OPCPackage container
    • properties

      protected POIXMLProperties properties
    • locale

      protected Locale locale
    • includeTextBoxes

      protected boolean includeTextBoxes
    • includeSheetNames

      protected boolean includeSheetNames
    • includeCellComments

      protected boolean includeCellComments
    • includeHeadersFooters

      protected boolean includeHeadersFooters
    • formulasNotResults

      protected boolean formulasNotResults
    • concatenatePhoneticRuns

      protected boolean concatenatePhoneticRuns
  • Constructor Details

  • Method Details

    • main

      public static void main(String[] args) throws Exception
      Throws:
      Exception
    • setIncludeSheetNames

      public void setIncludeSheetNames(boolean includeSheetNames)
      Should sheet names be included? Default is true
      Specified by:
      setIncludeSheetNames in interface ExcelExtractor
      Parameters:
      includeSheetNames - true if the sheet names should be included
    • getIncludeSheetNames

      public boolean getIncludeSheetNames()
      Returns:
      whether to include sheet names
      Since:
      3.16-beta3
    • setFormulasNotResults

      public void setFormulasNotResults(boolean formulasNotResults)
      Should we return the formula itself, and not the result it produces? Default is false
      Specified by:
      setFormulasNotResults in interface ExcelExtractor
      Parameters:
      formulasNotResults - true if the formula itself is returned
    • getFormulasNotResults

      public boolean getFormulasNotResults()
      Returns:
      whether to include formulas but not results
      Since:
      3.16-beta3
    • setIncludeHeadersFooters

      public void setIncludeHeadersFooters(boolean includeHeadersFooters)
      Should headers and footers be included? Default is true
      Specified by:
      setIncludeHeadersFooters in interface ExcelExtractor
      Parameters:
      includeHeadersFooters - true if headers and footers should be included
    • getIncludeHeadersFooters

      public boolean getIncludeHeadersFooters()
      Returns:
      whether or not to include headers and footers
      Since:
      3.16-beta3
    • setIncludeTextBoxes

      public void setIncludeTextBoxes(boolean includeTextBoxes)
      Should text from textboxes be included? Default is true
    • getIncludeTextBoxes

      public boolean getIncludeTextBoxes()
      Returns:
      whether or not to extract textboxes
      Since:
      3.16-beta3
    • setIncludeCellComments

      public void setIncludeCellComments(boolean includeCellComments)
      Should cell comments be included? Default is false
      Specified by:
      setIncludeCellComments in interface ExcelExtractor
      Parameters:
      includeCellComments - true if cell comments should be included
    • getIncludeCellComments

      public boolean getIncludeCellComments()
      Returns:
      whether cell comments should be included
      Since:
      3.16-beta3
    • setConcatenatePhoneticRuns

      public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
      Concatenate text from <rPh> text elements in SharedStringsTable Default is true;
      Parameters:
      concatenatePhoneticRuns - true if runs should be concatenated, false otherwise
    • setLocale

      public void setLocale(Locale locale)
    • getLocale

      public Locale getLocale()
      Returns:
      locale
      Since:
      3.16-beta3
    • getPackage

      public OPCPackage getPackage()
      Returns the opened OPCPackage container.
      Overrides:
      getPackage in class POIXMLTextExtractor
      Returns:
      the opened OPCPackage
    • getCoreProperties

      public POIXMLProperties.CoreProperties getCoreProperties()
      Returns the core document properties
      Overrides:
      getCoreProperties in class POIXMLTextExtractor
      Returns:
      the core document properties
    • getExtendedProperties

      public POIXMLProperties.ExtendedProperties getExtendedProperties()
      Returns the extended document properties
      Overrides:
      getExtendedProperties in class POIXMLTextExtractor
      Returns:
      the extended document properties
    • getCustomProperties

      public POIXMLProperties.CustomProperties getCustomProperties()
      Returns the custom document properties
      Overrides:
      getCustomProperties in class POIXMLTextExtractor
      Returns:
      the custom document properties
    • processSheet

      public void processSheet(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor, Styles styles, Comments comments, SharedStrings strings, InputStream sheetInputStream) throws IOException, SAXException
      Processes the given sheet
      Throws:
      IOException
      SAXException
    • createSharedStringsTable

      protected SharedStrings createSharedStringsTable(XSSFReader xssfReader, OPCPackage container) throws IOException, SAXException
      Throws:
      IOException
      SAXException
    • getText

      public String getText()
      Processes the file and returns the text
      Specified by:
      getText in interface ExcelExtractor
      Specified by:
      getText in class POITextExtractor
      Returns:
      All the text from the document
    • close

      public void close() throws IOException
      Description copied from class: POITextExtractor
      Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class POIXMLTextExtractor
      Throws:
      IOException