Package org.apache.poi.extractor
Class OLE2ExtractorFactory
java.lang.Object
org.apache.poi.extractor.OLE2ExtractorFactory
Figures out the correct POIOLE2TextExtractor for your supplied
document, and returns it.
Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath
Note 2 - for text extractor creation across all formats, use
ExtractorFactory
contained within
the OOXML jar.
Note 3 - rather than using this, for most cases you would be better off switching to Apache Tika instead!
-
Method Summary
Modifier and TypeMethodDescriptionstatic <T extends POITextExtractor>
TcreateExtractor
(InputStream input) static POITextExtractor
createExtractor
(DirectoryNode poifsDir) Create the Extractor, if possible.static <T extends POITextExtractor>
Tstatic Boolean
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.static POITextExtractor[]
Returns an array of text extractors, one for each of the embedded documents in the file (if there are any).static boolean
Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.static boolean
Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.static void
setAllThreadsPreferEventExtractors
(Boolean preferEventExtractors) Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.static void
setThreadPrefersEventExtractors
(boolean preferEventExtractors) Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
-
Method Details
-
getThreadPrefersEventExtractors
public static boolean getThreadPrefersEventExtractors()Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.- Returns:
- true if event extractors should be preferred in the current thread, fals otherwise.
-
getAllThreadsPreferEventExtractors
Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.- Returns:
- true if event extractors should be preferred in all threads, fals otherwise.
-
setThreadPrefersEventExtractors
public static void setThreadPrefersEventExtractors(boolean preferEventExtractors) Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.- Parameters:
preferEventExtractors
- If this threads should prefer event based extractors.
-
setAllThreadsPreferEventExtractors
Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.- Parameters:
preferEventExtractors
- If all threads should prefer event based extractors.
-
getPreferEventExtractor
public static boolean getPreferEventExtractor()Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.- Returns:
- If the current thread should use event based extractors.
-
createExtractor
- Throws:
IOException
-
createExtractor
- Throws:
IOException
-
createExtractor
Create the Extractor, if possible. Generally needs the Scratchpad jar. Note that this won't check for embedded OOXML resources either, useExtractorFactory
for that.- Parameters:
poifsDir
- TheDirectoryNode
pointing to a document.- Returns:
- The resulting
POITextExtractor
, an exception is thrown if no TextExtractor can be created for some reason. - Throws:
IOException
- If converting theDirectoryNode
into a HSSFWorkbook failsOldFileFormatException
- If theDirectoryNode
points to a format of an unsupported version of Excel.IllegalArgumentException
- If creating the Extractor fails
-
getEmbededDocsTextExtractors
public static POITextExtractor[] getEmbededDocsTextExtractors(POIOLE2TextExtractor ext) throws IOException Returns an array of text extractors, one for each of the embedded documents in the file (if there are any). If there are no embedded documents, you'll get back an empty array. Otherwise, you'll get one openPOITextExtractor
for each embedded file.- Parameters:
ext
- The extractor to look at for embedded documents- Returns:
- An array of resulting extractors. Empty if no embedded documents are found.
- Throws:
IOException
- If converting theDirectoryNode
into a HSSFWorkbook failsOldFileFormatException
- If theDirectoryNode
points to a format of an unsupported version of Excel.IllegalArgumentException
- If creating the Extractor fails
-