Package org.apache.any23.extractor
Class SingleDocumentExtraction
- java.lang.Object
-
- org.apache.any23.extractor.SingleDocumentExtraction
-
public class SingleDocumentExtraction extends Object
This class acts as a facade where all extractors (for a given MIMEType) can be called on a single document. Extractors are automatically filtered by MIMEType.
-
-
Constructor Summary
Constructors Constructor Description SingleDocumentExtraction(org.apache.any23.configuration.Configuration configuration, org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorFactory<?> factory, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler.SingleDocumentExtraction(org.apache.any23.configuration.Configuration configuration, org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorGroup extractors, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, list of extractors and output triple handler.SingleDocumentExtraction(org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorFactory<?> factory, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description StringgetDetectedMIMEType()Returns the detected mimetype for the givenDocumentSource.List<org.apache.any23.extractor.Extractor>getMatchingExtractors()StringgetParserEncoding()booleanhasMatchingExtractors()Check whether the givenDocumentSourcecontent activates of not at least an extractor.SingleDocumentExtractionReportrun()Triggers the execution of all theExtractorregistered to this class using the default extraction parameters.SingleDocumentExtractionReportrun(org.apache.any23.extractor.ExtractionParameters extractionParameters)Triggers the execution of all theExtractorregistered to this class using the specified extraction parameters.voidsetLocalCopyFactory(LocalCopyFactory copyFactory)Sets the internal factory for generating the document local copy, ifnulltheMemCopyFactorywill be used.voidsetMIMETypeDetector(org.apache.any23.mime.MIMETypeDetector detector)Sets the internal mime type detector, ifnullmimetype detection will be skipped and all extractors will be activated.voidsetParserEncoding(String encoding)Sets the document parser encoding.
-
-
-
Constructor Detail
-
SingleDocumentExtraction
public SingleDocumentExtraction(org.apache.any23.configuration.Configuration configuration, org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorGroup extractors, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, list of extractors and output triple handler.- Parameters:
configuration- configuration applied during extraction.in- input document source.extractors- list of extractors to be applied.output- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(org.apache.any23.configuration.Configuration configuration, org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorFactory<?> factory, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler.- Parameters:
configuration- configuration applied during extraction.in- input document source.factory- the extractors factory.output- output triple handler.
-
SingleDocumentExtraction
public SingleDocumentExtraction(org.apache.any23.source.DocumentSource in, org.apache.any23.extractor.ExtractorFactory<?> factory, org.apache.any23.writer.TripleHandler output)Builds an extractor by the specification of document source, extractors factory and output triple handler, using theDefaultConfiguration.- Parameters:
in- input document source.factory- the extractors factory.output- output triple handler.
-
-
Method Detail
-
setLocalCopyFactory
public void setLocalCopyFactory(LocalCopyFactory copyFactory)
Sets the internal factory for generating the document local copy, ifnulltheMemCopyFactorywill be used.- Parameters:
copyFactory- local copy factory.- See Also:
DocumentSource
-
setMIMETypeDetector
public void setMIMETypeDetector(org.apache.any23.mime.MIMETypeDetector detector)
Sets the internal mime type detector, ifnullmimetype detection will be skipped and all extractors will be activated.- Parameters:
detector- detector instance.
-
run
public SingleDocumentExtractionReport run(org.apache.any23.extractor.ExtractionParameters extractionParameters) throws org.apache.any23.extractor.ExtractionException, IOException
Triggers the execution of all theExtractorregistered to this class using the specified extraction parameters.- Parameters:
extractionParameters- the parameters applied to the run execution.- Returns:
- the report generated by the extraction.
- Throws:
org.apache.any23.extractor.ExtractionException- if an error occurred during the data extraction.IOException- if an error occurred during the data access.
-
run
public SingleDocumentExtractionReport run() throws IOException, org.apache.any23.extractor.ExtractionException
Triggers the execution of all theExtractorregistered to this class using the default extraction parameters.- Returns:
- the extraction report.
- Throws:
IOException- if there is an error reading input from the document sourceorg.apache.any23.extractor.ExtractionException- if there is an error duing distraction
-
getDetectedMIMEType
public String getDetectedMIMEType() throws IOException
Returns the detected mimetype for the givenDocumentSource.- Returns:
- string containing the detected mimetype.
- Throws:
IOException- if an error occurred while accessing the data.
-
hasMatchingExtractors
public boolean hasMatchingExtractors() throws IOExceptionCheck whether the givenDocumentSourcecontent activates of not at least an extractor.- Returns:
trueif at least an extractor is activated,falseotherwise.- Throws:
IOException- if there is an error locating matching extractors
-
getMatchingExtractors
public List<org.apache.any23.extractor.Extractor> getMatchingExtractors()
- Returns:
- the list of all the activated extractors for the given
DocumentSource.
-
getParserEncoding
public String getParserEncoding()
- Returns:
- the configured parsing encoding.
-
setParserEncoding
public void setParserEncoding(String encoding)
Sets the document parser encoding.- Parameters:
encoding- parser encoding.
-
-