public interface ScoringFilter extends Configurable, FieldPluggable
| Modifier and Type | Field and Description |
|---|---|
static String |
X_POINT_ID
The name of the extension point.
|
| Modifier and Type | Method and Description |
|---|---|
void |
distributeScoreToOutlinks(String fromUrl,
WebPage page,
Collection<ScoreDatum> scoreData,
int allCount)
Distribute score value from the current page to all its outlinked pages.
|
float |
generatorSortValue(String url,
WebPage page,
float initSort)
This method prepares a sort value for the purpose of sorting and selecting
top N scoring pages during fetchlist generation.
|
float |
indexerScore(String url,
NutchDocument doc,
WebPage page,
float initScore)
This method calculates a Lucene document boost.
|
void |
initialScore(String url,
WebPage page)
Set an initial score for newly discovered pages.
|
void |
injectedScore(String url,
WebPage page)
Set an initial score for newly injected pages.
|
void |
updateScore(String url,
WebPage page,
List<ScoreDatum> inlinkedScoreData)
This method calculates a new score during table update, based on the values
contributed by inlinked pages.
|
getConf, setConfgetFieldsstatic final String X_POINT_ID
void injectedScore(String url, WebPage page) throws ScoringFilterException
url - url of the pagepage - new page. Filters will modify it in-place.ScoringFilterExceptionvoid initialScore(String url, WebPage page) throws ScoringFilterException
url - url of the pagepage - ScoringFilterExceptionfloat generatorSortValue(String url, WebPage page, float initSort) throws ScoringFilterException
url - url of the pagedatum - page row. Modifications will be persisted.initSort - initial sort value, or a value from previous filters in chainScoringFilterExceptionvoid distributeScoreToOutlinks(String fromUrl, WebPage page, Collection<ScoreDatum> scoreData, int allCount) throws ScoringFilterException
fromUrl - url of the source pagerow - page rowscoreData - A list of OutlinkedScoreDatums for every outlink. These
OutlinkedScoreDatums will be passed to
#updateScore(String, OldWebTableRow, List) for every
outlinked URL.allCount - number of all collected outlinks from the source pageScoringFilterExceptionvoid updateScore(String url, WebPage page, List<ScoreDatum> inlinkedScoreData) throws ScoringFilterException
url - url of the pagepage - inlinked - list of OutlinkedScoreDatums for all inlinks pointing to
this URL.ScoringFilterExceptionfloat indexerScore(String url, NutchDocument doc, WebPage page, float initScore) throws ScoringFilterException
url - url of the pagedoc - document. NOTE: this already contains all information collected by
indexing filters. Implementations may modify this instance, in
order to store/remove some information.row - page rowinitScore - initial boost value for the Lucene document.ScoringFilterExceptionCopyright © 2015 The Apache Software Foundation