public class IndexUtil extends Object
| Constructor and Description |
|---|
IndexUtil(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
index(String key,
WebPage page)
Index a
Webpage, here we add the following fields:
id: default uniqueKey for the NutchDocument.
digest: Digest is used to identify pages (like unique ID) and
is used to remove duplicates during the dedup procedure. |
public IndexUtil(Configuration conf)
public NutchDocument index(String key, WebPage page)
Webpage, here we add the following fields:
NutchDocument.MD5Signature or
TextProfileSignature.key - The key of the page (reversed url).page - The Webpage.Copyright © 2015 The Apache Software Foundation