public class FetcherJob extends NutchTool implements Tool
| Modifier and Type | Class and Description |
|---|---|
static class |
FetcherJob.FetcherMapper
Mapper class for Fetcher.
|
| Modifier and Type | Field and Description |
|---|---|
static org.slf4j.Logger |
LOG |
static String |
PARSE_KEY |
static int |
PERM_REFRESH_TIME |
static String |
PROTOCOL_REDIR |
static org.apache.avro.util.Utf8 |
REDIRECT_DISCOVERED |
static String |
RESUME_KEY |
static String |
THREADS_KEY |
currentJob, currentJobNum, numJobs, results, status| Constructor and Description |
|---|
FetcherJob() |
FetcherJob(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
int |
fetch(String batchId,
int threads,
boolean shouldResume,
int numTasks)
Run fetcher.
|
Collection<WebPage.Field> |
getFields(Job job) |
static void |
main(String[] args) |
Map<String,Object> |
run(Map<String,Object> args)
Runs the tool, using a map of arguments.
|
int |
run(String[] args) |
getProgress, getStatus, killJob, stopJobgetConf, setConfclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetConf, setConfpublic static final String PROTOCOL_REDIR
public static final int PERM_REFRESH_TIME
public static final org.apache.avro.util.Utf8 REDIRECT_DISCOVERED
public static final String RESUME_KEY
public static final String PARSE_KEY
public static final String THREADS_KEY
public static final org.slf4j.Logger LOG
public FetcherJob()
public FetcherJob(Configuration conf)
public Collection<WebPage.Field> getFields(Job job)
public Map<String,Object> run(Map<String,Object> args) throws Exception
NutchToolpublic int fetch(String batchId, int threads, boolean shouldResume, int numTasks) throws Exception
batchId - batchId (obtained from Generator) or null to fetch all generated
fetchliststhreads - number of threads per map taskshouldResume - numTasks - number of fetching tasks (reducers). If set to < 1 then use the
default, which is mapred.map.tasks.ExceptionCopyright © 2015 The Apache Software Foundation