public class Ftp extends Object implements Protocol
FtpResponse object and gets the content of the url from it.
Configurable parameters are ftp.username, ftp.password,
ftp.content.limit, ftp.timeout, ftp.server.timeout,
ftp.password, ftp.keep.connection and ftp.follow.talk
. For details see "FTP properties" section in nutch-default.xml.| Modifier and Type | Field and Description |
|---|---|
static org.slf4j.Logger |
LOG |
CHECK_BLOCKING, CHECK_ROBOTS, X_POINT_ID| Constructor and Description |
|---|
Ftp() |
| Modifier and Type | Method and Description |
|---|---|
protected void |
finalize() |
Configuration |
getConf()
Get the
Configuration object |
Collection<WebPage.Field> |
getFields() |
ProtocolOutput |
getProtocolOutput(String url,
WebPage page)
Creates a
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content received |
crawlercommons.robots.BaseRobotRules |
getRobotRules(String url,
WebPage page)
Get the robots rules for a given url
|
static void |
main(String[] args)
For debugging.
|
void |
setConf(Configuration conf)
Set the
Configuration object |
void |
setFollowTalk(boolean followTalk)
Set followTalk
|
void |
setKeepConnection(boolean keepConnection)
Set keepConnection
|
void |
setMaxContentLength(int length)
Set the point at which content is truncated.
|
void |
setTimeout(int to)
Set the timeout.
|
public void setTimeout(int to)
public void setMaxContentLength(int length)
public void setFollowTalk(boolean followTalk)
public void setKeepConnection(boolean keepConnection)
public ProtocolOutput getProtocolOutput(String url, WebPage page)
FtpResponse object corresponding to the url and returns a
ProtocolOutput object as per the content receivedgetProtocolOutput in interface Protocolurl - Text containing the ftp urldatum - The CrawlDatum object corresponding to the urlProtocolOutput object for the urlpublic void setConf(Configuration conf)
Configuration objectsetConf in interface Configurablepublic Configuration getConf()
Configuration objectgetConf in interface Configurablepublic Collection<WebPage.Field> getFields()
getFields in interface FieldPluggablepublic crawlercommons.robots.BaseRobotRules getRobotRules(String url, WebPage page)
getRobotRules in interface Protocolurl - url to checkCopyright © 2015 The Apache Software Foundation