The stem of the names of the output files is normally derived from a component of the url . If the url contains a path name, the stem is the component of that path, less any dot-separated suffix and prefix. For example, given
http://www.vitanuova.com/inferno/old.index.htmlthe stem would be index . If there is no path name, but the url contains a domain name, the stem is the penultimate component of the domain name (eg, excluding trailing .com , and initial www , etc). For example, given
www.innerhost.vitanuova.comthe stem would be vitanuova . If all else fails, webgrab uses the stem webgrab .
Given a stem , the initial page is stored in stem . suffix where suffix is the suffix (eg, .html ) of the name of the original page. Subordinate pages are saved in a similar way in files named stem _1. suffix1, stem _2. suffix2, ... .
The options are:
-r do not fetch subcomponents (just the `raw' source of url itself)
-v print a progress report
-vv print a chatty progress report
-o " stem" use the stem as given
-p " body" Use HTTP POST instead of GET , posting body as the data
Webgrab reads the configuration file /services/webget/config (if it exists), to look for the address of an optional HTTP proxy (in the .L httpproxy entry), and list of domains for which a proxy should not be used (in the noproxy or noproxydoms entry). If symbolic network and service names might be involved, the connection server lib/cs needs to be already running.
It cannot do `secure' transfers ( https ).
Its HTML parsing is naive, but on the other hand, it is less likely to trip over HTML novelties.