|
InfociousBot is the crawler (or spider) of the Infocious Web Search Engine, and we
use
it to download and process documents published on the Web.
Our crawler's mission is to find every available page on the Web and fetch it for
our users.
InfociousBot operates in two different modes:
- Surface Web crawling: In this mode InfociousBot operates
by following links on the Web in order to discover and download Web pages.
Once a page is downloaded, it is analyzed for the outgoing links and
InfociousBot follows these links to find more pages and repeat the process.
- Hidden (or Deep) Web crawling: Besides the pages that can be found
by simply following links on the Web, there are also pages which are not accessible
by links. These pages are accessible only after submitting a query to a Web search
interface (such as the USPTO database). In this case, our crawler issues queries
to the Web sites in order to retrieve pages from within the databases and index
them.
If your site has been recently visited by InfociousBot it is because we
consider the content that you provide interesting for our users and we
wish to index it.
During our downloads we do our best to be courteous
to
the sites that we crawl and we adhere to the rules that you specify in
your robots.txt files.
In addition, InfociousBot respects the NOINDEX, NOARCHIVE and NOFOLLOW
directives of the "ROBOTS" meta-tag that you can add to individual Web pages.
At present, we download one page every few
seconds
(ranging from 2 to 10) from a Web site. We periodically
re-download a site to capture important changes on the Web.
If you do not want the InfociousBot (or any other) crawler to download
a particular portion
of your site, you can specify this by writing a simple robots.txt file
(see the robot exclusion protocol):
http://www.robotstxt.org/wc/exclusion.html and include
it in your Web site. Our robot's User-agent string is InfociousBot.
InfociousBot respects the robots.txt and NOINDEX, NOARCHIVE, NOFOLLOW directives
in both modes of operation.
If you have any questions or concerns regarding our crawler you can contact us at:
crawler@infocious.com
|