Home     |     About Us     |     Products     |     Technology     |     Contact Us
 
 
Infocious Technology
Feature Overview
Narrow by Category
Focused Searching
Key Phrases
Related Topics
Suggestions
Personalize
Disambiguate
Spell Checking
Advanced Features
For Webmasters
InfociousBot Crawler
Site Ranking
Submit a URL
Add Infocious to Your Site



Search within our site:

InfociousBot is the crawler (or spider) of the Infocious Web Search Engine, and we use it to download and process documents published on the Web. Our crawler's mission is to find every available page on the Web and fetch it for our users.

InfociousBot operates in two different modes:

  1. Surface Web crawling: In this mode InfociousBot operates by following links on the Web in order to discover and download Web pages. Once a page is downloaded, it is analyzed for the outgoing links and InfociousBot follows these links to find more pages and repeat the process.
  2. Hidden (or Deep) Web crawling: Besides the pages that can be found by simply following links on the Web, there are also pages which are not accessible by links. These pages are accessible only after submitting a query to a Web search interface (such as the USPTO database). In this case, our crawler issues queries to the Web sites in order to retrieve pages from within the databases and index them.
If your site has been recently visited by InfociousBot it is because we consider the content that you provide interesting for our users and we wish to index it.

During our downloads we do our best to be courteous to the sites that we crawl and we adhere to the rules that you specify in your robots.txt files.

In addition, InfociousBot respects the NOINDEX, NOARCHIVE and NOFOLLOW directives of the "ROBOTS" meta-tag that you can add to individual Web pages.

At present, we download one page every few seconds (ranging from 2 to 10) from a Web site. We periodically re-download a site to capture important changes on the Web.

If you do not want the InfociousBot (or any other) crawler to download a particular portion of your site, you can specify this by writing a simple robots.txt file (see the robot exclusion protocol): http://www.robotstxt.org/wc/exclusion.html and include it in your Web site. Our robot's User-agent string is InfociousBot.

InfociousBot respects the robots.txt and NOINDEX, NOARCHIVE, NOFOLLOW directives in both modes of operation.

If you have any questions or concerns regarding our crawler you can contact us at: crawler@infocious.com

 
Previous: advanced features            Next: site ranking
 
Home      |      About Us      |      Products      |       Technology      |       Contact Us
Copyright (c) 2006 Infocious, Inc. All rights reserved.