Preventing Crawling And Spam


All the leading search engines use crawlers to find out pages for their algorithmic search results. Pages that are linked from other search engine indexed pages do not need to be submitted because they are found automatically. Some search engines like Yahoo! operate a paid submission service that guarantee crawling for either a set fee or cost per click. These types of programs usually guarantee inclusion in the database, but do not guarantee specific ranking within the search results. Therefore yahoos program has been criticized by advertisers and competitors. Two major directories, the Yahoo Directory and the open directory project need manual submission and human editorial review.

Google offers Google Webmaster Tool, for which an XML Sitemap feed can be created and submitted for free to ensure that all pages are found, especially pages that aren’t discoverable by automatically following links. Search engine crawlers take many other things into its consideration while crawling a site. Not every page is indexed by the search engines. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled. To avoid undesirable content in the search indexes, webmasters can instruct spiders not to crawl certain files or directories through the standard file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine’s database by using a mete tag specific to robots. When a search engine visits a site, the located in the root directory is the first file crawled. The file is then parsed, and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled. Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches.

In March 2007, Google warned webmasters that they should prevent indexing of internal search results because those pages are considered search spam. Another thing in seo service is important and that is spam care. That same spammer is busy building back links from anywhere they can find them, including some of the webs worst neighborhoods. The spam can be sent from sites of guns, casinos, link directories and many other unimportant sites for you. It is the most prevailing problem and most of the time the spammers are disguising themselves as valid users. One of the most common forms of comment and ping back spam right now is the relatively subtle, ambiguous kind short phrases or questions that are not obviously spam, at least on face value. The more sophisticated spammers have progressed from old standbys like nice post and great blog, to more cunning things like questions (where can I download your theme?) and appeals to your helpful nature (Im having trouble subscribing to your RSS feed). Therefore it is essential for the webmasters to prevent indexing of the internal search results as these pages are considered search spam. For more details please visit