Tuesday, September 23, 2008

Robots exclusion standard

In the old time, search enginee used to search all pages and files in a site to creat a search database. But nowaday, since every one put their personal information in there, Robots Exclusion Standard was created to keep privacy.

But there are some weakness of this system. If someone use a program to search all pages, files and link in a web site, it may eventually find some personal information. The simplest one is offline website download, like SurfOffline, which allow you download the whole website, including links from individual page. Sometime it may get you some hidden video, document, photos if the server dose not have a tide permission system.

Here is two simple example of Robots Exclucsion Standard:

# All allow all robots to all files...
User-agent: *

# No Robot allow in any files...
User-agent: *
Disallow: /


The Web Robots Pages - A Standard for Robot Exclusion

Wikipedia.org - Robots exclusion standard

SurfOffline 2.0 - Offline browser

Offline Downloader

Help For Web Beginners Home - Robot Exclusion Standard

No comments: