Tuesday, September 23, 2008

Robots exclusion standard

In the old time, search enginee used to search all pages and files in a site to creat a search database. But nowaday, since every one put their personal information in there, Robots Exclusion Standard was created to keep privacy.

But there are some weakness of this system. If someone use a program to search all pages, files and link in a web site, it may eventually find some personal information. The simplest one is offline website download, like SurfOffline, which allow you download the whole website, including links from individual page. Sometime it may get you some hidden video, document, photos if the server dose not have a tide permission system.

Here is two simple example of Robots Exclucsion Standard:

# All allow all robots to all files...
User-agent: *
Disallow:

# No Robot allow in any files...
User-agent: *
Disallow: /




References

The Web Robots Pages - A Standard for Robot Exclusion
http://www.robotstxt.org

Wikipedia.org - Robots exclusion standard
http://en.wikipedia.org/wiki/Robots.txt

SurfOffline 2.0 - Offline browser
http://www.surfoffline.com/

Offline Downloader
http://www.offlinedownloader.com/

Help For Web Beginners Home - Robot Exclusion Standard
http://www.helpforwebbeginners.com/webmasters/robot-exclusion-standard.html

No comments: