Categories
Recent Posts
Archives
Pages
Blogroll
Robots.txt – Very important in your proxy.
Having a good robots.txt file is very important for not only yourself, but also for the sites that your users surf. To explain why first I will explain what it is.
Robots.txt – a file used to exclude some or all robots from crawling some or all the files or directories on a website. This file should be placed in your website’s root directory.
So know you know what it is, the reason why you need a good robots.txt for your proxy is to avoid your proxyfied pages from being indexed. So how can they get indexed when the link doesnt appear anywhere? Well it can, your users could easily post your proxyfied links on forums, blogs or on their websites. Spiders will come across these links and then start browsing the web through your proxy. Each page will then get indexed.
The pages will then get cached and your bandwidth, along with the site being surfed bandwidth, will be being used. Further to this, your site could get reported for Spam or Duplicated content. This could then lead to your proxy getting banned from adsense or get delisted from search engines – not good at all.
So to stop this from happening make a new file called “robots.txt” in the top directoy of your proxy, for example, http://yourproxydomain.com/robots.txt. Then open up the file and add the appropriate text.
For PHProxies
User-agent: *
Disallow: /index.php?q*
For Glype Proxies
User-agent: *
Disallow: /browse.php
For CGI Proxies
User-agent: *
Disallow: /nph-proxy.pl/
OR
User-agent: *
Disallow: /nph-proxy.cgi/
Hopefully this will hepl your proxies.
Filed under: For Proxy Owners
One Response to “Robots.txt – Very important in your proxy.”
-
Kelly Brown Says:
June 12th, 2009 at 7:15 pmGreat post! I’ll subscribe right now wth my feedreader software!








