Controlling a Subdomain with Robots.txt
07-10-10
I ran across a new “your computer can’t do that” lie today.
Suppose you have a site example.com, and a subdomain dev.example.com. The dev site is just a clone of the main one, but you can make changes and test things before deploying to your live site. You might want to tell all robots to leave your dev site alone, and for good reason. The dev site could trigger Google’s duplicate content filters, or if not, Google might decide to show dev.example.com pages in it’s search results, resulting in visitors seeing things they shouldn’t.
Typically, robots.txt is used to tell robots which areas of the site they are allowed to visit. The difficulty with robots.txt, is that it only allows matches on the path segment, and not the domain. Controlling this on a domain (or subdomain) basis requires an extra robots.txt file, and a small rewrite in your .htaccess file. Here’s a simple example:
dev.robots.txt
# control robots for the dev.* subdomain user-agent: * disallow: /
robots.txt
user-agent: * disallow: /db/
.htaccess
# Uncomment the following line if rewrites are not already enabled
# RewriteEngine on
# Use a special robots.txt file for the dev subdomain
RewriteCond %{HTTP_HOST} ^dev.example.com$
RewriteRule robots\.txt dev.robots.txt [L]
If you copy this code and use it on your site, make sure to change the domain in the .htaccess sample (change it from example.com to your domain).

Hey, I currently have a site where i Use virtual subdomains.
And Alot of links that i restrict on the real domain would show on the subdomains which i then have to remove. Would using the code above “create a virtual robots.txt file for all my subdomains?
Yes, this is the basic idea of the whole thing. Your robots.txt file maps virtually behind the scenes, giving you subdomain level control.