Making a response update.
There is one more way to make the block that would be straight through HTTP response header using the X-Robots-Tag
Instead of a metatag or robots.txt
, you can also return a header X-Robots-Tag:noindex
in response to a page request. Here is an example of an HTTP response with a X-Robots-Tag
that instructs trackers not to index a page:
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
(…)
Here is more information about X-Robots-Tag: https://developers.google.com/search/reference/robots_meta_tag
Another tip about the tags <meta>
is that you can block by them specific bots like for example:
<meta name="googlebot" content="NOINDEX, NOFOLLOW">
<meta name="MSNBot" content="NOINDEX, NOFOLLOW">
Blockade by robots.txt
of the site you do not want to be indexed:
User-agent: *
Disallow: /
In the User-agent: *
the *
means that this section applies to all robots.
And the Disallow: /
tells the robot not to visit any page on the site.
Another thing, the Noindex Nofollow
should be inserted in a meta tag <meta>
not within the robots.txt
! The right thing should be:
<html>
<head>
<title>...</title>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
</head>
Either way the tag <meta>
can be ignored by some search engines, as well as the robots.txt
especially malware-robots. And the nofollow
only blocks links from the page you are on, if there is a link to your site on some other page that does not also have the nofollow
the bot can find your site by this link, whether or not having the robots.