Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a plain-text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which pages or sections of your site they are allowed or not allowed to access. It follows the Robots Exclusion Protocol (REP). Crawlers check this file before crawling any other page on your site.

Question 2

Does robots.txt prevent pages from being indexed?

Accepted Answer

No. Disallowing a URL in robots.txt prevents crawlers from fetching the page, but Google can still index the URL if other pages link to it. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header on the page itself. For truly sensitive content, block access via authentication — robots.txt is a gentleman's agreement, not a security measure.

Question 3

What is the Crawl-delay directive?

Accepted Answer

Crawl-delay tells a crawler to wait N seconds between requests to reduce server load. Google does not respect Crawl-delay in robots.txt — instead set your preferred crawl rate in Google Search Console. Other crawlers (Bing, Yandex, etc.) do respect it.

Question 4

Can I use wildcards in robots.txt?

Accepted Answer

Yes. Google and most major crawlers support two wildcards: * matches zero or more characters (e.g. Disallow: /wp-admin/* blocks all URLs starting with /wp-admin/), and $ matches the end of a URL (e.g. Disallow: /*.pdf$ blocks all PDFs). Other crawlers may not support wildcards — check their documentation.

Robots.txt Generator

Robots.txt directives explained

User-agent

Allow

Disallow

Crawl-delay

Sitemap

Wildcards

⚠️ robots.txt is not a security measure

Frequently asked questions

Related SEO tools

Control what crawlers see