robots.txt
The robots.txt file is a plain text file placed at the root of your website (e.g. example.co.uk/robots.txt) that tells search engine crawlers which pages or sections they are allowed to access. It is the first thing most bots check when they visit your site.
robots.txt Syntax
A basic robots.txt file looks like this:
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
Sitemap: https://example.co.uk/sitemap.xmlUser-agent: *— applies the rules to all crawlersAllow: /— permits crawling of the entire siteDisallow: /admin/— blocks the /admin/ directory from being crawled
Important Distinction: Crawling vs Indexing
Blocking a URL in robots.txt prevents crawlers from visiting it, but does not prevent it from being indexed. If another site links to a disallowed URL, Google can still add that URL to its index — it just cannot read the content. To prevent indexing, use a noindex meta tag instead.
Common Mistakes
- Blocking CSS and JavaScript — this prevents Google from rendering your pages and can severely harm rankings
- Accidental full-site block —
Disallow: /underUser-agent: *blocks all crawlers from the entire site - Missing sitemap reference — always include
Sitemap: https://yourdomain/sitemap.xml - Blocking paginated URLs unnecessarily — pagination is important for e-commerce and blog crawlability
How to Fix a Missing robots.txt
Create a plain text file named robots.txt and upload it to the root of your website (the same directory as your homepage). The absolute minimum content for most sites is:
User-agent: *
Allow: /
Sitemap: https://yourdomain.co.uk/sitemap.xmlTest it is working by visiting yourdomain.co.uk/robots.txt directly in your browser.