Technical SEO

robots.txt

The robots.txt file is a plain text file placed at the root of your website (e.g. example.co.uk/robots.txt) that tells search engine crawlers which pages or sections they are allowed to access. It is the first thing most bots check when they visit your site.

robots.txt Syntax

A basic robots.txt file looks like this:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Sitemap: https://example.co.uk/sitemap.xml

User-agent: * — applies the rules to all crawlers
Allow: / — permits crawling of the entire site
Disallow: /admin/ — blocks the /admin/ directory from being crawled

Important Distinction: Crawling vs Indexing

Blocking a URL in robots.txt prevents crawlers from visiting it, but does not prevent it from being indexed. If another site links to a disallowed URL, Google can still add that URL to its index — it just cannot read the content. To prevent indexing, use a noindex meta tag instead.

Common Mistakes

Blocking CSS and JavaScript — this prevents Google from rendering your pages and can severely harm rankings
Accidental full-site block — Disallow: / under User-agent: * blocks all crawlers from the entire site
Missing sitemap reference — always include Sitemap: https://yourdomain/sitemap.xml
Blocking paginated URLs unnecessarily — pagination is important for e-commerce and blog crawlability

How to Fix a Missing robots.txt

Create a plain text file named robots.txt and upload it to the root of your website (the same directory as your homepage). The absolute minimum content for most sites is:

User-agent: *
Allow: /

Sitemap: https://yourdomain.co.uk/sitemap.xml

Test it is working by visiting yourdomain.co.uk/robots.txt directly in your browser.

Check your own website Run a free SEO audit and see if this issue affects your site.

Run Free Audit →