What robots.txt does

Robots.txt is a text file placed at your domain root (yourdomain.com/robots.txt). It tells well-behaved crawlers — Googlebot, Bing, and other search engine bots — which pages they are allowed to access. It is a set of instructions, not a security measure: bots that choose to ignore it can still access those URLs.

How to check your robots.txt

The fastest check: type your domain followed by /robots.txt in your browser:

https://yourdomain.com/robots.txt

If you get a 404, your site has no robots.txt — this is fine. Without a robots.txt, all crawlers have full access to everything.

Use the free robots.txt generator and checker to generate a properly formatted file or validate an existing one.

Reading robots.txt syntax

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://yourdomain.com/sitemap.xml

Line by line:

User-agent: * — applies to all crawlers. Use a specific bot name (Googlebot, Bingbot) to target one crawler only.
Disallow: /admin/ — block access to the /admin/ directory and all URLs under it
Disallow: /private/ — block /private/ and everything under it
Allow: /public/ — explicitly allow /public/ (useful to override a broader Disallow)
Sitemap: — tells crawlers where your sitemap is (recommended on all robots.txt files)

The most dangerous robots.txt mistake

The single worst robots.txt line is:

Disallow: /

This blocks all crawlers from all URLs on the site. It is the correct setting during development — and a catastrophic setting in production. Developers sometimes push a development robots.txt to production by mistake. If your site suddenly drops from search results, check your robots.txt first.

What to block and what to allow

Block from crawlers:

/admin/ — admin dashboards, CMS backends
/api/ — API endpoints that return JSON (no SEO value)
/checkout/, /cart/ — e-commerce flows (no ranking value)
/?s= — WordPress search result pages (duplicate content)
/login, /register — authentication pages

Never block from crawlers:

Your CSS and JavaScript files — Google needs them to render pages correctly
Images used on pages you want indexed — Google Images is a traffic source
Your sitemap URL — it should be accessible to all crawlers
Any page you want to rank in search results

Robots.txt for AI crawlers

In 2026, AI companies send their own crawlers to collect training data and power AI search features. These follow robots.txt if you specify them:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

Add Allow: / instead if you want AI crawlers to access your content — being included in AI training data and AI search results (ChatGPT Browse, Perplexity) is increasingly a traffic source.

Summary

Check your robots.txt at yourdomain.com/robots.txt. Generate or validate one with the free robots.txt tool. Never block / in production. Always include your sitemap URL. Block admin panels, API endpoints, and checkout flows — not your CSS, JS, or content pages.

How to Check and Generate a Robots.txt File (And What the Rules Mean)

What robots.txt does

How to check your robots.txt

Reading robots.txt syntax

The most dangerous robots.txt mistake

What to block and what to allow

Robots.txt for AI crawlers

Summary

Browse by category

Everything you can do — for free

Work with images

Edit and format text

Stay safe online

Calculate anything