What robots.txt does
Robots.txt is a text file placed at your domain root (yourdomain.com/robots.txt). It tells well-behaved crawlers — Googlebot, Bing, and other search engine bots — which pages they are allowed to access. It is a set of instructions, not a security measure: bots that choose to ignore it can still access those URLs.
How to check your robots.txt
The fastest check: type your domain followed by /robots.txt in your browser:
https://yourdomain.com/robots.txtIf you get a 404, your site has no robots.txt — this is fine. Without a robots.txt, all crawlers have full access to everything.
Use the free robots.txt generator and checker to generate a properly formatted file or validate an existing one.
Reading robots.txt syntax
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://yourdomain.com/sitemap.xmlLine by line:
User-agent: *— applies to all crawlers. Use a specific bot name (Googlebot,Bingbot) to target one crawler only.Disallow: /admin/— block access to the /admin/ directory and all URLs under itDisallow: /private/— block /private/ and everything under itAllow: /public/— explicitly allow /public/ (useful to override a broader Disallow)Sitemap:— tells crawlers where your sitemap is (recommended on all robots.txt files)
The most dangerous robots.txt mistake
The single worst robots.txt line is:
Disallow: /This blocks all crawlers from all URLs on the site. It is the correct setting during development — and a catastrophic setting in production. Developers sometimes push a development robots.txt to production by mistake. If your site suddenly drops from search results, check your robots.txt first.
What to block and what to allow
Block from crawlers:
/admin/— admin dashboards, CMS backends/api/— API endpoints that return JSON (no SEO value)/checkout/,/cart/— e-commerce flows (no ranking value)/?s=— WordPress search result pages (duplicate content)/login,/register— authentication pages
Never block from crawlers:
- Your CSS and JavaScript files — Google needs them to render pages correctly
- Images used on pages you want indexed — Google Images is a traffic source
- Your sitemap URL — it should be accessible to all crawlers
- Any page you want to rank in search results
Robots.txt for AI crawlers
In 2026, AI companies send their own crawlers to collect training data and power AI search features. These follow robots.txt if you specify them:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /Add Allow: / instead if you want AI crawlers to access your content — being included in AI training data and AI search results (ChatGPT Browse, Perplexity) is increasingly a traffic source.
Summary
Check your robots.txt at yourdomain.com/robots.txt. Generate or validate one with the free robots.txt tool. Never block / in production. Always include your sitemap URL. Block admin panels, API endpoints, and checkout flows — not your CSS, JS, or content pages.