Robots.txt Management
Create and manage virtual robots.txt rules without a physical file
What is Robots.txt?
The robots.txt file is a text file placed in your website's root directory that tells search engine crawlers which pages or sections of your site they should or shouldn't crawl. It's the first file search engines check before crawling your site.
ProRank SEO's virtual robots.txt editor allows you to manage these rules through WordPress without creating or editing a physical file on your server.
Virtual Robots.txt Editor
Key Features
- No Physical File Required:Rules are generated dynamically by WordPress
- Automatic Conflict Detection:Detects existing physical robots.txt files to prevent conflicts
- Sitemap Auto-Addition:Automatically adds your sitemap URL when sitemaps are enabled
- AI Bot Blocking:One-click blocking of 50+ AI/ML training bots
Configuration Steps
- Navigate to Settings:Go to ProRank SEO → Technical SEO → Robots & Indexing → Robots.txt tab
- Check for Conflicts:If a warning appears about a physical robots.txt file, remove it from your server root
- Enable Virtual Editor:Toggle "Enable Virtual Robots.txt Editor" to ON
- Add Custom Rules:Enter your robots.txt rules in the text area
- Configure AI Blocking:Toggle "Block AI/ML Training Bots" if desired
- Save Settings:Click "Save Settings" to apply your configuration
Default Rules
ProRank SEO suggests these default rules as a starting point:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# WordPress directories
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
# Protect sensitive files
Disallow: /wp-config.php
Disallow: /wp-login.php
Disallow: /wp-signup.php
# Block duplicate content
Disallow: /*?replytocom=*
Disallow: /*/feed/
Disallow: /*/trackback/
Disallow: /xmlrpc.php
# Sitemap location (added automatically)
Sitemap: https://yoursite.com/sitemap_index.xmlCrawl Budget Optimization
Save Crawl Budget with Smart Rules
ProRank SEO can automatically add rules to preserve your crawl budget for important pages:
E-commerce Sites
# Block action URLs to save crawl budget
User-agent: *
Disallow: /*?add-to-cart=*
Disallow: /*?remove_item=*
Disallow: /cart/*
Disallow: /checkout/*
Disallow: /my-account/*
Disallow: /*?orderby=*
Disallow: /*?filter*=*
Disallow: /*?min_price=*
Disallow: /*?max_price=*Filtered URLs
# Block filtered URLs to prevent infinite crawl space
User-agent: *
Disallow: /*?*filter*
Disallow: /*?*sort*
Disallow: /*?*page=*&*
Disallow: /*?*color=*
Disallow: /*?*size=*
Disallow: /*?*brand=*Internal Search
# Block internal search results
User-agent: *
Disallow: /?s=*
Disallow: /search/*
Disallow: /*?*search*AI Bot Blocking Rules
When you enable "Block AI/ML Training Bots", ProRank SEO automatically adds Disallow rules for 50+ known AI bots. Here's a sample of what gets added:
# Block AI/ML Training Bots
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Gemini-Bot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: MidJourney-Bot
Disallow: /
# ... plus 40+ more botsThe complete list includes bots from OpenAI, Google AI, Anthropic, Meta, Microsoft, Amazon, Apple, and various image generation and research systems. This list is updated regularly to include new AI crawlers.
Common Patterns
Allow Specific Bots Only
# Allow only major search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
# Block everyone else
User-agent: *
Disallow: /Protect Specific Directories
User-agent: *
# Protect private directories
Disallow: /private/
Disallow: /members-only/
Disallow: /downloads/
# But allow specific files
Allow: /downloads/public-guide.pdfCrawl Delay (Not Google)
# Note: Googlebot doesn't support crawl-delay
# Use for other bots only
User-agent: Bingbot
Crawl-delay: 1
User-agent: Slurp
Crawl-delay: 1Testing Your Robots.txt
How to Test
- View Your Robots.txt:Visit
https://yoursite.com/robots.txtin your browser - Google Search Console:Use the robots.txt Tester tool in Search Console to validate rules
- Check Specific URLs:Test if specific URLs are blocked or allowed for Googlebot
- Verify Sitemap Addition:Confirm your sitemap URL appears at the bottom of robots.txt
Important Warnings
Physical File Override: If a physical robots.txt file exists in your site root, it will override the virtual editor. Remove the physical file to use ProRank's virtual editor.
Be Careful with Disallow: Using Disallow: / for User-agent: * will block all search engines from your entire site. Always test changes carefully.
Crawl-Delay Note: Google doesn't support the crawl-delay directive. Use Google Search Console's crawl rate settings instead for Googlebot.
Best Practices
Do's
- ✓ Test changes in Search Console first
- ✓ Use specific paths rather than wildcards when possible
- ✓ Include your sitemap URL
- ✓ Block duplicate content and parameters
- ✓ Regularly review and update rules
- ✓ Keep rules simple and readable
Don'ts
- ✗ Don't block CSS/JS files needed for rendering
- ✗ Don't use robots.txt for security (it's public)
- ✗ Don't block your entire site accidentally
- ✗ Don't rely on crawl-delay for Google
- ✗ Don't forget to test after changes
- ✗ Don't list sensitive URLs in robots.txt
Robots.txt vs Meta Robots
Understanding the Difference
Robots.txt
- • Controls crawling (which pages bots can access)
- • Site-wide or directory-level control
- • Prevents bots from accessing content
- • Public file anyone can view
- • Processed before page is accessed
Meta Robots Tags
- • Controls indexing (what appears in search results)
- • Page-level control
- • Allows crawling but prevents indexing
- • Hidden in page HTML
- • Processed after page is accessed
Use robots.txt to save crawl budget by blocking unimportant pages. Use meta robots tags to control what appears in search results.