Robots.txt Management

Create and manage virtual robots.txt rules without a physical file

What is Robots.txt?

The robots.txt file is a text file placed in your website's root directory that tells search engine crawlers which pages or sections of your site they should or shouldn't crawl. It's the first file search engines check before crawling your site.

ProRank SEO's virtual robots.txt editor allows you to manage these rules through WordPress without creating or editing a physical file on your server.

Virtual Robots.txt Editor

Key Features

No Physical File Required:Rules are generated dynamically by WordPress
Automatic Conflict Detection:Detects existing physical robots.txt files to prevent conflicts
Sitemap Auto-Addition:Automatically adds your sitemap URL when sitemaps are enabled
AI Bot Blocking:One-click blocking of common AI/ML training bots

Configuration Steps

Navigate to Settings:Go to ProRank SEO → Technical SEO → Robots & Indexing → Robots.txt tab
Check for Conflicts:If a warning appears about a physical robots.txt file, remove it from your server root
Enable Virtual Editor:Toggle "Enable Virtual Robots.txt Editor" to ON
Add Custom Rules:Enter your robots.txt rules in the text area
Configure AI Blocking:Toggle "Block AI/ML Training Bots via Robots.txt" if desired
Save Settings:Click "Save Settings" to apply your configuration

Default Rules

The editor starts with an empty text area. Here is an example set of rules you can use as a starting point:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# WordPress directories
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/

# Protect sensitive files
Disallow: /wp-config.php
Disallow: /wp-login.php
Disallow: /wp-signup.php

# Block duplicate content
Disallow: /*?replytocom=*
Disallow: /*/feed/
Disallow: /*/trackback/
Disallow: /xmlrpc.php

# Sitemap location (added automatically)
Sitemap: https://yoursite.com/sitemap_index.xml

Crawl Budget Optimisation

Save Crawl Budget with Smart Rules

You can add these patterns to your virtual robots.txt rules to preserve crawl budget for important pages:

E-commerce Sites

# Block action URLs to save crawl budget
User-agent: *
Disallow: /*?add-to-cart=*
Disallow: /*?remove_item=*
Disallow: /cart/*
Disallow: /checkout/*
Disallow: /my-account/*
Disallow: /*?orderby=*
Disallow: /*?filter*=*
Disallow: /*?min_price=*
Disallow: /*?max_price=*

Filtered URLs

# Block filtered URLs to prevent infinite crawl space
User-agent: *
Disallow: /*?*filter*
Disallow: /*?*sort*
Disallow: /*?*page=*&*
Disallow: /*?*color=*
Disallow: /*?*size=*
Disallow: /*?*brand=*

Internal Search

# Block internal search results
User-agent: *
Disallow: /?s=*
Disallow: /search/*
Disallow: /*?*search*

AI Bot Blocking Rules

When you enable "Block AI/ML Training Bots via Robots.txt", ProRank SEO adds Disallow rules for known AI crawlers. Here's a sample of what gets added:

# Block AI/ML Training Bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Gemini-Bot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: MidJourney-Bot
Disallow: /

# ... plus 38 more bots (44 total)

The complete list of 44 bots includes crawlers from OpenAI, Google AI, Anthropic, Meta, Microsoft, Amazon, Apple, and various image generation and research systems.

Common Patterns

Allow Specific Bots Only

# Allow only major search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

# Block everyone else
User-agent: *
Disallow: /

Protect Specific Directories

User-agent: *
# Protect private directories
Disallow: /private/
Disallow: /members-only/
Disallow: /downloads/

# But allow specific files
Allow: /downloads/public-guide.pdf

Crawl Delay (Not Google)

# Note: Googlebot doesn&apos;t support crawl-delay
# Use for other bots only
User-agent: Bingbot
Crawl-delay: 1

User-agent: Slurp
Crawl-delay: 1

Testing Your Robots.txt

How to Test

View Your Robots.txt:Visit https://yoursite.com/robots.txt in your browser
Google Search Console:Use the robots.txt Tester tool in Search Console to validate rules
Check Specific URLs:Test if specific URLs are blocked or allowed for Googlebot
Verify Sitemap Addition:Confirm your sitemap URL appears at the bottom of robots.txt

Important Warnings

Physical File Override:If a physical robots.txt file exists in your site root, it will override the virtual editor. Remove the physical file to use ProRank's virtual editor.

Be Careful with Disallow: Using Disallow: / for User-agent: * will block all search engines from your entire site. Always test changes carefully.

Crawl-Delay Note:Google doesn't support the crawl-delay directive. Use Google Search Console's crawl rate settings instead for Googlebot.

Best Practices

Do's

✓ Test changes in Search Console first
✓ Use specific paths rather than wildcards when possible
✓ Include your sitemap URL
✓ Block duplicate content and parameters
✓ Regularly review and update rules
✓ Keep rules simple and readable

Don'ts

✗ Don't block CSS/JS files needed for rendering
✗ Don't use robots.txt for security (it's public)
✗ Don't block your entire site accidentally
✗ Don't rely on crawl-delay for Google
✗ Don't forget to test after changes
✗ Don't list sensitive URLs in robots.txt

Robots.txt vs Meta Robots

Understanding the Difference

Robots.txt

• Controls crawling (which pages bots can access)
• Site-wide or directory-level control
• Prevents bots from accessing content
• Public file anyone can view
• Processed before page is accessed

Meta Robots Tags

• Controls indexing (what appears in search results)
• Page-level control
• Allows crawling but prevents indexing
• Hidden in page HTML
• Processed after page is accessed

Use robots.txt to save crawl budget by blocking unimportant pages. Use meta robots tags to control what appears in search results.