ProRank SEO

Robots.txt Management

Create and manage virtual robots.txt rules without a physical file

What is Robots.txt?

The robots.txt file is a text file placed in your website's root directory that tells search engine crawlers which pages or sections of your site they should or shouldn't crawl. It's the first file search engines check before crawling your site.

ProRank SEO's virtual robots.txt editor allows you to manage these rules through WordPress without creating or editing a physical file on your server.

Virtual Robots.txt Editor

Key Features

  • No Physical File Required:Rules are generated dynamically by WordPress
  • Automatic Conflict Detection:Detects existing physical robots.txt files to prevent conflicts
  • Sitemap Auto-Addition:Automatically adds your sitemap URL when sitemaps are enabled
  • AI Bot Blocking:One-click blocking of 50+ AI/ML training bots

Configuration Steps

  1. Navigate to Settings:Go to ProRank SEO → Technical SEO → Robots & Indexing → Robots.txt tab
  2. Check for Conflicts:If a warning appears about a physical robots.txt file, remove it from your server root
  3. Enable Virtual Editor:Toggle "Enable Virtual Robots.txt Editor" to ON
  4. Add Custom Rules:Enter your robots.txt rules in the text area
  5. Configure AI Blocking:Toggle "Block AI/ML Training Bots" if desired
  6. Save Settings:Click "Save Settings" to apply your configuration

Default Rules

ProRank SEO suggests these default rules as a starting point:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# WordPress directories
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/

# Protect sensitive files
Disallow: /wp-config.php
Disallow: /wp-login.php
Disallow: /wp-signup.php

# Block duplicate content
Disallow: /*?replytocom=*
Disallow: /*/feed/
Disallow: /*/trackback/
Disallow: /xmlrpc.php

# Sitemap location (added automatically)
Sitemap: https://yoursite.com/sitemap_index.xml

Crawl Budget Optimization

Save Crawl Budget with Smart Rules

ProRank SEO can automatically add rules to preserve your crawl budget for important pages:

E-commerce Sites

# Block action URLs to save crawl budget
User-agent: *
Disallow: /*?add-to-cart=*
Disallow: /*?remove_item=*
Disallow: /cart/*
Disallow: /checkout/*
Disallow: /my-account/*
Disallow: /*?orderby=*
Disallow: /*?filter*=*
Disallow: /*?min_price=*
Disallow: /*?max_price=*

Filtered URLs

# Block filtered URLs to prevent infinite crawl space
User-agent: *
Disallow: /*?*filter*
Disallow: /*?*sort*
Disallow: /*?*page=*&*
Disallow: /*?*color=*
Disallow: /*?*size=*
Disallow: /*?*brand=*

Internal Search

# Block internal search results
User-agent: *
Disallow: /?s=*
Disallow: /search/*
Disallow: /*?*search*

AI Bot Blocking Rules

When you enable "Block AI/ML Training Bots", ProRank SEO automatically adds Disallow rules for 50+ known AI bots. Here's a sample of what gets added:

# Block AI/ML Training Bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Gemini-Bot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: MidJourney-Bot
Disallow: /

# ... plus 40+ more bots

The complete list includes bots from OpenAI, Google AI, Anthropic, Meta, Microsoft, Amazon, Apple, and various image generation and research systems. This list is updated regularly to include new AI crawlers.

Common Patterns

Allow Specific Bots Only

# Allow only major search engines
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

User-agent: DuckDuckBot
Allow: /

# Block everyone else
User-agent: *
Disallow: /

Protect Specific Directories

User-agent: *
# Protect private directories
Disallow: /private/
Disallow: /members-only/
Disallow: /downloads/

# But allow specific files
Allow: /downloads/public-guide.pdf

Crawl Delay (Not Google)

# Note: Googlebot doesn't support crawl-delay
# Use for other bots only
User-agent: Bingbot
Crawl-delay: 1

User-agent: Slurp
Crawl-delay: 1

Testing Your Robots.txt

How to Test

  1. View Your Robots.txt:Visit https://yoursite.com/robots.txt in your browser
  2. Google Search Console:Use the robots.txt Tester tool in Search Console to validate rules
  3. Check Specific URLs:Test if specific URLs are blocked or allowed for Googlebot
  4. Verify Sitemap Addition:Confirm your sitemap URL appears at the bottom of robots.txt

Important Warnings

Physical File Override: If a physical robots.txt file exists in your site root, it will override the virtual editor. Remove the physical file to use ProRank's virtual editor.

Be Careful with Disallow: Using Disallow: / for User-agent: * will block all search engines from your entire site. Always test changes carefully.

Crawl-Delay Note: Google doesn't support the crawl-delay directive. Use Google Search Console's crawl rate settings instead for Googlebot.

Best Practices

Do's

  • ✓ Test changes in Search Console first
  • ✓ Use specific paths rather than wildcards when possible
  • ✓ Include your sitemap URL
  • ✓ Block duplicate content and parameters
  • ✓ Regularly review and update rules
  • ✓ Keep rules simple and readable

Don'ts

  • ✗ Don't block CSS/JS files needed for rendering
  • ✗ Don't use robots.txt for security (it's public)
  • ✗ Don't block your entire site accidentally
  • ✗ Don't rely on crawl-delay for Google
  • ✗ Don't forget to test after changes
  • ✗ Don't list sensitive URLs in robots.txt

Robots.txt vs Meta Robots

Understanding the Difference

Robots.txt

  • • Controls crawling (which pages bots can access)
  • • Site-wide or directory-level control
  • • Prevents bots from accessing content
  • • Public file anyone can view
  • • Processed before page is accessed

Meta Robots Tags

  • • Controls indexing (what appears in search results)
  • • Page-level control
  • • Allows crawling but prevents indexing
  • • Hidden in page HTML
  • • Processed after page is accessed

Use robots.txt to save crawl budget by blocking unimportant pages. Use meta robots tags to control what appears in search results.