Analyze Content Publishing Velocity With This Python Script

Transform XML sitemaps into competitive intelligence by tracking how fast your competitors publish new content.

What Is Content Publishing Velocity and Why It Matters

Content publishing velocity--the rate at which a website publishes new content and updates existing material--has become an increasingly important metric for understanding competitive positioning in search. While traditional SEO tools focus on rankings and backlinks, they often miss a fundamental question: how quickly are your competitors (and you) actually publishing content?

This guide provides a practical Python script that extracts and analyzes publishing velocity data from XML sitemaps, transforming sitemap XML into actionable competitive intelligence. By understanding these patterns, you can make data-driven decisions about your content calendar, resource allocation, and competitive positioning.

Publishing velocity matters because search engines reward websites that demonstrate consistent, valuable content production. A website that publishes consistently signals active maintenance and relevance to search algorithms, which can improve crawl frequency and build topical authority over time. Our SEO experts regularly analyze publishing patterns as part of comprehensive technical audits.

Beyond your own publishing habits, analyzing competitor publishing velocity provides strategic insights into content investment levels, seasonal patterns, and strategic priorities. Understanding how fast competitors publish helps you position your own content strategy appropriately within your market.

Python Setup and Required Libraries

Before diving into the script, you need to configure your Python environment with the necessary libraries. The analysis relies on a few key packages that handle HTTP requests, XML parsing, data manipulation, and visualization.

Core Dependencies

requests - Handles HTTP requests to fetch sitemap data from web servers. This library provides simple, elegant access to web resources, essential for retrieving sitemaps programmatically.

xml.etree.ElementTree - Parses XML content returned from sitemap requests. This built-in Python module provides efficient XML parsing without additional dependencies, making it ideal for sitemap analysis.

pandas - Transforms parsed XML data into structured dataframes for analysis. Pandas enables powerful data manipulation including filtering, grouping, aggregation, and statistical calculations.

matplotlib - Creates visualizations of publishing velocity patterns. Charts and graphs make trends immediately apparent, supporting both analysis and reporting.

collections.Counter - Built-in module for counting publishing events by time period, enabling frequency analysis.

Installation Command

pip install requests pandas matplotlib

Import Statements

import requests
import xml.etree.ElementTree as ET
from collections import Counter
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import json

For those without local Python environments, Google Colab provides a browser-based solution with pre-installed libraries. This cloud-based approach eliminates installation complexity and enables quick experimentation.

The Python ecosystem integrates seamlessly with modern web development workflows, allowing you to incorporate velocity analysis into broader content management systems.

The Publishing Velocity Analysis Script

This Python script fetches sitemap data, extracts lastmod timestamps, calculates publishing frequency, and generates visualizations to help you understand content production patterns.

Core Script Implementation

import requests
import xml.etree.ElementTree as ET
from collections import Counter
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import json

def fetch_sitemap(url):
 """Fetch and parse XML sitemap from a given URL."""
 try:
 response = requests.get(url, timeout=30)
 response.raise_for_status()
 root = ET.fromstring(response.content)
 
 namespaces = {
 'sm': 'http://www.sitemaps.org/schemas/sitemap/0.9',
 }
 
 entries = []
 
 # Check for sitemap index
 if root.tag.endswith('sitemapindex'):
 for sitemap in root.findall('sm:sitemap', namespaces):
 loc = sitemap.find('sm:loc', namespaces)
 if loc is not None:
 entries.extend(fetch_sitemap(loc.text))
 
 # Check for URL sitemap
 elif root.tag.endswith('urlset'):
 for url_elem in root.findall('sm:url', namespaces):
 loc = url_elem.find('sm:loc', namespaces)
 lastmod = url_elem.find('sm:lastmod', namespaces)
 
 if loc is not None:
 url = loc.text
 lastmod_date = None
 if lastmod is not None and lastmod.text:
 try:
 lastmod_date = datetime.fromisoformat(lastmod.text.replace('Z', '+00:00'))
 except ValueError:
 pass
 entries.append((url, lastmod_date))
 
 return entries
 
 except requests.exceptions.RequestException as e:
 print(f"Error fetching {url}: {e}")
 return []

def analyze_velocity(entries, time_period='monthly'):
 """Analyze publishing velocity from sitemap entries."""
 dated_entries = [(url, date) for url, date in entries if date is not None]
 
 if not dated_entries:
 return {}
 
 df = pd.DataFrame(dated_entries, columns=['url', 'date'])
 
 if time_period == 'monthly':
 df['period'] = df['date'].dt.to_period('M')
 elif time_period == 'weekly':
 df['period'] = df['date'].dt.to_period('W')
 else:
 df['period'] = df['date'].dt.to_period('Q')
 
 velocity = df.groupby('period').size()
 return velocity

def generate_report(domain, velocity_data, time_period='monthly'):
 """Generate comprehensive publishing velocity report."""
 report = {
 'domain': domain,
 'total_publications': velocity_data.sum(),
 'periods_analyzed': len(velocity_data),
 'average_per_period': velocity_data.mean(),
 'median_per_period': velocity_data.median(),
 'std_deviation': velocity_data.std(),
 }
 return report

def visualize_velocity(domain, velocity_data):
 """Create visualization of publishing velocity over time."""
 plt.figure(figsize=(12, 6))
 dates = [str(period) for period in velocity_data.index]
 values = velocity_data.values
 
 plt.plot(dates, values, marker='o', linewidth=2, markersize=4)
 plt.fill_between(dates, values, alpha=0.3)
 
 plt.title(f'Content Publishing Velocity: {domain}')
 plt.xlabel('Time Period')
 plt.ylabel('New Content Published')
 plt.xticks(rotation=45, ha='right')
 plt.tight_layout()
 plt.show()

def compare_competitors(competitor_domains, time_period='monthly'):
 """Compare publishing velocity across multiple competitors."""
 comparison_data = {}
 
 for domain in competitor_domains:
 sitemap_url = f"https://{domain}/sitemap.xml"
 entries = fetch_sitemap(sitemap_url)
 velocity = analyze_velocity(entries, time_period)
 report = generate_report(domain, velocity, time_period)
 comparison_data[domain] = report
 
 return comparison_data

Script Walkthrough

The script operates through several interconnected stages:

fetch_sitemap() - Handles different sitemap formats, detecting whether it's an index or a direct URL list. When encountering a sitemap index, it recursively fetches nested sitemaps for comprehensive coverage. The function handles namespace variations common in sitemaps, including image and video extensions.

analyze_velocity() - Processes collected entries to calculate publishing frequency. It filters for entries with valid lastmod dates, groups publications by the specified time period, and returns frequency counts per period.

generate_report() - Transforms raw velocity data into structured metrics including total publications, average and median per period, standard deviation, and trend direction.

visualize_velocity() - Creates visual representations using matplotlib, making trends immediately apparent for analysis and reporting.

compare_competitors() - Enables benchmarking across multiple domains, valuable for competitive analysis.

Interpreting Velocity Results for Strategic Decisions

Raw velocity data becomes valuable when translated into actionable strategic insights.

Understanding Publishing Patterns

Publishing patterns reveal strategic priorities and operational capabilities:

Consistency Score - Calculate the coefficient of variation to measure publishing consistency. Lower scores indicate reliable, predictable publishing cadences. Very high scores suggest episodic content creation rather than sustained investment.

Peak Periods - Identify months or quarters with unusually high publication counts. These often correspond to product launches, industry events, or seasonal opportunities. Understanding competitor peak periods helps anticipate competitive intensity.

Lag Periods - Similarly, identify periods of minimal publishing. These may indicate operational constraints or strategic shifts. Lag periods represent opportunities to capture attention when competitors are quiet.

Benchmarking Against Competitors

When comparing your publishing velocity against competitors:

Absolute Comparison - Compare total publications and averages directly, but contextualize raw numbers within your market context.

Relative Intensity - Calculate your publishing rate relative to domain authority or organic traffic. A smaller site publishing at high intensity may be aggressively investing in content.

Quality-Adjusted Assessment - Consider how publishing velocity correlates with ranking success. High-velocity sites that don't rank well may indicate potential quality or relevance issues.

Setting Publishing Targets

Based on velocity analysis, establish data-driven publishing targets:

  • Match or slightly exceed primary competitors' baseline publishing to maintain competitive visibility.
  • Increase publishing velocity if market growth trends indicate growing competitive intensity.
  • Balance velocity targets with quality standards appropriate for your content strategy.

Advanced Velocity Analysis Techniques

Content Type Segmentation

Analyze publishing velocity by inferred content type to reveal strategic priorities:

def analyze_by_content_type(entries):
 type_patterns = {
 'blog': ['/blog/', '/posts/', '/articles/'],
 'product': ['/product/', '/shop/', '/store/'],
 'video': ['/video/', 'youtube.com'],
 }
 
 type_counts = {category: 0 for category in type_patterns}
 
 for url, date in entries:
 for content_type, patterns in type_patterns.items():
 if any(pattern in url.lower() for pattern in patterns):
 type_counts[content_type] += 1
 break
 
 return type_counts

This segmentation reveals strategic priorities. A competitor heavily investing in video content might indicate emerging format preferences in your market.

Publishing Day Analysis

Identify which days of the week show highest publication activity to understand editorial workflow patterns:

def analyze_publishing_days(entries):
 dated_entries = [(url, date) for url, date in entries if date is not None]
 
 if not dated_entries:
 return {}
 
 days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
 
 df = pd.DataFrame(dated_entries, columns=['url', 'date'])
 df['day_of_week'] = df['date'].dt.day_name()
 
 day_counts = df['day_of_week'].value_counts()
 return day_counts.to_dict()

Day-of-week patterns reveal editorial workflow insights. Weekday publishing suggests professional editorial processes, while weekend publishing may indicate automated or contributor-driven content.

Velocity Trend Extrapolation

Project future publishing based on observed trends using linear regression on historical velocity data.

Automating Ongoing Monitoring

Set up automated velocity monitoring to track changes over time:

  • Schedule daily or weekly script execution
  • Implement alerts for significant velocity changes
  • Export results to Google Sheets or other reporting systems

For organizations seeking to automate competitive intelligence at scale, our AI automation services can help integrate these scripts into broader marketing technology stacks.

Scheduled Execution Example

import schedule

def scheduled_analysis():
 domains = ['yourdomain.com', 'competitor1.com', 'competitor2.com']
 for domain in domains:
 sitemap = f"https://{domain}/sitemap.xml"
 entries = fetch_sitemap(sitemap)
 velocity = analyze_velocity(entries)
 report = generate_report(domain, velocity)
 # Store or export results

schedule.every().day.at("08:00").do(scheduled_analysis)

Common Pitfalls and Best Practices

Data Quality Considerations

Incomplete Sitemaps - Some sites don't maintain comprehensive sitemaps. Pages may exist without sitemap inclusion, creating undercounting. Use velocity analysis as one signal among many rather than absolute truth.

Stale Timestamps - Sitemaps aren't always updated when content changes. A page might have been published months ago but only recently added to the sitemap, creating inaccurate velocity readings.

Historical Gaps - Sitemaps typically don't include historical publication dates for existing pages. Velocity analysis captures new pages and updates but may miss older established content.

Interpretation Guidelines

Volume ≠ Quality - High publishing velocity doesn't guarantee SEO success. Analyze correlation between velocity and ranking improvements for your specific situation.

Context Matters - Publishing velocity must be interpreted within context. A site publishing 50 short blog posts monthly differs fundamentally from one publishing 5 long-form guides. Content depth matters alongside frequency.

Competitive Parity - Match publishing intensity to competitive requirements without over-investing beyond market norms.

Technical Best Practices

  • Respect robots.txt when fetching sitemaps
  • Implement rate limiting when analyzing multiple competitors
  • Cache results to avoid redundant fetches
  • Implement comprehensive error handling for network issues
  • Schedule daily or weekly execution for ongoing monitoring rather than continuous fetches

Practical Applications and Next Steps

Publishing velocity analysis serves multiple strategic purposes across content operations.

Content Calendar Optimization

Use velocity insights to plan your editorial calendar strategically. Analyze when competitors publish heavily to anticipate keyword competition, and identify quieter periods for easier ranking opportunities.

Resource Planning

Velocity trends inform resource allocation decisions. Growing velocity requirements suggest hiring or tool investment. Declining velocity might indicate capacity for reallocation to other priorities.

Competitive Positioning

Understanding your position relative to competitors enables strategic differentiation. If you can't match high-volume competitors on frequency, consider competing on depth, uniqueness, or specific topical niches where you can establish authority.

Investment Justification

Data-driven velocity analysis provides concrete evidence for content investment requests. Demonstrating competitive publishing gaps with quantified data strengthens budget proposals and helps secure resources for your content marketing initiatives.


Sources:

  1. Search Engine Land - Analyze content publishing velocity with this Python script
  2. Botpresso - Python Script to Detect Website Publishing Velocity
  3. SEOZoom - Python for SEO: analysis, automation, and useful libraries
  4. Boba Digital - How to Use Python for SEO

Ready to Optimize Your Content Strategy?

Our SEO experts can help you implement data-driven content strategies backed by competitive analysis and technical precision.

Frequently Asked Questions

What is content publishing velocity?

Content publishing velocity is the rate at which a website produces new content and updates existing material. It's measured by analyzing how frequently new pages appear or existing pages are modified, typically tracked through XML sitemap lastmod timestamps.

Why does publishing velocity matter for SEO?

Publishing velocity matters because search engines reward websites that demonstrate consistent, valuable content production. Regular publishing signals active maintenance, improves crawl frequency, builds topical authority over time, and creates more opportunities to rank for relevant keywords.

How accurate is sitemap-based velocity analysis?

Sitemap analysis provides useful estimates but has limitations. Not all pages are included in sitemaps, and lastmod timestamps aren't always accurate or present. Use velocity data as one signal among many rather than absolute truth.

What Python libraries do I need for this analysis?

The core script requires requests for HTTP fetching, xml.etree.ElementTree for XML parsing, pandas for data manipulation, and matplotlib for visualization. All can be installed with: pip install requests pandas matplotlib

How often should I run velocity analysis?

For ongoing monitoring, weekly or monthly analysis provides sufficient freshness. Sitemaps don't change that frequently, and daily fetches may be unnecessary. Schedule automated runs based on your competitive monitoring needs.