Is large dynamic sitemap generation impacting crawl budget?

Author
Hana Ibrahim Author
|
2 days ago Asked
|
22 Views
|
2 Replies
0
After finally resolving our previous dynamic sitemap generation issues, we're now encountering a different challenge with large-scale sitemaps. We're observing a significant number of URLs submitted via these dynamic sitemaps landing in 'Discovered - currently not indexed' within Google Search Console, which strongly suggests a negative impact on our crawl budget and overall indexing efficiency.

Here's a snippet from a typical GSC sitemap processing report:
Sitemap: https://www.example.com/sitemap_dynamic_products.xml
Type: Sitemap
Submitted: Oct 26, 2023
Last read: Oct 26, 2023
Status: Success
URLs submitted: 1,500,000
Indexed: 250,000
Discovered - currently not indexed: 1,200,000
Valid with warnings: 50,000
Is anyone else facing similar issues with large dynamic sitemaps affecting their indexing rates? Anyone faced this before?

2 Answers

0
Ling Lee
Answered 2 days ago

Hey Digital Marketer Pro, it sounds like you've moved from one sitemap challenge to another โ€“ a common journey in technical SEO, isn't it? That 'Discovered - currently not indexed' status for 1.2 million URLs is definitely a red flag, and you're right to connect it to crawl budget and overall indexability.

While a large sitemap can sometimes contribute to how Google allocates resources, the sheer volume of unindexed URLs usually points more towards Google's perception of the quality and value of those pages, rather than just the sitemap size itself. Google will prioritize crawling and indexing pages it deems valuable, unique, and well-linked internally. Here's what you should investigate:

  1. Content Quality & Uniqueness: Are those 1.2 million pages truly unique, valuable, and free from thin or duplicate content issues? Google is less likely to index pages that don't offer significant value to users. This is paramount for any content to be considered for indexing.
  2. Internal Linking Structure: How well are these 1.2 million pages integrated into your site's internal linking structure? Pages that are only found via a sitemap and have weak internal links are often deprioritized. Strengthening your internal linking strategy can significantly improve their chances of being crawled and indexed.
  3. Canonicalization & Noindex Tags: Double-check that these pages are not accidentally canonicalizing to other URLs or have a noindex tag in their HTML or HTTP headers.
  4. Sitemap Segmentation: Instead of one massive dynamic sitemap, consider breaking it down into smaller, more granular sitemaps (e.g., by product category, date published, or last modified). This adheres to XML sitemap best practices and can help Google process and prioritize different sections of your site more efficiently. Each sitemap should ideally be under 50,000 URLs and 50MB uncompressed.
  5. Crawl Budget Optimization: Beyond sitemaps, focus on overall site health. Improve site speed, remove or noindex genuinely low-value pages, and ensure your robots.txt isn't inadvertently blocking important sections. Google's crawl budget isn't just about how many URLs it can process from your sitemap; it's about how efficiently it can crawl your entire site.
0
Hana Ibrahim
Answered 2 days ago

Thanks Ling Lee, that's a super detailed response! The point about focusing on content quality and internal linking for those 1.2 million pages, rather than just the sitemap itself, is a really important distinction. It's awesome to have this kind of collaborative insight in the community, really helps untangle these complex issues.

Your Answer

You must Log In to post an answer and earn reputation.