Google SEO Expert, Sandy Rowley and Team, in Reno, Share insights from 14k data points…Cracking our Knuckles and Diving In…

Full documentation on Google SEO Leak here:

Recently, internal documentation for Google Search’s Content Warehouse API was accidentally leaked. This revelation includes internal microservices that reflect Google Cloud Platform offerings and details from the deprecated Document AI Warehouse. The documentation, initially published publicly in a code repository for the client library, has been captured by an external automated documentation service.

Despite Google fixing the repository mistake on May 7th, the automated documentation remains accessible. For liability reasons, I won’t link to it here. However, since the code was published under the Apache 2.0 license, anyone who found it can use, modify, and distribute it freely.

Reviewing the Google SEO Leak

The API reference docs, combined with previous Google leaks and DOJ antitrust testimony, reveal extensive information about data storage for content, links, and user interactions. This includes varying degrees of descriptions of features manipulated and stored, ranging from sparse to surprisingly detailed.

While it’s tempting to label these as “ranking factors,” many are indeed ranking factors, but some are not. Here, I’ll contextualize some of the most interesting ranking systems and features based on my research and past statements from Google.

 Google’s Mixed Messages

Google’s public representatives have sometimes discredited discoveries made by experts in marketing, tech, and journalism. Future Googlers speaking on these topics should consider saying, “we can’t talk about that” to maintain credibility. Leaks like this make it difficult to trust future statements.

 The Caveats

  1. Limited Time and Context: The holiday weekend allowed me only about 12 hours of concentrated review. Some anonymous parties helped speed up my understanding. This analysis is an initial review and may be less structured than future posts.
  2. No Scoring Functions: The documentation lacks details on feature weighting in scoring functions. We don’t know how features are used or deprecated. However, the documentation provides insights into the features considered.
  3. Likely the First of Several Posts: This post is an initial analysis. I may publish subsequent posts as I dig deeper. The SEO community will likely spend months parsing through these docs.
  4. Current Information: The leak appears to represent the current architecture of Google Search Content Storage as of March 2024.
  5. Correlation is Not Causation: This principle doesn’t directly apply here, but it’s essential to consider in any analysis.

 There Are 14K Ranking Features and More in the Docs

The API documentation includes 2,596 modules with 14,014 attributes (features). These modules relate to YouTube, Assistant, Books, video search, links, web documents, crawl infrastructure, an internal calendar system, and the People API. Google’s systems operate in a monolithic repository, meaning all code is stored in one place, and any machine can access it.

 Google’s Lies and Misleading Statements

Google representatives have repeatedly denied using “domain authority” and “clicks for rankings.” However, the leaked documentation contradicts these statements, revealing the existence of “siteAuthority” and systems like NavBoost that use click-driven measures to influence rankings.

 The Architecture of Google’s Ranking Systems

Google’s ranking systems are a series of microservices, each responsible for different aspects of content processing and ranking. Key systems include:

– Crawling: Trawler (web crawling system)

– Indexing: Alexandria (core indexing system)

– Rendering: HtmlrenderWebkitHeadless (JavaScript page rendering system)

– Processing: LinkExtractor (extracts links from pages)

– Ranking: Mustang (primary scoring and ranking system), Ascorer (primary ranking algorithm), NavBoost (re-ranking system based on user clicks)

– Serving: Google Web Server (frontend server), SuperRoot (manages post-processing and result presentation)

 Key Take Aways for SEO

  1. How Panda Works: Panda modifies rankings based on distributed signals related to user behavior and external links. This suggests a focus on driving more successful clicks and earning diverse links.
  2. Authors are an Explicit Feature: Google stores and measures authorship, reinforcing the importance of E-E-A-T (Expertise, Authoritativeness, and Trustworthiness).
  3. Demotions: Various demotions are applied based on anchor mismatch, SERP dissatisfaction, poor navigation, exact match domains, product reviews, location relevance, and more.
  4. Links Remain Important: Despite claims of decreasing importance, links are still crucial, with Google deeply analyzing the link graph.
  5. Document Truncation: Google counts tokens and truncates documents, emphasizing the need to place important content early.
  6. Short Content and Originality: Short content is scored for originality, and keyword stuffing is penalized.
  7. Page Titles Matter: Titles are still measured against queries, and keyword placement is crucial.
  8. Dates Are Important: Google associates multiple date types with pages to prioritize fresh content.
  9. Domain Registration Info: Google tracks domain registration info, possibly impacting new and transferred domains.
  10. Video Sites Treated Differently: Sites with over 50% video content are treated differently.
  11. YMYL (Your Money or Your Life): Google scores YMYL content explicitly, affecting health and news sites.
  12. Gold Standard Documents: These documents receive additional weight, potentially from quality ratings.

Sandy Rowley SEO Expert Google Ads

Sandy Rowley

SEO Company in Reno


 Final Thoughts

This leak provides a clearer picture of Google’s ranking systems and features. The information reinforces the importance of high-quality content, strong user engagement, technical excellence, and strategic link building. Small business owners should leverage these insights to enhance their SEO strategies, focusing on creating valuable content and improving user experience.