What We Check & How We Score

Every score SEOgent produces is grounded in publicly documented standards maintained by W3C, Google, and the broader web performance community. This document explains what we check, how we score it, and where each standard comes from — so you can verify our findings and act on them with confidence.

SEO Analysis

SEOgent's SEO checks are based on Google's official documentation for crawling, indexing, and on-page optimization. Where Google has not defined a specific rule, we follow the most widely adopted industry practices supported by large-scale SERP studies.

Title Tags

Title tags are one of the few on-page elements Google uses directly as a ranking signal. They also determine the clickable headline shown in search results.

What we check

Presence — every indexable page should have a title tag
Length — optimal range is 30–60 characters
Uniqueness — duplicate titles across pages are detected at the site level and flagged

Scoring thresholds

Status	Criteria	Impact
Good	30–60 characters, unique across site	High — direct ranking signal
Warning	Below 30 or above 60 characters	Medium — may be truncated or too vague
Critical	Missing title tag	High — Google will likely rewrite

Google: Control your title links in search results

Meta Descriptions

Meta descriptions are not a direct ranking factor but strongly influence click-through rate. Google rewrites them 60–70% of the time when they don't match search intent.

What we check

Presence — every indexable page should have a meta description
Length — 70–160 characters; descriptions below 70 or above 160 are flagged
Uniqueness — duplicate descriptions across pages are detected at the site level and flagged

Google SEO Starter Guide — Meta descriptions

Heading Structure (H1–H6)

Google uses heading structure to understand content hierarchy and topic relevance. A clear, logical heading structure also improves accessibility and readability.

What we check

H1 presence — every page should have an H1 heading
Single H1 — each page should have exactly one H1 tag
Logical hierarchy — headings should follow a sequential order (H1 → H2 → H3) with no skipped levels

Google: Organize your site hierarchy with headings

Canonical Tags

Canonical tags tell Google which URL is the preferred version of a page when duplicate or near-duplicate content exists across multiple URLs.

What we check

Canonical presence — indexable pages should declare a canonical URL

Google: Specify a canonical URL

robots.txt & Indexability

robots.txt controls crawler access, while meta robots tags control indexing. Misconfiguration here can cause pages to disappear from search results entirely.

What we check (site-level)

robots.txt presence — file should be accessible at the site root
No blanket block — robots.txt should not block all crawlers with "User-agent: * / Disallow: /"
Search engines allowed — Googlebot and Bingbot are not blocked
Reasonable crawl delay — crawl delay should not exceed 10 seconds
Sitemap directive — robots.txt should reference the sitemap URL

What we check (page-level)

Noindex detection — pages with a noindex meta robots directive are flagged
Indexability status — each crawled URL is classified as indexable or noindex

Google: robots.txt introduction and guide

Google: robots meta tags specification

Structured Data (JSON-LD)

Structured data (schema.org markup) helps Google understand your content and makes pages eligible for rich results — review stars, FAQ dropdowns, product panels, and more.

What we check

JSON-LD presence — we detect JSON-LD script blocks on each page
JSON-LD validity — each block is validated for correct JSON syntax
@type property — JSON-LD blocks should include a @type
Rich-result eligible types — we check whether the schema types used are eligible for Google rich results (e.g., Article, Organization, Product, FAQPage, HowTo, BreadcrumbList, and others)

Google: Structured data general guidelines

Schema.org — full vocabulary reference

Sitemaps

XML sitemaps help Google discover and prioritize pages on your site, especially for sites with large or frequently updated content.

What we check

Sitemap presence — a valid sitemap should be accessible at /sitemap.xml, /sitemap_index.xml, or at a URL declared in robots.txt

Google: Build and submit a sitemap

Internal Linking & Crawlability

Every page on your site should be reachable via links. Broken links and redirect chains reduce crawl efficiency and can harm rankings.

What we check

Internal links — every page should link to other pages on the site
Dead links — internal and external links returning 4xx/5xx responses are flagged (when link checking is enabled)
Broken images — images with invalid or empty src attributes are flagged
Redirect chains — pages reached through 2+ redirects are flagged; 4+ redirects are marked critical
Anchor tag integrity — anchor tags without href attributes are flagged

Google: How Google Search crawls the web

Technical SEO & Page Configuration

SEOgent checks several foundational technical requirements that affect how search engines and browsers process your pages.

What we check

HTTPS — pages should be served over a secure connection
HTTP status code — pages should return a 2xx success status code
SEO-friendly URLs — URLs should use hyphens (not underscores), stay under 75 characters, and avoid excessive query parameters
Viewport meta tag — pages should declare a viewport for mobile rendering
HTML5 DOCTYPE — pages should include a proper HTML5 doctype declaration
Character encoding — pages should declare character encoding via meta charset or Content-Type
Language attribute — the HTML element should include a lang attribute
Hreflang tags — multilingual sites should include hreflang tags for international targeting

Content Quality

What we check

Content length — pages with fewer than 300 words are flagged as thin content
Image alt text — all meaningful images should have descriptive alt attributes
Image lazy loading — pages with multiple images should use loading="lazy" for below-the-fold images

Social Sharing

What we check

Open Graph tags — pages should include og:title, og:description, and og:image for social media sharing

Duplicate Content Detection (site-level)

After all pages are analyzed, SEOgent aggregates results to detect duplicates across the entire site:

Duplicate titles — titles appearing on two or more pages
Duplicate meta descriptions — descriptions appearing on two or more pages

Performance Analysis

SEOgent's performance scores are based on Google Lighthouse lab data and the Core Web Vitals framework. These are the same metrics Google uses as a page experience ranking signal. Performance analysis runs when enabled for a scan.

About Lighthouse vs. Core Web Vitals

Lighthouse generates lab data — a simulated page load on a Moto G4 device with a throttled connection. Core Web Vitals use real-world field data from Chrome users. Both matter: Lighthouse identifies what to fix; Core Web Vitals reflect actual user experience. SEOgent uses Lighthouse lab data to give you actionable, per-page diagnostics.

Core Web Vitals & Performance Metrics

Core Web Vitals are Google's three primary user experience metrics. They became a search ranking signal in August 2021. SEOgent collects the following metrics for each analyzed page:

Metric	Good	Needs Improvement	Poor	Lighthouse Weight
LCP — Largest Contentful Paint	< 2.5s	2.5s – 4.0s	> 4.0s	25%
CLS — Cumulative Layout Shift	< 0.1	0.1 – 0.25	> 0.25	15%
TBT — Total Blocking Time*	< 200ms	200ms – 600ms	> 600ms	30%
FCP — First Contentful Paint	< 1.8s	1.8s – 3.0s	> 3.0s	10%
Speed Index	< 3.4s	3.4s – 5.8s	> 5.8s	10%
TTFB — Time to First Byte	< 800ms	800ms – 1.8s	> 1.8s	—

* TBT is Lighthouse's lab proxy for INP (Interaction to Next Paint). INP is only measurable with real-world field data. SEOgent also collects INP and FID values when available from the crawler.

web.dev: Core Web Vitals — official documentation

Google: PageSpeed Insights documentation

Google: Core Web Vitals report in Search Console

Lighthouse Performance Score

The overall Lighthouse Performance score (0–100) is a weighted average of the metrics above. Scores of 90+ are considered good; 50–89 need improvement; below 50 is poor.

Google Lighthouse: Performance audits reference

Accessibility Analysis

SEOgent's accessibility checks use axe-core, the industry-standard automated accessibility testing engine. axe-core tests against the Web Content Accessibility Guidelines (WCAG) 2.1 AA criteria. Accessibility analysis runs when enabled for a scan.

Why accessibility matters for SEO

Accessible sites are better understood by search engines. Alt text helps Google index images. Semantic HTML gives search engines clear content structure. Sites that meet WCAG AA also tend to score higher in overall page quality assessments.

What We Check

SEOgent runs the full axe-core rule set against each analyzed page. This includes checks across all four WCAG principles:

Perceivable — images have alt text, color contrast meets minimum ratios, content is not conveyed by color alone
Operable — all functionality is keyboard accessible, no keyboard traps, interactive targets meet minimum size requirements
Understandable — page language is declared, form inputs have labels, error messages are descriptive
Robust — semantic HTML is used correctly, ARIA attributes are valid, content works with assistive technologies

What we report

Accessibility score (0–100)
Violations grouped by impact level: critical, serious, moderate, and minor
For each violation: description, affected elements (with CSS selectors), WCAG criteria, and remediation guidance
WCAG 2.1 AA compliance verdict: pass or fail, with a list of failing criteria
Passing rules, so you can see what your site already does well

Important limitation

Automated tools can detect approximately 30–40% of WCAG issues. Color contrast, missing labels, and missing alt text are well-detected automatically. Keyboard traps, cognitive load, and complex interaction patterns require manual testing to verify fully. SEOgent's score reflects what can be reliably detected programmatically.

W3C: WCAG 2.1 official standard

axe-core: Rule descriptions

W3C WAI: WCAG 2 overview

WebAIM: WCAG 2 checklist (plain-language reference)

AEO, GEO & AI Readiness

SEOgent includes checks for how well your site communicates with AI systems — including LLM-based search (Google AI Overviews, Perplexity, ChatGPT), AI agents crawling your site programmatically, and generative engine optimization (GEO) signals.

About the standards in this section

Unlike SEO (Google Search Central) and accessibility (WCAG/W3C), Answer Engine Optimization has no formal standards body or ratified specification. The checks in this section are based on: observed citation patterns across major AI platforms; platform-specific guidance published by Google, OpenAI, Anthropic, and Perplexity; and Google's E-E-A-T framework as documented in the Search Quality Evaluator Guidelines. This is a fast-moving space. SEOgent's AEO checks reflect current best practices as of early 2026 and will be updated as platforms publish clearer guidance. Where an official source exists, it is linked. Where checks are based on observed behaviour and industry research, that is noted explicitly.

What is AEO / GEO?

Answer Engine Optimization (AEO) is the practice of structuring content so AI-powered platforms can find it, understand it, and cite it as a direct answer to user queries. Generative Engine Optimization (GEO) is the closely related practice of optimizing content specifically for Large Language Models and generative AI systems.

Where traditional SEO aims to rank in search results, AEO/GEO aims to become the source an AI cites. Both are complementary — a strong SEO foundation is a prerequisite for AEO visibility, since most AI systems source from well-indexed, authoritative content.

llms.txt

llms.txt is an emerging convention that provides AI systems with a structured, human-readable summary of your site's content and permissions — analogous to robots.txt but designed for LLMs rather than web crawlers.

Standards status

llms.txt is a community proposal, not a ratified standard. It is not published by W3C, IETF, or any formal body. Adoption is growing among developer-focused sites and SaaS products. SEOgent checks for its presence as a forward-looking best practice, not a compliance requirement.

What we check

Presence of /llms.txt at the site root
Presence of /llms-full.txt for extended content (checked when llms.txt exists)

llmstxt.org — the llms.txt proposal (community)

AI Crawler Access (robots.txt)

Major AI platforms publish their crawler user-agent strings and expect sites to honour robots.txt directives. If your site blocks AI crawlers, your content will not be indexed by those platforms and cannot appear in their generated answers.

What we check

GPTBot — OpenAI's web crawler for ChatGPT training and browsing
ChatGPT-User — OpenAI's user-initiated browsing crawler
ClaudeBot — Anthropic's web crawler
PerplexityBot — Perplexity AI's crawler
Google-Extended — Google's AI training crawler (separate from Googlebot)
Explicit allow/disallow rules — whether each AI crawler is specifically permitted or blocked

Official sources (platform-published):

OpenAI: GPTBot documentation

Anthropic: ClaudeBot and user agent documentation

Google Search Central: AI Mode and robots meta tags

Answer Optimization (page-level)

SEOgent checks whether content pages are structured in ways that AI systems can extract and cite as direct answers. These checks only apply to pages with 300+ words of content.

What we check

Answer blocks — presence of FAQ accordions (details/summary elements), question-phrased headings (H2–H6 starting with who, what, when, where, why, how, etc.), or definition lists (dl/dt elements)
FAQ or HowTo schema — presence of FAQPage, Question, Answer, HowTo, or HowToStep schema types in JSON-LD
Structured data tables — meaningful HTML tables with header cells (th) and at least 2 rows and 2 columns of data

Research basis:

Conductor: AEO/GEO Benchmarks Report 2025 (industry research)

Google: People-first content guidance

Scoring & Severity Ratings

Every issue SEOgent surfaces is classified by severity. Severity reflects the potential impact on search visibility, user experience, or crawlability — not just the presence of a technical violation.

Severity	Definition	Examples
Critical	Directly harms crawlability, indexability, or has significant accessibility impact	Missing title tags, pages blocked from indexing, server errors, broken images, 4+ redirect chains
Warning	Suboptimal configuration that reduces SEO effectiveness or usability	Title length outside range, thin content, missing canonical, no JSON-LD, missing Open Graph tags, low contrast ratios
Passed	Check meets best practice standards	Good title length, valid heading hierarchy, HTTPS enabled, all images have alt text

SEO Score Calculation

Each page receives an SEO score from 0–100 based on weighted checks. Each check carries a weight (1–10) reflecting its importance. The score is calculated as:

Score = (sum of passed check weights / total weight of all checks) x 100

Grade	Score
Excellent (A)	90–100
Good (B)	80–89
Needs Work (C)	70–79
Poor (D)	60–69
Failing (F)	Below 60

Standards references are linked throughout this document. Standards evolve — this document is updated to reflect current guidelines.