How to Measure and Improve Core Web Vitals Beyond PageSpeed Insights

A structured website performance diagram showing PageSpeed Insights, Search Console, CrUX, and RUM feeding into Core Web Vitals diagnosis.

PageSpeed Insights is useful, but it is not a complete performance strategy. If you want to improve Core Web Vitals properly, you need more than a list of audits and a lab score. You need to understand how templates behave in the real world, how users actually experience the site, and where structural decisions are creating friction that a single test run will never fully explain.

Table of Contents

The question behind Core Web Vitals is usually larger than it first appears

Most businesses arrive at Core Web Vitals through PageSpeed Insights. That makes sense. It is accessible, familiar, and often the first tool people use when a page feels slow or rankings start to wobble. The problem is not the tool itself. The problem is the assumption that its recommendations amount to a complete roadmap.

They do not.

PageSpeed Insights gives you two very different views of performance at once: field data from the Chrome User Experience Report and lab data from Lighthouse. It reports real-user data over a trailing 28-day period where enough data exists, and when page-level data is limited it may fall back to origin-level data instead. That is useful, but it also means many teams are blending simulated tests, historical user experience, and template-specific issues without fully realising they are looking at different layers of the same problem.

At DBETA, we see this misunderstanding regularly. A business runs one page through PSI, works through the obvious recommendations, and assumes the job is broadly done. Then the Search Console report still shows poor groups of URLs, or performance remains unstable across the site, or the page score improves while the user experience barely moves. In practice, that usually points to a deeper issue: the website is being tested as a page, but behaving as a system.

That distinction matters. Core Web Vitals are not really a story about isolated pages. They are a story about delivery, structure, rendering, dependency control, and what happens when those things meet real users on real devices. Google’s own guidance makes the same underlying point in a more measured way: Core Web Vitals are field metrics, and where you have both field and lab data, field data should be used to prioritise work because it reflects what users are actually experiencing.

Why PageSpeed Insights is useful, but not enough on its own

PageSpeed Insights is still worth using. It is one of the best starting points available. It can tell you whether a page has enough real-user data to show a field view, whether the LCP, INP, and CLS thresholds are passing, and which Lighthouse diagnostics look suspicious. Google’s published thresholds remain the practical benchmark most teams work to: LCP at 2.5 seconds or less, INP at 200 milliseconds or less, and CLS at 0.1 or less.

What PSI does not do well is tell you how a problem behaves across your wider architecture.

That is not a criticism. It is simply not what the tool is for. A single PSI report is excellent at surfacing symptoms on a given URL. It is much weaker at showing whether the issue belongs to one page, one template family, one device group, one geography, one third-party dependency, or one recurring structural decision spread across dozens of pages.

This is where businesses often take the wrong lesson. They see an image-related audit and conclude that images are the main problem. They see an unused JavaScript warning and conclude the fix is asset trimming. Sometimes that is true. Often it is only partially true. One of the patterns we see is that weak architectural decisions disguise themselves as small front-end issues. The site appears to need “optimisation”, but what it actually needs is cleaner resource discovery, tighter dependency control, a more disciplined rendering strategy, or better template governance.

That is the reason many speed programmes stall. Teams keep fixing what the tool can see most easily, while the more consequential problem sits underneath.

The first shift: stop treating Core Web Vitals as a score problem

A lot of poor performance work starts from the wrong mental model. The goal becomes “getting a better PSI score” instead of improving how the website behaves.

That may sound like semantics, but it changes the whole workflow.

A website can score well in a controlled lab test and still frustrate users in the field. It can also score badly in a single run while remaining broadly healthy in real use. Google explicitly notes that lab data is useful for debugging in a controlled environment, but may not capture real-world bottlenecks. That is why PSI can show both lab and field data at the same time, and why the two can conflict without either one being “wrong”.

From our perspective, Core Web Vitals should be treated less like a badge and more like an operational signal. They tell you whether the website is loading cleanly, responding reliably, and staying visually stable under real conditions. Those are not vanity metrics. They shape trust. A page that shifts during checkout, stalls during interaction, or loads its main content too late creates friction that users often interpret as uncertainty, not merely slowness.

For a business, that has knock-on effects. Lower trust affects conversion. Structural inconsistency makes maintenance harder. Weak template behaviour spreads across sections. And because modern search increasingly rewards satisfying page experience alongside relevance, performance problems can become visibility problems as well. Google is careful not to present Core Web Vitals as a magic ranking lever, but it does state that they are used by its ranking systems and are part of the wider page experience picture.

The measurement stack we would use beyond PageSpeed Insights

If PSI is the first checkpoint, what comes next?

The better answer is not one extra tool. It is a measurement stack, where each source answers a different question.

Search Console shows where the problem repeats

Search Console’s Core Web Vitals report is useful because it steps back from the single-URL mindset. It shows performance grouped by status, metric, and URL groups made up of similar pages. Google is explicit about this: the report is based on actual user data, it groups similar web pages together, and it assumes that poor behaviour within a group is likely caused by the same underlying reasons, often tied to a common framework or template.

That matters more than most people realise.

If one article page looks poor in PSI, that may be a local issue. If Search Console shows a large group of similar article pages or product pages failing on LCP or INP, you are no longer dealing with a page problem. You are dealing with a template problem, a component problem, or a structural pattern that repeats.

This is one of the most useful shifts a business can make. Instead of asking, “What is wrong with this page?” you start asking, “What are these pages built from, and why is that system producing the same weakness repeatedly?”

CrUX helps validate whether the issue is real at page or origin level

The Chrome User Experience Report is the field dataset behind much of this ecosystem. It reflects how real-world Chrome users experience popular websites, and the CrUX API gives access to aggregated real-user data at both page and origin level.

That makes CrUX especially useful when you need to answer questions PSI does not answer clearly enough.

Is the issue isolated to one URL or affecting the whole origin? Has it been getting worse over time? Is mobile materially weaker than desktop? Is the URL lacking enough page-level data and being assessed more broadly at origin level? Those questions change how you prioritise work. A fix that makes sense for one commercially important template may not be the same fix you would choose for an origin-wide weakness.

This is also where judgment matters. Not every poor metric deserves the same urgency. If a minor content page underperforms in isolation, that may not be your first concern. If your primary lead-generation templates are consistently weak across a group, that is a governance issue as much as a performance one.

RUM tells you what your own users are actually experiencing

CrUX is valuable, but it is still Google’s aggregated dataset. It is not a complete substitute for your own real-user monitoring.

If you want to move beyond generic reporting, you need to measure your own users directly. Google’s guidance on field measurement recommends collecting Web Vitals through an in-house or third-party analytics setup, and its own web-vitals JavaScript library is specifically intended to help gather field data from your users, including INP where supported.

In practice, RUM is where performance work starts becoming commercially useful.

It allows you to segment by device class, template, geography, traffic source, or user journey. It lets you see whether logged-in users suffer more than anonymous users, whether one regional market is being hit harder, or whether a particular step in a form flow consistently produces interaction delays. PageSpeed Insights cannot tell you that. Search Console cannot tell you that. Your own instrumentation can.

For businesses that depend on leads, enquiries, checkouts, or repeat usage, this is where performance stops being abstract. You can start tying technical behaviour to operational outcomes.

Lighthouse and DevTools are for diagnosis, not the final verdict

Once field data tells you what needs attention, lab tools become much more valuable. Lighthouse, DevTools, traces, and network inspection help you understand why the problem is happening.

That is where lab data shines. It is repeatable enough to support debugging. It can show blocking scripts, late resource discovery, layout shifts, heavy third-party work, or long tasks on the main thread. But it works best when it is guided by field evidence, not used as a substitute for it. Google’s own guidance is blunt on this point: if you have both field and lab data, field data should guide prioritisation because it represents the real experience.

That distinction alone improves a lot of performance work.

How to improve Core Web Vitals once you have measured them properly

Once measurement is clearer, improvement becomes less random. You stop chasing audits and start resolving the causes that matter.

LCP is usually a delivery and discovery problem before it is an image problem

Many teams still treat Largest Contentful Paint as “compress the hero image and move on”. Sometimes that helps. Often it is too narrow.

Google’s current guidance stresses that the LCP resource should be discoverable in the initial HTML response so the browser’s preload scanner can find it as early as possible. In other words, if the browser cannot see the important content soon enough, it cannot prioritise it well.

In practice, that means looking at questions such as:

  • Is the main image buried behind JavaScript?
  • Is the page slow to deliver the document in the first place?
  • Is the LCP candidate loaded as a CSS background with no early hint?
  • Are too many competing resources arriving first?

This is why we often say that LCP is not just a media problem. It is a sequencing problem. It sits at the intersection of origin speed, template structure, and resource discovery. A business can spend time converting images to a new format and still leave the biggest win untouched if the page architecture is delaying discovery.

INP is where JavaScript discipline becomes visible

Interaction to Next Paint measures overall responsiveness across the visit, not just the first interaction. The final value is based on the slowest interaction observed, ignoring outliers.

That makes INP a very different kind of problem from older “load-only” thinking.

Poor INP tends to come from long tasks, heavy script evaluation, layout thrashing, large DOM updates, and too much work landing on the main thread at the wrong time. web.dev’s current guidance points developers towards breaking up long tasks, reducing unnecessary JavaScript, avoiding large complex layouts, and in some cases moving work off the main thread with web workers.

From our experience, this is where websites that look modern on the surface often begin to show strain. A page builder, layered scripts, chat widgets, A/B testing tools, animation libraries, and analytics tags can all appear manageable in isolation. Collectively, they create hesitation. The interface loads, but it does not feel calm. Menus delay. Filters stick. Inputs feel slightly behind the user. That is not just a front-end annoyance. It erodes confidence in the whole system.

CLS is usually a sign that the page has not reserved certainty

Cumulative Layout Shift is often explained as a visual stability issue, but there is a deeper lesson in it. CLS problems usually appear where the page has failed to reserve space or failed to behave predictably.

Google’s optimisation guidance still points to the same fundamentals: reserve dimensions for images and dynamic content, reduce unexpected movement, and minimise font-driven shifts by using newer font fallback controls such as size-adjust.

That sounds straightforward, but the business lesson is more interesting. Stable websites signal control. Unstable websites signal improvisation. When interface elements move while a person is trying to act, the experience stops feeling dependable. That matters even more on commercial pages where the user is deciding whether to trust the business, complete a form, or continue the journey.

The resource-loading decisions that still matter

One of the impression terms you are already seeing is “priority hints”, which is a useful clue. The terminology has shifted, but the intent behind the query is good.

The official term is now the Fetch Priority API. web.dev notes that this feature was originally called Priority Hints before being renamed after standardisation. The purpose is not to make everything load faster by force. It is to tell the browser which resources deserve relatively higher or lower priority when they are fetched.

This is where nuance matters.

preload and fetchpriority are not the same thing. Google’s guidance explains the difference clearly: preload helps the browser discover a resource early, while Fetch Priority influences how that resource should be prioritised when it is fetched. One is about discoverability. The other is about urgency. They can work together, but they solve different problems.

The same applies to preconnect and dns-prefetch. These are not decorative additions to the <head>. They are decisions about which origins deserve early connection work and which merely deserve a lighter hint. Used carefully, they can reduce avoidable delay. Used carelessly, they create competition, noise, and misplaced bandwidth. Google’s performance guidance still treats resource hints as tools for prioritisation, not as a licence to hint everything.

In practice, good asset loading is rarely about adding more hints. It is about deciding what should be discovered early, what should be prioritised, and what should wait.

What businesses often get wrong when trying to improve Core Web Vitals

A few mistakes appear often enough that they are worth stating plainly.

The first is treating one URL as representative of the whole site. That leads to local fixes and false confidence.

The second is treating PageSpeed Insights recommendations as a to-do list rather than a starting point. That leads to surface optimisation without structural diagnosis.

The third is prioritising lab scores over field evidence. That leads to impressive screenshots and uneven real-world improvement.

The fourth is focusing only on front-end assets while ignoring delivery, dependency control, and template behaviour. That leads to effort without proportionate movement.

The fifth is assuming performance is a design issue rather than an infrastructure issue. From our experience, that is where the most expensive delays begin. Once a site grows, weak structure creates friction long before anything visibly “breaks”.

What good looks like

It does not start with frantic audit chasing. It starts with measurement clarity. You use PSI to spot symptoms, Search Console to find repeating template patterns, CrUX to validate field behaviour, and RUM to understand what your own users actually experience. Then you diagnose with lab tools, fix the causes that matter most, and validate again over time.

What emerges is not just a faster page. It is a more governable website.

That matters because performance is rarely isolated. Better resource discovery improves clarity. Cleaner JavaScript improves responsiveness and maintainability. Better template discipline improves scalability. More stable rendering improves trust. Technical quality compounds. So does technical mess.

At DBETA, we believe this is the more useful way to think about Core Web Vitals. Not as a set of scores to chase, and not as a checklist bolted onto the end of a build, but as evidence of whether the website is behaving like reliable infrastructure.

Final thought

PageSpeed Insights is valuable. But it is a lens, not a performance strategy.

If you want to improve Core Web Vitals properly, you need to move beyond the single-page audit mindset and start measuring how the system behaves: across templates, across real users, across time, and across the parts of the website that carry the most commercial weight.

That is where better decisions usually begin.

A website that loads cleanly, responds properly, and stays stable is not simply “better optimised”. It is easier to trust, easier to maintain, easier to scale, and easier for search and AI systems to interpret with confidence. In the long run, that is the point. Technical quality is not separate from business value. It is part of how that value is delivered.

FAQs

Q: Is PageSpeed Insights enough to improve Core Web Vitals?

A: No. PageSpeed Insights is a useful starting point, but it does not replace Search Console, CrUX, or first-party real user monitoring. Serious Core Web Vitals work needs both field data and diagnostic lab testing.

Q: What is the difference between lab data and field data?

A: Lab data is collected in a controlled test environment and is useful for debugging. Field data reflects how real users actually experience the page over time, which makes it far more useful for prioritising live performance work.

Q: What is the difference between CrUX and RUM?

A: CrUX is Google's aggregated dataset showing how real Chrome users experience popular websites. RUM measures your own users directly on your own site, which makes it better for journey-level diagnosis, segmentation, and commercial analysis.

Q: Are Priority Hints and fetchpriority the same thing?

A: Broadly, yes in intent. The older term 'Priority Hints' is still widely used informally, but the current standardised feature is the Fetch Priority API, implemented through the fetchpriority attribute.

Bridge the gap between pages and systems.

White astronaut helmet with reflective visor, front viewMetallic spiral 3D object, top view