Need cloud computing? Get started now

Beyond CWV: 11 More Performance Metrics to Monitor, Part 3 of 5

Although CWV are excellent default choices for beginner performance engineers, certain issues may require alternate solutions.

The previous article in this series looked at the Core Web Vitals (CWV) and their related metrics. This post explores 11 other metrics that may be important to measuring website performance. Although CWV are excellent default choices for beginner performance engineers, certain issues may require alternate solutions. This post offers some excellent options.

Synthetic tests vs. real user monitoring: What’s the difference?

Before discussing metrics, there's one more bit of theory to explain: the two ways of measuring performance. The First Input Delay (FID) entry in our previous article mentioned that it’s a user-driven metric, not a synthetic one. What does that mean?

There are two ways of measuring performance: synthetic tests and real user monitoring (RUM). The first approach measures website performance in a controlled, pre-defined environment, while the second adds a specialized script to the website that measures user experience in real time.

Both are useful, but it’s important to keep their advantages and drawbacks in mind. Speaking very generally, synthetic tests help during development, while RUM helps with monitoring in real time.

Synthetic measurements establish a benchmark during initial and ongoing development. If a new version of the site performs more poorly than the previous version on a synthetic measurement, it clearly has an issue that should be solved before moving to production.

RUM gathers data about the actual user experience by measuring it in real time. Not all users have the same experience. People in one part of the world may have a significantly worse experience than people in another region or using a different device or network. This is something synthetic measurements can’t measure because the artificial environment is always exactly the same.

One nonobvious drawback of RUM measurements is that they add an extra script to all pages to be downloaded and executed, which means performance can deteriorate slightly. Fortunately, RUM script authors are acutely aware of this problem and work hard to keep their scripts' footprints as small as possible.

An example of synthetic vs. RUM 

To explore the differences between synthetic and RUM measurements, let’s look at the First Contentful Paint (FCP) metric. Google suggests an FCP of 1.8 seconds or less. This is an ideal benchmark to keep in mind during development. Set up a synthetic FCP measurement and test all iterations of the site against a maximum FCP of 1.8 seconds.

Later candidate releases can be tested in the same way. Run the test again to make sure the site hasn't regressed — ideally, it will have an even lower FCP than the previous version.

These measurements are always executed in the same synthetic browser environment. Although that environment mimics the real world, it can’t mimic real-world variation. It's possible to throttle the synthetic bandwidth to measure how the site performs under poor network conditions. However, in the end, web developers, not actual networks, define "poor network conditions." 

In some ways, this is good. The standardized test setup allows an objective comparison of several versions of the website. 

In other ways, this is bad. The synthetic tests don’t give an overview of how real users experience the site. That's where RUM comes in.

A RUM script may find that, despite the site staying well under 1.8 seconds FCP in the synthetic tests, actual values in the field are sometimes much higher — or lower.

Some users have browsers that have significantly less power than synthetic ones. Alternatively, their browser may hang for a few seconds for reasons unrelated to the site — such as a slow memory swap on their computer.

RUM scripts may also report significantly faster FCPs in the real world. Users' browsers may have a back/forward cache, where previously visited pages are still stored in the browser memory. When the user goes back, the previous page is restored very quickly, leading to faster FCP times.

RUM tests quantify how many actual users have a better or worse experience than the synthetic benchmark and whether it's necessary to improve the site even further for those with a slow FCP. Still, it isn’t necessary to take the slowest FCP into account. RUM performance reports tend to focus on the 75th percentile. In other words, they remove the slowest 25% of results. This ignores negative circumstances that may be temporary — or caused by unsolvable issues, such as a permanent slow network speed.

Before the first byte

Now it's time to study a few more web performance metrics and their uses. The first two measure aspects of network traffic, and are not related to front-end code. If one of these metrics becomes very large, the website may experience a networking or server-side issue.

Connection times

Connection times measure the time needed to set up a TCP and/or DNS connection (Table 1). A high connection time suggests connection issues. If these persist, it's useful to look into the cause. Is a certain DNS misfiring? Does the server have trouble with some requests?

Meaning

Definition

Other name

Useful for

The time from navigation start to establishing TCP or DNS connections

Monitoring network and server problems

Table 1: A definition of a connection time and its main function

TTFB

The time to first byte (TTFB) metric measures the time that elapses between the navigation start and the arrival of the first byte of the response (Table 2). It measures combined back end and connection responsiveness. If the TTFB is excessively long, there’s a problem either on the server or with the connections — not with the front-end code.

Meaning

Definition

Other name

Useful for

Time to first byte

The time between navigation start and the first byte of the new page arriving

Back-end time

Monitoring network and server problems


Table 2: An explanation of TTFB, its alternative name, and its main function

To determine whether the problem lies on the network or on the server, compare connection times with TTFB. If the TTFB is much higher than the connection times, the problem likely occurs on the server; if it's only slightly higher, the problem likely occurs on the network.

TTFB is the technical starting point for all front-end metrics. Before TTFB, browsers don’t have any front-end code with which to work; the interactions and paints measured by the CWV cannot yet take place.

Metrics that measure interactivity

After the first byte comes FCP and Largest Contentful Paint (LCP). Once users see something on their screen, they naturally want to interact with it. The previous article explored the FID metric, which measures interactions, but there are a few more to keep in mind.

TTI

The time to interactive (TTI) metric measures the time from navigation start until the user can expect interactivity from the site; that is, when a click or other interaction would have visible results within a reasonable time frame (Table 3). Note that this is about user capacity (potential clicks), not about user intent (actual clicks) — that's the job of the time to first interaction (TTFI) metric.

Meaning

Definition

Other name

Useful for

Time to interactive

Time until a user interaction would show speedy results

Benchmarking other metrics


Table 3: A definition of time to interactive and its primary use

Unfortunately, the exact definition of TTI is open to debate. mPulse's definition waits for FCP and then takes the first instance the browser's main thread is free to handle user requests for at least one second. Web.dev and Web Page Test mostly agree, but require a five-second period during which the main thread is available and there are at most two outstanding network requests.

The Web Page Test definition requires TTI to be a synthetic metric. In a RUM environment, it's not possible to determine the number of ongoing network requests. mPulse's definition is more RUM-centric, though it can also be used in a synthetic environment.

TTI is rarely used as a stand-alone metric. However, the TTFI and total blocking time (TBT) metrics require a TTI measurement. Therefore, TTI is best seen as a metric that serves other metrics.

TTFI

Where TTI conveys user capability (the user could initiate an interaction), the time to first interaction metric (TTFI) conveys user intent (the user does initiate an interaction).

TTFI measures the time from the navigation start to the first time a user actually attempts to interact with the page (Table 4). It provides information about the time the website appears ready for use.

Meaning

Definition

Other names

Useful for

Time to first interaction


The time when the user first tries to interact with the page

Especially for usability research



Table 4: A definition of time to first interaction and its purpose 

Despite being valuable for usability research, TTFI may be less useful for mechanical performance. Improving TTFI is not necessarily a goal. In fact, if TTFI comes before TTI — in other words, if the page appears ready before it actually is — it doesn't respond to the first interaction, and the user's impatience increases.

TTFI is a user-driven metric because it waits for the user to do something. That's why it's primarily useful in RUM measurements. It's possible to simulate a user's click in a synthetic environment, but the timing of that click would have to be picked by a developer. It would not mimic actual user behavior.

TTFI doesn't tell you how quickly the results of the interaction are shown on screen. That's the job of the Interaction to Next Paint (INP) metric.

A slow TTFI may indicate that the site is slow to render on screen. Compare TTFI with FCP or LCP to determine if this is indeed the problem. If TTFI is much higher than LCP, the site rendering is not the issue. Instead, users may have difficulty understanding how they should interact with the site.

TBT

The total blocking time (TBT) metric measures how much time is spent on running large scripts (Table 5). As a future article will discuss, each running script occupies the main thread. If it runs for a long time, downloads and user interactions may feel sluggish.

Meaning

Definition

Other name

Useful for

Total blocking time


A blocking time is the time the main thread is executing a task that takes at least 50 ms. Total blocking time is the sum of these blocking times from FCP to TTI.

Long task time

Benchmarking all JavaScripts


Table 5: A definition of total blocking time, its alternative name, and its function

Blocking times of 50 milliseconds (ms) or longer may be perceptible to the user, while lower blocking times are not. Therefore, for the purposes of this metric, a long task is defined as a JavaScript task that takes more than 50 ms.

The metric polls the interval between FCP and TTI, and each long task that runs in that interval counts toward the TBT. However, the first 50 ms of each task is disregarded. The TBT is the sum of all those individual blocking times. Thus, tasks of 100 ms, 70 ms, and 40 ms contribute 50 ms, 20 ms, and 0 ms to the TBT, which becomes 70 ms in total.

There’s a variant called long task time (LTT) that works almost exactly the same, except it doesn’t subtract 50 ms from each individual task. It gives the full duration of all long tasks. This may be a more useful metric when determining the total time spent on long tasks.

A slow TBT or LTT indicates that one or more tasks is taking a lot of time. It’s worthwhile to find out which task(s) are delayed and whether they can be sped up — or postponed entirely. The fewer long tasks there are, the quicker users can use the site.

General metrics

Finally, we’ll share a list of general metrics that are useful for setting benchmarks. If one of these metrics increases significantly, it may be a sign that something unusual is going on —  and that more research is warranted.

Page size

Page size is the total size of the page, including all assets, in bytes (Table 6). Sometimes, website owners are not in full control of all assets — for instance, when the site loads many third-party ad scripts. In that case, the page size keeps track of the size of these third-party scripts.

Meaning

Definition

Other name

Useful for

Total amount of bytes loaded

Page weight

Benchmarking and comparing pages

Table 6: A definition of page size, its alternative name, and its practical use case

A huge page size indicates that something unusual is going on, especially when a page is much larger than other pages on the same site or an earlier version of the same page. Sometimes the cause is clear: a page that shows a series of high-resolution images has a much larger page size than one that doesn’t. If the cause is unclear, it may be worthwhile to dive deeper. Are there many more third-party scripts or other assets on this page than on others? Why? Could some be removed?

A small page size does not guarantee good performance. There could be other problems that need attention, such as slow networks or long JavaScript tasks.

PRT

Peak response time (PRT) equals the longest response time of a single asset on a page (Table 7). This is likely a hero image or JavaScript framework.

Meaning

Definition

Other name

Useful for

Peak response time

Longest single-asset response time

Benchmarking individual assets


Table 7: A definition of peak response time and a practical use case

If the site uses very large assets, a high PRT is inevitable. Like page size, PRT is most useful for creating benchmarks. If a later version of the site sees a sudden increase in PRT, it makes sense to dig deeper and identify the cause. Sometimes, a file becomes larger for no good reason, and it makes sense to reduce its size.

Error rate

The error rate denotes how many requests return errors (Table 8). Ideally, this number is zero — and if it starts to tick up, something is misfiring and an error should be corrected. Find out which asset is causing the error and fix it. This is especially useful in larger organizations where multiple teams are responsible for the aspects of one page.

Meaning

Definition

Other name

Useful for

Percentage of requests that returns an error

Monitoring sites with a lot of assets from a lot of different teams

Table 8: A definition of error rate and its possible uses

Response time

The term “response time” is used in two distinct contexts: synthetic context and RUM context (Table 9).

Meaning

Definition

Other name

Useful for

Synthetic: Time from first received byte to last received byte of a specific asset

OR

RUM: Time from request start to response start

Benchmarking network and server problems


Table 9: Two definitions of response time and a use case

In a synthetic context, response time measures the load time of a specific asset. It serves as a benchmark: If the synthetic response time of an asset is high relative to other assets or earlier versions of the same asset, something odd is taking place and more research is necessary.

In a RUM context, response time measures the time the server takes to respond; that is, the time between the start of a request for a new asset and the reception of that asset's first byte. A spike in RUM response time means either the network or the server has trouble fulfilling the request.

Load time

The load time is slightly different from the response time. It gives you the time from the navigation start to the moment the load event fires, which is when the main HTML and all assets have been loaded (Table 10).

Meaning

Definition

Other names

Useful for

Time from the navigation request to the firing of the load event

Page load, document complete time

Measuring the total load time until the load event



Table 10: A definition of load time, its alternative terms, and a possible use case

Load time measurement starts at navigation start, while response time starts at TTFB. Therefore, load time will be a little longer than the response time.

Additionally, load time may obfuscate the real time it takes to render the site. Once the last script has loaded, the load event fires and the load time is calculated. The execution of the onload event handler takes place after load time, though, so the page may not be ready for use at load time.

Again, this metric is most useful for benchmarking. If a page's load time goes up considerably relative to earlier versions or other pages on the same site, something unusual is happening and more research is necessary.

Full load time

Full load time resembles the load time, but does not stop at the load event (Table 11). Instead, it also measures the execution time of any scripts after the load event. It fires only when the browser's main thread is empty and can handle user interactions.

Meaning

Definition

Other name

Useful for

Total time needed to load everything, including any lazy-loading post-document-complete ones

Fully loaded time


Measuring the total load time experienced by the user



Table 11: An explanation of full load time, its alternative name, and its primary use

Once full load time comes around, the site will likely be ready for use — which is questionable at load time. Thus, the full load time is preferred over the load time.

Like load time, full load time is best used as a benchmark to compare this page with earlier versions or other pages.

Summary

In this post, we looked at the differences between synthetic and RUM measurements. Synthetic tests are primarily useful during site development, while RUM measurements tell site owners how users actually experience the site — and what problems it encounters in the real world. Each is useful in its own sphere, and web developers are encouraged to use both.

We also explored 11 non-CWV web performance metrics mostly concerning various aspects of asset loading or interactivity. Web developers are encouraged to pick one or two of them to track in addition to the CWV, since these metrics measure other parts of the website rendering process.

Stay tuned for part 4: browsers, programming mistakes, and more

In our next post in this series, we’ll discuss how browsers actually work, which common programming mistakes can affect site performance, and why some of these metrics were designed as they were.

Learn more

Once you’ve learned all about Google’s CWV and the 11 additional metrics, head over to Akamai’s TechDocs documentation site for even more web metrics that can help you develop fast, engaging websites.