Beware of PerformanceTiming.domInteractive
The web is more complex now than ever before. Web pages today contain hundreds of objects — images, stylesheets, and JavaScript; not to mention a lot of slow-loading third-party scripts — that all need to load before users can fully enjoy their web browsing experience. However, the more objects we add to a page, the slower the page gets, leading to user frustration and, even worse, page abandonment.
I am proud to say that our community is extremely motivated to improve web performance and provide browsing experiences that users enjoy. To do this, we need to rethink how we measure our performance goals to satisfy the growing speed expectations of our (impatient) users. We need to measure what matters the most for our users, i.e., those elements that directly influence their experiences.
For so long, we have relied on the onLoad event triggered by web browsers to measure how fast websites load and translate that into users’ experiences with the web. Unfortunately, onLoad is an event that users can barely notice as it is triggered after all resources on the web page have been loaded, including the resources that are invisible because they are not above-the-fold. Therefore, the time the browser takes to trigger the onLoad event should not be used to represent the website speed because it does not truly represent the user’s perspective.
Other browser-exposed events, such as the time to first paint (TTFP) and the time to DOM interactive (TTDI), are better surrogates for measuring website speed from the user’s perspective. The TTFP indicates the time from when the user enters a URL in the address bar until the browser paints the first pixels on the screen. In other words, the time that the users stare at the blank screen before anything shows up. TTFP is a great metric because it helps developers to optimize their pages in a way that reduces the wait time for users to start seeing things. But TTFP alone is not a sufficient measure to help us improve the user experience because it only indicates when the first pixels were painted, and not when the page was ready for the users to interact, which is likely why users open web pages.
The TTDI event, also known as PerformanceTiming.domInteractive or just domInteractive, measures the time from when the user enters a URL in the address bar until the page is ready for the user to interact. To be clear, for the domInteractive event to be triggered, the page itself does not have to be loaded completely. Users could still click links, press some buttons, and enter text in textboxes before the page is loaded completely.
This appears to be a great way of measuring the website speed from the user’s perspective, and following Google and Netflix’s approach, I ran some experiments to evaluate how accurate this metric is in quantifying TTDI for different web pages. For my experiments, I used Google Chrome Version 61.0.3163.100 (Official Build), Firefox Quantum 57.0b13, Opera 48.0.2685.52, and Safari Version 10.0.3 (12602.4.8) to load four web pages particularly designed to reflect various loading behaviors of external stylesheets, JavaScript, and web fonts.
Overall, I found that pages with external (not in the HTML) CSS, JavaScript, or fonts could lead to inaccurate estimations of domInteractive. In fact, the position of JS in the HTML also affects the domInteractive reported by the web browsers. Since today almost all web pages contain external CSS, Javascript, and font files, I prepared four working examples to test and demonstrate how the inaccuracies in measuring domInteractive could impact performance evaluations of real websites.
Test 1: Parser blocking JS in the HEAD tag
Explanation: For this page, the JavaScript is configured to load in about 8 seconds (+/- the time to resolve hostname and perform TCP/SSL handshakes) and all the elements of interactivity are in the body below the JavaScript. When the browser starts to parse the HTML and hits this script tag, it halts the parsing (though there is a preload scanner in Chrome that will continue to parse, but it won't let the browser execute anything until the script is loaded, compiled, and executed completely). Since the parse is halted and all the interactive elements are below the script, the browser will not render anything on the screen. After the script loads, the browser builds the DOM, renders the page, and triggers the domInteractive event. In this case the domInteractive reported by the browser is accurate.
Test 2: Parser blocking JS at the bottom of the BODY tag
Explanation: Again, the JavaScript is configured to load in about 8 seconds and all the elements of interactivity are in the body above the JavaScript. When the browser starts to parse the HTML and hits the elements of interactivity, it renders them right away and thus the elements are ready for interaction. Next, when the browser parses the script tag, it halts the parsing, waits until the script is loaded, and fires the domInteractive event. In this case, the domInteractive reported by the browser is much higher than the actual time it took for the page to become interactive, even when the interactive elements are not dependent on the JavaScript.
Test 3: Render blocking stylesheet in the HEAD tag
Test link:
Explanation: The stylesheet on this page is configured to load in about 8 seconds. Because stylesheets are not parse blocking, it does not block the parser regardless of its placement in the HTML. Now, when the browser starts to parse the HTML and hits the link tag to load the stylesheet, it blocks the webpage rendering until the stylesheet is completely loaded. As a result, the elements of interactivity are not visible. However, since the browser completes parsing the HTML, it triggers the domInteractive event, which is much lower than the actual time it took for the page to become interactive, or for that matter visible.
Test 4: (Partial) Render blocking fonts
Explanation: The font on this page is configured to load in about 8 seconds. Similarly to stylesheets, fonts are not parser-blocking. But depending on which browser you use, their rendering could happen at different times and so could the interactivity of the elements on which the font is applied. For example, Chrome, Firefox, and Opera load a default font if the requested font is not loaded within 3 seconds. Therefore, for these three browsers, the TTDI is at least 3 seconds (if you consider any font to be loaded). But the reported domInteractive value is much lower because the HTML parsing finished much sooner than the timeout for the font.
The fourth browser, Safari, has no timeout when loading fonts; it will wait as long as the font needs to load before rendering elements on which the font is applied. Therefore, the interactivity elements will become visible after 8 seconds, but reported domInteractive will be much lower.
Conclusions
In summary, domInteractive seems to measure the time from when the user enters a URL until the page is ready for the user to interact, but because of the way it is calculated by browsers, it may not measure TTDI in reality. So I’d suggest the developers be beware of the inaccuracies when using domInteractive as a measure of website speed from the perspective of the user. One piece of advice for web developers is to arrange for the loading of JS, CSS, and fonts asynchronously or post-page-load, whenever possible. This approach will help browsers report accurate estimations of domInteractive and provide a more realistic picture of website speed from a user’s perspective. Additionally, I’d recommend that developers leverage the User Timing API to accurately measure user experiences specific to their uniquely designed web pages.
Acknowledgments
Thanks to the members of Akamai’s Foundry team: Martin Flack, Stephen Ludin, Moritz Steiner, and Mike Bishop; for providing feedback on an early version of this article. Thanks to Michael Bettendorf for copy editing this article.