Preparing for Y2038 (Already?!)
It somehow doesn't seem that long ago, but it’s been 19 years since I spent the Y2K New Year's Eve in the Akamai Network Operations Command Center. I was waiting to respond to anything that might go awry as the clock struck midnight in key time zones, such as Greenwich Mean Time and Eastern Standard Time. As of January 9, 2019, we were roughly halfway from Y2K to Y2038, the next large time epoch rollover event. In 2038, on January 19, the Unix time will exceed the size of a signed 32-bit integer "time_t" value (231-1); that is, it will be roughly 2.1 billion seconds since the epoch of 00:00:00 on January 1, 1970. We have somewhat more time to deal with the systems that will break 19 years from now. However, as we get closer there will be increasing impacts on software working with future dates.
Shortly after Y2K we made jokes about "Next up, Y2038!", but back then the year 2038 felt like an eternity into the future and Y2038 issues seemed likely to be someone else's problem. Now that we're more than halfway there, and we have already reached the point where Y2038 issues can cause software failures, we should start planning for Y2038. For example, software may already be having problems working with 20-year certificate lifetimes or 20-year mortgages, and the frequency of these issues will only increase as we get closer to Y2038. At Akamai, we are already running strategically targeted internal planning and testing for Y2038, and it seems likely that the scope of this work will continue to grow over the next 19 years as this important effort increases in urgency.
Very little went wrong on January 1, 2000, for us (short of some JavaScript on an Akamai marketing site displaying "19100" as the current date!), but many people don’t realize that the limited global impact that evening was due to two factors: (1) the amount of advanced preparation that was done, and (2) the fact that many "Y2K problems" actually occurred years in advance rather than during the rollover itself. Leap seconds are in some ways scarier than date-format issues in that they are harder to test for and much less of their impact happens in advance. There is a risk that the lack of impacts of Y2K may cause organizations and technologists to underprepare for Y2038. It is also harder to explain Y2038 than Y2K to lay people, which may make it harder to prioritize and focus work in advance. The large number of embedded Internet of Things (IoT) devices becoming ubiquitous in our environment also makes the likely risk and potential impact considerably higher for Y2038 than it was for Y2K.
Many years ago, I heard a (perhaps apocryphal) anecdote about an early Y2K production impact. A warehouse had two automated jobs: one that looked for pallets of expired goods and sent them for disposal, and a second one that looked for low inventory and ordered more of a product. Canned tomatoes were the first product to have expiration dates start crossing into the year 2000, and a Y2K bug (indicating that the tomatoes expired in 1900!) directed a forklift operator to dispose of them tomatoes as they arrived. The system then identified that there was low inventory on canned tomatoes and would order more. A few weeks later they would arrive, and a few days after that the forklift operator would be dispatched to dispose of them. This cycle continued until the forklift operator finally noticed something awry and escalated the problem. It is likely that some of the first Y2038 issues will be quite similar in nature.
Our initial experience with Y2038 planning (besides seeing shock from people upon hearing concerns being raised about an issue that is still 19 years away) is that an incremental and focused approach is needed at this stage. We will certainly need much more involved and comprehensive programs some number of years down the road. In this initial phase, some areas to focus on include: (1) software dealing with future times and dates, (2) on-the-wire message and file formats, and (3) devices with long-deployed lifetimes, and their dependencies. Of course, as we get closer, it will become critical to start focusing on broader sets of systems, including ensuring that they can handle crossing the Y2038 transition safely.
Software dealing with future dates
Perhaps the most important area to focus on initially is software that deals with dates in the future, such as for handling X.509 certificates or for financial planning. There are many date-and-time format representations, not all of which will have a Y2038 problem. For example, software dealing with times in past decades (well before 1970) often picked different date-and-time representations. However, in testing the cases of X.509 certificates (such as used for HTTPS) and certificate authorities (CAs), we have found numerous software issues in the lab that start to arise with certificates and CAs expiring past Y2038.
OpenSSL, in particular, has multiple time formats, with ASN1_UTCTIME having a limit in Y2049 (a distinct issue to plan for after Y2038), so uses the ASN1_TIME functions to provide compatibility with all ranges of time. Converting times from a library such as OpenSSL to 32-bit time_t is an additional area likely to have problems.
In many of these cases, it has been possible to resolve the issues simply by porting and compiling legacy software for 64-bit architectures (e.g., to move from a 32-bit integer time_t to a 64-bit time_t). In other cases, more extensive changes have been needed, especially when times get cast into integers for math, when message wire formats get involved, or when values are stored in databases. In testing and fixing support for 20-year CAs, one thing that showed up is that downstream dependencies also come into play. For example, if a date 20 years in the future gets fed into a logging system or monitoring system, and if those in turn feed into alerting systems, reporting databases, or provisioning systems, then all those may also need fixes.
On-the-wire message and file formats
As mentioned above, when 32-bit timestamps are put into messages, databases, or file formats the impact can extend well beyond a specific system. These are also systems with external dependencies where more advanced planning is often needed as interactions cross system boundaries. For these collections of interoperating systems, changes may need to be released in a specific order and backward compatibility often comes into play. Furthermore, if there are either formally or informally standardized protocols that use 32-bit epoch timestamp values in messages, any migration or fix could be predicated on fixing the standard. As such, these become important to worry about as with a dependency chain such as:
Update protocol/standards to support post-Y2038 timestamps
Implement support for updated standard in software libraries
Get new versions of libraries incorporated into software packages
Get software packages included in new shipping product
If each of these takes a few years and the shipping product has a long lifetime, then the long lead times here may already be a problem.
Devices with long deployed lifetimes, and their dependencies
We also need to start focusing on devices with long deployment lifetimes. As just mentioned, following through the external dependencies these devices have is also important. Embedded devices shipping with 32-bit hardware may also not have an easy fix of "just compile for a 64-bit time_t" via a software update. Alternatively, an easy fix could have unacceptable performance impact.
Connected automobiles and other IoT devices are likely an area of specific concern, but I'm sure there are many other similar types of devices and applications. For example, given current trends, it is likely that more than 10% of cars sold today will still be operating in Y2038, and with increases in vehicle age and the number of vehicles on the road this percentage may be even higher. If it takes a few years for shipping automobiles to be generally Y2038-compliant, with the current (and increasing) distribution of motor vehicle ages we may end up with a significant fraction of automobiles with the potential to have serious issues in nineteen years. This same pattern likely exists in some other industries as well, such as in consumer electronics (e.g., home gaming consoles and smart televisions), in which devices may be shipping with 20-year CA certificates pre-installed.
Devices with long deployed lifetimes may require more comprehensive testing of both the device and its dependencies, such as testing that the operating system and software continue to work properly before, during, and after the Y2038 transition point.
Happy New Year!
The Y2038 issue is in a similar category as IPv6: It is a multi-decade rollout that in the general case is Important but not yet Urgent (per the Eisenhower Matrix). From this perspective, now is as good a time as any to start planning, triaging, and testing before it becomes Urgent (or too late). Focus first on software dealing with future dates, on-the-wire message and file formats, and devices with long deployed lifetimes. Next, use the experience from the initial focus to build a more comprehensive program over the coming years. Regardless, set a minimum bar and start making sure that the new software, systems, protocols, and products you are building and deploying don't introduce Y2038 issues, and be sure to flag any 32-bit timestamps-since-the-unix-epoch you see included in new designs.
Erik Nygren is a fellow and chief architect in Akamai's Platform Engineering organization. He'll be celebrating his 20th anniversary at Akamai this June.
Thank you to multiple folks at Akamai for their contributions to this article.
While precautions have been taken in the preparation of this document, Akamai Technologies, Inc. assumes no responsibility for errors, omissions, or for damages resulting from the use of the information herein. The information herein is subject to change without notice. Akamai and the Akamai wave logo are registered trademarks or service marks in the United States (Reg. U.S. Pat. & Tm. Off). Published January 10, 2019.