Why the Future isn’t Coming as Fast

10 min readJan 15, 2022

One of my favorite future techs of the last decade has been autonomous vehicles. I think they have the ability to create a lot of cool changes in society. Many of those changes will even be positive. For that reason, I’m very sad that they are taking longer than expected to arrive on our streets. The NY Times had an article recently about this.

The Costly Pursuit of Self-Driving Cars Continues On. And On. And On.

Many in Silicon Valley promised that self-driving cars would be a common sight by 2021. Now the industry is resetting…

www.nytimes.com

The race for autonomous cars really took off with the Google self-driving car project in 2009. There had been significant work done before that for things like the DARPA challenges, but Google’s efforts really seemed to bring it more into reality. In 2010 most people involved seemed to predict that by 2020 we’d have autonomous cars on the roads all over the US. In that NY Times article, they indicate that the main reason that hasn’t happened is that autonomous cars are robots and robotics is harder than just software. There is definitely some truth to this. Robotics deals with atoms, not just bits, and advances are harder in that realm. However, things have changed in the world of bits too.

For many decades, we were used to computers getting constantly faster. We often refer to this as Moore’s Law, though for much of the time, Dennard Scaling played a significant role. We got used to living in a world where the clock speed (and general processing speed) would double every 1.5 to 2 years. Up until around 2000, your programs just got faster with no effort on your part. Then the “free lunch” ended and clock speeds topped out at a few GHz. However, transistor count was still growing exponentially, so we started to get multi-core chips. Speed continued to rise, but only for applications that could make use of parallelism.

Death of Moore’s Law

The end of Moore’s Law has been looming on the horizon for quite some time. We’ve known for a while that sometime in the mid-2020s the scale of transistors would get down close to the size of a single atom and it just wouldn’t be possible to push it further. I remember holding out hope that materials like graphene or carbon nanotubes would come in as viable replacements for silicon and continue the scaling in computing power. That hasn’t happened yet.

Chipmakers are inventive. They have managed to do various things to keep making computers faster, but at great expense to them and to software developers. New chip fabs have become astronomically expensive and programming for GPUs and other special-purpose processors is much harder than programming for a single-core CPU. “Moore’s Law”, in the sense of computers getting exponentially faster, isn’t dead yet, but it is dying, and it has been for a while. It wasn’t that long ago that I remember needing to upgrade computers every three years because a three-year-old computer just seemed slow. I don’t feel that way anymore. Most companies have increased the length of their upgrade cycles. One might have argued that this was because their applications didn’t need more speed, but I think it is really because the computers just aren’t getting faster at the same rate.

The first indication I saw of this was from Top500.org. They maintain a list of the 500 fastest supercomputers in the world. Twice each year they release an updated list and it comes with a poster. This poster includes a speed plot that has three curves on it: one for the faster computer in the world, one for the 500th faster computer in the world, and one for the sum of the top 500. The plots are semi-log and up until June 2021, they also showed exponential fits to the growth. In November 2021, they stopped adding those fits. The reason is that they aren’t valid anymore. If you get the June 2021 poster (I’m not duplicating it here because they request a sign-on and it is their IP) you will see that the actual speed of these computers has been falling below the fit for a number of years now. At least in the supercomputer realm, the exponential scaling that had been so well maintained for the first 20 years of the Top500 list is now failing.

Only Supercomputers?

Of course, that is for supercomputers. They are big and expensive. Perhaps the more mundane devices we use on a regular basis are fairing better. To test this hypothesis I turned to the CPU benchmarks at Spec.org. They have CPU benchmarks going back to 1995 and many computer manufacturers submit results to them to prove the performance of their hardware.

There have been four different CPU benchmark metrics used over this time: CPU 95, CPU 2000, CPU 2006, and CPU 2017. To look at the performance of regular CPUs over the full time-span of these benchmarks, I stitched them all together and made the following plot. This plot also shows a line for the exponential fit of all the data through 2010.

There are some interesting details to how I made this plot, but I’m leaving those to the end for the interested reader. What really matters, and what this plot makes really obvious, is that around 2010-2012 the rate of growth in CPU speed falls off the previous exponential rate. Indeed, we are getting close to the point where our CPUs are 100x slower than we would have predicted using the trend that held through 2010.

Predictions Gone Awry

So imagine you are someone back in 2010 making predictions about future tech. (I happened to be one of these people.) You are inevitably aware that Moore’s Law is going to die around 2025, but you probably expect progress to continue as normal until at least 2020. But as it happens, just as you are making these predictions in 2010 the rate of growth in computing power is beginning to level off. As a result, you expected that today our computers would be 10–100x faster than they actually are.

It is hard to overstate how much more you can do with a computer that is 100x faster than the one you have. This is especially true of real-time systems where how quickly the machine responds to a stimulus is very important. Think of those self-driving vehicles. They need to respond in a fraction of a second to what is happening around them. Because of this gap between where we thought we might be today and where we actually are, a decision that we thought would be able to process in 100 ms today actually takes 1-10 s on the devices available to us.

This inevitably applies to more than just autonomous cars. How many of the tech predictions from around 2010 have fallen flat simply because the rate of growth in computing power hasn’t kept up with the historical trend?

Does Computing Power Matter?

Of course, the argument can be made that what we lack isn’t computer power, but the right logic to produce intelligent behavior. That might be true in terms of running models, but running models is fast compared to training them. If the people working on this had access to machines that could train models 100x faster, they could test out hypotheses much more quickly. In my own work as a software developer, I find that it is not uncommon to be slowed down by processes that run slowly. Everyone should be familiar with the XKCD comic on compiling. For those who use dynamic languages without compilers, this can be replaced with “running tests”. Either way, developing software at scale often includes steps that block the developer because they run slowly due to the speed of our hardware.

Compiling

xkcd.com is best viewed with Netscape Navigator 4.0 or below on a Pentium 3±1 emulated in Javascript on an Apple IIGS…

xkcd.com

I’ve also seen people argue that Moore’s Law isn’t really dead. This is generally based on performance benchmarks for things like GPUs, TPUs, or other special-purpose processors. It is true that my plot above is specific to CPUs and it doesn’t apply to these other types of chips. However, I would argue that there are two problems with this. First, CPUs still matter. Second, the slowdown in transistor shrink applies to those other processors as well.

Why do I think that CPUs still matter? For one thing, they are still where most software runs. In particular, they are still the processor that most development work happens on. The compilers and unit tests I mentioned above run on CPUs, they don’t do much on GPUs and TPUs. Part of this is because of the broader issue that only some tasks benefit from special-purpose processors and compilers and other developer tools typically aren’t in that group.

In addition, anyone who has tried writing code for GPUs or other special-purpose processors knows that they are generally much harder to develop for. So if the only way your application gains speed is to move it to a GPU you can expect the productivity of your developers to drop. There are projects underway that might help with this some, but I suspect that some of it is inherent to those platforms. GPUs run fast on tasks that can be broken up in a way where the same logic is run on a large number of pieces of data. Getting your software into that structure is often hard. The fact that tools like CUDA and OpenCL tend to mirror C/C++ also causes friction now, but I am not willing to say that will always be the case.

Implications for the Future

So what does this mean for the future? Should we expect more predictions of what we can do with technology to fall flat? A lot depends on what happens with that plot I put above. It looks to me like it is continuing to flatten out. Unless there is a major breakthrough in materials or 3D fabrication techniques, I fear that the curve is going to continue to flatten and the only speed improvements of note will be in special-purpose hardware. If that happens, a lot of the developments that people have predicted for the future will fail to materialize, at least until such a breakthrough occurs.

One potential interesting side effect of this though is that we might actually see more effort being put into languages and tools that improve performance without killing developer productivity. When exponential performance gains came for free, it was easy for people to switch to languages and tools that ran slower. This started with the move to Java and C#, but really came into full swing with the rise of scripting languages like Python and JavaScript. Java got a really bad reputation for being slow early on, but that reputation is largely not deserved today. The current JVM does a great job of optimizing and on most workloads, it is only ~2x slower than hand-tuned C/C++. In contrast to this, Python is generally 10x slower than Java, but people use it for a lot of things because it runs fast enough. I feel compelled to point out that in a world where most things are run in the cloud on “rented” machines, longer execution times mean paying more money, even when it seems “fast enough”.

The question is if hardware doesn’t get faster will developers start paying more attention to the performance of their languages and tools? Will we see more effort put into making tools that allow programming for special purpose processors to be done well in languages that provide higher productivity?

It is possible that we are seeing some of the first steps in this direction. Rust, in particular, stands out to me as a language that has the ability to maintain programmer productivity, through a type system that provides strong guarantees on safety, along with performance that matches that of far more error-prone C and C++. I can also imagine more work going into systems that allow declarative, functional code to be compiled into performant GPU code. I would love to have my map and filter calls in Scala compiled to efficient GPU code on large datasets. Such advances might go well with other changes in chip technology that have been talked about for years but have never completely materialized, like having memory and different processing unit types put closer together in a fabric of processors instead of the more traditional von Neumann architecture.

Personally, I’d love to see a move back to more performant, compiled languages. I truly look forward to some of the changes we might see in how we write software with increased pressure on not wasting so many clock cycles.

SPEC Methodology (Appendix)

On the topic of the chart above, there are a few things that should be noted. First, I’m using FP Rate, which measures floating-point operations in parallel. Many of these benchmarks include systems that not only have many cores, but also many chips. To normalize this, I divided by the number of chips as what I’m interested in is really performance per chip, not total system performance. Those multi-chip systems are generally well out of the price range of most consumers anyway.

Second, every time the SPEC benchmarks have been updated, the baseline score has been adjusted. In order to get a single, fairly smooth curve I had to adjust them. I normalized all four data sets to the CPU2000 values by scaling those before and after it by an amount that cause the average of the last 100 data points on one series to be the same as the first 100 data points of the next series. While this isn’t the most rigorous approach I could have taken, I think it works well for my purposes here. Looking at the plot, one is unlikely to notice that there are actually four different series present in the plot.