Forecasting s-curves is hard

S-curves (or sigmoid functions) are commonly used to model the evolution of social or biological systems over time [1]. These functions start with exponential growth, then increase linearly, and finally level off (therefore end up looking like a wonky s). Many things that we think of as exponential functions will actually follow an s-curve (otherwise the system would reach infinity). One famous example is the adoption of a new technology. The graph below shows the percentage of US adults who own a smartphone over time, with a best-fit s-curve imposed on the top. In this case the exponential growth occurs because of the way publicity and supply are rolled out. However, there are only a limited number of potential consumers (some of whom will never get a smartphone) and so the growth gradually slows to zero.

US smartphone ownership [2]

Another example, and the reason that these curves have been back in the news, is the propagation of disease. In this case the exponential growth occurs when the virus is new, such that most people encountering it will not have developed immunity. The level-off occurs because the virus is no longer encountering people without immunity (either due to ‘herd immunity’ or isolation of those infected). The graph below shows the number of deaths in China from the SARS outbreak in 2003, again with a best-fit s-curve.

Deaths due to SARS in China [3]

S-curves have only three parameters, and so it is perhaps impressive that they fit a variety of systems so well. Broadly, the three parameters describe the initial growth rate, the level-off rate, and the value at which it levels-off. Therefore, if you can estimate these three numbers, then you have the trend curve. Many of us will have learnt in school that if there are three parameters to be found, you need three data points to define the function. This would suggest that you could perfectly predict the level-off point based on only three observations (spoiler: you can’t). 

In reality, while we can say that the overall trend of the data is likely to fit to some s-curve, the individual points will not all lie along it. This can be seen in both of the previous examples. This discrepancy is often described as ‘modelling error’, which comprises both errors in the measurement of the data, and the fact that the s-curve model is fundamentally wrong. To quote George Box “all models are wrong, but some are useful”. 

Intuitively, it makes sense that it should not be possible to forecast the curve from the early data; to assume this, means believing that we can’t affect the outcome. However, in my experience “intuition” and “mathematics” can often be hard to reconcile. Therefore, I decided to investigate how much the “best fit s-curve” changes as more data becomes available. Below is a s-curve that I chose at random. The points shown are “noisy observations” – which is the maths-y way of saying ‘points from the curve with a random amount of error applied’.

In this case, the s-curve model is a perfect fit – I have literally generated the data from an s-curve. This means that if there was zero error then we would only need three points to find the curve. All this to say, that this example is idealistic – in reality there is unlikely to be a curve that fits the data so well. Below is an animation showing the best fit s-curve (found using a least squares optimisation) as more data becomes available.

It may not be surprising that in the exponential growth phase the estimate is very bad, but even in the linear phase (when 40+ points are available) the correct curve has not been found. In fact, it is only once the data starts to level-off that the correct s-curve is found. This is especially unhelpful when you consider that it can be quite hard to tell which part of the curve you on; hindsight is 20-20.

This is not to say that it is impossible to model or predict s-curves. Only that, contextual information about the system you are modelling is likely required. For biological systems, are there physical parameters which govern the initial growth rate? For technological changes, can the final level-off be reasonably estimated? This information is application specific. In other words, data enthusiasts (such as myself) should leave the modelling up to the professionals.

Edit: 20/04/20
I’ve had several requests to share the code used to generate the animation. The optimisation I used is part of another project which I can’t share, but I have uploaded a script which should reproduce the animation here.

References
[1] Nieto et. al, “Performance analysis of technology using the S curve model: the case for digital signal procession technologies” 1998.
[2] Comscore Whitepaper: ‘The 2016 U.S. Mobile App Report”, September 13, 2016
[3] World Health Organisation https://www.who.int/csr/sars/country/en/

28 thoughts on “Forecasting s-curves is hard

    1. Here is a Bayesian analysis, bolstering Constance’s claim:

      [video src="https://www.speicherleck.de/iblech/stuff/forecasting-s-curves-is-hard.mp4" /]
      https://github.com/iblech/mathezirkel-kurs/blob/master/herrscher-des-zufalls/s-curve.pl

      Starting with a uniform prior, we perform successive Bayesian updates. Additionally to the point estimate obtained by computing the expectation value of the posterior distribution (shown in pink), the distribution itself is rendered (as the white-to-red boxes in the background). To better highlight the variance, also s-curves randomly drawn from the posterior distribution are shown (thin lines).

      Like

  1. By the way, three known parameter points (sets of coordinates) define a curve if and only if the curve is already known to be a circular arc. An s-curve is right out. Broadly, the number of sampled data points (also sets of coordinates with noise) required to estimate the values of the parameter points varies with both the number of parameter points and the statistical variations of the sampled data noise. See ‘combinatorial explosion’ and ‘statistical design of experiments’.

    Like

  2. Great post, loved the estimate timelapse visual.

    I’ve been investigating this as well, I’ve created a website to visualize coronavirus growth rates in the United States over time.

    https://flattenthevir.us/

    I want to incorporate s-curve predictions into the site but I don’t know how to calculate these sigmoid parameters and I am having trouble finding resources online. Do you have a few links you can shoot my way to help me figure out how to predict growth/decay rates & the upper bound?

    Liked by 1 person

  3. That animation is great, it really highlights the issue with this sort of fit. I had an adviser who described this sort of extrapolation as trying to touch the ceiling with a piece of cooked spaghetti.

    Like

  4. Interesting. However may I ask how you obtained the ‘c’ values of the sigmoid? i,e. the maximum values. They seem to be hard coded in the opt_param {}?.

    Like

    1. I obtained them via grid search for the parameters that minimise the square error on the observed data. Nothing fancy, but computation wasn’t an issue with only three parameters. It’s not included in the code because it was using code that is part of another project which I’m not able to share, so I hard coded the ‘optimal’ values for the sake of sharing the animation.

      Like

  5. You may have generated your data incorrectly. There are points to the right that are lower than points to the left. S-curve data is cumulative, a point can never be lower than a point to its left.

    Like

    1. I added zero mean Gaussian noise to the curve so yes some are lower than the previous point. Not all s-curves are cumulative (e.g. sales during a technology transition), however, for the covid example perhaps it would have made more sense to apply the noise to the difference in points (e.g. new cases rather than total). My instinct is that this would make the prediction worse, but it would definitely be interesting to try.

      Like

  6. Thanks for sharing this. I found it in the O’Reilly Data Newsletter. I wonder how the estimations would look like if you took two actions:

    1. It seems to me, that the random noise is of constant magnitude. How would it look like, if the noise is proportional to the actual value of the S curve function?

    2. One considers S shaped functions, if there is an upper bound. In case of an epidemic, an upper bound is e.g. the total population. It may still be bigger than the level-off value though. How does it look like, if the algorithm accounts for an upper bound? In case that the upper bound is exceeded, it may just fix the level-off value to the upper bound and fit the remaining two parameters.

    Like

Leave a Reply