Junk (filter) science

This is a mirror of a PLOS blogpost. Formatting is usually nicer there.

This is part 4 of a series of introductory posts about the principles of climate modelling. Others in the series: 1 | 2 | 3

In the previous post I said there will always be limits to our scientific understanding and computing power, which means that “all models are wrong.” But it’s not as pessimistic as this quote from George Box seems, because there’s a second half: “… but some are useful.” A model doesn’t have to be perfect to be useful. The hard part is assessing whether a model is a good tool for the job. So the question for this post is:

How do we assess the usefulness of a climate model?

I’ll begin with another question: what does a spam (junk email) filter have in common with state-of-the-art predictions of climate change?

Wall of SPAM
Modified from a photo by freezelight

The answer is they both improve with “Bayesian learning”. Here is a photo of the grave of the Reverend Thomas Bayes, which I took after a meeting at the Royal Statistical Society (gratuitous plug of our related new book, “Risk and Uncertainty Assessment for Natural Hazards”):

Bayes' grave

Bayesian learning starts with a first guess of a probability. A junk email filter has a first guess of the probability of whether an email is spam or not, based on keywords I won’t repeat here. Then you make some observations, by clicking “Junk” or “Not Junk” for different emails. The filter combines the observations with the first guess to make a better prediction. Over time, a spam filter gets better at predicting the probability that an email is spam: it learns.

The filter combines the first guess and observations using a simple mathematical equation called Bayes’ theorem. This describes how you calculate a “conditional probability”, a probability of one thing given something else. Here this is the probability that a new email is spam, given your observations of previous emails. The initial guess is called the “prior” (first) probability, and the new guess after comparing with observations is called the “posterior” (afterwards) probability.

The same equation is used in many state-of-the-art climate predictions. We use a climate model to make a first guess at the probability of future temperature changes. One of the most common approaches for this is to make predictions using many different plausible values of the model parameters (control dials): each “version” of the model gives a slightly different prediction, which we count up to make a probability distribution. Ideally we would compare this initial guess with observations, but unfortunately these aren’t available without (a) waiting a long time, or (b) inventing a time machine. Instead, we also use the climate model to “predict” something we already know, to make a first guess at the probability of something in the past, such as temperature changes from the year 1850 to the present. All the predictions of the future have a twin “prediction of the past”.

We take observations of past temperature changes – weather records – and combine them with the first guess from the climate model using Bayes’ theorem. The way this works is that we test which versions of the model from the first guess (prior probability) of the past are most like the observations: which are the most useful. We then apply those “lessons” by giving these the most prominence, the greatest weight, in our new prediction (posterior probability) of the future. This doesn’t guarantee our prediction will be correct, but it does mean it will be better because it uses evidence we have about the past.

Here’s a graph of two predictions of the probability of a future temperature change (for our purposes it doesn’t matter what) from the UK Climate Projections:

The red curve (prior) is the first guess, made by trying different parameter values in a climate model. The predicted most probable value is a warming of about three degrees Celsius. After including evidence from observations with Bayes’ theorem, the prediction is updated to give the dark blue curve (posterior). In this example the most probable temperature change is the same, but the narrower shape reflects a higher predicted probability for that value.

Probability in this Bayesian approach means “belief” about the most probable thing to happen. That sounds strange, because we think of science as objective. One way to think about it is the probability of something happening in the future versus the probability of something that happened in the past. In the coin flipping test, three heads came up out of four. That’s the past probability, the frequency of how often it happened. What about the next coin toss? Based on the available evidence – if you don’t think the coin is biased, and you don’t think I’m trying to bias the toss – you might predict that the probability of another head is 50%. That’s your belief about what is most probable, given the available evidence.

My use of the word belief might trigger accusations that climate predictions are a matter of faith. But Bayes’ theorem and the interpretation of “probability” as “belief” are not only used in many other areas of science, they are thought by some to describe the entire scientific method. Scientists make a first guess about an uncertain world, collect evidence, and combine these together to update their understanding and predictions. There’s even evidence to suggest that human brains are Bayesian: that we use Bayesian learning when we process information and respond to it.

The next post will be the last in the introductory series on big questions in climate modelling: how can we predict our future?



  1. JT

    Speaking of prior probabilities: there is a calculated from first principles with no feedbacks estimate of climate sensitivity to doublings of CO2 of about 1.2 C. I have seen much argument about uniform prior probability estimates of climate sensitivity which cover a range of 2 -10 C. Why would the theoretically calculated from first principles with no feedbacks estimate of 1.2 C with an error estimate of, say, +/- .5 C not be the obvious Baysian prior?

    • Arthur Dent

      Possibly because it doesn’t give the “right” answer. Bayesian statistics operate on the underlying assumption that you are interested in understanding the situation not attempting to justify a preconceived idea. The fact that you need to state a “prior” can sometimes become an opportunity for self fulfilling prophetics. This is not necessarily deliberate but can arise out of confirmation bias

      • Tamsin Edwards

        A very late reply to your comments. Which study has a prior that only covers 2-10 degC? It’s true 0-10 degC has been used a lot, but I don’t recall any with zero probability for 0-2 degC.

        In the ideal situation, you would set a wide prior then the observations would “decide” what the true value was. But we usually find the observations don’t have as strong an effect as we would like, so it’s true we have to be careful what we choose for a prior. Some climate scientists do think 10 degC is too high: James Annan is one.

        It’s worth pointing out here that there are two types of prior on climate sensitivity. The first you could consider “direct”, where someone considers the possible range/distribution of values for CS. This is often a uniform (i.e. flat) prior across a particular range, as you say. Nic Lewis has recently looked at an alternative, “objective priors”. But this is only possible for climate models in which you can set the value of climate sensitivity. These tend to be the simple to medium complexity models (e.g. “EMICs”, Earth system Models of Intermediate Complexity).

        The second kind is “indirect”, and emerges from prior distributions on other parameters. For example, in a complex global climate model (GCM) – such as HadCM3, used by the UK Climate Projections and climateprediction.net – there is often no climate sensitivity parameter. Priors are set for parameters that relate to clouds, the land surface, and so on. In the case of UKCP09 the priors for continuous parameters were triangular (maximum probability at the tuned value, falling to zero at the min+max values thought plausible by the model developers). There were also discrete and switch parameters, where probabilities were assigned to the 2 or more possible values.

        As I understand it, it’s much harder to define objective priors for many parameters at once. So Nic Lewis’ approach is good for simpler models where you can choose the climate sensitivity of the model, but is harder or impossible to use when the climate sensitivity is an emergent property of the model.

  2. MarkB

    “So the question for this post is:

    How do we assess the usefulness of a climate model?”

    Excuse me, but I don’t see how you’ve answered your own question. You gave a nice little intro to Bayesian modelling, but I see nothing referencing ‘usefulness.’ Presumably, the usefulness of a model is entirely based on its distilling of reality. Not past reality, which can be tuned to, but future reality.

    Said another way, if I granted you the ability to hindcast the 20th century without error, I would still be skeptical of your ability to forecast the 21st until I see that you’ve done it. Planetary climate functions on a geological time scale, of which 100 years is not the blink of an eye. I see no reason why the recent increase in planetary temps should not be an emergent property of changes that happen over a thousand year scale. Which renders any model-based predictions of future climate hypotheses, and hypotheses only. Untested, unproven, and nothing more than a best guess. And best guesses, while certainly part of the scientific process, are not ‘science.’

    I fully appreciate climate models as heuristic devices. But heuristic devices do not ‘settle’ science. Thus, I can only see real-world multi-decade scale forecasting as hubris.If GCMs were presented as ‘our best, educated guess,’ I would have no problem with them.

    • Tamsin Edwards

      Another late reply. MarkB, I agree that I haven’t answered “How do we assess usefulness?”. I was hoping no-one would point that out 😉

      I aimed to give an introduction to assessing model success for the past: but of course this is necessary, not sufficient, for model success in the future. To me usefulness conflates two things: not only success (small bias) but also uncertainty (small variance). If we had infinite spread in our future projections – the Earth might do anything – we would be confident of success but not very useful. So usefulness is not only about whether our uncertainty estimates incorporate everything, but also whether they are small enough for us to distinguish between different actions (mitigation, geoengineering etc). There’s an example of this in the following post I just put up (“Possible Futures”).

      You said “I would still be skeptical of your ability to forecast the 21st [century] until I see that you’ve done it”. I think any rational climate scientist would say the same. We know there may well be deep uncertainties about the response of the climate. You may know that I also profoundly disagree with anyone that says any part of science is “settled” (see my first ever post). Every day we might discover something new that changes our understanding.

      But there are two points to note. First, we are making statements about our best, educated guess. If we don’t make that clear to the public and policymakers, if we present our work as “actual simulations of the future” rather than “best estimates based on all the available evidence and our current understanding” then we are failing. I believe that the public and in particular policymakers do understand this, but we should be vigilant about repeating the message.

      Second, deep uncertainty can go both ways. There may be aspects of climate change we are underestimating, and others we are overestimating. We are trying to make risk assessments, just as people do for volcanic eruptions or heart attacks. The actual hazard probability may be less or more than we estimate. The risk managers – policymakers, businesses, and public – need to decide what to them is an acceptable level of risk, given the current uncertainty about that risk.

  3. Rob Burton

    The formatting is much better here than at the other website.

    “This doesn’t guarantee our prediction will be correct, but it does mean it will be better because it uses evidence we have about the past.”

    I don’t see how that follows naturally. The way it is written it sounds like you could apply the same method to the stock market of the past to predict the future market, for example.

    • Tamsin Edwards

      Hi Rob. Personally I like this template too, and I haven’t got the hang of image sizes at PLOS, I just mean I have written it there and cut-and-pasted to here. I’ve decided to stop mirroring though, sorry!

      I meant that sentence in the sense of “using all the information we have is better than only using part of it”.

      But anyway I am going to partially agree with you, and partially disagree.

      I disagree because the stock market is different. It depends on the vagaries of human decisions, which do not behave according to universal physical laws. We are trying to predict physical quantities that are universal, so the past is a guide to the future.

      But on the other hand I agree because:
      (a) we are not only predicting physical quantities through laws (such as conversation laws) that are universal, we are also predicting more complex beasts (literally) such as the biosphere;
      (b) we are representing parts of the whole thing with these aggregated parameterisations, and we can’t guarantee those parameterisations work for the future because we haven’t got complete observations of the same situation occurring in the past.

  4. geronimo

    Tamsin, I know, from personal experience that when you’re too near the wall you can’t see the cracks in the edifice. We will never be able to foretell the future using models, if we ever do the whole edifice of human civilisation will collapse on itself. Without the use of models I have personally made several forecasts as to the future. The one I’m proudest of is that I forecast “Coronation Street” wouldn’t go beyond its six week pilot .

    I’m enjoying the blog by the way, and hope you’re all right in the new home.

  5. matthu

    Your explanation about Bayesian models moving the model from the red (more widely dispersed) curve to the tighter blue curve makes sense. But it is not what has been happening in practice, is it?

    Why is it that in climate science we have every possible outcome under the sun being consistent with climate science, even if they hadn’t been considered previously, and all backed up with peer research (winters will get warmer/colder, snow will get more abundant/disappear completely, summers will get drier/wetter)?

    What has been happening is that the curve has been moving from the blue curve to the red curve.

  6. Evil Denier

    Given the ridiculously low ab initio R² values for Carbon Dioxide, shouldn’t modellers (that means you, Tamsin) try models that don’t force (!) Carbon Dioxide to be a major contributor to a rise in temperature? I’ve seen much better correlations to sunspots, the sum/integral or lagged values thereof, 60-year sine waves &c.
    Yes, it’s all a question of Seq, isn’t it? Try a model that has an effective Seq < 2.0 (I suppose ≤ 1.0 is too much to ask?) . Ignore the establishment, for once.

    • Evil Denier

      Oh, Tamsin m’dear, remember (when it comes to Seq) the Millikan Effect. You wouldn’t want to be so remembered …….. Would you?

    • Evil Denier

      “What if climate change appears to be just mainly a multi-decadal natural fluctuation? They’ll kill us probably.” © Phil Jones (eMail).
      Lüdecke et al. (2013) Multi-periodic climate dynamics: spectral analysis of long-term instrumental and proxy temperature records, Clim. Past, 9, p 447-452. doi:10.5194/cp-9-447-2013.
      From the paper (oh, it’s peer reviewed – the ‘Team’ would probably denigrate the journal/the editor: so much for Science!):
      “For clarity we note that the reconstruction is not to be confused with a parameter fit. All Fourier components are fixed by the Fourier transform in amplitude and phase, so that the reconstruction involves no free (fitted) parameters.”
      John von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk..”
      How many do GCMs (AOGCMs – whatever) have?
      Can give you a dozen papers. Better fit – smaller # of parameters.

    • Evil Denier

      A comment at WUWT: (not by me!)
      DirkH says:
      April 30, 2013 at 4:49 pm
      Well basically every person who wants to become a climate modeler these days will probably get a nice warm place to go to during office hours, will be paid pretty well by a Green government; and otherwise achieve exactly nothing in his life. From time to time he will have the opportunity to pontificate in state-owned media about the terrible future that awaits us all. If that’s your cup of tea, go for it. Maybe you can also land a nice marketing deal with Panasonic.”
      ‘Nuff sed. (forgive the [apparent] sexism – DirkH obviously hasn’t been house-trained)

  7. AngusPangus

    I’m sorry, Tamsin, but I must object to your statement:

    “This doesn’t guarantee our prediction will be correct, but it does mean it will be better because it uses evidence we have about the past”

    It does not seem to me that one follows from the other at all. You contend that by tuning the model to match past climate, it follows naturally that the model is “better” because, Hey Presto!, with tuning your model can now produce a squiggle which roughly matches past climate.

    On the contrary, I would contend that by tuning your model to fit the past, your model will be (a) better, or (b) worse, or (c) neither better nor worse, at predicting the future, and you have no way of knowing which case applies.

    Look at the models in the IPCC reports. All of them do a reasonable job of hindcasting, no? (this must be the case or else they would not have made it into the IPCC reports). However, each of them produces a different amount of warming from a doubling of CO2 – it’s a long time since I’ve looked at this, but let’s say from 1.5C up to maybe 8C. Now let’s apply your assertion to the models that produce the least warming and the most warming, respectively.

    According to you, because these two models have “learned from the past”, as it were, they are BOTH giving us a “better” prediction of the future. And yet it is obvious that one or both MUST be hopelessly wrong, and basing a policy response on the wrong one would produce disastrous consequences if the other one (or another one) turned out to be correct.

    How do we know which one, the low one or the high one is more likely to be the correct one? We don’t know, and the fact that both have been trained by hindcasting TELLS US NOTHING USEFUL ABOUT THEIR ABILITY TO PREDICT THE FUTURE.

    You cannot simply mash all of the models together to produce an average of the models and pretend that that tells you something. It doesn’t, necessarily.

    This is not to say that I think that modelling per se is useless; I don’t think that at all. What I am challenging is the notion that you can say that you have got a “better” model simply because you’ve managed to replicate the past. That cannot be right. Your model needs to demonstrate a track record of making sound predictions of things that have not yet happened before you can say that it is a better, good, or useful model (even then, as economic models have demonstrated, your model may still turn out to be dangerously wrong). This is something that climate models have, so far, singularly failed to do.

    Anyway, thanks for the effort that you put into you blog, which is interesting to read.