Category: climate_sensitivity

Limitless possibilities

Mark Maslin and Patrick Austin at University College London have just had a comment published in Nature called “Climate models at their limit?”. This builds on the emerging evidence that the latest, greatest climate predictions, which will be summarised in the next assessment report of the Intergovernmental Panel on Climate Change (IPCC AR5, 2013) are not going to tell us anything too different from the last report (AR4, 2007) and in fact may have larger uncertainty ranges.

I’d like to discuss some of the climate modelling issues they cover. I agree with much of what they say, but not all…

1. Models are always wrong

Why do models have a limited capability to predict the future? First of all, they are not reality….models cannot capture all the factors involved in a natural system, and those that they do capture are often incompletely understood.

A beginning after my own heart! This is the most important starting point for discussing uncertainty about the future.

Climate modellers, like any other modellers, are usually well aware of the limits of their simulators*. The George Box quote from which this blog is named is frequently quoted in climate talks and lectures. But sometimes simulators are implicitly treated as if they were reality: this happens when a climate modeller has made no attempt to quantify how wrong it is, or does not know how to, or does not have the computing power to try out different possibilities, and throws their hands up in the air. Or perhaps their scientific interest is really in testing how the simulator behaves, not in making predictions.

For whatever reason, this important distinction might be temporarily set aside. The danger of this is memorably described by Jonty Rougier and Michel Crucifix**:

One hears “assuming that the simulator is correct” quite frequently in verbal presentations, or perceives the presenter sliding into this mindset. This is so obviously a fallacy that he might as well have said “assuming that the currency of the US is the jam doughnut.”

Models are always wrong, but what is more important is to know how wrong they are: to have a good estimate of the uncertainty about the prediction. Mark and Patrick explain that our uncertainties are so large because climate prediction is a chain of very many links. The results of global simulators are fed into regional simulators (for example, covering only Europe), and the results of these are fed into another set of simulators to predict the impacts of climate change on sea level, or crops, or humans. At each stage in the chain the range of possibilities branches out like a tree: there are many global and regional climate simulators, and several different simulators of impacts, and each simulator may be used to make multiple predictions if they have parameters (which can be thought of as “control dials”) for which the best settings are not known. And all of this is repeated for several different “possible futures” of greenhouse gas emissions, in the hope of distinguishing the effect of different actions.

2. Models are improving

“The climate models…being used in the IPCC’s fifth assessment make fewer assumptions than those from the last assessment…. Many of them contain interactive carbon cycles, better representations of aerosols and atmospheric chemistry and a small improvement in spatial resolution.”

Computers are getting faster. Climate scientists are getting a better understanding of the different physical, chemical and biological processes that govern our climate and the impacts of climate change, like the carbon cycle or the response of ice in Greenland and Antarctica to changes in the atmosphere and oceans. So there has been a fairly steady increase in resolution***, in how many processes are included, and in how well those processes are represented. In many ways this is closing the gap between simulators and reality. This is illustrated well in weather forecasting: if only they had a resolution of 1km instead of 12km, the UK Met Office might have predicted the Boscastle flood in 2004 (page 2 of this presentation).

But the other side of the coin are, of course, the “unknown unknowns” that become “known unknowns”. The things we hadn’t thought of. New understanding that leads to an increase in uncertainty because the earlier estimates were too small.

Climate simulators are slow, as slow as one day to simulate two or three model years, several months for long simulations. So modellers and their funders must decide where to spend their money: high resolution, more processes, or more replications (such as different parameter settings). Many of those of us who spend our working hours, and other hours, thinking about uncertainty, strongly believe the climate modelling community must not put resolution and processes (to improve the simulator) above generating multiple predictions (to improve our estimates of how wrong the simulator is). Jonty and Michel again make this case**:

Imagine being summoned back in the year 2020, to re-assess your uncertainties in the light of eight years of climate science progress. Would you be saying to yourself, “Yes, what I really need is an ad hoc ensemble of about 30 high-resolution simulator runs, slightly higher than today’s resolution.” Let’s hope so, because right now, that’s what you are going to get.

But we think you’d be saying, “What I need is a designed ensemble, constructed to explore the range of possible climate outcomes, through systematically varying those features of the climate simulator that are currently ill-constrained, such as the simulator parameters, and by trying out alternative modules with qualitatively different characteristics.”

Higher resolution and better processes might close the gap between the simulator and reality, but if it means you can only afford the computing power to run one simulation then you are blind as to how small or large that gap may be. Two examples of projects that do place great importance on multiple replications and uncertainty are the UK Climate Projections and

3. Models agree with each other

None of this means that climate models are useless….Their vision of the future has in some ways been incredibly stable. For example, the predicted rise in global temperature for a doubling of CO2 in the atmosphere hasn’t changed much in more than 20 years.

This is the part of the modelling section I disagree with. Mark and Patrick argue that consistency in predictions through the history of climate science (such as the estimates of climate sensitivity in the figure below) is an argument for greater confidence in the models. Of course inconsistency would be a pointer to potential problems. If changing the resolution or adding processes to a GCM wildly changed the results in unexpected ways, we might worry about whether they were reliable.

But consistency is only necessary, not sufficient, to give us confidence. Does agreement imply precision? I think instinctively most of us would say no. The majority of my friends might have thought the Manic Street Preachers were a good band, but it doesn’t mean they were right.

In my work with Jonty and Mat Collins, we try to quantify how similar a collection of simulators are to reality. This is represented by a number we call ‘kappa’, which we estimate by comparing simulations of past climate to reconstructions based on proxies like pollen. If kappa equals one, then reality is essentially indistinguishable from the simulators. If kappa is greater than one, then it means the simulators are more like each other than they are like reality. And our estimates of kappa so far? Are all greater than one. Sometimes substantially.

The authors do make a related point earlier in the article:

Paul Valdes of Bristol University, UK, argues that climate models are too stable, built to ‘not fail’ rather than to simulate abrupt climate change.

Many of the palaeoclimate studies by BRIDGE (one of my research groups) and others show that simulators do not respond much to change when compared with reconstructions of the past. They are sluggish, and stable, and not moved easily from the present day climate. This could mean that they are underestimating future climate change.

In any case, either sense of the word ‘stability’ – whether consistency of model predictions or the degree to which a simulator reacts to being prodded – is not a good indicator of model reliability.

Apart from all this, the climate sensitivity estimates (as shown in their Figure) mostly have large ranges so I would argue in that case that consistency did not mean much…

Figure 1 from Maslin and Austin (2012), Nature.

Warning: here be opinions

Despite the uncertainty, the weight of scientific evidence is enough to tell us what we need to know. We need governments to go ahead and act…We do not need to demand impossible levels of certainty from models to work towards a better, safer future.

This being a science and not a policy blog, I’m not keen to discuss this last part of the article and would prefer your comments below not to be dominated by this either. I would only like to point out, to those that have not heard of them, the existence (or concept) of “no-regrets” and “low-regrets” options. Chapter 6 of the IPCC Special Report on ‘Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation (SREX)’ describes them:

Options that are known as ‘no regrets’ and ‘low regrets’ provide benefits under any range of climate change scenarios…and are recommended when uncertainties over future climate change directions and impacts are high.

Many of these low-regrets strategies produce co-benefits; help address other development goals, such as improvements in livelihoods, human well-being, and biodiversity conservation; and help minimize the scope for maladaptation.

No-one could argue against the aim of a better, safer future. Only (and endlessly) about the way we get there. Again I ask, please try to stick to on-topic and discuss science below the line.

Update 14/6/12: The book editors are happy for Jonty to make their draft chapter public:


*I try to use ‘simulator’, because it is a more specific word than ‘model’. I will also refer to climate simulators by their most commonly-used name: GCMs, for General Circulation Models.

**”Uncertainty in climate science and climate policy”, chapter contributed to “Conceptual Issues in Climate Modeling”, Chicago University Press, E. Winsberg and L. Lloyd eds, forthcoming 2013. See link above.

***Just like the number of pixels of a digital camera, the resolution of a simulator is how much detail it can ‘see’. In the climate simulator I use, HadCM3, the pixels are about 300km across, so the UK is made of just a few. In weather simulators, the pixels are approaching 1km in size.



A Sensitive Subject

Grab yourself a cup of tea. This is a long one.

Yesterday was historic for me. It was the first time I presented a result about ‘climate sensitivity’ (more on this later). This is how it felt to get that result two weeks ago:

In June 2006, we were just a bunch of bright-eyed, bushy-tailed researchers, eager to make a difference in the brave new world of “using palaeodata to reduce uncertainties in climate prediction”. Little did we know the road would be so treacherous, so windy, and so, so long….

Many years later, when we were all old and grey, we finally reached the first of our goals: a preliminary result. On Thursday 12th April 2012, Dr Jonathan C. Rougier produced a plot named, simply, ‘sensitivity.pdf’…

I was presenting these results in a session at the big (11  000 participants) annual European Geosciences Union conference in Vienna. The first speaker was James Hansen, who is rather big in climate circles, and the second David Stainforth, who is first author of the first big result (they dish out climate models for people to run in the background on their computers). I was third, slightly rattled from only finishing my slides just before the session and running through them only once.

If any of you were at the session, I’d prefer you not to talk about our final result, at least for now…it is so preliminary, and I’d prefer this to be a ‘black box’ discussion of our work without prejudice or assumptions from our preliminary numbers.

A little history…

Our project (my first climate job since leaving particle physics) had the rather lovely name of PalaeoQUMP * and the aim of reducing uncertainty about climate sensitivity. By ‘reducing uncertainty’ I mean making the error bars smaller, pinning down the range in which we think the number lies. Climate sensitivity is the global warming you would get if you doubled the concentrations of carbon dioxide in the atmosphere. The earth is slow at reacting to change, so you have to wait until the temperature has stopped changing. Svante Arrhenius (Swedish scientist, 1859-1927) had a go at this in 1896. He did “tedious calculations” by hand and came up with 5.5degC. He added that this was probably too high, and in 1901 revised it to 4degC.

The idea was to reproduce the method of the original lovely-named project QUMP, the internal name given to the Met Office Hadley Centre research into Quantifying Uncertainty in Model Predictions. They compared a large group of climate model simulations with observations of recent climate, to see which were the most realistic and therefore which were more likely to be reliable for predicting the future. QUMP was the foundation for the UK Climate Projections, which provide “information designed to help those needing to plan how they will adapt to a changing climate”. We planned to repeat their work, but looking much further back in time – using what knowledge we have of the climate 6000 years ago (the ‘Mid-Holocene’) and 21 000 years ago (the height of the last ice age, or ‘Last Glacial Maximum’), instead of the recent past.

Fairly early into this project I wrote – with Michel Crucifix and Sandy Harrison – a review paper about people’s efforts to estimate climate sensitivity, which I’ve just put on because I support open science.

PalaeoQUMP ended in 2010 without us publishing any scientific results, for a variety of reasons: ambitious aims, loss of collaborators from the project, and my own personal reasons. Two of the original members – Jonty Rougier (statistician) and Mat Collins (climate modeller, formerly at the Met Office Hadley Centre) – and I continued to work with our climate simulations when we found time. We got distracted along the way from the original goal of climate sensitivity by interesting questions about how best to learn about past climates, but pootled along happily.

But late last year a group of scientists led by Andreas Schmittner published a result that was very similar to our original plan: comparing a large number of climate model simulations to information about the Last Glacial Maximum to try and reduce the uncertainty in climate sensitivity. Their result certainly had a small uncertainty, and it was also much lower than most people had found previously: a 90% probability of being in the range 1.4 to 2.8 degC. This sent a mini-ripple around people interested in climate sensitivity, palaeoclimate and future predictions. The authors were quite critical of their own work, making the possible weak points clear. One of the main weaknesses was that their method needed a very large number of simulations, so they had to use a climate model with a very simple representation of the atmosphere (because it is faster to run). They invited others to repeat their method and test it.

So we took up the gauntlet…

We have a group, an ensemble, of 17 versions of a climate model. The model is called HadCM3, which is a fairly old (but therefore quite fast and well-understood) version of the Hadley Centre climate model. It has a much better representation of the atmosphere than the one used by Andreas Schmittner. In this case “better” is not too controversial: we have atmospheric circulation, they don’t.

We created the different model versions by changing the values of the ‘input parameters’. These are control dials that change the way the model behaves. Unfortunately we don’t know the correct setting for these dials, for lots of reasons: we don’t have the right observations to test them with, or a setting that gives good simulations of temperature might be a bad setting for rainfall. So these are uncertain parameters and we use lots of different settings to create a group of model versions which are all plausible. This group is known as a perturbed parameter ensemble.

We use the ensemble to simulate the Last Glacial Maximum (LGM), the preindustrial period (as a reference), and a climate the same as the preindustrial but with double the CO2 concentrations (to calculate climate sensitivity). We can then compare the LGM simulations to reconstructions of the LGM climate. These reconstructions are based on fossilised plants and animals: by looking at the kinds of species that were fossilised (e.g. something that likes cold climates) and where they lived (e.g. further south than they live today), it is possible to get a surprisingly consistent picture of climates of the past. Reconstructing past climates is difficult, and it’s even harder to estimate the uncertainty, the error bars. I won’t discuss these difficulties in this particular post, and generalised attacks on you know who will not be tolerated in the comments! We used reconstructions of air temperature based on pollen ** and reconstructions of sea surface temperatures based on numerous bugs and things. Andreas Schmittner and his group used the same.

We’re using a shiny new statistical method from Jonty Rougier and two collaborators, which has not yet been published (still in review) but is available online if you want to deluge yourself with charmingly written but quite tricky statistics. It’s a general and simple (compared with previous approaches) way to deal with the very title of this blog: the wrongness of models. The description below is full of ‘saying that’, ‘judge’, ‘reckon’ and so on. Statistics, and science, are full of ‘judgements’: yes, subjectivity. We have to simplify, approximate, and guess-to-the-best-of-our-abilities-and-knowledge. A lot of the statements below are not “This Is The Truth” but more “This Is What We Have Decided To Do To Get An Answer And In Future Work These Decisions May Change”. Please bear this in mind!

Think of an ensemble of climate simulations of temperature. These might be from one model with lots of different values for the control parameters, or they might be completely different models from different research institutes. Most of them look vaguely similar to each other. One is a bit of an oddity. Two look nearly identical. Here is a slightly abstract picture of this:

The crosses in the picture are mostly the same sort of distance from the centre spot, but in different places. One is quite a lot further out. Two are practically on top of each other.

How should we combine all these together to estimate the real temperature? A simple average of everything? Do we give the odd-one-out a smaller contribution? Do we give the near-identical ones smaller contributions too? What if a different model is an oddity for rainfall? Even if we come up with different contributions, different weightings, for each model, the real problem is often relating these back to the original “design” of the ensemble. If our model only has one uncertain parameter, it’s easy. We can steadily increase that control dial for each of the different simulations. Then we compare all the simulations to the real world, find the “best” setting for that parameter, and use this for predicting future climate. This is easy because we know the relationship between each version of the model: each one has a slightly higher setting of the parameter. But if we have a lot of uncertain parameters, it is much harder to find the best settings for all of them at once. It is even worse if we have an ensemble of models from different research institutes, which each have a lot of different uncertain parameters and it is impossible to work out a relationship between all the models.

These problems have given statisticians headaches for several years. We like statisticians, so we want to give them a nice cup of tea and an easier life.

Jonty and Michael and Leanna’s method tries to do make life easier, and begins by asking the question the other way round. Can we throw out some of the models so that the ones that are left are all similar to each other? Then we can stop worrying about how to give them different contributions: we can stop using the individual crosses and just use the average of the rest (the centre spot).

We also don’t need to know the relationship between different models. Instead of using observations of the real world to pick out the “best” model, we will take the average of all of them and let the observations “drag” this average towards reality (I will explain this part later).

How do you decide which models to throw out? This is basically a judgement call. One way is to look at the difference between a model and the average of the others. If any are very far away from the average, chuck them. Another is to squint and look at the simulations and see if any look very different from the others. Yes, really! The point is that it is easier to do this, to justify the decisions, and to use the average, than to decide what contribution to give each model.

The next part to their cunning is reckoning that all the models are equally good – or equally bad, depending on the emptiness or fullness of your glass – at simulating reality. In other words, the models are closer to the ensemble average than reality is. We can add a red star for “reality” outside the cluster of models:

(Notice I’ve now thrown away the outlier model and one of the two near-identical ones.) This is saying that models are probably more like each other than they are like the real world. I think most visitors to this blog would agree…

There is one more decision. The difficulty is not just in combining models but also interpreting the spread in results. Does the ensemble cover the whole range of uncertainty? We think it probably doesn’t, no matter how many models you have or how excellent and cunning your choices in varying the uncertain input parameters. We will say that it does have the same kind of “shape”: maybe the ensemble spread is bigger for Arctic temperatures than for tropical temperatures, so we’ll take that as useful information for the model uncertainty. But we think it should be scaled up, should be multiplied by a number. How much should we scale it up? More on this later…

All of this was just to turn the ensemble into a prediction of LGM temperatures (from the LGM ensemble) and climate sensitivity (from the doubled CO2 ensemble), with uncertainties for each. We will now compare and then combine the LGM temperatures with the reconstructions.

Here is the part where we inflate – actually the technical term, like a balloon – the ensemble spread to give us model uncertainty. How far? The short answer is: until the prediction agrees with the reconstruction. The long answer is a slightly bizarre analogy that comes to mind. Imagine you and a friend are standing about 10 feet apart. You want to hold hands, but you can’t reach. This is what happens if your uncertainties are too small. The prediction and the reconstruction just can’t hold hands; they can’t be friends. Now imagine that you so much want to hold their hand that your arms start growing….growing…growing… until you can reach their hand, perhaps even far enough for a cuddle. You are the model ensemble, and we have just inflated your arms / uncertainty. Your friend is the reconstruction. Your friend’s arms don’t change, because we (choose to) believe the estimates of uncertainty that the reconstruction people give us. But luckily we can inflate your arms, so that now you “agree” with each other. [ For those who want more detail, the hand-holding is a histogram of standardised predictive errors that looks sensible: centred at zero and has most of the mass between [-3,3]. ]

Now we combine the reconstructions with the ensemble “prediction” of the LGM. This gives the best-of-both-worlds. The reconstructions give us information from the real world (albeit more indirect than we would like). The model gives us the link between LGM temperatures and climate sensitivity. The model ensemble and reconstructions are combined in a “fair” way, by taking into account the uncertainties on each side. If the model ensemble has a small uncertainty and the reconstructions have a large uncertainty, then the combined result is closer to the model prediction, and vice versa. This is a weighted average of two things, which is easier than a weighted average of many things (the approach I described earlier). [ Those who want more details: this is essentially a Kalman Filter, but in this context it is known as Bayes Linear updating. ].

To recap:

Reconstructions – we use plant- and bug-based reconstructions of LGM temperatures.

Model prediction – after throwing out models that aren’t very similar, we take the average of the others as our “prediction” of Last Glacial Maximum (LGM) temperatures and climate sensitivity.

Model uncertainty – we multiply the spread of the ensemble by a scaling factor so that the LGM prediction agrees with the reconstructions.

LGM “prediction” – we combine the model prediction with the reconstructions. The combination is closer to whichever has the smallest uncertainty, model or reconstruction.

Now for climate sensitivity. The climate sensitivity gets “dragged” by the reconstructions in the same way as the LGM temperatures. (For this we have to assume that the model uncertainty is the same in the past as the future: this is not at all guaranteed, but inconveniently we don’t have any observations of the future to check). If the LGM “prediction” is generally colder than the LGM reconstructions, it gets dragged to a less-cold LGM and the climate sensitivity gets dragged to a less-warm temperature. And that’s…*jazz hands*….a joint Bayes Linear update of a HadCM3 perturbed parameter ensemble by two LGM proxy-based reconstructions under judgements of ensemble exchangeability and co-exchangeability of reality.

I’m afraid the result itself is going to be a cliffhanger. As I said at the top, I want to talk about the method without being distracted by our preliminary result. But if you’ve got this far…thank you for persevering through my exploratory explanations of some state-of-the-art statistics in climate prediction.

Just as I post this, I am begininning my travels home from Vienna so apologies for comments getting stuck in moderation while I am offline.

Update: I’ve fixed the link to the Rougier et al. manuscript.

Caveat 1. Please note that my descriptions may be a bit over-simplified or, to use the technical term, “hand-wavy”. Our method is slightly different from the statistics manuscript I linked to above, but near enough to be worth reading if you want the technical details. If anyone is keen to see my incomprehensible and stuffed-to-bursting slides, I’ve put them on my page. I’ve hidden the final result of climate sensitivity (and the discussion of it)…

Caveat 2. This work is VERY PRELIMINARY, so don’t tell anyone, ok? Also please be kind – I stayed up too late last night writing this, purely because I am all excited about it.

* Not listed on the PalaeoQUMP website is Ben Booth (who has commented here about his aerosol paper), an honorary member who helped me a lot with the climate modelling.

** N.B. if you want to use the pollen data, contact Pat “Bart” Bartlein for a new version because the old files have a few points with “screwed up missing data codes”, as he put it. These are obvious because the uncertainties are something like 600 degrees.***

*** No jokes about palaeoclimate reconstruction uncertainties please.