Offset Testing IS Hypothesis Testing (and why that matters!)
In a recent Club House session on carbon offsets an audience member asked the panel about the "single most important thing" that could help increase confidence in the whole idea of carbon offsets. My "single most important thing" would be for the offset world at large to recognize that testing carbon offsets for environmental integrity involves the same kind of hypothesis testing challenges that we encounter in all kinds of situations, from pregnancy testing to guilt and innocence determinations in the judicial system. Because this point is so rarely recognized or discussed, I'll explore it further below.
You may have seen my pregnancy testing analogy reflected in this slide.
But in many respects analogizing carbon offsets to guilt or innocence determinations in the judicial system is even better. Because with pregnancy tests you'll eventually know whether the test was right or wrong. With trial outcomes, in many cases you can simply never know for sure, just as in the case of carbon offsets. There is a twist to this that I'll explore more at the end of this piece.
So when someone goes to trial, the hypothesis is that they're guilty of the crime. But there are actually three potential realities.
It may be that jurors arrive at the "right" verdict (the green boxes above). But they could also arrive at an incorrect finding of innocence, and an incorrect finding of guilt. Convicting an innocent person is an example of Type 1 error (a false positive, one of the red boxes above). Clearing a guilty person is an example of Type 2 error (a false negative, the other red box above). Note that carbon offset testing similarly has three potential outcomes. The "right" conclusion re environmental integrity, an incorrect affirmation of environmental integrity (Type 1 - false positive), and an incorrect denial of environmental integrity (Type 2 - false negative).
You can see the false positives in this slide. In other words, the standard of judgement determines how many innocent people are found guilty. As we'll see, there is no way to simply move the standard of judgement all the way to the right because then you'd also be releasing all the guilty people!
The real question is how the judicial system balances false positives and false negatives. In the slide below you can see that it would certainly be possible to have roughly equal numbers of false positives and false negatives, depending on how you set standards of evidence, etc.
As a policy matte, however, we're more worried about convicting the innocent than letting the guilty go free. That's why we have a "beyond a reasonable doubt" standard for conviction, which biases the system towards more false negatives (guilty people walking free) than false positives (innocent people going to prison).
It's worth noting 2 things here:
- If we simply assumed (as many of us do) that we can reliably tell who is telling the truth and who is lying, and if we did not have a "beyond reasonable doubt" standard, FAR more innocent people would be convicted.
- It's impossible to solve for false positives and false negatives simultaneously. Anything you do to reduce false convictions will increase the number of guilty getting off, and vice versa. This is a CRITICAL INCONVENIENT TRUTH when it comes to hypothesis testing.
When it comes to carbon offsets, no matter what tests (standard of judgement) we put into place, there will ALWAYS be false positives (tons inappropriately allowed into the offset pool), false negatives (tons inappropriately denied entry into the offset pool), and "real offsets." And unfortunately, false positives will ALWAYS be inversely related to false negatives as shown below. It's worth nothing that because a lot of false negatives decreases offset supply and increases prices, there is always pressure to limit false negatives. What does that inevitably mean? Yup! More false positives (whether intentional or not!)
If we don't explicitly recognize the need to prioritize between false positives and false negatives in offset test, we are likely to end up with FAR more false positives than if we fail to recognize that need (as we normally do). This reality is reinforced by the graphic below, a back of the envelope calculation my team did years ago of 2 billion tons of ongoing emissions reductions and carbon sequestration in the U.S. alone that would constitute false positives if approved as offsets.
Why would they be inappropriate? Remember the definition of carbon offset additionality (and note that the same definition applies to "sequestered or carbon capture tons").
The bottom line is that there are BILLIONS of tons of already happening "emissions reductions," and HUNDREDS OF BILLIONS of already happening "carbon sequestration" that would constitute false positives in a carbon offset system (and thus not advance climate change mitigation objectives). Why hundreds of billions of tons? Because hundreds of billions of tons cycle between the atmosphere and the biosphere every year, going into trees, soils, the oceans, etc. It's called the natural carbon cycle, and without a ROBUST effort to distinguish between these "already happening" tons moving around through the carbon cycle, and "additional" tons represented by offsets, any market will be absolutely swamped with "already happening" tons and lead to no climate change benefit.
The challenge is reflected here. If you ignore additionality, you'll end up with an almost entirely non-additional offset pool given that non-additional tons are both the lowest risk and the lowest cost from the standpoint of market makers. They will be the first tons that get sought out by project developers if they're able to do so.
And even if you do try to account for additionality, but don't do it well enough, you'll still end up with an almost entirely non-additional market.
In this article I'm not referring to any particular offset or category of offsets. I'm not getting into the literature evaluating carbon offsets. I'm approaching the topic purely through the lens of statistical hypothesis testing, and the realities of today's emissions and natural carbon cycle. I'm simply pointing out that if we never even recognize that we're engaged in hypothesis testing when it comes to carbon offsets, we're likely to end up with a system with FAR less environmental integrity than if we had asked the right questions going in.
So what's the "single thing" that would make the most different to offset credibility? Recognizing that we're engaging in hypothesis testing that requires serious attention to the potential for false positives and false negatives, and how to balance them. It requires asking the right questions, and not just assuming we're doing the right thing. Otherwise we'll far more often fall prey to "willful blindness."
ADDENDUM - In the discussion above I kept things simple and intuitive (or at least that was my goal). My colleague Derik Broekhoff, however, mentioned that I had not adequately emphasize a critical feature of offset markets as compared to the judicial system. And it's an important point, so I've adding it here.
A big difference between the judicial system and carbon offsets is the potential for "adverse selection." In the judicial system one should be able to expect that the majority of people going to trial are in fact guilty (since we're talking offsets and not criminal justice reform, let's take that as a given for now). This is where Bayes theorem comes in. Let's say that 5% of defendants who go to trial are innocent, and you have a fair system that gets it right 95% of the time. In that case, you’d expect a “falsely convicted” rate of less than 1%.
When it comes to carbon offsets the situation is exactly the opposite. Partly driven by low prices for carbon offsets, and because non-additional "false positive" tons are by far the cheapest and lowest risk "offsets" if project developers can get them into the market, project developers (whether through willful blindness or bad intentions) have a big incentive to try and get false positive tons approved as offsets. And as we already saw above there are MANY BILLIONS OF TONS of such tons available.
It is therefore reasonable to expect the majority of tons being proposed as offsets will be non-additional, making the job of any offset standard or test more difficult. This is where where Bayes theorem again comes in. Instead of 5% of the people on trial being innocent, we might have 90% of the tons being proposed as offsets being non-additional. Even with a really solid offset test – say one with the same 95% accuracy rate we assumed for the judicial system, you can still expect around *half* your approved tons to be non-additional (as opposed to less than 1% of innocent people being convicted). And given that there is no reason to believe that the accuracy rate of offset testing is anywhere near 95%, who knows how many bad the resulting offset pool is from an environmental integrity perspective. 50% "real?" 30? 10? We just don't know, since we so rarely even recognize that we're engaged in hypothesis testing!
By the way, if you want to read the seminal paper on this topic, you can download it here.
Researcher at BEA International
3yLove this
Director, Climate and Nature, Palladium
3yHi Mark - nice think piece. Agree that fundamentally the Project developers have an incentive to maximise generation of credits... it's a market mechanism... so it is the role of standards and verifiers to ensure the 'realness' of any credits issued. I disagree with the 90% non-additional in the outro part, but even if it's say 30% - if they are being used to 'offset' fossil fuel use, and continue a social licence to operate - then the climate (and hence we) are all worse off. Owen Hewlett Vikash T. - i think you'll like this article.
Climate Risk Knowledge Management | Climate Red Team | Leveraging AI for Climate Risk | Scenario Planning | Carbon Offsets | Educator | Communicator | Speaker
3yFolks, I've done some editing and added an important Addendum to the article above. Consider sharing it!
Mark, excellent as always. One point I would add - or, really, emphasize since you already allude to it - is the problem of adverse selection. The difference between the judicial system and offset markets is that one can expect (I hope, at least!) the majority of people who are arrested to in fact be guilty. Because of how the incentives work in offset markets, the opposite is true for carbon offsets – that is, in many cases, one can expect the majority of projects coming forward to be non-additional (because, as you say, "non-additional tons are both the lowest risk and the lowest cost from the standpoint of market makers.") This where Bayes theorem comes in. Even if you have a really solid test – say, 95% accurate at screening out false positives, and perfect at avoiding false negatives – you can expect around *half* your certified tons to be non-additional if 95% of the tons that *apply for crediting* are non-additional. That contrasts with a judicial system where, let’s say, only 5% of defendants who go to trial are innocent. In that case, you’d expect a “falsely convicted” rate of less than 1%. Given where offset prices have been for the past decade (in most cases), a high rate of non-additional projects applying for crediting is easy to imagine, and may be one reason why the apparent track record of most offset programs seems problematic.