Statistical analysis can be a valuable tool for investigating scientific truth, but it is easy to abuse, as Mark Twain pointed out over a century ago when he talked about "lies, damned lies, and statistics". Arik Levinson recently published a paper on energy codes that was widely publicized as a Freaknomics podcast. He erroneously concluded that there is no evidence that codes save energy. Sadly, Dr. Levinson gets it wrong on the broad issue of framing the question, which this paper addresses, (and also on narrower statistical grounds, which is addressed here).
In broad summary, Levinson's paper sets up inappropriate expectations for savings from energy codes, leading the reader to believe that proponents of energy codes expected outcomes far more optimistic than they really did. When he fails to find these impossible results, he concludes that codes don't save energy.
The American Council for an Energy Efficiency Economy's executive director Steve Nadel also noticed this problem, and has written a brief explanation of why Dr.Levinson is in error on this issue. This paper explains the nature of the problem in more detail.
How to use statistics in a scientific paper
To make proper use of statistics, one first needs to have a well thought out rigorous hypothesis that makes sense on scientific grounds and that you have some analytic reason to believe is true. One has to be objective, and state the questions in terms that everyone will agree to, even if they disagree on how they expect the analysis to come out.
Dr. Levinson fails to do this in his energy codes paper, setting up intentionally provocative straw man arguments and recasting objective hypotheses into value-laden ones. His formulation of the research question displays some fundamental ignorance of what the code writers were doing in California, and what they thought they would accomplish, leading him to try to refute arguments that no one was making.
For example, the very first words of the article are an alleged quote from an Energy Commission document (A completed and adopted study? A claim by one staff member?):
New building codes will reduce the energy "used in typical buildings by at least 80 percent".
Levinson either misunderstands this statement or is mistaking its context, or else is quoting something that is evidently wrong. In 1979 building codes covered heating, cooling, a small fraction of lighting, and water heating. These uses accounted for far less than 80 percent of total energy consumption, therefore the claim that a code could reduce total energy use by 80 percent is evidently wrong. Refuting it does not require a statistical study!
If we limit the discussion to electricity, the claim is even more ridiculous. This author co-wrote a study (Moving California Toward a Renewable Energy Future, published in 1980) that projected potential energy savings and in particular calculated code-induced savings separately for electricity and gas, for cooling, heating, lighting, hot water, and other uses, so he knows that it was not only possible but common practice to get it right. The right answer, by the way, is that regulated uses of electricity consumed 31% of California's residential electricity in 1978.
Thus it is reasonable to assume that this was obvious to the California Energy Commission (CEC) as well, and therefore to interpret the Commission claim as meaning that codes could save 80 percent of regulated energy, the dominant portion of which was gas.
The CEC statement, if correctly cited, is used by Levinson as a prediction of what codes will imminently do, as opposed to its real use as a statement of what codes COULD do. In fact, the standards that the Commission actually proposed at this time were expected to save far, far less than 80 percent of these uses; if memory serves (I was involved in the proceedings in question) about 15% of regulated energy. And the CEC was not fully successful in adopting or enforcing even these more modest goals.
And the NRDC report in question, which did predict a savings in regulated electricity of about 80%, also projected that the statewide savings from the code would be about 1.5 percent of total electricity use in 1995, or roughly 5 percent of the electricity use of new homes. This is nowhere near the 80 percent that Levinson seems to be trying to refute.
This error is important because the paper was evidently written in a format that listeners to the Freakonomics podcast would like. This format is based on the narrative: "You thought that the answer was A, but OMG, really the answer turns out to be B." Note the use of the word "really" in this script, since Levinson employs it in the title. If this is your narrative, it is critical to get A right.
Why? Because what if A is disinformation the author is planting in order to make the listener/reader think he is challenging conventional wisdom (a key part of the Freakonomics narrative)? If the reader is led to think that we all believed that answer A is true, but no one really believed it, (because instead we all thought it was B all along), then structure of the paper would be different.
Another disturbing aspect of the paper is the quasi-normative statement "states should not be credited with saving the amount of electricity and carbon dioxide emissions promised when those codes were enacted". The issue of what states should or not be credited with is a policy recommendation, not an analytic conclusion.
In addition, it is misleading and warps the analysis, because Levinson does not correctly evaluate the issue of what savings were "promised" or expected and whether measurements corroborate or refute the expectations.
Much of his report, especially the part on natural gas use, apparently assumes that codes were adopted in 1980 with expectations of large savings. In fact, the first energy code was adopted in California in 1975 and revised every three to five years thereafter, saving in the range of 10 to 15% of prior energy use each time. If you were looking for savings of about 80% of regulated energy (cumulatively from pre-1975), you could not find them until 2015 or later, and he doesn't have data that recent. It is important to note that the 10 to 15 percent refers to regulated total energy--the expected savings in electricity as metered are more likely 5 percent or less. This is a very small signal in the midst of lots of noise.
If you are trying to find a signal from statistics, you first have to look for it. If the equations do not look for a slowly increasing level of efficiency from codes--in other words a very small signal trying to emerge from lots of noise, beginning in 1975, you are not likely to find it even when it is there.
Furthermore the statement that states should not be crediting savings from codes is wrong because electricity and carbon savings are not calculated as "savings" from a base case. Forecasts and plans for electricity and carbon emissions levels are forecast from the ground up by projecting energy use AFTER EFFICIENCY PROGRAMS ARE IN PLACE. A hypothesis that might upset plans for carbon emissions and electricity efficiency would be "that metered energy use after the code is higher than predicted by the engineering models that the state uses". But Levinson does not frame much less test this hypothesis.
Evidence to date shows that it is unsupported: an analysis for California showed that energy use after codes (and also as influenced by other programs) was somewhat lower than minimum code compliance would suggest. And in several analyses of modeled energy use as performed in energy ratings, which are now used as a code compliance mechanism in some non-California jurisdictions, comparison to metered use showed that the models, which are similar to those of the Energy Commission, correctly predict metered use within a few percentage points.
Thus if savings attributable to codes are less than expected by engineering, it means that the business-as-usual case without codes (but WITH all the rest of California's and America's efficiency policies) would not have been as energy consumptive as predicted. That is, the problem would be that the post-codes expectation is correct, and the savings putatively were smaller than expected, thus the base case without codes would have to be lower in energy use than expected.
If this were true, and Levinson has not demonstrated it, codes would still be well justified: they could then be seen as an insurance policy for achieving savings that might otherwise be attributed to other programs. This attribution is important here because the cost of the other programs would have been higher. If codes do not deliver as much as expected because other programs or market factors deliver them instead, then it is not carbon or electricity futures that are misunderstood, it is the cost of achieving the savings or the balance between different types of programs. And this answer is far away from any of the issues Levinson discusses.
Levinson notes that the analysis is complicated, and lists some of the complications at the beginning of the paper. But these complications are more profound than he seems to realize, and he ignores other ones that are likely to be even more important. This is the major flaw of the paper: oversimplification and reliance on economic theory as proven fact rather than a potential explanatory hypothesis. The most troublesome example of this is Levinson's recasting the hypothesis.
Recasting the hypothesis compromises the conclusions
The hypothesis that the paper's title appears to set out is: "post-codes houses use less regulated energy than houses at pre-code levels of efficiency". But this is not what the analysis addresses.
Levinson first re-states this as something he proposes to test: "If building codes save energy, otherwise similar homes built more recently under stricter standards should use less energy."
But this is not as easy to test as it might appear. What do the words "otherwise similar" mean? More recent homes are bigger and have different designs than older homes, and correcting for size is not simple, since if you do it statistically you are potentially studying the effect of other covariant factors rather than size. And you are certainly confounding the influence of code-based efficiency with the other variables. Also the occupants may be different--from different cultures or income levels or ages or family characteristics. This discussion only scratches the surface of what must be accounted for to characterize "otherwise similarity".
I used this quirky expression of "otherwise similarity" because it is not a well-defined concept, and trying to define it is problematic. What parameters should you hold constant? When is "otherwise similarity" not the same as identicality?
What Levinson's working hypothesis could have said is that "the exact same house built after code adoption, operated in the exact same way, should use less regulated energy (or electricity)". The only practical way to compare the exact same house is holding everything else equal, which is prohibitive to do from empirical data. This is where engineering simulation comes in--but Levinson dismisses this tool.
And the qualification "operated in the same way" may not be correct, as it ignores the potential that more efficient homes are operated in a different way as a consequence of the code, something that Levinson explicitly brings up.
But how would one test this hypothesis that codes may change behavior? Statistical analysis is a weak tool to use if one is trying to evaluate changes that are "as a consequence of" something. And trying to make conditions "otherwise similar" using statistical adjustments raises red flags that you may be normalizing for the very effect you are trying to measure, or otherwise risking a descent into circular logic.
So now we get to the third-level-of-abstraction hypothesis that the paper actually tries to test: "do later post-code vintages of houses use less electricity than earlier ones?"
This hypothesis is very different. First of all, it writes gas completely out of the equation. This is troubling because most of the energy use pre-code--especially the regulated energy use--was from gas. Secondly, it does not compare expectations from engineering analysis to metered data. The expectation it looks at is the fallacious one that post-code homes are similar to pre-code homes. If we looked analytically at what we would expect, it is not at all clear what to expect!
A priori--that is without looking at the data but considering secular trends in California housing markets--one might expect Levinson's hypothesis to be false even if housing did save 75 or 80 percent of heating and cooling energy, as many retrospective analyses by Energy Commission and utility staff have found (comparing the 2013 code, or earlier versions, with pre-1975 practice).
Why the difference? Because pre-1975 houses are smaller, are located in milder climates, and are lived in by different demographic groups. In specific, most new houses built since 1975 were located in suburban greenfield developments, the sort of housing favored by larger and higher-income households who are more conservative in their voting patterns and likely less amenable to appeals to save energy. They may have been more prone to install large energy users such as spas or home theater, or even lots of TVs, a product whose energy use rose rapidly after the introduction of high definition features and large flat screens, only to fall off rapidly after about 2007 due to incentives, standards, and labels. Housing size and the harshness of the climate in which the houses are built have grown steadily since 1975. (New housing has tended to be built on cheaper land farther from cities and from the coast, allowing larger houses and requiring more air conditioning. Temperatures 40 miles inland are often 40 degrees F warmer in summer than those on the coast.)
Levinson tries to address some of these variables--not all of them--but does so in purely statistical ways that risk confusing signal with normalization. How do we adjust for size? By comparing the effect of size on energy use for new homes? Then we are being self-referential. By using statistical analysis of existing homes? What if older homes do not resemble new homes in terms of energy consumption? (And there is a big difference between size trends based on energy efficiency.) If this is the case, size-dependence will be different for new homes than for existing. And others factors may corrupt the analysis, such as differences in attitudes towards energy, and differences in family characteristics, are not considered.
Additionally, the energy use of older houses has declined compared to what had been expected, due to retrofit programs by utilities, appliance efficiency standards that affect all homes (and apparently affect older homes more than newer ones), utility incentives for nonregulated efficiency measures or for beyond-standards products, and due to tariff designs intended to promote conservation, not to mention energy education programs by DOE, EnergyStar, the local utility, Flex Your Power, etc.
So as a matter of experimental design, one would first have to estimate what one would EXPECT energy consumption as a function of vintage to be, before testing, much less making claims, about whether expectations are corroborated or overturned.
This is a systematic failure of the Levinson analysis. Good science practice is to compare what is predicted with what is measured. Instead Levinson just starts fitting whatever data sets he can find willy-nilly, hoping to find something that is (or isn't) statistically significant.
As Levinson's curve fits show, what one would NAIVELY have expected is not what is observed. The patterns found in Levinson's statistics are complex and not susceptible to simple explanations. This should cause the scientist to think more deeply about the problem and define analytically what would have been expected, since it is evident that what one intuitively expected--namely that houses don't change much with vintage other than because of energy codes--is not what is observed.
This exact problem with vintaging demonstrating exogenous influences was seen in the benchmarking study of New York City office buildings. In that case as well, the newer buildings used more energy per square foot than the older ones, and one might be tempted to make the same argument about building codes. But it turned out that the energy intensity increase wasn't because they were less efficient. It was because their users were more demanding. The newer buildings were all Class A, and these buildings used some 25% more energy than Class B, everything else being equal.
Correlation is not causality: newer houses using more energy than older vintages is an interesting observation that may or may not have anything to do with codes.
Cause-and-Effect or Attribution?
Levinson also employs a quirky interpretation of the words "really save" that changes the entire framing of the question. This subtle pivot inherently biases the results to show less savings than at least this reviewer would otherwise have expected.
The paper claims to address the question "How much energy do building energy codes really save?" but it pivots to try to answer another question instead. It subtlely reframes the question to be "how much energy savings can be unambiguously attributed to energy codes?" This is like the difference between asking "how many basketball games did Coach Jackson win?" and "how many wins can be attributed to Coach Jackson's coaching?"
Energy policy is not a set of independently operating categories. Instead it is a seamless whole, in which energy codes, equipment standards, information programs, labeling programs, financial incentives, and tariff policy all try to produce synergistic results. If one tries to attribute savings to only one of them, the analysis will show (or in this case assume) that none of the others is effective, and thus dilute the "measured" effect of the one being analyzed. This is why the basketball team metaphor is apt: a win often cannot be unambiguously attributed to one team member but rather occurs due to try to refute arguments that no one was making.
As Levinson states: "If homeowners would eventually increase the energy efficiency of their buildings anyway, building codes might save homeowners that expense but should not be credited with reducing energy consumption relative to a world without the codes, at least not for long."
But this is only true if the two issues--improvement of existing homes and improvement of new homes--were not linked. But they are, in fact, through incentive programs for equipment and lighting, through education and tariffs, and through market transformation, in which products introduced for code compliance become available to remodels and retrofits as well.
And in addition, if it were true, it would imply that codes should be relied on just as they are now, because they require actions that would have been taken anyway, and therefore cost nothing compared to what would have happened anyway. Actually even this understates the advantage of codes, because doing it anyway LATER usually costs a lot more than doing it when the house is under construction. And codes impose the costs on the beneficiaries, whereas other programs have a shared cost.
So we see in Levinson's own words that the issue is not how much energy is saved after codes are adopted, but how much energy should be CREDITED to codes. Forecasts of low-carbon or low energy futures are not concerned with credit, they are only concerned with the bottom line.
You can only find what you are looking for
The paper is also compromised by Levinson's apparent confusion about the history of building codes. This causes the wrong hypotheses to be generated and tested.
As I mentioned, Levinson appears to think that codes made a giant leap forward in 1980, when in fact the progress was more gradual and began in 1975. This affects his hypotheses and how he analyzes them in two different ways, both of which lead to error: Consider the first problem--for gas.
Levinson says, "Homes built in the late 1970s, before California's building codes took effect [emphasis added], were already using 8 MBTUs less natural gas per year than homes built before 1940. Homes built most recently use another 8 MBTUs less than those built in the late 1970s. Newer homes do appear to use less natural gas than older homes, even after adjusting for home and occupant characteristics. But that pattern is apparent for homes built both before and after the establishment of California's building codes [emphasis added], and appears to have been unchanged by those codes."
Here Levinson is dismissing results that he doesn't like--that codes save lots of gas--on the grounds that there is a fixed date for the primary implementation of codes, namely 1980. If codes savings are gradual, and started earlier, then all of these savings can be credited to codes. Note my use here of Levinson's term "credited", meaning that even his inappropriate distinctions show that the code saved energy.
This analytic distinction is critical. If the first 8 MBTUs of savings are believed to have occurred prior to the adoption of energy codes, then an economist might assume that such savings are due to secular trends or price effects and would continue even without codes. But if they are measured as occurring AFTER the first codes, and the analyst knows that codes continually become more demanding, then the economist could attribute the entire pattern of savings to codes.
Gas is important to this analysis because heating is the largest energy use in the home, and is primarily gas. If gas is saved this is even more important because of the second point:
Codes drove new construction away from electricity towards gas.
Levinson has this wrong: "Figure 4 plots the proportion of homes with electric space heat or hot water in California, according to when the homes were built. Electrification increased until the late 1970s, when 15 percent of homes had electric heat, hot water, or both. After that, the trend reversed, so that very few homes built recently have electric hot water and almost none have electric heat. Why? In the 1950s a consortium of utilities and appliance manufacturers launched 'Live Better Electrically' campaigns, granting allowances to home builders to construct all-electric homes throughout the United States. But by the 1980s, the program had ended along with popularity of all-electric homes."
This narrative illustrates an error frequently found in economic fundamentalist literature: crediting price effects or other economic factors for things that were actually regulatory in nature, and then finding that you can't credit the regulations because the savings were all spoken for already.
What actually happened is that the 70s-era codes essentially banned electric resistance heating. Electric heating was not expanding in the 1950s through 70s due mainly to the "Live Better Electrically" campaigns; rather it was a result of electric heat being cheaper for builders to install. Since it is less resource efficient and more expensive than gas, the code discouraged it, successfully. The data in this paragraph of the report confirm how successful this discouragement was.
But Levinson was not looking for this effect, so of course he didn't find it. Yet it would fundamentally alter his conclusions, even after controlling for all of the other errors.
This change also makes his findings about lower gas use all the more significant: gas was being called on by code to take on more of the load from electricity. Yet despite this code-induced shift, gas use went down over time.
Potential biases toward behavioral/price-induced effects
This problem (of structuring the statistics so that all changes can be assigned only to price effects) is rife throughout the efficiency literature that this author has reviewed. One notorious example claimed to show that appliance efficiency standards didn't save anything because price elasticity studies performed for the 1970s and 80s before national appliance standards went into effect showed that large saving were achieved without regulation during a time period wherein energy prices went up. What the author failed to understand was that California had regulated the efficiency of several products during this time period, and the regulations were complied with throughout the country (due to economies of scale) so the attribution of efficiency to price was incorrect.
Levinson apparently tries to blame engineering models of energy outcomes that he asserts are inaccurate in the direction of over-predicting savings--social scientists seem to love disproving of engineering models, even when the evidence supports the models--which allegedly ignore economic effects. But the CEC's models adjust for behavioral changes, and have done so since the 1970s. This sets up a false dichotomy between "projected" savings and "real" savings. This is Levinson's word ("real") in the title--not mine. And it is an important observation, since the word is gratuitous. Its use implies that statistical simulations of savings are somehow more "real" than engineering simulations of savings.
As I noted above, if one wants fully to correct for changes other than codes between pre-code houses and various generations of code-compliant homes, one has to compare apples with apples--holding everything constant beyond the efficiency of the house. This is virtually impossible to do with statistical regression alone, as Levinson seems unintentionally to have demonstrated, but is relatively easy and cheap with models. The only expensive thing you would have to do is measure indoor thermostats and appliance ownership rates to correct for possible behavioral changes as the efficiency of the house changes.
It is also an example of a distressing similarity the author has observed in other articles skeptical of efficiency: it assumes that all engineering models are identical. Thus, if you can prove that one of them is wrong, then one can believe they are ALL wrong. This appears to display an unprofessional lack of respect for the work of analysts in other disciplines.
If you believe that statistical simulations are more real than engineering simulations, you are confronted with a paradox: if statistical adjustment of sampled, reported metered energy use is more "real" than modeled energy use, is not direct statewide measurement of energy use more real than statistically modeled numbers? The paradox is: if measured statewide residential electricity use per capita flattened in 1975 in California and grew steadily in the rest of the country, and energy use increases with vintage of house, how could this have happened?
What is a testable hypothesis that can resolve the paradox? Levinson does not suggest one.
Summary and Conclusions
Others have performed the same type of analysis in a more rigorous fashion, and conclude that energy codes have statistically demonstrated energy savings. Notably, Levinson fails to evaluate this paper or the others it references.
Levinson's work tests a very different hypothesis than the one he claims to be testing. His results are portrayed as if they are surprising--as if they show that the most reasoned expectations of codes were wrong.
But that is not what the study demonstrates. The study ASSUMES out of whole cloth that one would expect a post-codes home to use less energy in the study years than a pre-codes home without first establishing what one would have been (or was) predicted before looking at metered data.
The assumption is a poor one, and the Levinson analysis shows that it is.
In fact, this could have been determined at the outset: contemporaneous analysis showed that reasoned expectations for code-based electricity savings were not large: a savings of 80 percent in regulated energy was expected to produce a reduction in total metered use of only about 5%.
Thus the Levinson study doesn't say anything about the statistically estimated savings, much less the "real" savings, from energy codes in California. All of the evidence to date is consistent with the prediction that energy codes save about as much energy as they were expected to save.