Measuring the Cloud: A Lifecycle View of Data Centers
Gabriella Waters, Civitaas Co-Founder
Measuring the environmental impact of data centers is usually framed as a story about impressive efficiency metrics and “100% renewable” marketing. But if we’re being both honest and serious about responsible AI and digital infrastructure, we have to ask a more basic question: what exactly are we measuring, who chose those yardsticks, and who is left out of the frame? I’m making an attempt at sketching out a different way of instrumenting data centers in this post. Just like the AI systems that data centers fuel, these centers are sociotechnical in nature, with a footprint that is experienced unequally across people, places, and time.
Why “Green Data Centers” is the Wrong Question
Most public conversations about data center sustainability start and stop at a narrow set of numbers: Power Usage Effectiveness (PUE), the share of “renewable” electricity contracts, and maybe a line about water‑saving cooling. Those metrics are not meaningless, but they are profoundly incomplete.
First, they collapse everything into a single, aggregate story about efficiency. A facility can proudly advertise world‑class PUE while still driving new fossil generation somewhere on the grid (I.e., pulling water from stressed basins, or locking in land‑use patterns that local communities never consented to). Second, the standard dashboards are almost entirely operator‑centric. They tell you how well the company is doing against its own goals, not how the surrounding community or ecosystem is experiencing the build‑out. In other words, we have plenty of numbers, but very few impact‑aware numbers (Eichler et al., 2018).
What we need is a way to see a data center as a bundle of infrastructures, energy, water, land, materials, labor, and networks, and to assign responsibility across that bundle, not just to a neat box labeled “IT load.” This would allow us to refine the set of questions we need answered, and to set our measurement goals.
A Lifecycle, Not a Building
If you zoom out, “the data center” isn’t just a building full of racks. It has a lifecycle, and each stage shifts who carries the environmental and social cost. A simplified version of that lifecycle might look like Figure 2 below.:
Figure 2: Each station is a lifecycle stage; each band is a layer of metrics. (Graphic generated by artificial intelligence)
Different actors make decisions at each of these stages: developers, utilities, local governments, cloud tenants, hardware vendors, and different communities feel the effects. The typical sustainability story largely flattens this lifecycle into a single stage of “operations,” where we measure energy in vs. useful energy out. That’s similar evaluating a factory by how efficiently its machines run, without accounting for how those machines were made, where the inputs came from, or what happens when they’re scrapped.
For responsible AI, this matters because many of the most energy‑intensive workloads (large model training, high‑availability inference) are being piled onto infrastructure whose full lifecycle impacts we barely track.
What Conventional Metrics Miss
People who talk about “green” data centers usually reach for the same set of numbers as seen In Figure 2 and detailed below:
Figure 2: Conventional Metrics vs. Emission Breakdowns (Graphic generated by artificial intelligence)
PUE (Power Usage Effectiveness): how much total power the building uses compared to the power going directly into the servers. Lower is “better.”
WUE (Water Usage Effectiveness): how much water is used to support a given amount of computing. Again, lower is “better.”
Scope 1, 2, and 3 emissions:
Scope 1 = pollution released directly on site (for example, from backup generators).
Scope 2 = pollution from the electricity the data center buys.
Scope 3 = everything else up and down the supply chain (building materials, hardware manufacturing, shipping, and so on).
“Total CO₂” or “carbon neutral” claims: big summary numbers, often reduced or “balanced out” on paper by buying offsets or renewable energy credits rather than changing how the facility actually runs.
These numbers can be genuinely helpful for engineers and high‑level planning. The issue isn’t that they are fake; it’s that they can quietly bake in a few big assumptions:
They focus on efficiency, not overall size. You can keep adding more “efficient” data centers and still increase total pollution.
They usually ignore when and where the impacts happen. Using power at 3 p.m. during a heat wave on a coal‑heavy grid is not the same as using power at 3 a.m. when there’s extra wind or solar available.
They talk about “the data center” as if it’s just one box, instead of a whole network of buildings spread across different grids, water sources, and communities. The combined effect of that network is what really matters.
They rarely capture the localized impacts that nearby communities feel: extra traffic, constant noise, diesel fumes from generator tests, or the fact that land and water are being tied up for this use instead of others.
If we want numbers that help with real‑world decision‑making, not just with glossy sustainability reports, we have to design metrics that fill in these gaps on purpose, instead of treating them as acceptable blind spots.
A Four‑Layer Model for Measuring Impact
One way to do this is to think in layers. Instead of chasing a single magic number, we can structure measurement around four distinct layers, each with its own questions and metrics:
Physical resource flows
Spatiotemporal patterns (where and when impacts land)
Social and ecological externalities
Governance and disclosure quality
This isn’t a new standard. It’s more like a way to organize what a responsible measurement system should capture. The layers are not interchangeable. Good governance cannot “cancel out” unreported physical footprints; elegant dashboards do not compensate for opaque siting decisions. The point is to make it harder to hide behind any one metric so that we can effectively capture real-world impacts.
Now, onto the four layers:
Layer 1 – Physical Footprint: Carbon, Water, Materials, Noise
Layer 1 is the most familiar: carbon, water, materials, and other direct physical flows. The twist is in how we slice them.
Instead of only reporting annual facility‑level emissions, we can ask:
How much of the data center emissions come from operational factors (running the facility) vs. embodied factors (constructing it and manufacturing the hardware used inside it)?
Can we express emissions as kgCO₂e (kilograms of CO₂‑equivalent) per compute‑hour by workload class? An AI training run is not the same as file storage or email. If we care about AI specifically, we need to tease out its share of the footprint.
How do we track energy provenance? Rather than claiming “100% renewable” on a yearly ledger basis, we can report the actual grid mix and on‑site generation by hour.
For water, can we break down blue‑water (the visible fresh water we pull from rivers, lakes, reservoirs, and underground aquifers for human use) withdrawals per kWh and per rack, especially in water‑stressed regions?
On the materials side, metrics like “embodied carbon per installed rack,” refrigerant leak rates, and the share of hardware that is refurbished or recycled at end‑of‑life become important. For example, every server refresh has a material history, not just a performance bump. That needs to be visible to be measurable.
A justice‑aware twist: for each of these metrics, we can ask whether they are reported at a level granular enough for external parties like researchers, regulators, and communities, to do their own analyses, instead of relying on aggregated corporate summaries.
Layer 2 – When and Where Impact Lands
Layer 2 is about context. The same physical flows mean different things depending on when and where they occur.
On the “when” axis, we can imagine:
Hourly emissions factors that show how carbon intensity of the grid changes over the day and seasons.
Peak‑coincidence scores that indicate how often the data center’s load aligns with periods of system stress (hot afternoons, cold snaps, low‑renewables periods).
On the “where” axis, we can:
Map energy use to specific grid regions, not just “in Country X.” This makes it clear that when a facility claims to be green, it is actually leaning on a relatively carbon‑heavy or fragile part of the grid.
Link water withdrawals to particular watersheds and their seasonal stress levels. Using water for cooling in a region already under severe drought conditions is different from using it in a place with abundant supply.
Taken together, these spatiotemporal metrics answer questions like: Is this data center effectively subsidized by a coal‑heavy part of the grid at peak times? Is it amplifying local water scarcity during the hottest months? Phrased differently: Layer 2 asks who else is sharing the infrastructure the data center depends on, and how its behavior changes their risk.
Layer 3 – Externalities and community‑level indicators
Layer 3 moves from biophysical metrics to social experience. The goal here is to reveal the forms of burden and benefit that don’t show up in PUE. Some candidate indicators:
A community burden index that combines:
Traffic and heavy‑vehicle movements linked to construction and operations
Noise levels from cooling and backup generators
Local air quality impacts from diesel testing and emergency use
Shifts in local electricity prices or reliability
A community benefit profile keyed to concrete commitments:
Long‑term jobs (not just temporary construction spikes)
Tax contributions and how they are earmarked
Support for local education, training, and digital inclusion initiatives
These metrics should be co‑designed with the people who live near the facility, and ideally codified in community benefit agreements. They are not nice-to-haves; they are part of the measurement logic. Without them, we effectively say that environmental impact stops at the property line.
In a responsible AI context, this layer can also surface whose data and labor are flowing through the facility (content moderators, data labelers, local service staff) and whether they have any say in the decisions that affect their environment.
Layer 4 – Governance, Transparency, and Data Quality
Layer 4 is about the measurement system itself. It asks: how trustworthy, granular, and usable is the information we have?
We can define metrics not just for “impact” but for the quality of reporting and disclosure:
Granularity: Are emissions, water use, and other metrics reported per facility, per month, per hour, or only as annual global totals?
Verifiability: Is there third‑party auditing or assurance of the reported numbers, or are we relying on unaudited corporate claims?
Standardization: Are metrics aligned with existing methods (e.g., widely used carbon accounting approaches, recognized efficiency standards) so that different operators and regions can be compared?
Machine‑readability: Are data released in formats that regulators, communities, and researchers can actually work with, rather than buried in PDFs?
An operator’s “sustainability maturity” should be at least partly assessed on this layer. A company that reports fewer absolute emissions but offers no facility‑level, verifiable data is arguably less trustworthy than one that reports a higher footprint with full transparency.
For AI practitioners, this layer directly affects whether we can make informed choices about where and how to run workloads. If I want to schedule training jobs in low‑carbon hours or avoid regions with severe water stress, we’ll need reliable, machine‑readable data from the infrastructure providers.
Instrumenting Specific Operational Domains
Within this four‑layer model, we can zoom into specific domains and ask what measurement looks like.
Power and cooling
For power and cooling, an instrumentation strategy might include:
Meters at multiple levels: rack, row, room, and facility, to separate IT load from cooling and conversion losses.
Scenario experiments: measuring the difference between air and liquid cooling in both energy and water terms, in different climates.
Backup power profiling: tracking not just fuel consumption but frequency and duration of generator use, and linking that to local air quality indicators.
These metrics help separate genuine efficiency gains from shifts that simply move impacts elsewhere (for example, using less water but more refrigerants with high global warming potential).
Compute, storage, and network
On the compute side, we can connect infrastructure metrics to software behavior:
Workload‑level impact: estimating CO₂e per training run, per inference million, per TB‑month of storage, or per GB of data transferred.
Class‑specific baselines: differentiating between classes like “LLM training,” “low‑latency inference,” “cold storage,” “backup,” or “batch analytics.”
This is where workload tags and APIs matter: cloud providers could expose interfaces that let users request impact estimates for their jobs and optimize scheduling accordingly. Without this link, all the careful facility‑level measurement stays trapped on the provider’s side of the wall.
Who benefits from better measurement?
If we build and adopt metrics along these four layers, who actually gains?
Local communities gain leverage: Disaggregated public data lets them see how a facility is affecting noise, air, water, and grid behavior, and whether promised benefits are materializing. That, in turn, strengthens their position in negotiations over siting and expansion.
Regulators and policymakers get sharper tools: Instead of blunt caps or vague ESG reports, they can see facility‑level impacts, design location‑appropriate rules, and monitor compliance in something close to real time and adapt policies accordingly.
Investors and insurers can better understand physical and transition risks. A facility heavily exposed to drought, fragile grids, or social conflict is a different proposition than one that has invested in resilient infrastructure and robust community relationships.
Researchers and advocates gain data for independent analysis: They can, for example, correlate AI workload growth with changes in local grid emissions or water stress, or test whether community benefit agreements actually deliver.
Cloud customers and AI teams can make more informed choices about how and where to run workloads, especially if impact is exposed at the job or service level.
Notice who is often left out in current practice: everyday end‑users. Very few people have any visibility into the environmental profile of the apps and services they use, even though they are frequently the ones paying for “green” options. One radical idea to inform the broader community is to issue impact receipts, small, human‑readable summaries that describe the location‑aware carbon and water cost of certain services or workloads. That could make the environmental cost of AI much less abstract.
Figure 3: Example Impact Receipt - Imagine if cloud consoles and AI platforms issued ‘impact receipts’ alongside cost estimates.
From Optimization to Constraint:
A final shift concerns how we use metrics. So far, the implicit goal has often been optimization: squeeze out more efficiency, improve PUE, and reduce per‑kWh water use. But from a justice perspective, some dimensions should not be endlessly optimized; they should be treated as constraints, especially where further gains would deepen inequities or shift burdens onto already disadvantaged communities.
For example:
Set hard ceilings on water withdrawals in specific watersheds, regardless of the efficiency of cooling technology.
Establish non‑negotiable limits on hourly emissions in regions with strict climate targets, forcing rescheduling or relocation of certain workloads.
Recognize community veto points: conditions under which local opposition or clear evidence of harm pauses expansion, even if the numbers look “good” globally.
In other words, metrics should not only feed a corporate optimization loop; they should empower communities and regulators to say “no” or “not like this.” That’s the connective tissue between environmental measurement and the broader work of responsible AI: we’re not just counting harms, we’re building the conditions under which some harms are ruled out.
Where This Goes Next
This post barely scratches the surface, but it suggests a direction away from a single efficiency‑centric story, toward a layered measurement model that ties data centers to their lifecycles, their contexts, and their communities. For researchers, there is an open agenda here: building open benchmarks that combine software‑level metrics with site‑level externalities, validating justice‑oriented indices with affected communities, and experimenting with impact receipts and real‑time public dashboards.
For practitioners and local governments, there is a practical opportunity: treat new data center projects as pilots for co‑designed sustainability measurement. Bake the four layers into procurement, permitting, and community benefit agreements. Require that metrics be public, machine‑readable, and auditable.
If we can measure data centers this way, we can stop pretending that “the cloud” is weightless and start aligning AI’s infrastructure with the communities and ecosystems that make it possible.
References
Eichler, H., Bloechl‐Daum, B., Broich, K., Kyrle, P. A., Oderkirk, J., Rasi, G., Ivo, R. S., Schuurman, A., Senderovitz, T., Slawomirski, L., Wenzl, M., & Paris, V. (2018). Data rich, information poor: Can we use electronic health records to create a learning healthcare system for pharmaceuticals? Clinical Pharmacology & Therapeutics, 105(4), 912–922. https://doi.org/10.1002/cpt.1226



