June 16, 2026 · 13 min read

FinOps Audit Checklist: 20 Points to Find Cloud Waste

FinOps audit checklist with 20 actionable points across tagging, commitments, anomaly alerts, and AI/GPU billing. Score each, quantify the gap in USD/month.

FinOps Audit Checklist: 20 Points to Find Cloud Waste

Most cloud cost audits start the same way: someone opens the bill, gasps, and starts clicking through Cost Explorer looking for the scary number. That works for finding the one big thing. It does not work for finding the dozen small things that, added together, are usually bigger than the one big thing.

This is a FinOps audit checklist you can actually run. Twenty points, grouped into five categories, each one a self-contained check you score and price. By the end you will have a ranked list of waste with a dollar figure on every line, which is the difference between “we should tighten up our cloud costs” and “here is $340K a year we can stop spending, starting with these three items.”

This is the hands-on, find-the-waste-now companion to two other pieces. If you want to grade your program’s overall discipline, read the FinOps maturity assessment framework. If you are choosing what to run this on, see the FinOps tools comparison for 2026. This article is neither. It is the checklist you work through line by line to find cash that is leaking right now.

How to use this cloud cost audit checklist

Before you score a single item, get the scope right. A checklist run on a fuzzy scope produces fuzzy numbers nobody trusts.

Scope first. Pick a billing window of at least one full month so monthly commitments, savings plans, and lifecycle jobs all land inside it. List every account and subscription in play, not just the obvious production ones. The waste hides in the forgotten dev account, the acquired company’s leftover project, the sandbox someone spun up for a demo in 2024. Confirm which cloud providers you are auditing, because multi-cloud estates leak in the seams between them.

Score each item pass, partial, or fail. Binary scoring lies. “Do we have tagging?” is almost always a yes, and almost always useless. Use three states. Pass means it is genuinely handled. Partial means it exists but has holes. Fail means it is absent or broken. This three-state score across all 20 points becomes your FinOps Defect Score baseline, the number you re-run quarterly to prove the program is working.

Quantify every gap in USD per month. This is the step that turns a checklist into a roadmap. For every partial or fail, estimate what the gap costs each month. Untagged spend you cannot allocate, idle resources still billing, the delta between your current commitment coverage and the optimal one. Rough is fine; an order of magnitude beats no number. When you sort the gaps by dollars, the remediation order writes itself.

Know when to DIY and when an audit pays for itself. Run the first pass yourself. The checklist is designed for it. Bring in an external audit when the quantified gaps dwarf the cost of the engagement, usually the 5x to 20x ROI threshold, where a focused audit returns five to twenty times its fee in identified annual savings. The honest signal is in the scoring: if half your items come back partial or fail and the dollar column has commas in it, the work is bigger than a side-project and the math favors getting help.

Here is the structure you are filling in:

CategoryPointsWhat it protects
Tagging & cost allocation1-5Whether you can see where money goes
Commitments & rightsizing6-11Whether you are paying the right rate for the right size
Anomaly detection & budget alerts12-15Whether waste gets caught before month-end
AI/GPU billing surfaces16-20The new spend native tools were not built to audit

Now the 20 points.

Tagging & cost allocation (points 1-5)

You cannot fix what you cannot attribute. Allocation is the foundation, and it is where most audits find the work is larger than expected. The key idea running through this section: audit by dollars, not by resource count.

1. Spend-weighted tag coverage is above target. Measure the percentage of spend (in dollars) carrying a complete tag set, not the percentage of resources tagged. These diverge violently. A team can hit 95% resource coverage while 40% of spend sits untagged, because one big untagged GPU fleet or data-transfer line outweighs hundreds of tagged micro-resources. Target 90%+ of spend with full tags. If you have only ever measured resource-count coverage, treat this as a fail until proven otherwise.

2. Mandatory tag policy is enforced at provisioning, not retroactively. Retroactive tagging is a treadmill: you tag, new resources arrive untagged, you tag again. Enforcement at creation (AWS SCPs / tag policies, Azure Policy, GCP org policies that block or auto-tag non-compliant resources) is the only thing that holds coverage steady. If your tagging is a monthly cleanup job rather than a guardrail, score it partial at best.

3. Shared-cost and untaggable-resource allocation is defined. Some spend cannot be tagged to a single owner: cross-AZ networking, data transfer, support charges, shared platform services, marketplace fees. You need an explicit, documented method for splitting these (even or usage-proportional) across teams. No method means these costs either land nowhere or land on whoever is loudest, and both are wrong.

4. Cost allocation reconciles to the provider invoice to the cent. Run your allocation model’s total against the actual invoice. They should match to the cent. If your dashboards say $612K and the bill says $689K, every per-team number you publish is a guess, and finance will eventually catch it. Reconciliation is the proof your allocation is real.

5. Showback or chargeback actually lands with owners. A perfect allocation that sits in a dashboard nobody opens changes no behavior. The check is whether per-team or per-product cost reaches the human who owns that budget, on a cadence, in a format they read. Showback (visibility) is the floor; chargeback (real budget impact) drives the strongest behavior change. If the numbers exist but never reach owners, this is a fail no matter how clean the data is.

For the deeper version of points 1 through 5, our cloud tagging strategy and cost allocation guide walks through building a taxonomy that survives contact with real engineering teams.

Commitments & rightsizing (points 6-11)

This is where the largest single dollar figures usually live. Two failure modes: paying on-demand rates for steady-state workloads (a commitments problem) and running resources bigger or longer than the work needs (a rightsizing problem).

6. Reserved Instance and Savings Plan coverage and utilization are above floor. Two separate numbers, both matter. Coverage is the share of eligible steady-state usage covered by commitments; aim high on genuinely stable workloads. Utilization is whether the commitments you bought are actually being used; this should sit near 100%. Low coverage means you are overpaying on-demand. Low utilization means you over-committed and are burning money on reservations nothing is consuming. Audit both.

7. No expired or under-utilized commitments are quietly burning money. Commitments expire and lapse back to on-demand silently. Convertible RIs drift away from your actual instance mix. Check for expired commitments that should have been renewed and active commitments running below ~80% utilization. This is pure leak: money already spent, value not captured.

8. Rightsizing recommendations are reviewed and actioned, not just generated. Every cloud generates rightsizing recommendations. Almost nobody acts on them. The check is not “do we have recommendations” (you do, automatically) but “is there a person and a process that reviews and implements them on a cadence.” Generated-and-ignored is a fail.

9. Idle and orphaned resources are swept on a schedule. The classic zombies: unattached EBS/managed disks, idle load balancers with no healthy targets, stopped instances still billing for storage, old snapshots, unused elastic IPs, abandoned NAT gateways. Individually small, collectively a steady drip. The check is whether a scheduled sweep finds and kills these, not whether someone did it once last year.

10. Storage tiering and lifecycle policies are in place. Hot storage for cold data is one of the most common silent overspends. S3 Intelligent-Tiering / lifecycle rules, Azure Blob tiers, GCS lifecycle policies moving aged data to infrequent-access and archive tiers. No lifecycle policy on a growing bucket means you pay premium rates forever on data nobody has read in months.

11. Dev/test environments are auto-stopped off-hours. Non-production environments rarely need to run nights and weekends. Scheduled stop/start (roughly 12 hours a day, 5 days a week instead of 24/7) cuts non-prod compute by around 65%. The check is whether off-hours scheduling exists and actually runs, not whether someone intends to set it up.

Common waste sourceTypical monthly leak (mid-market)Effort to fix
Under-covered steady-state compute$$$ highMedium
Under-utilized commitments$$ mediumLow
Idle / orphaned resources$$ mediumLow
No storage lifecycle policies$$ mediumLow
Non-prod running 24/7$$$ highLow

The low-effort, high-leak rows (idle sweep, non-prod scheduling, lifecycle policies) are where you start. They are fast and they are nearly pure savings.

Anomaly detection & budget alerts (points 12-15)

Allocation and commitments are about steady-state efficiency. This section is about catching the surprise: the runaway job, the misconfigured autoscaler, the forgotten experiment that quietly costs $9K before anyone notices at month-end.

12. Budget alerts exist for every major cost center. Not one account-wide budget, but alerts mapped to the cost centers that match how you allocate (per team, per product, per environment). A single top-level budget tells you the building is on fire long after the room has burned. Coverage means every meaningful spend bucket has its own threshold.

13. Alerts are validated and proven to actually fire. This is the point most organizations fail without knowing it. Existing is not firing. Validate by forcing the condition: lower a threshold below current spend, or inject a synthetic spike in a test account, and confirm a human who can act actually receives it. The number of teams that discover their alerts route to a long-dead distribution list, or were silently disabled during a reorg, is genuinely surprising. An unvalidated alert is decorative.

14. Anomaly detection covers the whole estate, not just the top three services. It is easy to watch EC2, RDS, and S3 and feel covered. Anomalies love the long tail: a sudden data-transfer spike, a new service someone enabled, a region you do not usually use. The check is whether detection spans the full estate (and increasingly the AI/GPU surfaces below), not just the usual suspects.

15. Alert routing reaches an owner who can act, with a response SLA. An alert that fires into a channel nobody owns is noise. Each alert needs a named owner who has the authority and access to act, plus a defined response SLA (for example: P1 cost anomaly acknowledged within 2 hours). Routing plus ownership plus SLA is what turns detection into prevented spend.

For a structured way to pressure-test points 12 through 15, our budget alert validation service is built specifically around proving alerts fire before you need them to.

AI/GPU billing surfaces (points 16-20)

Here is what most 2024-era checklists skip entirely. AI and GPU spend behaves nothing like traditional cloud cost, and the native cost tools were not designed to audit it. The State of FinOps 2026 put AI billing near the top of the agenda for exactly this reason: it is the fastest-growing, least-governed line on the bill.

16. GPU instances have idle detection and auto-termination. A GPU node costs many multiples of a CPU node per hour and sits idle constantly: notebooks left open, training runs that finished hours ago, dev clusters nobody scaled down. The check is whether idle GPUs are detected and auto-terminated, not whether someone remembers to shut them off. This is frequently the single largest waste line in an AI-heavy estate.

17. LLM/API token spend is attributed to team, feature, and customer. Token spend on OpenAI, Anthropic, Bedrock, or self-hosted inference usually arrives as one undifferentiated number. You need attribution down to team, feature, and ideally customer, so you can see which product line or which customer is consuming the budget. Without it, AI cost is a black box you cannot optimize and cannot price into your own product.

18. Model-tier selection is reviewed for overspend on over-powered models. Using a top-tier frontier model for a task a smaller, cheaper model handles fine is the AI-era equivalent of running an oversized instance. The check is whether someone reviews which model tier each workload uses against what the task actually needs. Routing simple classification to a flagship reasoning model can be a 10x to 50x cost multiplier for no quality gain.

19. Prompt caching and context efficiency are enforced. Resending the same large system prompt or document context on every call, with no caching, multiplies token cost needlessly. Providers offer prompt caching that can cut input-token cost dramatically on repeated context. The check is whether caching and context trimming are enforced patterns, not optional ones developers remember sometimes.

20. AI budget alerts trigger on token velocity, not month-end totals. Traditional budget alerts fire on accumulated spend, which for AI is far too late: a runaway agent loop or a bad batch job can burn a month’s token budget in an afternoon. AI alerts need to watch token velocity (tokens per minute or per hour against a baseline) so a spike trips the alarm in real time, not in the month-end review. If your AI spend is governed by the same month-end budget you use for EC2, score this a fail.

Points 16 through 20 are the section we see fail most completely, because they are new and the tooling defaults do not cover them. Our AI and GPU cost governance work exists precisely for this gap.

From checklist to roadmap

Once all 20 are scored and priced, you have something better than a checklist: a ranked roadmap. Sort by dollars per month, cross-reference effort, and your first sprint is obvious. Typically it is the low-effort, high-leak items, idle GPU termination, non-prod scheduling, the idle-resource sweep, commitment utilization cleanup, that pay for the entire audit in the first month.

Then re-run the scores quarterly. The FinOps Defect Score you established on this first pass is the trend line that proves the program is working, the number you take to leadership instead of a vague “we’re getting better at cloud costs.”

A few patterns worth naming. Tagging almost always scores worse than teams expect, because they have been measuring resource-count coverage. Budget alerts almost always have at least one that does not fire. And the AI/GPU section is, for most organizations in 2026, close to greenfield. If your run looks like that, you are not behind, you are normal, and you now have the map.

Get the full checklist run for you

Want the scored worksheet version of all 20 points to run yourself? Book a FinOps QA Assessment and we will run all 20 points across your estate and hand you a quantified remediation roadmap, every gap scored, every gap priced, ranked by ROI, with your FinOps Defect Score baseline set so you can track progress from day one.

The first pass is the cheapest money you will find this quarter. The waste is already there. The only question is whether you go looking for it before or after the next bill lands.

Frequently Asked Questions

What is on a FinOps audit checklist?

A complete FinOps audit checklist covers five categories: tagging and cost allocation (spend-weighted tag coverage, mandatory tag enforcement, shared-cost allocation, invoice reconciliation, showback/chargeback delivery), commitments and rightsizing (Reserved Instance and Savings Plan coverage and utilization, expired commitments, rightsizing follow-through, idle and orphaned resources, storage tiering, off-hours scheduling), anomaly detection and budget alerts (budget coverage, alert validation, full-estate anomaly detection, alert routing), and AI/GPU billing surfaces (GPU idle termination, token spend attribution, model-tier review, prompt caching, token-velocity alerts). That is the 20-point checklist in this article.

How do you audit cloud costs step by step?

Define scope first: pick the billing window (a full month minimum), list every account and subscription, and confirm which cloud providers are in play. Then score each of the 20 checklist items pass, partial, or fail. For every gap, quantify the dollars per month it is costing you. Rank gaps by savings size against effort to fix. The output is not a to-do list, it is a prioritized remediation roadmap with a dollar figure attached to each line.

What is spend-weighted tag coverage?

Spend-weighted tag coverage measures the percentage of your cloud spend (in dollars) that carries a complete tag set, not the percentage of resources that are tagged. The distinction matters enormously: you can have 95% of resources tagged and still have 40% of spend untagged, because a handful of large untagged resources (a big RDS cluster, a GPU fleet, a data-transfer line) dwarf hundreds of tiny tagged ones. Always audit by dollars, never by resource count.

How do you validate budget alerts?

Do not assume an alert that exists will fire. Validate it by forcing the condition: temporarily lower a budget threshold below current spend, or inject a synthetic cost spike in a test account, and confirm the alert actually reaches a human who can act. Most organizations discover during this step that their alerts are misconfigured, routed to a dead inbox, or silently disabled. An unvalidated alert is the same as no alert.

What should an AWS cost audit cover?

An AWS cost audit covers the same five categories as any cloud audit, applied to AWS specifics: Cost Allocation Tags activated in the billing console, Reserved Instance and Savings Plan coverage and utilization in Cost Explorer, idle and orphaned resources (unattached EBS volumes, idle ELBs, stopped-but-billed instances, old snapshots), S3 lifecycle and storage-class tiering, AWS Budgets coverage and Cost Anomaly Detection, and increasingly the GPU and Bedrock/SageMaker token surfaces that legacy checklists miss.

Should I do a cloud cost audit myself or hire someone?

Run the first pass yourself with this checklist. It is the fastest way to understand the scope of the problem and find the obvious wins. Bring in an external audit when the quantified gaps are large relative to the cost of the engagement, typically when a thorough audit returns 5x to 20x its cost in identified annual savings, or when the work (cross-account tag remediation, commitment portfolio modeling, AI billing attribution) exceeds the bandwidth your team can spare. The checklist itself usually tells you which case you are in.

Get Your FinOps Defect Score

Book a free 30-minute cloud cost review. We will identify your top three FinOps gaps and give you a preliminary Defect Score - no pitch, no obligation.

Talk to an Expert