Concerns and Caveats

Concerns, Caveats & What This Data Can't Tell Us

Good analysis requires intellectual honesty. This page documents the limitations of the data, the known gaps in our analysis, and the places where reasonable people might draw different conclusions. If you're using any findings from this dashboard to inform decisions, start here.

Provenance

No Results

Methodological Guardrails

No Results

Data Quality Notes


What This Data Doesn't Have

No dataset tells the whole story. Here's what's missing from this one:

No Patient Demographics

This dataset contains no information about age, race, gender, ethnicity, disability status, or any other demographic characteristic of beneficiaries. We cannot analyze disparities, identify underserved populations, or determine whether specific groups are being left behind. Given that Medicaid disproportionately serves communities of color, people with disabilities, and children, this is a significant blind spot.

Limited Geographic Resolution

The dataset includes provider NPIs but not their locations directly. We join against the CMS NPI Registry (NPPES) to map providers to states based on their registered practice address. This enables state-level analysis (see the Geographic Analysis page), but has limitations: provider location reflects their registered address, not necessarily where care is delivered (e.g., telehealth), and we cannot do sub-state analysis without additional ZIP-to-county mapping.

No Outcomes

We can see how much was spent and on what. We cannot see whether anyone got better. Claims data tells you that a prescription was filled — not whether the patient took the medication, whether it worked, or whether they're still alive. Spending is a poor proxy for health.

No Denial Data

This dataset shows claims that were paid. It does not show claims that were submitted and denied. If a provider requests authorization for a treatment and is refused, that event is invisible here. Denial rates are one of the most important access metrics — and we can't see them.

No Managed Care Capitation Payments

Medicaid is increasingly delivered through managed care organizations (MCOs) that receive capitated (per-member-per-month) payments. This dataset captures fee-for-service claims and some MCO encounter data, but it does not capture the full picture of managed care spending. States with heavy MCO penetration may appear to have lower spending in this data than they actually have.


Concentration Risks

Market concentration matters for access. When a small number of providers control a large share of spending in a service category, the exit of even one provider can create a crisis.

No Results
View SQL (`concentration`)
SELECT * FROM medicaid.concentration_by_category
Loading...

Citation: concentration (source medicaid.concentration_by_category).

The Top Provider Share column shows what percentage of total spending in each category flows through a single billing NPI. Even modest-looking percentages can represent billions of dollars and hundreds of thousands of beneficiaries. If that provider leaves the Medicaid program, those beneficiaries don't automatically find new care.


Known Limitations of Our Analysis

Beyond the data itself, our analytical choices introduce additional caveats:

HCPCS Category Mapping Is Approximate

We assign HCPCS procedure codes to categories (Mental Health, Surgery, Lab, etc.) using code range patterns. This is a reasonable heuristic but not a precise classification. Some codes fall at the boundary between categories, and the "Other" category is a catch-all for everything that doesn't match our patterns. Different analysts might draw these boundaries differently and get somewhat different results.

Beneficiary Counts Are Not Unique Individuals

When we say "beneficiaries" in this analysis, we mean the sum of TOTAL_UNIQUE_BENEFICIARIES across rows. A single person who sees three different providers in the same month appears three times — once in each provider's row. This means our beneficiary counts overstate the number of unique people. The overcount is especially significant in categories where patients see multiple providers (like mental health, where a patient might see a therapist, a psychiatrist, and a case manager).

Spending Means Paid Amounts, Not Charges or Costs

The "spending" figures throughout this dashboard represent amounts actually paid by Medicaid — not what providers charged (which is typically much higher) or what the care actually cost to deliver (which is different still). A provider might charge $200, receive $45 from Medicaid, and incur $60 in costs. We only see the $45.

Time Period Constraints

This data covers January 2018 through October 2024. We cannot observe trends from before 2018, and the most recent months are increasingly affected by claims lag. Year-over-year comparisons that include 2024 should be interpreted cautiously, as the 2024 data only covers 10 months.

COVID Distortions

The 2020-2021 period is heavily distorted by the pandemic. Utilization patterns during COVID were abnormal — deferred care, telehealth surges, the continuous enrollment provision, and emergency flexibilities all created artifacts in the data. Be cautious about drawing trend lines through this period.


A Note on Interpretation

Numbers don't speak for themselves. A rising spending trend could mean more people are getting needed care (good), or that prices are inflating without corresponding improvements (bad), or that sicker patients are entering the system (contextual). A declining trend could mean efficiency gains (good), or that people are losing access (bad), or that providers are leaving Medicaid (alarming).

Throughout this dashboard, we try to present the data clearly and note where multiple interpretations are plausible. We encourage readers to bring their own domain expertise and to treat these findings as starting points for investigation, not final answers.

Confidence Labels

We use lightweight confidence labels in narrative text:

  • Descriptive, High confidence: direct aggregate from claims data with limited interpretation
  • Descriptive, Medium confidence: direct aggregate with known attribution/mapping caveats
  • Inference, Medium confidence: interpretation of trends that could have multiple plausible causes
  • Inference, Low confidence: directional hypothesis needing external validation

Reproduce This Page

cd dashboard
export EVIDENCE_SOURCE__medicaid__token="<your_motherduck_token>"
export EVIDENCE_SOURCE__medicaid__database="medicaid"
npm run sources
npm run build
npm run preview
# then open http://localhost:3000/concerns