Public Data Intelligence

Medicaid Analysis Session:
Public Claims to Fraud, Geo, and Revenue Intelligence

Buffaly staged a real public Medicaid claims dataset, built its own fraud-risk analysis pipeline, ran the scoring and enrichment workflow autonomously, and only then turned the results into provider-level and state-level visual intelligence.

Raw Source 10.325 GB

Working Set 284,294 Remote-Care Rows

Programs RPM + CCM + RTM + APCM

Execution Milestones

HHS_Audit_Trace_v2.4

1. Buffaly obtained the source Medicaid provider-spending file from HHS / CMS and staged it locally

What happened

Buffaly staged a real HHS / CMS Medicaid provider-spending dataset locally. The inspected source was about 10.325 GB and contained roughly 11,354 data rows after the header.

Why it matters

This establishes that the project began with a real HHS / CMS Medicaid spending dataset downloaded to disk rather than with a synthetic demo file.

2. Buffaly generated shell-script tooling to preprocess the raw file

What happened

Buffaly generated dedicated extraction and counting scripts to prepare the raw claims data for analysis.

Why it matters

These scripts show the project started with Buffaly creating repeatable data-prep tooling before the fraud analysis pipeline was run.

3. Buffaly filtered the raw claims file down to the remote-care CPT families we cared about

What happened

The CPT extraction step selected RPM, CCM, RTM, and APCM billing codes from the full Medicaid spending file and produced a focused remote-care analysis dataset.

Why it matters

This is the real segmentation phase of the project: reducing a very large public claims file to the remote-care billing programs relevant to fraud-risk and opportunity analysis.

4. Buffaly quantified the segmented dataset before deeper analysis

What happened

The extracted remote-care file is 13.2 MB and contains 284,294 rows. Program counts in that file are CCM 203,075, RPM 79,595, and RTM 1,624.

Why it matters

This is the checkpoint where Buffaly validated that the filtered claims population was large enough and well-structured enough to support downstream fraud modeling.

5. Buffaly narrowed the remote-care claims to a 2024+ working subset

What happened

A second filtering step narrowed the remote-care claims to a 2024+ working subset used for the downstream fraud and market analysis.

Why it matters

That reduced the analysis to a current-timeframe slice suitable for fraud-risk exploration, market sizing, and demo-ready visualization.

6. Buffaly measured unique NPI coverage across programs

What happened

The NPI counting workflow showed unique billing-provider counts of 2,682 CCM, 1,223 RPM, and 59 RTM in the 2024+ subset.

Why it matters

This quantified the provider universe before enrichment and helped establish how broad each program's footprint was.

7. Buffaly used an existing RPM fraud-risk pipeline as the first analysis engine

What happened

Once the remote-care subset was prepared, Buffaly applied the existing RPM fraud-risk pipeline to identify anomalous provider behavior in RPM billing patterns.

Why it matters

This let the project stand on a proven fraud-analysis foundation rather than inventing the full approach from scratch.

8. Buffaly grounded that RPM fraud logic in OIG-style statistical signals

What happened

The RPM fraud pipeline was framed around statistical risk signals informed by the 2024 OIG report on RPM fraud risk, translating public fraud patterns into provider-level scoring logic.

Why it matters

This made the RPM stage feel defensible and research-backed instead of merely heuristic.

9. Buffaly used an NPI lookup skill to enrich the RPM results

What happened

After fraud scoring, Buffaly enriched provider rows through NPI lookup so each suspicious or noteworthy record could be tied back to real provider identity data.

Why it matters

This is where abstract risk rows became intelligible provider entities that could be reviewed, grouped, and visualized.

10. Buffaly used a Google Maps skill to geolocate the enriched providers

What happened

A second enrichment phase geolocated providers using Google Maps and follow-up passes, converting provider identity data into place-aware data.

Why it matters

This turned the fraud analysis into a geography-aware story, which is critical for map-based demos and regional pattern analysis.

11. Buffaly moved the RPM analysis into visualization-ready outputs

What happened

With identity and geography attached, the RPM work could be rendered as provider maps, suspicious provider views, and reimbursement-oriented exploration artifacts rather than raw JSON alone.

Why it matters

This is the stage where the work became decision-ready: the system could show where providers were, how they clustered, and why they stood out.

12. Buffaly then generated a new CCM fraud-risk pipeline from the RPM foundation

What happened

After RPM, Buffaly created a CCM-specific fraud pipeline by extending the same general approach and deriving new CCM-oriented fraud signals from the earlier RPM analysis patterns.

Why it matters

This was the major expansion of the project: the system moved from a single-program fraud model into a multi-program fraud-analysis platform.

13. Buffaly carried the same enrichment and visualization pattern into CCM

What happened

The CCM pipeline followed the same high-level progression of provider scoring, NPI enrichment, geolocation, and visualization-ready output generation.

Why it matters

That consistency meant RPM and CCM could be compared side by side using the same conceptual demo flow.

14. Buffaly layered revenue-opportunity analysis on top of the fraud work

What happened

The project evolved beyond fraud-risk alone into reimbursement opportunity modeling, producing provider-level and summary outputs that highlighted missed or under-realized remote-care revenue opportunities.

Why it matters

This widened the story from pure compliance risk into business intelligence and growth opportunity.

15. Buffaly built state-level reimbursement visualizations across programs

What happened

State reimbursement viewer payloads were produced so the work could be explored geographically not just by provider, but by market and state-level reimbursement variation.

Why it matters

This gave the project a strategic planning layer on top of the provider-level fraud and opportunity views.

16. Supporting analysis documents captured the reasoning and formulas

What happened

Companion analysis writeups captured the reasoning, formulas, and reimbursement logic behind the outputs.

Why it matters

The project was not just coded; it was explained, documented, and prepared for communication.

17. Public storytelling followed the technical build

What happened

The FairPath article on HHS Medicaid remote-care billing and related fraud framing indicate the analysis matured into a market-facing story after the data engineering and modeling work was complete.

Why it matters

This completed the chain from public data acquisition to analysis, enrichment, visualization, and public communication.

18. The end state was a Buffaly-orchestrated healthcare intelligence case study

What happened

By the end of the reconstructed flow, Buffaly could demonstrate source ingestion, claims filtering, CPT segmentation, fraud scoring, provider enrichment, geolocation, revenue modeling, and state-level visualization across RPM and CCM.

Why it matters

That end-to-end orchestration story is the actual product showcase: Buffaly turns public healthcare data into usable intelligence through a coordinated workflow.

What Buffaly had to do

Build the analysis pipeline before it could show the result

This case study is valuable because Buffaly had to assemble the workflow itself: stage public claims data, isolate the right billing programs, score provider risk, enrich identities, geolocate results, and then render strategy-ready outputs.

Pipeline assembled

Claims staging + preprocessing
RPM / CCM / RTM / APCM segmentation
RPM and CCM fraud-signal generation

Decision-ready outputs

NPI-linked provider risk views
Geolocated provider maps
Revenue and state-level opportunity visuals

Representative Buffaly skills used

A few of the concrete capabilities Buffaly had to use to make this session work end to end.

Data pipeline

Source acquisition and local staging

Brought the public HHS / CMS claims file local and generated preprocessing scripts before analysis began.

Claims shaping

Remote-care CPT filtering and program counting

Reduced the raw file to RPM, CCM, RTM, and APCM claims and verified the working population before deeper analysis.

Fraud analysis

RPM / CCM fraud-signal generation

Applied the RPM fraud pipeline, grounded it in OIG-style statistical signals, and expanded the pattern into CCM.

Enrichment + visualization

NPI identity lookup, geolocation, and output generation

Turned raw provider rows into named, mapped, visualization-ready intelligence with opportunity and state-level views.

Why this matters

The important point is not that the source file was large. Because Buffaly uses native tools and objects, it can manipulate large real-world datasets through code paths instead of trying to push the whole problem through a token window. That makes it practical to build and run novel analysis workflows over multiple days.

What this project proves

Native tools and objects matter more than file size

The 10+ GB source is not the core story. Because Buffaly works through native tools and objects, it can manipulate large datasets through code rather than treating the whole file like text to be paid for token by token.

Buffaly built the fraud-risk pipeline before the visualization

The visualization was the end of the workflow, not the beginning. Buffaly first built and ran its own fraud-risk analysis pipeline, then enriched the provider identities and geography, and only then rendered the outputs into something visual.

Complex orchestration can unfold over multiple days

This work progressed from staging to filtering to scoring to enrichment to opportunity modeling over several days. That matters because useful real-world workflows are often too broad to finish in a single burst.

Novel workflows can still become useful outputs

Buffaly turned a public claims source into provider-level fraud intelligence, revenue opportunity views, and state-level strategy outputs. The value is in orchestrating a new workflow end to end until it becomes operationally useful.

Medicaid Analysis Session: Public Claims to Fraud, Geo, and Revenue Intelligence

Execution Milestones

1. Buffaly obtained the source Medicaid provider-spending file from HHS / CMS and staged it locally

2. Buffaly generated shell-script tooling to preprocess the raw file

3. Buffaly filtered the raw claims file down to the remote-care CPT families we cared about

4. Buffaly quantified the segmented dataset before deeper analysis

5. Buffaly narrowed the remote-care claims to a 2024+ working subset

6. Buffaly measured unique NPI coverage across programs

7. Buffaly used an existing RPM fraud-risk pipeline as the first analysis engine

8. Buffaly grounded that RPM fraud logic in OIG-style statistical signals

9. Buffaly used an NPI lookup skill to enrich the RPM results

10. Buffaly used a Google Maps skill to geolocate the enriched providers

11. Buffaly moved the RPM analysis into visualization-ready outputs

12. Buffaly then generated a new CCM fraud-risk pipeline from the RPM foundation

13. Buffaly carried the same enrichment and visualization pattern into CCM

14. Buffaly layered revenue-opportunity analysis on top of the fraud work

15. Buffaly built state-level reimbursement visualizations across programs

16. Supporting analysis documents captured the reasoning and formulas

17. Public storytelling followed the technical build

18. The end state was a Buffaly-orchestrated healthcare intelligence case study

Medicaid Analysis Session:
Public Claims to Fraud, Geo, and Revenue Intelligence