August 26, 2025 · 3 min read

How I drove billing data integrity incidents to ZERO with a 3-layer self auditing system

Data EngineeringData QualityBilling SystemsPySpark

When you work on a billing platform that processes millions of dollars a week, a single silent data error isn't a bug, it's a number that finance reports, a customer who gets overcharged, or a reconciliation that takes a team three days to unwind. On AT&T's billing platform at Amdocs, my job as a Data Engineer was to make sure those errors never reach the people downstream. This is how I approached it.

The problem: data that changes hands so many times

A billing record on a system this size isn't produced in one place. It's assembled. It passes through many cross team handoffs, ingestion, enrichment, rating, aggregation, and reporting, each owned by a different team, each with its own assumptions about what "valid" looks like.

The failure mode is rarely a crash. It's drift. A row count that's off by a fraction of a percent. A currency rounding rule applied twice. A late arriving file that quietly shifts a daily total. By the time finance notices, the error is days old and buried under everything that came after it.

The approach: make the pipeline audit itself

Instead of bolting on checks at the end, I built reconciliation into the pipeline as three layers, so the data effectively audits itself at every stage.

Layer 1 — Boundary reconciliation

At every handoff between teams, the pipeline compares what it received against what the upstream system claims it sent: record counts, control totals, and key checksums. If the two don't agree, the handoff is flagged before a single downstream job runs on bad input.

Layer 2 — Invariant checks

Some things must always be true regardless of the data: totals must reconcile across aggregation levels, every billed amount must trace back to a rated event, and no record may appear in two mutually exclusive states. These invariants are encoded as automated checks that run continuously, not as a one off QA pass.

Layer 3 — Financial reconciliation

The final layer reconciles the engineering view of the data against the financial view, the numbers finance will actually report. This is the layer that turns "the pipeline ran successfully" into "the pipeline ran correctly."

The result

The system reconciles data across all cross team handoffs and catches problems before they reach finance. In practice, that took data integrity incidents down to essentially none, the difference between firefighting after the fact and never lighting the fire in the first place.

What I'd tell another data engineer

  • Reconciliation is a feature, not a chore. If your checks live outside the pipeline, they'll always run too late. Build them in.
  • Counts and control totals catch more than schema validation. Most real world data damage is silent and numeric, not structural.
  • Design for the handoff, not the happy path. On a system with multiple owners, the boundaries between teams are where trust breaks down, so that's where the verification has to live.

The unglamorous part of data engineering, the migrations, the pipelines, the checks, is the part that keeps the numbers honest. That's the work I care about most.


I'm Yash Agarwal, a Data Engineer II at Amdocs in Pune, India. I write about building reliable, large scale data platforms. You can find more of my work on my portfolio or connect with me on LinkedIn.

← All articles