Blog
Notes on building reliable, large-scale data platforms: cloud migrations, pipelines, data quality, and the occasional war story.
You can't check nationality in a millisecond: the access control trap behind frontier AI's safety recalls
A frontier model shipped on a Friday and was pulled three days later. The headline was national security but underneath it is an access control engineering problem I keep seeing: a permission model more granular than the infrastructure can enforce.
Meta's real moat was never the benchmark. It's the data layer
Muse Spark took Meta from an 18 to a 52 on the intelligence index in nine months. But the number that matters isn't the benchmark, it's the quiet move from open weights to a proprietary model sitting on three billion users. A data engineer's read on where the advantage actually lives.
The 'SaaS-apocalypse' wasn't a crash, it was a re-pricing of what software is worth
When new Claude workflow tools landed, markets erased roughly $285 billion from software stocks in a single session. No outage, no breach, just fear that AI agents are becoming full stack replacements for entire workflows. A data engineer's take on why the future is fewer tools and more systems.
AI's centre of gravity is shifting: India and the rise of real world AI adoption
The India AI Impact Summit made one thing clear: the next phase of AI won't be defined only by model breakthroughs, but by where and how those models are deployed, governed, and scaled. A data engineer's view on why real-world constraints matter more than demos.
Why LangGraph feels familiar to a data engineer: state, orchestration, and failure paths
I picked up LangGraph half out of curiosity and half because it became a project requirement. What struck me is how much it rhymes with data engineering, state over stateless calls, orchestration over linear chains, and designing for the failure paths.
Databricks' new IDE quietly changes how data engineers build pipelines
Databricks introduced a dedicated IDE for data engineering, and it isn't just another UI update. It's a shift from experiment first notebooks to engineer first workflows: structured pipeline authoring, real lineage, and git native version control. Why that matters for anyone maintaining production pipelines.
Turning 25: notes on being a work in progress
When I was younger, 25 felt like peak adulthood — career sorted, life direction locked in. The reality is somewhere between 'I think I know what I'm doing' and complete confusion. A few honest reflections on growth, timelines, and trusting the process.
Databricks just made data governance feel like leverage, not compliance
Databricks' November release isn't only a feature drop, it reads like a direction shift for modern data platforms. External tables into Unity Catalog with lineage intact, cross cloud sharing with SAP, attribute based access control, and audit logs. Why this moves us from 'store and compute' to 'connect and understand.'
Snowflake × SAP and the rise of the 'AI-READY' data fabric
Snowflake and SAP announced a collaboration to build a shared, AI-ready business data fabric, zero copy sharing, semantic enrichment, and unified governance across clouds. Having migrated AT&T's warehouses from Teradata to Snowflake, the hardest problem was never performance. It was context.
Why great data engineers think like product managers
When I started out, I thought being a good data engineer was all about clean pipelines and perfect schemas. Over time I realised the best ones don't just move data, they move decisions. They think in outcomes, not objects.
Travel resets the cache
Just back from the Philippines: beaches, canyons, islands, and a lot of laughter. Somewhere between chasing sunsets I was reminded that the same principle applies to data systems and humans: clear the clutter, refresh your processes, reconnect to what actually matters.
Your pipelines can run fast, but if your data isn't trusted, nothing else matters
When we migrated AT&T's enterprise data warehouse from Teradata to Snowflake, everything looked great: tables moved, pipelines ran, queries got faster. Then the silent failures started. The real lesson of that migration wasn't about speed: it was about data trust.
What I've learned in 2 years as a data engineer (at 24)
When I started my career, I was convinced that mastering tools was everything: Python, SQL, Snowflake, Teradata, cloud platforms. After migrating 1000+ tables and 45B+ records for AT&T and shipping a production GenAI pipeline, I learned the thing that actually future-proofs a career in data.
GenAI won't replace data engineers, it'll empower us
There's been so much talk about AI taking over jobs. Here's what I've actually seen in my work: on a recent GenAI heavy project, the AI handled modular coding while our team focused on orchestrating and scaling the workflow. The future isn't AI vs humans: it's AI + humans.
How I drove billing data integrity incidents to ZERO with a 3-layer self auditing system
A look at the self auditing architecture I built on AT&T's billing platform at Amdocs, reconciling data across multiple cross team handoffs and catching problems before they ever reach finance.