Dharmesh
Kashyap

Data Engineer · ETL Pipelines · Analytics Engineering · Data Automation

Data Engineer specialising in ETL pipelines, large-scale data processing, and automation systems. I turn raw, messy, multi-source data into structured assets that drive real decisions.

Download Resume ✉ Get in Touch View Projects

About

I work across the full data lifecycle - extraction, transformation, validation, and delivery. My work has spanned geospatial datasets at national scale, financial and regulatory pipelines, and executive dashboards that go straight into boardrooms. I've designed databases handling millions of records, built ETL systems from scratch, and managed complete project lifecycles including architecture, deployment, and stakeholder coordination.

My approach is automation-first. If a process can be systematised, it will be web scraping pipelines that survive anti-bot rewrites, PDF parsers for documents that were never meant to be parsed, API integrations that run without intervention. I'm strong in Python, SQL, and building scalable systems where reliability is non-negotiable.

I focus on one outcome: raw data in, decision-ready assets out. Everything in between schema design, validation layers, data quality checks is engineered to make that outcome repeatable.

Precision through automation. Reliability by design.

Impact

Numbers that shipped.

Village boundary records delivered as GeoJSON

GOV Org

Automated government scheme parsing into structured PDFs

GOV Org

Mutual fund instruments on automated pipelines

GOV Org

Record database designed from schema up

Financial Services Institution

Tab executive dashboard for senior stakeholders

Financial Services Institution

Projects

Selected Work

// case_study

🗺

↗

GOV Org Pan-India Spatial Pipeline

End-to-end pipeline delivering 670,000+ village boundary records as structured GeoJSON. Handled inconsistent source data, boundary mismatches, and validation across multiple government data formats.

PythonGeoJSONPostgreSQLValidationETL

// case_study

⚙

↗

Lead Generation Automation Tool

AI-powered scraping and enrichment pipeline that eliminates manual prospecting. Entity resolution via Groq API, operator-facing Streamlit dashboard, one-click Excel export. Zero manual steps in the workflow.

PythonGroq APIStreamlitBright DataXPath

// case_study

⚡

↗

Data Processor & ETL Pipeline Tool

Flask-based web tool for structured data ingestion. Runs automated validation checks — column schema comparison, null detection, type integrity — before transforming raw uploads into clean CSV and pushing directly into PostgreSQL.

PythonFlaskPostgreSQLETLData Validation

// case_study

📈

↗

Executive Analytics Dashboard — Financial Services

8-tab Tableau dashboard built on 18L+ records modelled into PostgreSQL. Covers KPI reporting, trend analysis, and deep drill-downs across expense categories. Designed for senior stakeholders — from high-level summaries to transaction-level detail.

TableauPostgreSQLSQLKPI DesignData Modelling

// case_study

📊

↗

COVID-19 Global Mortality Analysis

Processed 6.48M+ death records across multi-year global dataset. Temporal trend analysis, regional breakdowns, and anomaly detection visualized through a structured Tableau dashboard.

TableauPythonPandasSQLData Cleaning

// case_study

🏠

↗

Airbnb Market Pricing Intelligence

Listing-level pricing pattern analysis across neighborhoods and property types. KPI overlays for occupancy signals and pricing efficiency. Built for actionable market intelligence, not just visualization.