DATA ENGINEER
Available for opportunities

Dharmesh
Kashyap

Data Engineer · ETL Pipelines · Analytics Engineering · Data Automation

Data Engineer specialising in ETL pipelines, large-scale data processing, and automation systems. I turn raw, messy, multi-source data into structured assets that drive real decisions.

I work across the full data lifecycle - extraction, transformation, validation, and delivery. My work has spanned geospatial datasets at national scale, financial and regulatory pipelines, and executive dashboards that go straight into boardrooms. I've designed databases handling millions of records, built ETL systems from scratch, and managed complete project lifecycles including architecture, deployment, and stakeholder coordination.

My approach is automation-first. If a process can be systematised, it will be web scraping pipelines that survive anti-bot rewrites, PDF parsers for documents that were never meant to be parsed, API integrations that run without intervention. I'm strong in Python, SQL, and building scalable systems where reliability is non-negotiable.

I focus on one outcome: raw data in, decision-ready assets out. Everything in between schema design, validation layers, data quality checks is engineered to make that outcome repeatable.

Precision through automation. Reliability by design.

Numbers that shipped.

0
Village boundary records delivered as GeoJSON
GOV Org
0
Automated government scheme parsing into structured PDFs
GOV Org
0
Mutual fund instruments on automated pipelines
GOV Org
0
Record database designed from schema up
Financial Services Institution
0
Tab executive dashboard for senior stakeholders
Financial Services Institution

Selected Work

// case_study
🗺
GOV Org Pan-India Spatial Pipeline
End-to-end pipeline delivering 670,000+ village boundary records as structured GeoJSON. Handled inconsistent source data, boundary mismatches, and validation across multiple government data formats.
PythonGeoJSONPostgreSQLValidationETL
// case_study
Lead Generation Automation Tool
AI-powered scraping and enrichment pipeline that eliminates manual prospecting. Entity resolution via Groq API, operator-facing Streamlit dashboard, one-click Excel export. Zero manual steps in the workflow.
PythonGroq APIStreamlitBright DataXPath
// case_study
Data Processor & ETL Pipeline Tool
Flask-based web tool for structured data ingestion. Runs automated validation checks — column schema comparison, null detection, type integrity — before transforming raw uploads into clean CSV and pushing directly into PostgreSQL.
PythonFlaskPostgreSQLETLData Validation
// case_study
📈
Executive Analytics Dashboard — Financial Services
8-tab Tableau dashboard built on 18L+ records modelled into PostgreSQL. Covers KPI reporting, trend analysis, and deep drill-downs across expense categories. Designed for senior stakeholders — from high-level summaries to transaction-level detail.
TableauPostgreSQLSQLKPI DesignData Modelling
// case_study
📊
COVID-19 Global Mortality Analysis
Processed 6.48M+ death records across multi-year global dataset. Temporal trend analysis, regional breakdowns, and anomaly detection visualized through a structured Tableau dashboard.
TableauPythonPandasSQLData Cleaning
// case_study
🏠
Airbnb Market Pricing Intelligence
Listing-level pricing pattern analysis across neighborhoods and property types. KPI overlays for occupancy signals and pricing efficiency. Built for actionable market intelligence, not just visualization.
TableauPythonEDAData VisualizationKPI Design

Technical Skills

Programming Languages
Python SQL Java C++
Data Engineering
ETL Pipelines Data Validation Web Scraping API Integration Data Processing Automation
Databases
PostgreSQL MySQL Schema Design Query Optimization
Analytics & Tooling
Tableau Streamlit Jupyter Bright Data KPI Reporting EDA Dashboard Design