
Business Problem: The reporting team relied on manual data extraction, transformation, and loading processes across multiple systems. These tasks were time‑consuming, error‑prone, and often caused delays in weekly and monthly reporting cycles. Role: System Engineer and Automation Lead responsible for designing, scripting, and deploying automated workflows that improved data reliability and reduced manual workload. Data Sources: • SQL Server (transactional data) • CSV/Excel files from shared drives • API endpoints for external data • Azure Blob Storage Tools & Technologies: Python, Bash, Azure Functions, Azure Storage, Cron Jobs, SQL, Logging & Monitoring Process: 1. Data Collection - Automated extraction from SQL Server using scheduled Python scripts. - Pulled external data via REST APIs with authentication tokens. - Ingested CSV/Excel files from shared drives into Azure Blob Storage. 2. Data Cleaning - Standardized inconsistent column names and data types. - Implemented validation rules for missing values and schema mismatches. - Logged anomalies for review by the analytics team. 3. Workflow Automation - Built a modular Python ETL pipeline with reusable functions. - Scheduled daily and hourly jobs using Cron and Azure Functions. - Added retry logic, error handling, and notification alerts. 4. Integration - Loaded cleaned data into SQL tables for Power BI consumption. - Synced processed files to Azure Storage for archival. - Integrated with existing reporting workflows without downtime. 5. Monitoring & Reliability - Implemented logging for each pipeline stage. - Added email/Teams alerts for failures or unusual patterns. - Created dashboards to track pipeline health and performance. Key Insights: • 70% of manual workload was caused by repetitive file handling and SQL exports. • Inconsistent file formats were the main source of reporting delays. • API data had the highest failure rate due to token expiration and rate limits. Business Impact: • Reduced manual data preparation time by 70%. • Improved data freshness from weekly to daily. • Eliminated recurring reporting errors caused by manual processes. • Increased reliability of downstream dashboards and analytics. • Freed up the analytics team to focus on insights instead of data wrangling.

