- AMAZONSR. DATA CONSULTANTFebruary 2024 - Today (1 year and 3 months)Poland1 . Worked extensively with Python and SQL for data manipulation, analysis, and automation. 2. Led multiple data migration projects across various tech stacks, ensuring smooth data transfer and integration. 3. Developed and managed AWS infrastructure using CloudFormation, implementing Infrastructure as Code best practices. 3. Gained a solid understanding of Apache Spark, including its architecture and distributed data processing capabilities. 4. Deployed Apache Airflow on a scalable Kubernetes cluster (e.g., AWS EKS) with dynamic resource scaling. 5. Used the official Airflow Helm chart to deploy Scheduler, Webserver, Workers, and Flower. 6. Built custom Docker images for Airflow workers and schedulers with required dependencies and custom DAGs. 7. Configured persistent volumes for logs, DAGs, and metadata using AWS S3 or EFS. 8. Integrated monitoring tools like Prometheus and Grafana, and centralized logging with the ELK stack. 9. Built tests and documented data models using dbt's built-in testing framework, ensuring data quality and lineage visibility across the organization. 10. Orchestrated data pipelines using Apache Airflow, creating and maintaining complex Directed Acyclic Graphs (DAGs) to automate ETL processes, ensuring reliable data processing. 11 . Integrated dbt with data warehouses such as Redshift to streamline the data transformation process and improve overall system performance. 12. Implemented access controls and permissions to protect sensitive data and comply with regulatory requirements. 13. Enhanced data processing capabilities, improving performance and reducing time-to-insight. 14. Managed and maintained databases including SQL Server / Oracle / MySQL / PostgreSQL, ensuring high availability, performance, and reliability of database systems. 15. Performed database installations, upgrades, and patching to maintain system security and compliance with organizational standards.
- LGIMSR. DATA ENGINEERApril 2022 - January 2024 (1 year and 10 months)United Kingdom1 . Created scripts in Scala and Spark to read CSV, JSON, and Parquet files from Azure Data Lake Storage. 2. Loaded data into Azure Databricks for processing and transformed data stored in Azure Data Lake Storage. 3. Utilized Azure Synapse Analytics for analytics and data warehousing. 4. Extensive experience in Design Verification Testing (DVT), including planning, execution, and analysis of tests to ensure hardware and software product quality. 5. Identified and resolved issues, optimized test processes, and delivered reliable, compliant products. 6. Hands-on experience setting up workflows using Apache Airflow for managing and scheduling Spark jobs. 7. Developed new Spark jobs and optimized existing jobs on Azure Databricks. 8. Tuned Azure Databricks cluster according to job requirements. 9. Developed routines and data extracts using Azure Data Factory. 10. Designed and implemented ETL Mappings, Mapplets, Workflows, and Worklets using Azure Data Factory and Azure Integration Services . 11 . Created curated layers containing transformed data from different countries and sources using Azure Synapse Analytics. 12. Designed highly analytical and process-oriented pipelines on Azure. 13. Refactored existing code with new techniques and optimized cluster configurations on Azure. 14. Simplified processes by removing unnecessary steps and ensuring a clean and efficient workflow. 15. Implemented Azure Resource Manager (ARM) templates for deploying Azure infrastructure. 16. Implemented data pipelines using Azure Kubernetes Service (AKS). 17. Created audit tables for monitoring purposes using Azure Monitor and Azure Log Analytics. 18. Main languages used include SQL, Scala, and Bash. 19. Developed various ETL pipelines for streaming and batch processing jobs on Azure. 20. Created Spark ETLs to ingest data from various operational systems, establishing a unified dimensional or star schema data model for analytics and reporting.
- IBM BALTICSSR. DATA ENGINEERNovember 2021 - March 2022 (4 months)Riga, Latvia1 . Developed scripts to read CSV, JSON, and Parquet files from Azure Data Lake Storage using Python,Scala and Spark. 2. Loaded data into Azure Data Lake Storage and Microsoft Azure Synapse Analytics. 3. Set up workflows using Azure Data Factory for managing and scheduling Spark jobs. 4. Utilized Azure Event Hubs for ingesting high-volume streaming data from multiple sources. 5. Implemented event-driven architecture for efficient data ingestion. 6. Integrated Apache Spark (PySpark,Scala) within Azure Databricks for distributed and scalable data processing. 7. Leveraged Python programming for developing and optimizing data processing pipelines, resulting in a 20% increase in efficiency. 8. Monitored and optimized Airflow tasks and dbt runs to reduce processing time and ensure system reliability, leveraging logging, alerting, and monitoring features. 9. Implemented and maintained data solutions on Microsoft Azure, utilizing services such as Azure Synapse Analytics, Azure Data Factory, and Azure Blob Storage for seamless data integration and analysis. 10. Collaborated with cross-functional teams to implement DevOps best practices, reducing deployment times by 30% and ensuring a more streamlined development lifecycle. 11 . Implemented horizontal scaling to accommodate increasing data volume. 12. Developed and maintained data models for analytics and reporting using Azure Synapse Analytics, enabling fast and efficient querying on large datasets.
- BS SOFTWAREInternational Islamic University2016BS SOFTWARE
- HDP Certified DeveloperHDP Certified Developer
- Neo4J Certified ProfessionalNeo4J Certified Professional
- Spark Level-1 IBMSpark Level-1 IBM
- Spark Level-2 IBMSpark Level-2 IBM