About Cato
- Python & PySpark: Extensive use of PySpark for building distributed data pipelines. Solid understanding of Spark’s lazy evaluation model and performance best practices (e.g., avoiding premature collect/write operations).
- Spark: Knowledgeable in Spark's core concepts and optimization strategies. While recent experience is Python-based.
- Big Data & Data Lakes: Hands-on experience with HDFS, Hive (structured queries), and managing silver/golden layer transformations in large-scale systems.
- Cloud Platforms: Worked with Google Cloud (BigQuery, Cloud Storage) and Azure (Data Factory, Postgres) for data integration and reporting pipelines.
- Infrastructure & CI/CD: Practical knowledge of GitLab CI/CD, Docker, containerized PostgreSQL, and Django applications. Applied version control strategies, branching policies, and regression testing workflows.
- Monitoring & Debugging: Experienced in pipeline reliability, error detection, and data quality checks (e.g., malformed data parsing, separator mismatch, and alerting via logs).
- Security: Follows the principle of least privilege for role-based access management. Advocates for cost-aware, secure deployments.
- ETL/ELT Pipelines: Design, implement, and monitor data pipelines from ingestion to delivery, with quality and performance in mind.
- Reliable Systems: Strong focus on scalable architecture, fault isolation, and production-ready code.
- Clear Communication: Transparent updates and collaborative approach across stakeholders and teams.
French
Fluent
Spanish
Native or bilingual
English
Fluent
Experience
- CatobyteFreelance Data Engineer & NLP enthusiastDIGITAL AND ITMarch 2025 - Today (1 year and 3 months)I help teams build practical data workflows using Python, Spark, and cloud tools. I have hands-on experience with batch data pipelines, file format conversion (CSV/Excel to Parquet/ORC), and cloud storage systems like AWS S3 and BigQuery. I've worked with both on-premise clusters and cloud environments to prepare data for analysis and reporting.My strength lies in simplifying complex problems and delivering clean, efficient solutions. I’m also diving into NLP and deep learning, currently exploring real-world applications using tools like HuggingFace and PyTorch.I'm looking for freelance projects—especially ones with an NLP or AI angle where I can contribute while continuing to learn.
- Sabbatical leaveIndependent projectsDIGITAL AND ITJanuary 2024 - March 2025 (1 year and 2 months)Design and Development of a Technology Blog: Created and managed a blog focused on the topics of data engineering, artificial intelligence, and software development. Wrote in-depth articles for a non-technical audience on advanced concepts in data engineering and AI python HTML / CSS / AWS (Route 53 & S3)
- Corum l'ÉpargneData EngineerJanuary 2023 - January 2024 (1 year)Paris, FranceDéveloppement et optimisation de scripts Azure Data Factory pour créer et maintenir des flux de données .J'ai travaillé avec SQL et des bases de données relationnelles pour la maintenance et le développement de l'entrepôt de données de l'entreprise, en me concentrant sur la vérification et la transformation des données financières. Azure Data Factory / Azure Data Factory / Python
Recommendations
Be the first to recommend Cato
Help this freelancer shine by sharing your experience working together.
These freelancer profiles also match your criteria
Agatha Frydrych
Backend Java Software Engineer
4.7
(3)
2
Baptiste Duhen
Fullstack developer
4.6
(4)
5
Amed Hamou
Senior Lead Developer
4
(2)
7
Audrey Champion
Web developer
4.3
(3)
4
Education
- MasterUniversité Paris-Saclay Télécom ParisTech2017Master
- M1 Ingénierie LogicielleUniversité de Rennes 12014M1 Ingénierie Logicielle