Langat Vincent

Senior Data Scientist | Machine Learning Engineer | AI Engineer
Nairobi, KE.

About

Highly accomplished Data Scientist and Machine Learning Engineer with over 5 years of progressive experience in developing and deploying scalable ML models for capital markets, specializing in anomaly & fraud detection and credit scoring. Proven expertise in leveraging Python, SQL, and advanced data visualization tools to transform complex datasets into actionable insights. Adept at optimizing decision-making, minimizing risks, and driving business growth across diverse sectors including fintech, healthcare, and agri-tech. Recognized for leadership in developing and deploying active-learning-based ML models and leading data science initiatives.

Work

Prospect 33
|

Data Scientist (R&D)

Summary

Responsible for researching, developing and deploying Semi-Supervised Active Learning based Anomaly Detection ML models to accurately detect and remediate anomalies in financial and operational datasets.

Highlights

Engineered and deployed an active-learning-based Isolation Forest model, integrating domain-specific client feedback to iteratively improve accuracy and reduce false positives.

Co-developed DIVA and LEAP, flagship fraud and anomaly detection platforms for capital markets, building scalable, adaptive models that significantly improved detection accuracy and reduced false positives.

Enhanced LEAP application's backend and frontend capabilities through optimized API design and system architecture (Python, Django, Django REST Framework), improving platform performance, scalability, and developer efficiency.

Designed and implemented an explainability framework (SHAP, LIME, gradient-based methods) within the LEAP platform, providing clear, localized insights into anomaly detection decisions, boosting stakeholder trust and ensuring regulatory alignment.

Collaborated with stakeholders to define and prioritize product requirements, ensuring alignment with business objectives and maximizing product impact through targeted feature delivery.

Prospect 33
|

Junior Data Scientist (Global Data Lab)

Summary

Researched and developed advanced anomaly detection techniques for financial and credit data, integrating explainability frameworks to enhance model transparency and interpretability.

Highlights

Conducted research and development of advanced anomaly detection techniques for financial and credit data, integrating explainability frameworks to enhance model transparency and interpretability.

Twino (Moneza Kenya)
|

Data Scientist

Summary

Responsible for developing and managing loan portfolio credit scoring models for both pay day loans (PDL) and buy now pay later (BNPL) products, analyzing trends and risks, generating actionable reports, implementing risk rules, and collaborating with teams to ensure data accuracy and minimize fraud.

Highlights

Developed and deployed robust credit scoring (XGBoost) and fraud detection (CatBoost) models leveraging TransUnion and SEON data, significantly enhancing loan approval accuracy and strengthening fraud prevention measures.

Designed and implemented an automated loan default dashboard, providing daily, weekly, and monthly tracking of default rates, enabling proactive risk monitoring and informing continuous improvements in credit risk models.

Automated payment and debt collection reporting for back-office teams using PostgreSQL, streamlining operations and improving collection outcomes while reducing processing time.

Leveraged in-depth SQL querying (DBeaver, PostgreSQL) for data extraction and analysis, integrating insights into Preset for automated dashboards to enhance reporting efficiency and enable real-time, data-driven decision-making.

Led and mentored a team of junior analysts, fostering skill development and accelerating the successful delivery of data-driven solutions in business intelligence, fraud detection, and credit risk analysis.

Collaborated cross-functionally to align technical solutions with business objectives, integrating fraud detection, credit risk, and compliance strategies for cohesive, high-impact outcomes.

Mananasi Fibre Ltd
|

Data Analyst and Reporting Specialist

Summary

Led data analysis and reporting efforts for two climate-focused pilot projects funded by the UK's SMEP and DEFRA programs: (1) fibre production from pineapple waste and (2) Accessing viability of carbon credits through biochar production from pineapple biomass. Responsibilities span technical, environmental, social, and financial domains to support data-driven decision-making and donor compliance.

Highlights

Led data analysis and reporting efforts for two climate-focused pilot projects (fibre production from pineapple waste; carbon credits from biochar production), supporting data-driven decision-making and donor compliance.

Authored comprehensive technical documentation using statistical methods and data visualization, ensuring adherence to scientific reporting standards and data accuracy.

Compiled qualitative and quantitative data to effectively demonstrate project impact, developing compelling case studies and success stories highlighting socio-economic benefits.

Ensured timely and accurate grant reporting in compliance with SMEP/DEFRA requirements, coordinating data gathering with project stakeholders.

Implemented robust data validation processes, upholding data integrity and accuracy across all reporting deliverables.

Conducted energy consumption analysis to identify efficiency opportunities and support sustainability-focused decision-making.

Identified and implemented process improvements, automating and streamlining data workflows, and integrating best practices in data analysis and storytelling.

The Cyprus Institute
|

Graduate Research Assistant - Data Science & Air Quality

Summary

Focused on developing machine learning algorithms to forecast air quality and improve the accuracy of data collected from low-cost air quality sensors as part of the Horizon Europe funded EMME-CARE project, enhancing air quality monitoring capabilities in Nicosia by establishing a network of low-cost sensors.

Highlights

Designed and implemented a comprehensive network of low-cost air quality sensors across Nicosia, enabling spatial analysis and accurate source apportionment of air pollution to improve understanding of pollution dynamics.

Developed and implemented automated data pipelines for real-time data collection, cleaning, and analysis, ensuring continuous monitoring of air quality and enabling data availability for public health decision-making.

Developed custom tools and dashboards for advanced real-time air quality data visualization, providing stakeholders with dynamic monitoring and crucial insights for public health responses.

Developed and deployed advanced ML calibration techniques for low-cost air quality sensors, significantly improving measurement accuracy and reliability of pollutant data in urban environments.

Performed spatial and temporal analyses of air pollution in Nicosia, identifying key hotspots and patterns to inform environmental policy-making and urban planning in high-risk areas.

Established data quality indicators aligned with international air quality monitoring standards (US EPA, EU Directive 2008/50/EC), ensuring regulatory compliance and contributing to global best practices.

Finlays Kenya
|

Research and Data Analyst

Summary

Responsible for analyzing sales, customer feedback, and production data to identify trends and optimize marketing strategies, while developing AI models to improve tea production efficiency and forecast sales. This enhanced inventory management and reduced production costs. Delivered actionable insights and reports to stakeholders, driving informed decision-making and business growth.

Highlights

Analyzed sales, customer feedback, and production data to identify trends and optimize marketing strategies.

Developed AI models to improve tea production efficiency and forecast sales, enhancing inventory management and reducing production costs.

Delivered actionable insights and reports to stakeholders, driving informed decision-making and business growth.

Education

The Cyprus Institute

MPhil.

Environmental Sciences

Courses

Thesis: Practicalities in Machine Learning Calibration of Measurements from Low Cost Gas Sensors

University of Western Cape | AIMS South Africa

MSc.

Mathematical Sciences (Data Science)

Courses

Thesis: Application of Machine Learning to High-Resolution Aerial Imagery for Vegetation Mapping

The University of Nairobi

BSc.

Mathematics (Statistics and Operations Research)

Grade: First Class Honours

Languages

English

Fluent

Certificates

Data Science Application in Financial Services

Issued By

Prospect 33 (Global Data Lab)

MLOps with Databricks

Issued By

LinkedIn Learning

Applied Data Science

Issued By

Prospect 33 (Global Data Lab)

Data Science Application in Anti-Financial Crime

Issued By

Prospect 33 (Global Data Lab)

Data & Databases

Issued By

Prospect 33 (Global Data Lab)

ETL & ELT

Issued By

Prospect 33 (Global Data Lab)

Skills

MLOps & Deployment

Kubernetes.

Other Tools

LaTeX.

Programming Languages

Python, SQL.

Programming Languages

R, Python Django.

Data Analysis & Tools

Jupyter Notebook, VS Code, Matplotlib, Seaborn, Plotly, Bokeh.

Data Analysis & Tools

DBeaver, PostgreSQL, Tableau, Preset.

Data Analysis & Tools

SageMath, Gretl.

Machine Learning & Predictive Analytics

RF (Random Forest), XGBoost, CatBoost, LGBM (LightGBM), PyOD models, Keras, PyTorch, TensorFlow, Scikit-learn.

Machine Learning & Predictive Analytics

ARIMA, LSTM, fbprophet, Time Series Forecasting.

Data Engineering & Cloud Computing

ETL Processes, Data Pipeline Automation, AWS, S3, EC2.

Data Engineering & Cloud Computing

Azure, GCP, Google BigQuery.

MLOps & Deployment

Project Management, Version Control (Git), CI/CD Workflows, Streamlit, Docker, MLflow, Databricks.

References

Mark Clement, Global Head of AI and R&D, Prospect 33

+44 791 717 2754; mark.clement@prospect33.com

Valtars Bondars, Head of Risk & Data Science, Twino (Moneza Kenya)

+37 120 290 947; valtars.bondars@twino.eu

Gilbert Langat, Senior Data Scientist, EZRA

+254 711 170 904; gilbemrlk@gmail.com