About
Highly accomplished Data Scientist and Machine Learning Engineer with over 5 years of progressive experience in developing and deploying scalable ML models for capital markets, specializing in anomaly & fraud detection and credit scoring. Proven expertise in leveraging Python, SQL, and advanced data visualization tools to transform complex datasets into actionable insights. Adept at optimizing decision-making, minimizing risks, and driving business growth across diverse sectors including fintech, healthcare, and agri-tech. Recognized for leadership in developing and deploying active-learning-based ML models and leading data science initiatives.
Work
Prospect 33
|Data Scientist (R&D)
→
Summary
Responsible for researching, developing and deploying Semi-Supervised Active Learning based Anomaly Detection ML models to accurately detect and remediate anomalies in financial and operational datasets.
Highlights
Engineered and deployed an active-learning-based Isolation Forest model, integrating domain-specific client feedback to iteratively improve accuracy and reduce false positives.
Co-developed DIVA and LEAP, flagship fraud and anomaly detection platforms for capital markets, building scalable, adaptive models that significantly improved detection accuracy and reduced false positives.
Enhanced LEAP application's backend and frontend capabilities through optimized API design and system architecture (Python, Django, Django REST Framework), improving platform performance, scalability, and developer efficiency.
Designed and implemented an explainability framework (SHAP, LIME, gradient-based methods) within the LEAP platform, providing clear, localized insights into anomaly detection decisions, boosting stakeholder trust and ensuring regulatory alignment.
Collaborated with stakeholders to define and prioritize product requirements, ensuring alignment with business objectives and maximizing product impact through targeted feature delivery.
Prospect 33
|Junior Data Scientist (Global Data Lab)
→
Summary
Researched and developed advanced anomaly detection techniques for financial and credit data, integrating explainability frameworks to enhance model transparency and interpretability.
Highlights
Conducted research and development of advanced anomaly detection techniques for financial and credit data, integrating explainability frameworks to enhance model transparency and interpretability.
Twino (Moneza Kenya)
|Data Scientist
→
Summary
Responsible for developing and managing loan portfolio credit scoring models for both pay day loans (PDL) and buy now pay later (BNPL) products, analyzing trends and risks, generating actionable reports, implementing risk rules, and collaborating with teams to ensure data accuracy and minimize fraud.
Highlights
Developed and deployed robust credit scoring (XGBoost) and fraud detection (CatBoost) models leveraging TransUnion and SEON data, significantly enhancing loan approval accuracy and strengthening fraud prevention measures.
Designed and implemented an automated loan default dashboard, providing daily, weekly, and monthly tracking of default rates, enabling proactive risk monitoring and informing continuous improvements in credit risk models.
Automated payment and debt collection reporting for back-office teams using PostgreSQL, streamlining operations and improving collection outcomes while reducing processing time.
Leveraged in-depth SQL querying (DBeaver, PostgreSQL) for data extraction and analysis, integrating insights into Preset for automated dashboards to enhance reporting efficiency and enable real-time, data-driven decision-making.
Led and mentored a team of junior analysts, fostering skill development and accelerating the successful delivery of data-driven solutions in business intelligence, fraud detection, and credit risk analysis.
Collaborated cross-functionally to align technical solutions with business objectives, integrating fraud detection, credit risk, and compliance strategies for cohesive, high-impact outcomes.
Mananasi Fibre Ltd
|Data Analyst and Reporting Specialist
→
Summary
Led data analysis and reporting efforts for two climate-focused pilot projects funded by the UK's SMEP and DEFRA programs: (1) fibre production from pineapple waste and (2) Accessing viability of carbon credits through biochar production from pineapple biomass. Responsibilities span technical, environmental, social, and financial domains to support data-driven decision-making and donor compliance.
Highlights
Led data analysis and reporting efforts for two climate-focused pilot projects (fibre production from pineapple waste; carbon credits from biochar production), supporting data-driven decision-making and donor compliance.
Authored comprehensive technical documentation using statistical methods and data visualization, ensuring adherence to scientific reporting standards and data accuracy.
Compiled qualitative and quantitative data to effectively demonstrate project impact, developing compelling case studies and success stories highlighting socio-economic benefits.
Ensured timely and accurate grant reporting in compliance with SMEP/DEFRA requirements, coordinating data gathering with project stakeholders.
Implemented robust data validation processes, upholding data integrity and accuracy across all reporting deliverables.
Conducted energy consumption analysis to identify efficiency opportunities and support sustainability-focused decision-making.
Identified and implemented process improvements, automating and streamlining data workflows, and integrating best practices in data analysis and storytelling.
The Cyprus Institute
|Graduate Research Assistant - Data Science & Air Quality
→
Summary
Focused on developing machine learning algorithms to forecast air quality and improve the accuracy of data collected from low-cost air quality sensors as part of the Horizon Europe funded EMME-CARE project, enhancing air quality monitoring capabilities in Nicosia by establishing a network of low-cost sensors.
Highlights
Designed and implemented a comprehensive network of low-cost air quality sensors across Nicosia, enabling spatial analysis and accurate source apportionment of air pollution to improve understanding of pollution dynamics.
Developed and implemented automated data pipelines for real-time data collection, cleaning, and analysis, ensuring continuous monitoring of air quality and enabling data availability for public health decision-making.
Developed custom tools and dashboards for advanced real-time air quality data visualization, providing stakeholders with dynamic monitoring and crucial insights for public health responses.
Developed and deployed advanced ML calibration techniques for low-cost air quality sensors, significantly improving measurement accuracy and reliability of pollutant data in urban environments.
Performed spatial and temporal analyses of air pollution in Nicosia, identifying key hotspots and patterns to inform environmental policy-making and urban planning in high-risk areas.
Established data quality indicators aligned with international air quality monitoring standards (US EPA, EU Directive 2008/50/EC), ensuring regulatory compliance and contributing to global best practices.
Finlays Kenya
|Research and Data Analyst
→
Summary
Responsible for analyzing sales, customer feedback, and production data to identify trends and optimize marketing strategies, while developing AI models to improve tea production efficiency and forecast sales. This enhanced inventory management and reduced production costs. Delivered actionable insights and reports to stakeholders, driving informed decision-making and business growth.
Highlights
Analyzed sales, customer feedback, and production data to identify trends and optimize marketing strategies.
Developed AI models to improve tea production efficiency and forecast sales, enhancing inventory management and reducing production costs.
Delivered actionable insights and reports to stakeholders, driving informed decision-making and business growth.
Education
The Cyprus Institute
→
MPhil.
Environmental Sciences
Courses
Thesis: Practicalities in Machine Learning Calibration of Measurements from Low Cost Gas Sensors
University of Western Cape | AIMS South Africa
→
MSc.
Mathematical Sciences (Data Science)
Courses
Thesis: Application of Machine Learning to High-Resolution Aerial Imagery for Vegetation Mapping
The University of Nairobi
→
BSc.
Mathematics (Statistics and Operations Research)
Grade: First Class Honours
Languages
English
Fluent
Certificates
Data Science Application in Financial Services
Issued By
Prospect 33 (Global Data Lab)
MLOps with Databricks
Issued By
LinkedIn Learning
Applied Data Science
Issued By
Prospect 33 (Global Data Lab)
Data Science Application in Anti-Financial Crime
Issued By
Prospect 33 (Global Data Lab)
Data & Databases
Issued By
Prospect 33 (Global Data Lab)
ETL & ELT
Issued By
Prospect 33 (Global Data Lab)
Skills
MLOps & Deployment
Kubernetes.
Other Tools
LaTeX.
Programming Languages
Python, SQL.
Programming Languages
R, Python Django.
Data Analysis & Tools
Jupyter Notebook, VS Code, Matplotlib, Seaborn, Plotly, Bokeh.
Data Analysis & Tools
DBeaver, PostgreSQL, Tableau, Preset.
Data Analysis & Tools
SageMath, Gretl.
Machine Learning & Predictive Analytics
RF (Random Forest), XGBoost, CatBoost, LGBM (LightGBM), PyOD models, Keras, PyTorch, TensorFlow, Scikit-learn.
Machine Learning & Predictive Analytics
ARIMA, LSTM, fbprophet, Time Series Forecasting.
Data Engineering & Cloud Computing
ETL Processes, Data Pipeline Automation, AWS, S3, EC2.
Data Engineering & Cloud Computing
Azure, GCP, Google BigQuery.
MLOps & Deployment
Project Management, Version Control (Git), CI/CD Workflows, Streamlit, Docker, MLflow, Databricks.
References
Mark Clement, Global Head of AI and R&D, Prospect 33
+44 791 717 2754; mark.clement@prospect33.com
Valtars Bondars, Head of Risk & Data Science, Twino (Moneza Kenya)
+37 120 290 947; valtars.bondars@twino.eu
Gilbert Langat, Senior Data Scientist, EZRA
+254 711 170 904; gilbemrlk@gmail.com