Bonifasius Sinurat

Featured Projects

Customer Churn Prediction

Built an end-to-end churn prediction system for 4,656 e-commerce customers using class-weighted XGBoost with train-only CV thresholding. Achieved F2=0.9677, AUC-PR=0.9948, Recall=0.9789 on a 1,126-customer holdout (TN=921, FP=15, FN=4, TP=186). Explained drivers with SHAP and deployed a Streamlit app plus an ROI-based retention playbook ($10,540 net impact simulation).

Repository Live App

NLP Sentiment Analysis - Amazon Reviews

Built a sentiment triage model to classify negative (1–2) vs positive (4–5) Amazon reviews using TF-IDF (uni+bi-gram) + Multinomial Naive Bayes. Achieved 0.86 accuracy and F1_neg=0.86 with threshold tuning for operational trade-offs. At thr_pos=0.55, flagged ~1,100 reviews and estimated ~6 hours saved per 2,000 reviews (20 sec/review), surfacing key complaint themes (taste, packaging, delivery).

Repository Article

California Housing Price Prediction

Built a housing price regression model on 14,448 districts using CatBoost, achieving MAE $27,533, MAPE 14.97%, and RMSLE 0.2113 (MAE improved by $3,223 vs Random Forest). Translated predictions into negotiation bands and an ROI decision rule, and deployed a Streamlit app for what-if pricing scenarios.

Repository Live App

Customer Segmentation with LRFM Analysis

Segmented 5,869 customers from 541,910 transactions (UK retail, 2009–2011, £17.6M revenue) using LRFM features and K-Means (K=3; silhouette ~0.41). Quantified revenue concentration: top 1% customers generated 32.03% of revenue and the high-value cluster captured 85.96%, informing VIP retention and reactivation strategies.

Repository Article

TransJakarta Passenger Flow Analysis

Analyzed 37,900 TransJakarta trips (April 2023) to identify peak congestion windows and high-risk corridors. Found significantly higher travel times during 06:00–08:00 and 17:00–19:00 (Mann–Whitney U, p < 0.05) and surfaced longest routes averaging 81–84 minutes. Built a Tableau dashboard for route/time drilldowns to support fleet and headway optimization.

Repository Dashboard

Student Management System

Developed a Python CLI system with validated CRUD, role-based access, and data-quality safeguards (input validation, soft delete, error handling). Added basic analytics (graduation rate, subject difficulty) and indexed search to improve usability and reporting.

Repository

Technical Skills

Python

Advanced proficiency in Python for data science, featuring expertise in Pandas, NumPy, Scikit-learn, and building scalable machine learning pipelines.