2026.5.31
Bangalore Traffic Demand Prediction
Project Whitepaper — Plain-Language Edition
| Version | 1.0 |
| Best public score | 91.38 (submission_ensemble.csv) |
| Metric | R² — reported as score = 100 × R² |
| Audience | Teammates, reviewers, and anyone new to ML |
Table of contents
- Executive summary
- The problem in everyday terms
- The data
- How success is measured
- Our mental model
- Feature engineering
- Models we used
- Validation strategy
- Experiment journey
- Final solution architecture
- Project file map
- Glossary
- Limitations and outlook
- Elevator pitch
1. Executive summary
This project predicts how busy each road cell will be at a given time — similar to forecasting ride or delivery demand across a city grid.
We are given two days of historical traffic data and must predict demand for the next day at times we have never seen labels for. The best honest solution combines:
- Smart data preparation — location, time, and “what happened recently”
- Two complementary machine-learning models blended together
- Simple rules where patterns are obvious (e.g. daytime often mirrors the previous day)
We reached 91.38 out of 100 on the public leaderboard. Scores of exactly 100 from many teams are likely from overfitting the public test set; the final evaluation is expected to use hidden data and a submitted Python script.
2. The problem in everyday terms
Imagine Flipkart needs to know:
“At 10:30 AM, in this small map square, how many deliveries or rides will we need?”
Each row in the dataset answers:
At this location (
geohash), on this day, at this time — how high was demand?
- Demand is a number between 0 and 1 (normalized “busyness”, not a raw count).
- Location is a geohash — a short code for a map cell.
- Time is in 15-minute steps (
0:0,0:15,0:30, …).
The twist that makes this hard
| What we have in training | What we must predict in test |
|---|---|
| Day 48: full 24 hours (~69k rows) | |
| Day 49: only 00:00–02:00 (~7.8k rows) | Day 49: 02:15–13:45 (~42k rows) |
We must predict daytime on day 49 using mostly day 48 plus a tiny slice of day 49 night. We never see day-49 labels for 10 AM, 11 AM, etc. during training.
3. The data
3.1 Columns explained
| Column | Plain meaning |
|---|---|
geohash | Which map cell (location) |
day | 48 or 49 |
timestamp | Time of day (15-min slots) |
demand | Target — how busy it was (0–1). Missing in test. |
RoadType | Highway, Street, Residential, etc. |
NumberofLanes | Road width |
LargeVehicles | Allowed or not |
Landmarks | Landmark nearby (Yes/No) |
Temperature | Recorded temperature |
Weather | Sunny, Rainy, Foggy, Snowy |
3.2 Scale
| Split | Rows | Unique locations |
|---|---|---|
| Train | 77,299 | ~1,249 |
| Test | 41,778 | ~1,190 |
3.3 Data flow (high level)
4. How success is measured
Metric: R² (R-squared)
Think of it as: “How much of the true pattern did we capture?”
| Score | Meaning |
|---|---|
| 100 | Perfect predictions |
| 91.38 | Our best honest result — very strong |
| 60 | Poor — we got this when we used a wrong rule (scaling everything up) |
R² punishes systematic mistakes heavily. If you predict everything 60% too high, the score collapses — even if the general shape looks right.
5. Our mental model
We treated this as three different sub-problems, not one uniform rule:
Why this split matters
Changing only hour 2 moved the score from 90.96 → 79.55 (−11 points). Hour 2 is small in row count but huge in score impact because night demand is high and spiky.
6. Feature engineering
Feature engineering = creating new columns that help the model. We did not feed raw timestamps alone.
6.1 Feature categories
6.2 Lag features (memory of the past)
| Feature | Plain meaning |
|---|---|
| d48_same_slot | Demand at this cell at this time on day 48 |
| lag_1, lag_2, lag_3 | Demand 15, 30, 45 minutes earlier same day at same cell |
| roll_mean_3 | Average of last 3 slots |
| roll_std_3 | Variability of last 3 slots |
| d49_last_known | Last known day-49 demand for that cell (from night training data) |
6.3 Key discovery — 15-minute persistence
Demand at consecutive 15-minute slots is ~95% correlated. Test starts at 2:15; training labels for day 49 end at 2:00. So “use the 2:00 value” is a powerful rule for hour 2.
6.4 Cyclical time encoding
Raw hour treats 23:45 and 0:00 as far apart. Sin/cos encoding tells the model they are neighbors:
hour_sin = sin(2π × hour / 24)
hour_cos = cos(2π × hour / 24)
Same idea for 15-minute slots across the day.
6.5 What did NOT help
| Idea | Result |
|---|---|
| Scale all predictions × 1.64 (night ratio) | Score 60.94 — destroyed accuracy |
| Temperature as main driver | Correlation ≈ 0.003 — essentially noise |
| 9-model mega-ensemble | Marginal gain over simpler 2-model blend |
7. Models we used
A model learns patterns from past data and predicts future values. We used gradient boosted decision trees — many small decision trees combined, ideal for spreadsheet-like data.
7.1 Model comparison (simple view)
Quadrant positions are illustrative scores relative to our experiments, not exact measurements.
7.2 CatBoost
| What | Tree model that handles categories (geohash, road type) natively |
| Why | Location and road type are categorical; CatBoost excels here |
| Output | submission_catboost.csv |
7.3 XGBoost + target encoding
| What | Tree model with numeric features including encoded location averages |
| Why | Makes different errors than CatBoost → good for blending |
| Target encoding | Replace geohash with “average demand for this area” (fit without leakage) |
| Output | submission_full.csv |
7.4 Ensemble (our best public score)
final_prediction = 0.6 × XGBoost + 0.4 × CatBoost
Score: 91.38 — when one model is slightly wrong, the other often compensates.
7.5 Why not neural networks / RNNs?
| Reason | Explanation |
|---|---|
| Data shape | Tabular rows, not long sequences like video or text |
| Lags work | Hand-built 15-min memory captured most time signal |
| Data size | ~77k rows — trees + good features beat deep learning complexity |
| Interpretability | Easier to debug rules (hour 2 vs daytime) |
8. Validation strategy
Bad approach: Randomly shuffle all rows → 80% train / 20% test.
Problem: Model sees future and past mixed together. Local scores look too good and mislead you.
Good approach (what we built):
| Validation type | Used in | Trust level |
|---|---|---|
| Random KFold | Early train_final.py | ⚠️ Misleading for time data |
| Temporal night CV | validate.py, train_temporal.py | ✅ Matches real test structure |
| Public leaderboard | Submissions | ⚠️ Some teams may overfit public answers |
On honest temporal validation, our improved pipeline reached ~87 R² on held-out night data — a realistic signal.
9. Experiment journey
9.1 Score timeline
9.2 Full submission log
| File | Strategy | Score | Status |
|---|---|---|---|
submission_ensemble.csv | 60% XGB + 40% CatBoost | 91.38 | ✅ Best |
submission_hybrid.csv | Day-48 copy + hour-2 ML blend | 90.96 | ✅ |
probe_C_global_ratio.csv | Scale all by night ratio | 60.94 | ✅ |
probe_F_model_hard.csv | Night-calibrated model on hard rows | 80.09 | ✅ |
probe_I_pure_copy.csv | Pure day-48 copy everywhere | 79.55 | ✅ |
9.3 Confirmed truths
- Daytime (hours 3–13) ≈ same demand level as day 48 at same location and time (~89% exact match).
- Hour 2 must use day-49 night data — not plain day-48 copy.
- Demand is smooth over 15 minutes — recent history beats yesterday’s shape at night.
- Public 100s likely overfit; final ranking may use hidden test data.
10. Final solution architecture
Recommended scripts to run
# Main pipeline (temporal CV + lags + CatBoost + XGB + ensemble)
python scripts/train/train_temporal.py
# CV only — check honest score without submitting
python scripts/train/train_temporal.py --cv-only
# Re-blend with proven 91.38 weights
python scripts/blend/blend_ensemble.py --w-xgb 0.6 --w-cat 0.411. Project file map
| File | Purpose |
|---|---|
data/train.csv, test.csv | Raw data |
src/features.py | Original feature pipeline |
src/temporal_features.py | Lags, rolling stats, cyclical time |
validate.py | Honest time-based validation |
scripts/train/train_temporal.py | Recommended main training pipeline |
train_hybrid.py | Formula + ML hybrid (90.96) |
blend_ensemble.py | Blend two submission CSVs |
EXPERIMENT_LOG.md | Detailed score comparison |
notebooks/traffic_demand_analysis.ipynb | Optional EDA (not required for scoring) |
12. Glossary
| Term | Simple definition |
|---|---|
| Machine learning (ML) | Computer learns patterns from examples instead of hand-written rules |
| Model | The learned program that makes predictions |
| Feature | One input signal the model uses (e.g. “hour”, “lag_1”) |
| Training | Showing the model past data so it learns |
| Test / submission | Predictions for unseen rows, uploaded for scoring |
| R² | Accuracy metric; 100 = perfect |
| Geohash | Short code for a map grid cell |
| Lag | Value from an earlier time step |
| Target encoding | Replace a category with its average demand (carefully, to avoid cheating) |
| Ensemble | Combining multiple models’ predictions |
| Overfitting | Memorizing public answers instead of learning real rules |
| Cross-validation (CV) | Testing on held-out data during development |
| CatBoost / XGBoost | Popular tree-based ML libraries for tabular data |
13. Limitations and outlook
- We cannot fully validate daytime day-49 locally — no labels for those hours in training.
- ~9 points below public “100” may be unbridgeable without overfitting public
test.csv. - Final ranking may favor models that generalize on hidden data over public leaderboard 100s.
- Temperature and weather appear to be noise in this synthetic dataset.
14. Elevator pitch
We predict traffic demand on a city grid by combining where (geohash), when (time and recent 15-minute history), and what happened yesterday at the same place and time. Daytime is mostly stable and matches day 48; the tricky 2 AM hour needs fresh day-49 night data. Two tree-based models (CatBoost and XGBoost) learn these patterns and are blended for a 91.38 score. We validate honestly by simulating the real test — train on the past, predict unseen future times — so the solution is built to survive a hidden final test, not just chase a public leaderboard.