GPU-Accelerated CNN Training
Designed and trained a CUDA-enabled convolutional neural network with manual training loops, metric tracking, and Top-1/Top-5 evaluation to analyze model performance and convergence behavior.
PythonPyTorchCUDADeep Learning
Overview
I built this project to gain hands-on experience with GPU-accelerated deep learning and make training behavior easier to inspect. The focus was on writing a reliable manual training loop, tracking core metrics, and evaluating Top-1 and Top-5 accuracy for clearer model diagnostics. I also verified CUDA 12.x setup end to end so experiments were reproducible across runs.
Highlights
- Built CUDA-enabled preprocessing and DataLoader batching for consistent GPU throughput.
- Implemented manual train and eval loops with epoch-level loss and accuracy tracking.
- Added Top-1 and Top-5 evaluation to compare prediction quality beyond a single metric.
- Ran controlled optimization experiments to analyze convergence behavior.
Architecture
- Python pipeline loads data, applies transforms, and prepares batched tensors.
- PyTorch CNN model trains through explicit forward, backward, and optimizer steps.
- Device checks route execution to CUDA when available with CPU fallback support.
- Metric logging stores loss, Top-1, and Top-5 values per epoch for analysis.
Key Learnings
- Manual loops expose failure points that high-level trainers can hide.
- Top-1 plus Top-5 gives a better signal for model progress on harder classes.
- Optimization changes are easier to trust when metric tracking is standardized.
- Run-to-run reproducibility depends on capturing config and seed values every time.
Outcomes
- Compute stack: CUDA 12.x with PyTorch training pipeline
- Evaluation: Top-1 and Top-5 metrics tracked across epochs