GPU-Accelerated CNN Training

Designed and trained a CUDA-enabled convolutional neural network with manual training loops, metric tracking, and Top-1/Top-5 evaluation to analyze model performance and convergence behavior.

PythonPyTorchCUDADeep Learning

Overview

I built this project to gain hands-on experience with GPU-accelerated deep learning and make training behavior easier to inspect. The focus was on writing a reliable manual training loop, tracking core metrics, and evaluating Top-1 and Top-5 accuracy for clearer model diagnostics. I also verified CUDA 12.x setup end to end so experiments were reproducible across runs.

Highlights

Built CUDA-enabled preprocessing and DataLoader batching for consistent GPU throughput.
Implemented manual train and eval loops with epoch-level loss and accuracy tracking.
Added Top-1 and Top-5 evaluation to compare prediction quality beyond a single metric.
Ran controlled optimization experiments to analyze convergence behavior.

Architecture

Python pipeline loads data, applies transforms, and prepares batched tensors.
PyTorch CNN model trains through explicit forward, backward, and optimizer steps.
Device checks route execution to CUDA when available with CPU fallback support.
Metric logging stores loss, Top-1, and Top-5 values per epoch for analysis.

Key Learnings

Manual loops expose failure points that high-level trainers can hide.
Top-1 plus Top-5 gives a better signal for model progress on harder classes.
Optimization changes are easier to trust when metric tracking is standardized.
Run-to-run reproducibility depends on capturing config and seed values every time.

Outcomes

Compute stack: CUDA 12.x with PyTorch training pipeline
Evaluation: Top-1 and Top-5 metrics tracked across epochs