GPU-Accelerated CNN Training

Designed and trained a CUDA-enabled convolutional neural network with manual training loops, metric tracking, and Top-1/Top-5 evaluation to analyze model performance and convergence behavior.

PythonPyTorchCUDADeep Learning

Overview

I built this project to gain hands-on experience with GPU-accelerated deep learning and make training behavior easier to inspect. The focus was on writing a reliable manual training loop, tracking core metrics, and evaluating Top-1 and Top-5 accuracy for clearer model diagnostics. I also verified CUDA 12.x setup end to end so experiments were reproducible across runs.

Highlights

  • Built CUDA-enabled preprocessing and DataLoader batching for consistent GPU throughput.
  • Implemented manual train and eval loops with epoch-level loss and accuracy tracking.
  • Added Top-1 and Top-5 evaluation to compare prediction quality beyond a single metric.
  • Ran controlled optimization experiments to analyze convergence behavior.

Architecture

  • Python pipeline loads data, applies transforms, and prepares batched tensors.
  • PyTorch CNN model trains through explicit forward, backward, and optimizer steps.
  • Device checks route execution to CUDA when available with CPU fallback support.
  • Metric logging stores loss, Top-1, and Top-5 values per epoch for analysis.

Key Learnings

  • Manual loops expose failure points that high-level trainers can hide.
  • Top-1 plus Top-5 gives a better signal for model progress on harder classes.
  • Optimization changes are easier to trust when metric tracking is standardized.
  • Run-to-run reproducibility depends on capturing config and seed values every time.

Outcomes

  • Compute stack: CUDA 12.x with PyTorch training pipeline
  • Evaluation: Top-1 and Top-5 metrics tracked across epochs