Projects - Syed Taha

Vectorized CNN Inference on RISC-V

We implemented a complete CNN in RISC-V assembly, using the Vector Extension (RVV) to make it feasible on edge hardware. Switching to im2col + GEMM and FP16 compression got us a 7.3x speedup in total inference time and a 12.2x reduction in convolution instructions. There's also a range-reduction algorithm for Softmax that cut iterations by 99.9%. A research paper is currently under review.

Othello match progression showing AI winning as black - board positions from left to right

Othello AI with MCTS and CNNs

We combined Monte Carlo Tree Search with a CNN for board evaluation and move prediction. Most of the work went into trimming the model down — got it from 85MB to 42.66MB (50% smaller) while making it 10x faster per move (1000ms → 100ms) and keeping a >90% win rate against the larger baselines.

LLM performance analysis showing throughput scaling, speedup factor, and time distribution across thread counts

LLM Inference in xv6

The question was simple enough: could you run a modern LLM inference engine on a minimal educational OS? Turns out you can, but it takes some work. We built a POSIX-compliant Shared Memory subsystem for the xv6 kernel to cache model weights (~100MB) without redundant transfers, then implemented custom threading primitives from scratch and parallelized the inference engine. End result: 16.2 tokens/sec, up from ~7.

Along the way we also wrote a cycle-accurate profiling library to track call hierarchies — which is how we found that matrix multiplication was eating 87–92% of inference time. The whole process is documented in a book covering the methodology and implementation.

Kaggle: Safe Driver Prediction — 1st Place

I entered a Kaggle competition on imbalanced tabular data and ended up finishing 1st (Private Leaderboard AUROC: 0.64671). The approach was a stacking ensemble of XGBoost, LightGBM, and CatBoost with a logistic regression meta-learner, combined with a preprocessing pass to handle the extensive missing data and drop non-predictive features.

2D Physics & Orbital Mechanics Engine

I built a modular 2D physics engine from scratch in C++ — rigid body dynamics, velocity-based movement, AABB collision detection. Then extended it into an orbital mechanics simulation for gravitational N-body interactions using numerical integration. Built a real-time visualization layer with SFML so I could actually watch it run and catch bugs visually.

GitHub LinkedIn Resume

Things I've Built

Vectorized CNN Inference on RISC-V

Othello AI with MCTS and CNNs

LLM Inference in xv6

Kaggle: Safe Driver Prediction — 1st Place

2D Physics & Orbital Mechanics Engine