Blog - Academic Homepage

LLM Tech Report
Jun 7, 2025

LLM Tech Report
- rl
- llms
RoPE: Rotational Position Embedding
Mar 17, 2025

A simple and effective positional embedding for transformer models
Optimizers: math, implementations and efficiency
Jan 21, 2025

From math to optimized code: implementing optimizers with PyTorch comparisons
- coding
Reproduce the inference time scaling exp
Dec 29, 2024

dive into the minimal experiment to show the inference time scaling.
Cross-entropy loss and its optimization [WIP]
Dec 11, 2024

dive into cross-entropy loss and its optimization.
Graph Convolution ≈ Mixup
Nov 19, 2024

Revealing the connection between graph convolution and mixup
Attention and its gradient
Oct 19, 2024

dive into attention and its gradient
- attention
- llms
Softmax and its triton implementation
Oct 18, 2024

implementing softmax using triton
- coding