- coding 2025-01-21 » LLM Tech Report Notes (updated on 01/21/2025)
- paper 2024-12-30 » Reproduce the inference time scaling exp
2024-12-12 » Cross-entropy loss and its optimization [WIP]
2024-10-20 » Attention and its gradient
2024-10-19 » Softmax and its triton implementation
2024-11-20 » Graph Convolution ≈ Mixup