- coding 2025-01-22 » Optimizers: math, implementations and efficiency
- paper 2024-12-30 » Reproduce the inference time scaling exp
- LLM 2025-01-24 » [Research Preview] Thinking Preference Optimization
2025-01-21 » LLM Tech Report Notes (updated on 01/22/2025)
2024-12-12 » Cross-entropy loss and its optimization [WIP]
2024-10-20 » Attention and its gradient
2024-10-19 » Softmax and its triton implementation
2024-11-20 » Graph Convolution ≈ Mixup