2025
- 2025.06: LLM Tech Report
- 2025.03: RoPE: Rotational Position Embedding
- 2025.01: Optimizers: math, implementations and efficiency
2024
- 2024.12: Reproduce the inference time scaling exp
- 2024.12: Cross-entropy loss and its optimization [WIP]
- 2024.11: Graph Convolution ≈ Mixup
- 2024.10: Attention and its gradient
- 2024.10: Softmax and its triton implementation
