2025

2025.06: LLM Tech Report
2025.03: RoPE: Rotational Position Embedding
2025.01: Optimizers: math, implementations and efficiency

2024

2024.12: Reproduce the inference time scaling exp
2024.12: Cross-entropy loss and its optimization [WIP]
2024.11: Graph Convolution ≈ Mixup
2024.10: Attention and its gradient
2024.10: Softmax and its triton implementation