Xiaotian Han

coding

Optimizers: math, implementations and efficiency

LLM Tech Report Notes (updated on 01/22/2025)

Cross-entropy loss and its optimization [WIP]

Attention and its gradient

Softmax and its triton implementation

paper

Reproduce the inference time scaling exp

Graph Convolution ≈ Mixup

LLM

[Research Preview] Speculative Thinking: Large Models Mentoring Small Models for Efficient Reasoning

[Research Preview] Thinking Preference Optimization