Xiaotian Han

About
Blog
  • LLM Tech Report

    Jun 7, 2025

    LLM Tech Report

    • rl
    • llms
  • RoPE: Rotational Position Embedding

    Mar 17, 2025

    A simple and effective positional embedding for transformer models

    • transformer
    • positional-embedding
    • paper
  • Optimizers: math, implementations and efficiency

    Jan 21, 2025

    From math to optimized code: implementing optimizers with PyTorch comparisons

    • coding
  • Reproduce the inference time scaling exp

    Dec 29, 2024

    dive into the minimal experiment to show the inference time scaling.

    • paper
    • reproducibility
    • llm
    • scaling
  • Cross-entropy loss and its optimization [WIP]

    Dec 11, 2024

    dive into cross-entropy loss and its optimization.

    • coding
    • llm
    • optimization
  • Graph Convolution ≈ Mixup

    Nov 19, 2024

    Revealing the connection between graph convolution and mixup

    • graph-neural-networks
    • mixup
    • paper
  • Attention and its gradient

    Oct 19, 2024

    dive into attention and its gradient

    • attention
    • llms
  • Softmax and its triton implementation

    Oct 18, 2024

    implementing softmax using triton

    • coding