Xiaotian Han

About
Blog
  • 2025.06: LLM Tech Report

    LLM Tech Report

    • rl
    • llms
  • 2025.03: RoPE: Rotational Position Embedding

    A simple and effective positional embedding for transformer models

    • transformer
    • positional-embedding
    • paper
  • 2025.01: Optimizers: math, implementations and efficiency

    From math to optimized code: implementing optimizers with PyTorch comparisons

    • coding
  • 2024.12: Reproduce the inference time scaling exp

    dive into the minimal experiment to show the inference time scaling.

    • paper
    • reproducibility
    • llm
    • scaling
  • 2024.12: Cross-entropy loss and its optimization [WIP]

    dive into cross-entropy loss and its optimization.

    • coding
    • llm
    • optimization
  • 2024.11: Graph Convolution ≈ Mixup

    Revealing the connection between graph convolution and mixup

    • graph-neural-networks
    • mixup
    • paper
  • 2024.10: Attention and its gradient

    dive into attention and its gradient

    • attention
    • llms
  • 2024.10: Softmax and its triton implementation

    implementing softmax using triton

    • coding