-
LLM Tech Report
-
A simple and effective positional embedding for transformer models
-
From math to optimized code: implementing optimizers with PyTorch comparisons
-
dive into the minimal experiment to show the inference time scaling.
-
dive into cross-entropy loss and its optimization.
-
Revealing the connection between graph convolution and mixup
-
dive into attention and its gradient
-
implementing softmax using triton