/work4ai/Train your own R1 reasoning model with Unsloth

generated at 2/17/2025, 5:47:18 PM
Train your own R1 reasoning model with Unsloth
https://unsloth.ai/blog/r1-reasoning公式ブログ
GRPOが家庭用gpuで可能になった…
https://note.com/npaka/n/nd99a395b404f日本語ブログ記事
https://x.com/danielhanchen/status/1887564724071768529遂にVRAM16GBで14Bクラスのモデルの学習(QLora)が可能、まじか
Google Colabでも学習可能
VRAM48GBあれば70Bクラスも学習可能とか
最小要件はVRAM7GB、モデルサイズは1.5Bから確認されている

Q:どんくらいすごいの？
A:https://x.com/gclue_akira/status/1887760201669136825 8xH100が必要だった学習に、無料のGoogle ColabやローカルのRTX 4060ti 16GBとかで出来るようになった(しかもそれなりに現実的な時間で)

#unsloth
#GRPO

多分これを使ってる
Mistral-Small-3-Reasoner-s1