H3
>Attention is all you need... but how much of it do you need?
>Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao
>// Podcast `#2: Hungry Hungry Hippos (H3) //
>Stanford researchers just released a new architecture that:
>- Beats Transformers at ~1B param scale
>- Admits *much* longer context than Transformers
>Is H3 the Transformer-killer? More below!
>Hungry Hungry Hippos, aka "H3", functions like a linear RNN, or a long convolution.
>The key idea: due to the fast Fourier transform, an H3 layer:
>- can be computed in n*log(n) time, with n the context length
>- unlike Transformers, which require n^2!
>
transformerの計算量はO(n^2)だったのがO(nlog_2n)まで減らせる
>つまり、1000トークン入力した時に、Transformerだと100万オーダーまで計算量が増えてしまうところが、H3ならたったの1万オーダーで済む。メチャメチャ計算量が減る。ChatGPTは4千トークンしか入力できないけど、H3ベースになれば数万、数十万トークン入力可能になるかもしれない うみゆき@AI研究
Transformerが革新的な技術だったようにこれも基盤となる超重要研究になるのかもしれない

というかなれ