Hackernews posts about Qwerky-72B

Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs (substack.recursal.ai)

20 points by jtatarchuk 8 months ago | 3 comments
Qwerky 72B – A 72B LLM without transformer attention (substack.recursal.ai)

4 points by pico_creator 8 months ago | discuss
Qwerky: Attention is not what you need? RWKV mashed into QwQ models (substack.recursal.ai)

3 points by vessenes 8 months ago | 1 comments
Training large attention free models (substack.recursal.ai)

1 points by smusamashah 8 months ago | discuss