Hackernews posts about Qwerky-72B
- Attention is NOT all you need: Qwerky-72B trained using only 8 AMD MI300X GPUs (substack.recursal.ai)
- Qwerky 72B – A 72B LLM without transformer attention (substack.recursal.ai)
- Qwerky: Attention is not what you need? RWKV mashed into QwQ models (substack.recursal.ai)
- Training large attention free models (substack.recursal.ai)