Hackernews posts about MMLU
- Malus – Clean Room as a Service (malus.sh)
- "Malus": Is Copyleft Dead? (heathermeeker.com)
- "Malus": Is Copyleft Dead? (heathermeeker.com)
- "Malus": Is Copyleft Dead? (heathermeeker.com)
- The changing goalposts of AGI and timelines (mlumiste.com)
- I Saw Something New in San Francisco (www.nytimes.com)
- I Saw Something New in San Francisco (www.nytimes.com)
- I Saw Something New in San Francisco (www.nytimes.com)
- I Saw Something New in San Francisco (By Ezra Klein) (www.nytimes.com)
- Clean Room as a Service (malus.sh)
- Show HN: Shrouded, secure memory management in Rust (github.com)
- Payment Required and x402 and how to set it up (matija.eu)
- New medical LLM beats Med-PaLM-2, GPT-4 on MMLU benchmarks (huggingface.co)
- Show HN: Run MMLU benchmark on any LLM endpoint (mmlu.borgcloud.ai)
- Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu) (huggingface.co)
- MMLU-Pro: Advanced edition of MMLU & new Leaderboard (huggingface.co)
- Multitask Language Understanding (MMLU) on Helm (crfm.stanford.edu)
- OLMo 1.7–7B: A 24 point improvement on MMLU (blog.allenai.org)
- Show HN: BenchFlow – run AI benchmarks as an API (github.com)
- Show HN: Open-source study to measure end user satisfaction levels with LLMs (open-llm-initiative.com)
- Show HN: I built the LLM Comparison Tool I wish existed (llm-stats.com)
- Show HN: Forecaster Arena – Testing LLMs on real events with prediction markets (forecasterarena.com)
- Show HN: LLM Benchmarking Suite (github.com)
- Show HN: AIBenchy – Independent AI Leaderboard (aibenchy.com)
- Show HN: MarginDash – See which AI customers are profitable (margindash.com)
- Show HN: MarginDash – See which AI customers are profitable (margindash.com)