Hackernews posts about MMLU
- Flux server and web-front end (github.com)
- Keeping Users Happy When LLMs Suck – Slides from Arize Observe (docs.google.com)
- New medical LLM beats Med-PaLM-2, GPT-4 on MMLU benchmarks (huggingface.co)
- RedPajama-Incite-7B-Instruct Outperforms LLaMA on MMLU (twitter.com)
- SmartGPT: Major Benchmark Broken – 89.0% on MMLU and Exam Errors [video] (www.youtube.com)
- Errors in the MMLU: The Deep Learning Benchmark Is Wrong Surprisingly Often (derenrich.medium.com)
- Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu) (huggingface.co)
- Yi-34B, 76.3 on MMLU, Apache 2.0 (huggingface.co)
- Errors in the MMLU: The Deep Learning Benchmark Is Wrong Surprisingly Often (derenrich.medium.com)
- MMLU Benchmark (Multi-Task Language Understanding) (paperswithcode.com)
- MMLU-Pro: Advanced edition of MMLU & new Leaderboard (huggingface.co)
- Gemini Benchmark – MMLU (compared with GPT-4-turbo, Mixtral) (hub.zenoml.com)
- Multitask Language Understanding (MMLU) on Helm (crfm.stanford.edu)
- OLMo 1.7–7B: A 24 point improvement on MMLU (blog.allenai.org)
- MMLU Benchmark Broken (www.youtube.com)
- Show HN: Open-source study to measure end user satisfaction levels with LLMs (open-llm-initiative.com)
- What's Going on with the Open LLM Leaderboard? (huggingface.co)
- Overview of All Major LLM Benchmarks (www.confident-ai.com)
- 789 KB Linux Without MMU on RISC-V (popovicu.com)
- Marshall McLuhan explains the future of ads and the internet in 1966 (paleofuture.com)
- MMU-less systems and FDPIC (maskray.me)
- My McLuhan Lecture on Enshittification (pluralistic.net)