Hackernews posts about MMLU
- The "agentic spectrum" is a category error (marklubin.me)
- Fastest small LLM at 1 KB context is the slowest at 1 MB (blog.0xmmo.co)
- The AI Splurge Is Costing Big Tech Its Workforce (www.wsj.com)
- RmlUi – HTML/CSS User Interface Library Evolved (github.com)
- New medical LLM beats Med-PaLM-2, GPT-4 on MMLU benchmarks (huggingface.co)
- Show HN: Run MMLU benchmark on any LLM endpoint (mmlu.borgcloud.ai)
- Multilingual MMLU Dataset from OpenAI (OpenAI/Mmmlu) (huggingface.co)
- Show HN: BenchFlow – run AI benchmarks as an API (github.com)
- Show HN: Open-source study to measure end user satisfaction levels with LLMs (open-llm-initiative.com)
- Show HN: I built the LLM Comparison Tool I wish existed (llm-stats.com)
- Show HN: Flint – A 30B model fine-tuned for less repetition (springboards.ai)
- Show HN: Forecaster Arena – Testing LLMs on real events with prediction markets (forecasterarena.com)
- Show HN: LLM Benchmarking Suite (github.com)
- Show HN: AIBenchy – Independent AI Leaderboard (aibenchy.com)
- Show HN: MarginDash – See which AI customers are profitable (margindash.com)
- Show HN: MarginDash – See which AI customers are profitable (margindash.com)