Hackernews posts about Benchmarks
- Meta got caught gaming AI benchmarks (www.theverge.com)
- Show HN: LocalScore – Local LLM Benchmark (www.localscore.ai)
- LLM Benchmark for 'Longform Creative Writing' (eqbench.com)
- Medical Benchmarks and the Myth of the Universal Patient (www.newyorker.com)
- DeepSeek-V3-0324 Crushes GPT-4.5 in Math and Code Benchmarks at 1/277 the Cost (api-docs.deepseek.com)
- LocalScore: A Local LLM Benchmark (www.localscore.ai)
- NPB-Rust: NAS Parallel Benchmarks in Rust (arxiv.org)
- BrowseComp: A Benchmark for Browsing Agents (openai.com)
- Meta's benchmarks for its new AI models are a bit misleading (techcrunch.com)
- LiveBench: A Challenging, Contamination-Free LLM Benchmark (livebench.ai)
- Meta cheats on Llama 4 benchmark (www.heise.de)
- RTX 5090 Mobile: First LLM Benchmarks Are In (www.hardware-corner.net)
- Benchmark Comparison of Rust Logging Libraries (github.com)
- InternVL3: Open-Source Model Outperforms GPT-4o in Multimodal Benchmarks (www.implicator.ai)