Hackernews posts about Benchmarks
- New benchmark shows top LLMs struggle in real mental health care (swordhealth.com)
- FrontierScience Benchmark by OpenAI (openai.com)
- OpenGameEval: Eval Framework to Benchmark Agentic AI Assistants (corp.roblox.com)
- Why Windows XP is the ultimate AI benchmark (cuabench.ai)
- SWE-Bench: The $500B Benchmark (marginlab.ai)
- MI300X vs. H100 vs. H200 Benchmark Part 1: Training (newsletter.semianalysis.com)
- How to Benchmark C++ Code (codspeed.io)
- GitHub Actions CPU performance benchmarks (runs-on.com)
- LLM Benchmark by Databricks – OfficeQA (www.databricks.com)
- China's first real gaming GPU is here, benchmarks are brutal (www.howtogeek.com)
- Updated LLM Benchmark (Gemini 3 Flash) (entropicthoughts.com)
- GPT 5.2 on the Counter-Strike Benchmark (www.instantdb.com)
- Databricks Introduces OfficeQA Benchmark for Agents (www.databricks.com)
- 1TB of Parquet Files. Single Node Benchmark. (DuckDB Style) (dataengineeringcentral.substack.com)
- Gemini 3 Pro vs. GPT-5.1 Codex-Max vs. Claude Opus 4.5: AI Coding Benchmark (www.hansreinl.de)