Hackernews posts about Ben

I wasted weeks hand optimizing assembly because I benchmarked on random data (www.vidarholen.net)

391 points by thunderbong 30 days ago | 187 comments
Big agriculture mislead the public about the benefits of biofuels (lithub.com)

253 points by littlexsparkee 23 days ago | 247 comments
Benchmark Framework Desktop Mainboard and 4-node cluster (github.com)

203 points by geerlingguy 12 days ago | 88 comments
Qodo CLI agent scores 71.2% on SWE-bench Verified (www.qodo.ai)

139 points by bobismyuncle 8 days ago | 54 comments
Show HN: Terminal-Bench-RL: Training long-horizon terminal agents with RL (github.com)

125 points by Danau5tin 22 days ago | 12 comments
Benchmarking GPT-5 on 400 real-world code reviews (www.qodo.ai)

72 points by marsh_mellow 12 days ago | 80 comments
Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics

72 points by grace77 7 days ago | 24 comments
The benefits of trunk-based development (thinkinglabs.io)

51 points by gpi 28 days ago | 73 comments
Show HN: Evaluating LLMs on creative writing via reader usage, not benchmarks (www.narrator.sh)

36 points by Jetwu 5 days ago | 12 comments
GPT-5 doubles performance in offensive security benchmark (xbow.com)

26 points by summarity 3 days ago | 3 comments
We benchmarked Cyberpunk 2077 on Mac M1 to M4 – the numbers don't lie (www.tomsguide.com)

25 points by high_na_euv 30 days ago | 32 comments
Final Benchmarks of Clear Linux on Intel: ~48% Faster Than Ubuntu Out-of-the-Box (www.phoronix.com)

23 points by mfiguiere 25 days ago | 6 comments
Benchmarking MicroPython (blog.miguelgrinberg.com)

22 points by ibobev 19 days ago | 13 comments
Benchmarks in CI: Escaping the Cloud Chaos (codspeed.io)

21 points by adriencaccia 19 days ago | 6 comments
Qwen3 235B beats Claude on some code benchmarks (huggingface.co)

21 points by willahmad 29 days ago | 2 comments
VectorDB bench now support S3Vector (github.com)

19 points by redskyluan 27 days ago | 5 comments
Problems in LLM Benchmarking and Evaluation (www.xent.tech)

14 points by acegod 4 days ago | 4 comments
Show HN: Predict GPT-5 skills with a community AI benchmark

13 points by andrewxhill 17 days ago | 2 comments
Show HN: A benchmark + latency sim for LLM db queries: ClickHouse / Postgres (github.com)

12 points by oatsandsugar 14 days ago | 3 comments
'It's a Mess': A Brain-Bending Trip to Quantum Theory's 100th Birthday Party (www.quantamagazine.org)

11 points by nsoonhui 10 days ago | discuss
The Untold Revolution beneath iOS 26? WebGPU is shipping at last (brandlens.io)

11 points by edgeuser 21 days ago | discuss
Small Objects, Big Gains: Benchmarking Tigris Against AWS S3 and Cloudflare R2 (www.tigrisdata.com)

10 points by nethunters about 6 hours ago | 5 comments
AI Startup Caught Cheating on Benchmark Papers (twitter.com)

10 points by elasxies 7 days ago | 1 comments
Framework Desktop Hands On: First Impressions (Benchmarks, Gaming, AI Models) (boilingsteam.com)

10 points by ekianjo 12 days ago | 1 comments
The Brokk Power Ranking LLM Coding Benchmark (brokk.ai)

10 points by jbellis 7 days ago | discuss
TaxCalcBench: A benchmark for evaluating AI's ability to calculate tax returns (www.columntax.com)

10 points by sundaypancakes 27 days ago | discuss
Any Benefits of Buying Apple Products from Costco? (www.slashgear.com)

9 points by Bluestein 29 days ago | 7 comments
Benchmarking GPT-5 (www.coderabbit.ai)

9 points by aravindputrevu 11 days ago | 1 comments
Measuring Thinking Efficiency in Reasoning Models: The Missing Benchmark (nousresearch.com)

9 points by AntiRush 5 days ago | discuss
OpenAI's GPT-OSS models benchmarks worse than DeepSeek R1 and Qwen3 235B (xcancel.com)

8 points by pu_pe 13 days ago | 1 comments