Hackernews posts about HumanEval
- BigCodeBench: The Next Generation of HumanEval (github.com)
- HumanEval is saturated: new coding LLM benchmark released (bigcode-bench.github.io)
- Running HumanEval Safely with Riza (riza.io)
- Show HN: I built the LLM Comparison Tool I wish existed (llm-stats.com)
- Show HN: European Swallow AI – Sonnet-quality coding at $2.60/M tokens (www.europeanswallowai.com)
- Show HN: Fine-Tuning Index of Open-Source LLMs vs. OpenAI (predibase.com)
- Show HN: Atlas: Independent Evals and Benchmarking for Generative AI Models (app.layerlens.ai)
- Beat GPT-4o at Python with 100 dumb LLaMAs (modal.com)