Hackernews posts about HumanEval
- Beating GPT-4 on HumanEval with a fine-tuned CodeLlama-34B (www.phind.com)
- 50% on HumanEval with just 1.3B model (twitter.com)
- InstructCodeT5: 16B model beats every model in HumanEval (twitter.com)
- WizardCoder-34B-Python surpasses GPT-4 on HumanEval (twitter.com)
- Fine-tuned CodeLlama beats GPT-4 on HumanEval (huggingface.co)
- BigCodeBench: The Next Generation of HumanEval (github.com)
- Code Generation on HumanEval Leaderboard (paperswithcode.com)
- WizardCoder 34B surpasses GPT4 on HumanEval (twitter.com)
- HumanEval is saturated: new coding LLM benchmark released (bigcode-bench.github.io)
- Running HumanEval Safely with Riza (riza.io)
- Future of NLG evaluation: LLMs and high quality human eval? (ehudreiter.com)
- Show HN: Fine-Tuning Index of Open-Source LLMs vs. OpenAI (predibase.com)
- GPT4 Learning from Reflection (github.com)
- Beat GPT-4o at Python with 100 dumb LLaMAs (modal.com)
- HumaneAI pin maker selling itself for $1B (gizmodo.com)
- Humanimals (twitter.com)
- Refactoring Humanely and "Accidental Pomodoro" (melatonin.dev)
- How to Kill Bugs Humanely (reducing-suffering.org)