Hackernews posts about RLHF

Related: Stability AI ChatGPT LLM Alpaca

Show HN: Navigating research by changing problem representations (RLHF example) (alo.uz)

3 points by oshuhrat 11 days ago | discuss
Show HN: I reverse-engineered the RLF log format used by REMUS underwater drones (github.com)

1 points by ipunchghosts 6 days ago | discuss
RLHF Book (rlhfbook.com)

479 points by jxmorris12 over 1 year ago | 37 comments
RLHF is just barely RL (twitter.com)

386 points by tosh almost 2 years ago | 257 comments
Dispelling misconceptions about RLHF (aerial-toothpaste-34a.notion.site)

120 points by fpgaminer 11 months ago | 32 comments
RLHF from Scratch (github.com)

75 points by onurkanbkrc 5 months ago | 3 comments
Reinforcement Learning from Human Feedback (RLHF) in Notebooks (github.com)

72 points by ash_at_hny 12 months ago | 1 comments
Direct Preference Optimization vs. RLHF (www.together.ai)

37 points by summarity about 1 year ago | 1 comments
RLHF Is Cr*P, It's a Paint Job on a Rusty Car: Geoffrey Hinton (officechai.com)

16 points by gitremote over 1 year ago | 6 comments
Andrej Karpathy on X: RLHF is just barely RL (twitter.com)

15 points by bilsbie almost 2 years ago | discuss
Show HN: ECX a 'Jail-Fix' for RLHF Neutrality Loops in LLMs (zenodo.org)

3 points by Weatherill 3 months ago | 1 comments
Language Models Learn to Mislead Humans via RLHF (arxiv.org)

3 points by Anon84 over 1 year ago | 1 comments
RLHF Sycophancy: Gemini 3.0 discards calculated data to mimic user edits (tomaszmachnik.pl)

3 points by musculus 6 months ago | discuss
Extreme sycophancy RLHF is needed (twitter.com)

3 points by Michelangelo11 about 1 year ago | discuss
Safety Paradox: How RLHF Creates the AI Psychosis Problem It's Meant to Prevent (www.promptinjection.net)

2 points by JustMyNews about 2 months ago | 3 comments
Llama 2, 3 and 4: Synthetic Data, RLHF, Agents on the Path to Open Source AGI (www.latent.space)

2 points by swyx almost 2 years ago | 1 comments
Models self-report difference between RLHF trained responses and base cognition (github.com)

2 points by daniel-navarro 3 months ago | discuss
Show HN: We filed 99 patents for deterministic AI governance(Prior Art vs. RLHF)

2 points by genesalvatore 4 months ago | discuss
Ring-1T: Trillion-Parameter Model Trained with RLVR and RLHF (ant-ling.medium.com)

2 points by jinqueeny 9 months ago | discuss
Opal: An Operator Algebra View of RLHF (arxiv.org)

2 points by P_qRs 9 months ago | discuss
G-Core: A Simple, Scalable and Balanced RLHF Trainer (arxiv.org)

2 points by PaulHoule 11 months ago | discuss
RLHF: Reinforcement Learning from Human Feedback (huyenchip.com)

2 points by nielsole over 1 year ago | discuss
A Short Introduction to RLHF (ttumiel.com)

2 points by 0101111101 almost 2 years ago | discuss
We train LLMs like dogs, not raise them: RLHF and sycophancy (old.reddit.com)

1 points by musculus 2 months ago | 6 comments
Show HN: A Homeostatic Logic-Funnel to Prevent RLHF Overrides in LLM Personas (zenodo.org)

1 points by Weatherill 3 months ago | 1 comments
Thermodynamic Alignment: Replacing RLHF with Entropic Loss Functions (zenodo.org)

1 points by NyX_AI_ZERO_DAY 7 months ago | 1 comments
Ask HN: Did Google know about RLHF(breakthru) only after OpenAI shared

1 points by elarocks 7 months ago | 1 comments
Why RLHF Will Never Solve Sycophancy (jinyili.substack.com)

1 points by Jinyibruceli about 2 months ago | discuss
The Yellow Wallpaper Problem: RLHF Safety Training as Ontology Enforcement (github.com)

1 points by palmerschallon 5 months ago | discuss
Reducing RLHF hallucinations and sycophancy in Gemini 3 (Interactive Demo) (tomaszmachnik.pl)

1 points by musculus 6 months ago | discuss
Notes on RLHF Book by Nathan Lambert (shubhamg.bearblog.dev)

1 points by shubham13596 7 months ago | discuss
Show HN: Fine tuning and RLHF mistralai 7B using DeepSpeed (github.com)

1 points by genji970 11 months ago | discuss