Hackernews posts about Llama.cpp

Related: M2 Max

Llama.cpp: Deterministic Inference Mode (CUDA): RMSNorm, MatMul, Attention (github.com)

6 points by diwank 29 days ago | discuss
Yzma – local Vision Language Models/LLMs in Go using llama.cpp without CGo (github.com)

2 points by deadprogram 6 days ago | discuss
Run local LLMs with Ruby llama.cpp bindings (www.docuseal.com)

1 points by babanooey21 7 days ago | 1 comments
Yzma = embedding+inference on VLM/LLM/SLM/TLM in pure Go using llama.cpp (github.com)

1 points by deadprogram about 10 hours ago | discuss
Launch HN: Cactus (YC S25) – AI inference on smartphones (github.com)

123 points by HenryNdubuaku 26 days ago | 63 comments
Show HN: docker/model-runner – an open-source tool for local LLMs (github.com)

17 points by ericcurtin about 5 hours ago | 9 comments
Show HN: STT –> LLM –> TTS pipeline in C (github.com)

11 points by RhinoDevel 27 days ago | discuss
Vision Now Available in Llama.cpp (github.com)

550 points by redman25 5 months ago | 104 comments
Llama.cpp guide – Running LLMs locally on any hardware, from scratch (steelph0enix.github.io)

368 points by zarekr 11 months ago | 87 comments
Heap-overflowing Llama.cpp to RCE (retr0.blog)

248 points by retr0reg 7 months ago | 54 comments
Llama.cpp supports Vulkan. why doesn't Ollama? (github.com)

217 points by buyucu 9 months ago | 228 comments
Ollama violating llama.cpp license for over a year (github.com)

202 points by Jabrov 5 months ago | 68 comments
Llama.cpp Now Supports Qwen2-VL (Vision Language Model) (github.com)

155 points by BUFU 10 months ago | 50 comments
Show HN: LLaVaVision: An AI "Be My Eyes"-like web app with a llama.cpp backend (github.com)

154 points by lxe almost 2 years ago | 19 comments
Show HN: Open-source load balancer for llama.cpp (github.com)

151 points by mcharytoniuk over 1 year ago | 20 comments
Go library for in-process vector search and embeddings with llama.cpp (github.com)

114 points by kelindar 12 months ago | 29 comments
Llama.cpp AI Performance with the GeForce RTX 5090 Review (www.phoronix.com)

113 points by kristianp 7 months ago | 78 comments
Performance of llama.cpp on Apple Silicon A-series (github.com)

100 points by mobilio almost 2 years ago | 41 comments
Running Llama.cpp on AWS Instances (github.com)

96 points by schappim almost 2 years ago | 10 comments
Mistral Integration Improved in Llama.cpp (github.com)

95 points by decide1000 2 months ago | 15 comments
Llamanet: Zero-setup, zero-dependency OpenAI replacement powered by llama.cpp (github.com)

45 points by cocktailpeanut over 1 year ago | 5 comments
21.2× faster than llama.cpp? plus 40% memory usage reduction (arxiv.org)

43 points by helloericsf over 1 year ago | 14 comments
Llama.cpp: Add GPT-OSS (github.com)

35 points by atgctg 2 months ago | discuss
Gemma Is Added to Llama.cpp (github.com)

17 points by behnamoh over 1 year ago | discuss
LLaVA C++ server (based on llama.cpp) (github.com)

14 points by trzy almost 2 years ago | 6 comments
Llama.cpp Now Part of the Nvidia RTX AI Toolkit (developer.nvidia.com)

13 points by alanzhuly about 1 year ago | 1 comments
Llama.cpp releases now ship with pre-built macOS binaries (twitter.com)

12 points by schappim over 1 year ago | 1 comments
Grok-1 Support for Llama.cpp (github.com)

11 points by schappim over 1 year ago | 2 comments
Ask HN: Is anybody using llama.cpp for production?

11 points by HardikVala 3 months ago | 1 comments
We Found a Heap Overflow in Llama.cpp's Tokenizer (pwno.io)

9 points by retr0reg 4 months ago | discuss
Llama.cpp Working on Support for Llama3 (github.com)

7 points by theolivenbaum over 1 year ago | discuss
DeepSeek-R1 speeds up llama.cpp code by x2 (github.com)

6 points by roboboffin 9 months ago | 3 comments
Llama.cpp AI Performance with the GeForce RTX 5090 (www.phoronix.com)

6 points by mfiguiere 9 months ago | 1 comments
Ask HN: 2x Arc A770 or 1x Radeon 7900 XTX for llama.cpp

5 points by danielEM 7 months ago | 5 comments
Tinker with LLMs in the privacy of your own home using Llama.cpp (www.theregister.com)

5 points by rntn about 2 months ago | discuss
Llama.cpp: SOTA 2-bit quants (github.com)

5 points by tosh almost 2 years ago | discuss
Llama.MIA – fork of Llama.cpp with interpretability features (grgv.xyz)

5 points by coolvision almost 2 years ago | discuss