Hackernews posts about Llama.cpp
Related:
M2 Max
- Llama.cpp supports Vulkan. why doesn't Ollama? (github.com)
- DeepSeek-R1 speeds up llama.cpp code by x2 (github.com)
- Llama.cpp AI Performance with the GeForce RTX 5090 (www.phoronix.com)
- Llama.cpp AI Performance with the GeForce RTX 5090 (www.phoronix.com)
- Llama's Paradox – Exploiting Llama.cpp (retr0.blog)
- Llama.cpp PR with 99% of code written by DeepSeek-R1 (github.com)
- Stop Wasting Your Multi-GPU Setup with Llama.cpp (www.ahmadosman.com)
- Stop Wasting Your Multi-GPU Setup with Llama.cpp (www.ahmadosman.com)
- Llama.cpp AI Performance with the GeForce RTX 5090 (www.phoronix.com)
- Show HN: Bodhi App – Local LLM Inference (getbodhi.app)
- Ollama are 'try[ing to] achieve vendor lock-in' (github.com)
- Ggml: X2 speed for WASM by optimizing SIMD (github.com)
- Llama.cpp 30B runs with only 6GB of RAM now (github.com)
- Llama.cpp: Full CUDA GPU Acceleration (github.com)
- How Is LLaMa.cpp Possible? (finbarr.ca)
- Llama.cpp guide – Running LLMs locally on any hardware, from scratch (steelph0enix.github.io)
- LLama.cpp now has a web interface (github.com)
- Running LLaMA 7B on a 64GB M2 MacBook Pro with Llama.cpp (til.simonwillison.net)
- Show HN: Open-source load balancer for llama.cpp (github.com)
- Why MMAP in llama.cpp hides true memory usage (twitter.com)
- Performance of llama.cpp on Apple Silicon A-series (github.com)
- llama.cpp: Roadmap May 2023 (github.com)
- Running Llama.cpp on AWS Instances (github.com)
- Revert for jart’s llama.cpp MMAP miracles (github.com)
- Show HN: Llama.go – port of llama.cpp to pure Go (github.com)
- WIP Llama.cpp Vulkan Implementations (github.com)
- Show HN: Grammar Generator App for Llama.cpp (grammar.intrinsiclabs.ai)
- Gemma Is Added to Llama.cpp (github.com)
- LLaVA C++ server (based on llama.cpp) (github.com)
- Llama.cpp Now Part of the Nvidia RTX AI Toolkit (developer.nvidia.com)
- Grok-1 Support for Llama.cpp (github.com)
- Jlama (Java) outperforms llama.cpp in F32 Llama 7B Model (twitter.com)