Hackernews posts about VLLM
- Nano-vLLM: How a vLLM-style inference engine works (neutree.ai)
- New inference engine faster than vLLM, SGLang, TRT-LLM (layerscale.ai)
- Inference startup Inferact lands $150M to commercialize vLLM (techcrunch.com)
- Ollama vs. vLLM: When to Start Scaling Your Local AI Stack (www.sitepoint.com)
- Why vLLM Scales: Paging the KV-Cache for Faster LLM Inference (akrisanov.com)
- Inside vLLM: Anatomy of a High-Throughput LLM Inference System (www.aleksagordic.com)
- Using Nsight Compute to profile kernels in vLLM without creating repro scripts (blog.ncompass.tech)
- vLLM multi-turn conversations design (github.com)
- Show HN: Python SDK for RamaLama AI Containers (github.com)
- Show HN: VLM Inference Engine in Rust (mixpeek.com)
- Running local LLMs and VLMs on the Arduino UNO Q with yzma (projecthub.arduino.cc)