Hackernews posts about TensorRT-LLM
- TensorRT-LLM runtime now open-source (github.com)
- TensorRT LLM (github.com)
- 5x Faster Time to First Token with Nvidia TensorRT-LLM KV Cache Early Reuse (developer.nvidia.com)
- Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC (www.inferless.com)
- Automating Inference Optimizations with NVIDIA TensorRT LLM AutoDeploy (developer.nvidia.com)
- Tuning TensorRT-LLM for Optimal Serving (www.bentoml.com)
- Show HN: Small hardware box that runs local LLMs and exposes an OpenAI API (axis-one-psi.vercel.app)
- Show HN: Automatically Build Nvidia TRT-LLM Engines (www.baseten.co)