Hackernews posts about TensorRT-LLM

Benchmarking LLM Inference Back Ends: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, TGI (www.bentoml.com)

15 points by chaoyu about 1 year ago | 1 comments
Deploy Gemma 7B with TensorRT-LLM and achieve > 500 tok/s (docs.mystic.ai)

10 points by oscarrovira over 1 year ago | 8 comments
Show HN: LLGTRT: TensorRT-LLM+Rust server w/ OpenAI-compat and Structured Output (github.com)

6 points by mmoskal 11 months ago | discuss
TensorRT-LLM runtime now open-source (github.com)

4 points by mmoskal 7 months ago | 1 comments
Faster Open-Source Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, VLLM) (lmsys.org)

4 points by yvbbrjdr about 1 year ago | discuss
Optimizing Inference on LLMs with NVIDIA TensorRT-LLM (developer.nvidia.com)

3 points by mkaushik almost 2 years ago | 1 comments
Faster Mixtral inference with TensorRT-LLM and quantization (www.baseten.co)

2 points by tikkun almost 2 years ago | 1 comments
Nvidia Releases TensorRT-LLM (github.com)

2 points by hkab almost 2 years ago | 1 comments
5x Faster Time to First Token with Nvidia TensorRT-LLM KV Cache Early Reuse (developer.nvidia.com)

2 points by sandwichsphinx 11 months ago | discuss
TensorRT-LLM on K8s

2 points by hommes-r about 1 year ago | discuss
Comparing LLM Optimization Tools: VLLM, LMDeploy, MLC-LLM, TensorRT-LLM, and TGI (www.bentoml.com)

2 points by bbzjk7 over 1 year ago | discuss
Comparing GenAI Inference Engines: TensorRT-LLM, VLLM, HF TGI, and LMDeploy

1 points by juliensalinas 6 months ago | 1 comments
Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC (www.inferless.com)

1 points by agcat about 1 year ago | 1 comments
Tuning TensorRT-LLM for Optimal Serving (www.bentoml.com)

1 points by djhu9 about 1 year ago | discuss
Benchmarking Nvidia TensorRT-LLM (jan.ai)

1 points by tosh over 1 year ago | discuss
Turbocharging Meta Llama 3 Performance with Nvidia TensorRT-LLM and Triton (developer.nvidia.com)

1 points by mariuz over 1 year ago | discuss
Phind Model beats GPT-4 at coding, with GPT-3.5 speed and 16k context (www.phind.com)

891 points by rushingcreek almost 2 years ago | 347 comments
Zero-Shot Text Classification on a low-end CPU-only machine?

8 points by backend-dev-33 12 months ago | 12 comments
Show HN: LLMOne – Deploy LLMs from bare metal to production in hours (github.com)

5 points by pescn 3 months ago | discuss
Defining a Metric for LLM Responsiveness

1 points by Thomas-Mecattaf 12 months ago | 1 comments
LLMs up to 4x Faster With Latest NVIDIA Drivers on Windows (blogs.nvidia.com)

69 points by seaal almost 2 years ago | 33 comments
Nvidia Hopper Leaps Ahead in Generative AI at MLPerf (blogs.nvidia.com)

3 points by CharlesW over 1 year ago | discuss
Show HN: Automatically Build Nvidia TRT-LLM Engines (www.baseten.co)

2 points by mikejulietbravo about 1 year ago | discuss
Show HN: A script to deploy Llama7B on A10 Batch size 1 100 Token per sec (github.com)

1 points by mattick27 almost 2 years ago | discuss