Hackernews posts about FP16

TurboOCR: 270–1200 img/s OCR with Paddle and TensorRT (C++/CUDA, FP16) (github.com)

7 points by pfdomizer 23 days ago | 4 comments
Show HN: TurboOCR up to 1200 pages/s with Paddle and TensorRT (C++/CUDA, FP16) (github.com)

3 points by pfdomizer 20 days ago | discuss
TTS engines: WebSocket vs. sync is 5.5x, INT8 slower than fp16 on M4 (ai.gopubby.com)

2 points by KirMoisha 21 days ago | discuss
Show HN: OS Megakernel that match M5 Max Tok/w at 2x the Throughput on RTX 3090 (github.com)

6 points by GreenGames 28 days ago | 1 comments
Bolt Graphics Targets FP64 HPC Workloads with Zeus GPU (www.hpcwire.com)

1 points by rbanffy 5 days ago | discuss
Everyone Wants Servers and Nobody Wants Servers (connectedplaces.online)

3 points by speckx 13 days ago | discuss
WFY24 – Solving the "Average Weather" fallacy at 8,848M (Everest) (www.wfy24.com)

2 points by weatherfun 12 days ago | discuss
ONNX Runtime and CoreML May Silently Convert Your Model to FP16 (ym2132.github.io)

98 points by Two_hands 5 months ago | 17 comments
Insights from 84,000 comments on "Ask HN: Who Is Hiring" using Llama3.1-70B-FP16 (exxa.notion.site)

8 points by Blue_Cosma over 1 year ago | 1 comments
Running the Deepseek-R1 671B Model at FP16 Fidelity on AMD EPYC CPUs (www.servethehome.com)

5 points by tanelpoder about 1 year ago | 2 comments
90T/s on my iPhone llama3.2-1B-fp16 (www.reddit.com)

5 points by darkolorin about 1 year ago | 1 comments
Defeating the Training-Inference Mismatch via FP16 (arxiv.org)

2 points by matt_d 6 months ago | discuss
PyTorch 2.6 Delivers FP16 Support for x86 CPUs, Better Intel GPU Experience (www.phoronix.com)

2 points by rbanffy over 1 year ago | discuss
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models (arxiv.org)

2 points by fzliu almost 2 years ago | discuss
Defeating the Training-Inference Mismatch via FP16 (arxiv.org)

1 points by billyzs 6 months ago | discuss
PyTorch 2.6 Delivers FP16 Support for x86 CPUs, Better Intel GPU Experience (www.phoronix.com)

1 points by mikece over 1 year ago | discuss
150 LoC CUDA I8 Matmul That Beats CuBLAS Tensor Core FP16 (github.com)

1 points by carsonpoole almost 2 years ago | discuss
FlashAttention-3: 1.5-2x faster on FP16, up to 740 TFLOPS on H100 (75% util) (twitter.com)

1 points by quxinxin almost 2 years ago | discuss
Show HN: Kitten TTS – 25MB CPU-Only, Open-Source TTS Model (github.com)

1003 points by divamgupta 9 months ago | 361 comments
Show HN: Three new Kitten TTS models – smallest less than 25MB (github.com)

561 points by rohan_joshi about 2 months ago | 181 comments
Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers

189 points by areddyyt over 1 year ago | 79 comments
Show HN: Appear as anyone in video calls like zoom or Google meets (www.phazr.ai)

110 points by michaelphi about 1 year ago | 51 comments
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts (github.com)

58 points by zyoralabs 2 months ago | 9 comments
Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus

24 points by mikepapadim 5 months ago | 6 comments
Show HN: Run 500B+ Parameter LLMs Locally on a Mac Mini (github.com)

17 points by fatihturker about 2 months ago | 10 comments
Show HN: OpenGraviton – Run 500B+ parameter models on a consumer Mac Mini (opengraviton.github.io)

13 points by fatihturker 2 months ago | 5 comments
Show HN: Lightweight Llama3 Inference Engine – CUDA C (github.com)

12 points by abhisheknair10 over 1 year ago | discuss
Show HN: Slash your LLM Inference Costs with Overnight Processing

5 points by Blue_Cosma over 1 year ago | 1 comments
Transformers 2.0: What Ilya and Sam Might Have Missed

5 points by metehan777 over 1 year ago | 1 comments
Show HN: 1B Embeddings

4 points by INVARIAN about 1 month ago | 2 comments
Tq-KV – Rust implementation of TurboQuant that works on GGUF models

3 points by onurgokyildiz about 1 month ago | discuss
Show HN: HyperAI GPU Leaderboard – A benchmark comparison site for AI workloads (hyper.ai)

2 points by Ada_trying 3 months ago | 2 comments
Show HN: Efficient LLM Architectures for 32GB RAM (Ternary and Sparse Inference) (github.com)

2 points by fatihturker about 2 months ago | 1 comments
Show HN: Loft CLI – Fine-tune and run LLMs (1–3B) on 8 GB MacBook Air, no GPUs

2 points by dips2umar 10 months ago | 1 comments
Show HN: I made Qwen3.5-4B 13% smarter by compressing it to 4-bit (huggingface.co)

2 points by singularity_max about 2 months ago | discuss
Show HN: Slashing LLM Costs for Overnight Batch Inference

2 points by Blue_Cosma over 1 year ago | discuss
Show HN: ONNX optimized SigLIP and related foundation models (github.com)

2 points by rhysdg almost 2 years ago | discuss