Hackernews posts about vLLM-MLX
- Show HN: Treni – single-binary GPU runtime for uncertainty-aware agents 5ms TTFT (treni-docs.pages.dev)
- vLLM-MLX – Run LLMs on Mac at 464 tok/s (github.com)
- Show HN: Python SDK for RamaLama AI Containers (github.com)
- Trying VLLM Ideas on Apple Silicon with MLX (WIP) (github.com)
- Run LLMs on macOS using LLM-mlx and Apple's MLX framework (simonwillison.net)
- The insecure evangelism of LLM maximalists (lewiscampbell.tech)
- Against LLM Maximalism (explosion.ai)
- Show HN: Built a free moderation API after failing to find one (the-profanity-api.com)