Vivek Kalyanarangan: Quantization and Fast Inference, Kartoniert / Broschiert
Quantization and Fast Inference
- A Practitioner's Guide to Efficient AI
Lassen Sie sich über unseren eCourier benachrichtigen, sobald das Produkt bestellt werden kann.
- Verlag:
- Manning Publications, 12/2026
- Einband:
- Kartoniert / Broschiert
- Sprache:
- Englisch
- ISBN-13:
- 9781633433915
- Umfang:
- 350 Seiten
- Gewicht:
- 417 g
- Erscheinungstermin:
- 29.12.2026
- Hinweis
-
Achtung: Artikel ist nicht in deutscher Sprache!
Ähnliche Artikel
Klappentext
Get the eBook free when you register your print book at Manning.
Today's AI models demand a lot of memory, compute, and server horsepower---which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment---all with minimal accuracy loss.
From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4.
What's inside
• Applying post-training quantization to production models
• Deploying efficiently on CPUs, edge devices, and mobile
• Framework-agnostic techniques and real cross-framework parity testing
• Flowcharts and checklists for efficient decision making
About the reader
For ML engineers and researchers experienced in Python.
About the author
Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.