GLOSSARY TERM
What is Quantization?
Reducing the precision of the numerical values representing model parameters.
Quantization converts 32-bit floating-point weights to lower precision formats like 8-bit or 4-bit integers. It drastically shrinks model size and speeds up inference on edge hardware, utilizing specialized matrix math extensions.
Deploy to the Edge
Shrink massive models for low-latency deployment utilizing M1 optimization.