GLOSSARY TERM

What is Quantization?

Reducing the precision of the numerical values representing model parameters.
Quantization converts 32-bit floating-point weights to lower precision formats like 8-bit or 4-bit integers. It drastically shrinks model size and speeds up inference on edge hardware, utilizing specialized matrix math extensions.

Deploy to the Edge

Shrink massive models for low-latency deployment utilizing M1 optimization.