GLOSSARY TERM

What is Model Pruning?

A technique to compress neural networks by removing redundant weights or neurons.

Model pruning identifies and eliminates non-critical parameters, resulting in sparse network structures. It significantly reduces memory footprint and inference latency with negligible degradation in predictive performance.

Optimize Inference

Compress models heavily for edge deployment using M1 optimization layers.

Explore M1 Layer