GLOSSARY TERM

What is Swish Activation Function?

A smooth, non-monotonic activation function discovered via neural architecture search.
Defined as x multiplied by the sigmoid of x, Swish consistently outperforms ReLU in very deep networks. Its non-monotonic nature and lack of an upper bound prevent information bottlenecks while maintaining a smooth gradient landscape.

Advanced Non-Linearities

Deploy state-of-the-art activation algorithms native to M1 configurations.