GLOSSARY TERM
What is Swish Activation Function?
A smooth, non-monotonic activation function discovered via neural architecture search.
Defined as x multiplied by the sigmoid of x, Swish consistently outperforms ReLU in very deep networks. Its non-monotonic nature and lack of an upper bound prevent information bottlenecks while maintaining a smooth gradient landscape.
Advanced Non-Linearities
Deploy state-of-the-art activation algorithms native to M1 configurations.