3. Neural Nets: Activation Functions

📅 2025-03-16 | #Neural-Net

Activation Functions

The data on left can be modelled by using a linear function.
- The data is linearly separable.
- linear activation / no activation is sufficient.
But the data on right can't be modelled using linear function
- Linear model will fail here because it can only create a straight-line decision boundary.
- Therefore, a non-linearity is required to model the data.
Without activation functions, deep networks behave like a simple linear model, limiting their capability.

import torch
import torch.nn as nn

f (x) = a x + b

PyTorch Implementation :

linear_activation = nn.Identity()
x = torch.tensor([1.0, 2.0, 3.0])
output = linear_activation(x)

f (x) = \frac{1}{1 + e ^{- x}}

Output in range (0,1).
Used in binary classification problems.
Can be interpreted as probabilities (used in logistic regression).
Drawback
- Vanishing gradient problem
  - When inputs are large/small, gradients become very small.

PyTorch Implementation :

sigmoid = nn.Sigmoid()
x = torch.tensor([-1.0, 0.0, 1.0])
output = sigmoid(x)

f (x) = \frac{e ^{x} - e ^{- x}}{e ^{x} + e ^{- x}}

output in range in (-1,1)
Centered around 0, helps in faster convergence
Usefull for hidden layers in deep networks.
Drawback
- Vanishing Gradient Problem (better than sigmoid)
- Computationally expensive

PyTorch Implementation :

tanh = nn.Tanh()
output = tanh(x)

f (x) = ma x (0, x)

most widely used activation function.
output in range [0,∞).
Solves vanishing gradient problem (because no exponentials).
efficient computation.
Sparse activation (many neuron output 0).
Drawback
- Dying ReLU Problem (Neurons that output zero remain inactive).
- Not centered around zero.

PyTorch Implementation :

relu = nn.ReLU(x)
output = relu(x)

f (x) = {x, αx, if x \geq 0 if x < 0

PyTorch Implementation :

leakyrelu = nn.LeakyReLU(negative_slop=0.01)
ouyput = leakyrelu(x)

f (x) = {x, αx, if x \geq 0 if x < 0

PyTorch Implementation :

prelu = nn.PReLU()
output = prelu(x)

f (x) = {x, α (e^{x} - 1), if x \geq 0 if x < 0

PyTorch Implementation :

elu = nn.ELU(alpha=1.0)
output = elu(x)

σ (x_{i}) = \frac{e ^{x_{i}}}{\sum _{j} e ^{x_{j}}}

PyTorch Implementation :

softmax = nn.Softmax(dim=1) # axis 1 across row
output = softmax(torch.tensor([[1.0, 2.0, 3.0]]))

Choosing the right activation function is crucial for model performance and convergence stability.

Plot of some of the activation functions