Artificial Neural Network Deep Dive

Activation Functions in NN

Learning Outcome

5

Improving ReLU with Leaky ReLU

4

ReLU and the Deep Learning Revolution

3

Vanishing Gradient Problem

2

Mathematical Intuition Behind Non-Linearity

1

Neural Networks Without Activation Functions

Recall

In the Perceptron model, a neuron calculates the net input using the formula:
Net Input (z) = (Weights × Inputs) + Bias

The neuron calculates a weighted sum of inputs and adds a bias to get the net input value.

After calculating the sum, the neuron must decide what to do with the value.

If the result is very large, the neuron cannot pass the raw value directly to the next layer.

A filter decides whether and how strongly the neuron activates.
This filter is called the Activation Function.

Hook/Story/Analogy(Slide 4)

Transition from Analogy to Technical Concept(Slide 5)

Sigmoid (The S-Curve)

The Sigmoid function is one of the earliest activation functions used in neural networks.

Its job is to take any input value and convert it into a number between 0 and 1.

So no matter how big or small the input is, the output will always be:

0<Output<1

Because of this property, it is often interpreted as a probability.

So the model can answer Yes/No type questions.

These are called binary classification problems.

The Sigmoid function is called smooth because the curve changes gradually and continuously without any sudden jumps or sharp corners.

Example:

InputSigmoid Output
100.99995
1000.9999
10000.9999

Even though the inputs are very different, the outputs become almost the same.

So the curve at the edges becomes flat.

When the curve is flat, the "gradient" (the learning signal) becomes zero.

During Backpropagation, the network stops learning entirely. Deep layers literally "starve" to death.

Tanh (The Zero-Centered S-Curve)

The Tanh (Hyperbolic Tangent) activation function is very similar to the Sigmoid function, but with one important improvement.

  • Sigmoid output range: 0 → 1
  • Tanh output range: −1 → 1

So instead of compressing values only into positive numbers, Tanh allows both negative and positive outputs.

Think of it like this:

Input ValueTanh Output
Very large positive numberClose to +1
00
Very large negative numberClose to −1

So Tanh squishes any number into the range −1 to 1

Why Researchers Created Tanh ?

Sigmoid had a problem: outputs are always positive.

Example:

InputSigmoid Output
-50.006
00.5
50.99

Notice something:

All outputs are positive numbers.

Researchers solved this by creating Tanh, which is zero-centered.

During gradient descent optimization, this causes unbalanced weight updates.

Earlier activation functions like Sigmoid and Tanh had a major problem called the Vanishing Gradient Problem.

Because of this issue, deep neural networks could not learn properly, especially when they had many layers.

For many years this slowed down progress in Artificial Intelligence, sometimes called the AI Winter.

Core Concepts (.....Slide N-3)

Summary

4

Multi-Class Classification Output: Use Softmax (converts outputs into probabilities that sum to 1).

3

Binary Classification Output: Use Sigmoid (outputs probability between 0 and 1).

2

Alternative: Use Leaky ReLU if neurons stop learning (Dying ReLU problem).

1

Hidden Layers: Use ReLU because it is fast and reduces the vanishing gradient problem.

Quiz

Why was the Sigmoid activation function abandoned in deep neural network hidden layers?

A. It outputs only negative numbers

B. It is purely linear and cannot learn complex patterns

C. Large inputs flatten the curve, causing the Vanishing Gradient problem

D. It causes the Dying ReLU problem

Quiz-Answer

Why was the Sigmoid activation function abandoned in deep neural network hidden layers?

A. It outputs only negative numbers

B. It is purely linear and cannot learn complex patterns

C. Large inputs flatten the curve, causing the Vanishing Gradient problem

D. It causes the Dying ReLU problem

Copy of OG Template

By Content ITV

Copy of OG Template

  • 0