Glossary

Tanh (Hyperbolic Tangent)

Discover the power of the Tanh activation function in neural networks. Learn how it enables AI to model complex data with zero-centered efficiency!

Tanh (Hyperbolic Tangent) is a widely used activation function in neural networks. It is a mathematical function that squashes input values into a range between -1 and 1. Visually, it produces an "S"-shaped curve, similar to the Sigmoid function. Its key characteristic is that its output is zero-centered, meaning that negative inputs are mapped to negative outputs, and positive inputs are mapped to positive outputs. This property can help speed up the convergence of optimization algorithms like gradient descent during the model training process.

How Tanh Works

In a deep learning model, an activation function decides whether a neuron should be activated by calculating a weighted sum of its inputs. The Tanh function takes any real-valued number and maps it to the range [-1, 1]. Large positive values are mapped close to 1, large negative values are mapped close to -1, and values near zero are mapped to values around zero. This zero-centered nature is a significant advantage, as it helps keep the outputs of layers from shifting too far in one direction, which can make training more stable. For a more in-depth technical explanation, resources from institutions like Stanford University offer detailed course notes on activation functions.

The Tanh function can be easily implemented in popular deep learning frameworks. Here is a short example using PyTorch, an open-source machine learning library.

import torch
import torch.nn as nn

# Initialize a Tanh activation function layer
tanh_activation = nn.Tanh()

# Create a sample tensor with a range of values
input_tensor = torch.tensor([-3.0, -0.5, 0.0, 0.5, 3.0])

# Apply the Tanh function to the tensor
output_tensor = tanh_activation(input_tensor)

# The output values are now in the range [-1, 1]
print(f"Input: {input_tensor}")
print(f"Output: {output_tensor}")

Comparison With Other Activation Functions

Tanh is often compared with other activation functions, each with its own strengths and weaknesses:

Tanh vs. Sigmoid: Both functions have a similar S-shape. However, the Sigmoid function outputs values in the range, whereas Tanh outputs values in [-1, 1]. Because Tanh's output is zero-centered, it is often preferred over Sigmoid in the hidden layers of a network, as it tends to lead to faster convergence during training.
Tanh vs. ReLU: ReLU and its variants, like Leaky ReLU and SiLU, have become the default choice in many modern computer vision architectures. Unlike Tanh, ReLU is not computationally expensive and helps mitigate the vanishing gradient problem, where gradients become extremely small during backpropagation. However, Tanh remains valuable in specific contexts where a bounded output is required. You can see the usage of modern activation functions in models like Ultralytics YOLO11.

Applications in AI and Machine Learning

Tanh has historically been a popular choice, particularly in specific types of neural network architectures and applications.

Recurrent Neural Networks (RNNs): Tanh was commonly used in the hidden states of RNNs and variants like Long Short-Term Memory (LSTM) networks, especially for tasks in Natural Language Processing (NLP). Its bounded range helps regulate the flow of information through the recurrent connections, which is detailed in Christopher Olah's guide to Understanding LSTMs.
Sentiment Analysis: In older NLP models, Tanh was used to map features extracted from text to a continuous range representing sentiment. For example, a model could analyze a product review and output a score between -1 (very negative) and +1 (very positive), making Tanh a natural fit for the output layer. You can find relevant datasets for sentiment analysis on platforms like Kaggle.
Control Systems and Robotics: In Reinforcement Learning (RL), Tanh is sometimes used as the final activation function for policies that output continuous actions bounded within a specific range. For instance, it can control a robot's motor torque, normalizing the output to be between -1 and +1. Frameworks like Gymnasium are often used to develop and test these RL environments.
Hidden Layers: It can be used in the hidden layers of feedforward networks, although ReLU variants are now more common. It might be chosen when the zero-centered property is particularly beneficial for a specific problem or architecture. You can explore the performance of different architectures in our model comparison pages.

While modern architectures like Ultralytics YOLO often utilize functions like SiLU for tasks such as object detection, understanding Tanh remains valuable. It provides context for the evolution of activation functions and might still appear in specific network designs or legacy systems. Both PyTorch and TensorFlow provide standard implementations of the Tanh function for developers.

Tanh (Hyperbolic Tangent)

Train Ultralytics YOLO models to streamline workflows across industries

Flexible enterprise licensing solution to power your innovation

Train AI models in seconds with Ultralytics YOLO

How Tanh Works

Comparison With Other Activation Functions

Applications in AI and Machine Learning

Read more in this category

Why businesses should stop ignoring computer vision today

Key highlights from Ultralytics at Maker Faire Shenzhen 2025

How to sort laundry efficiently using Ultralytics YOLO models

Join the Ultralytics community