What is Manhattan Distance in Machine Learning?

Learn what Manhattan distance is, how it differs from Euclidean distance, and how it is used in various machine learning algorithms and applications.

In the vast realm of data, finding the right path is crucial. For data scientists, that path often involves calculating distances between points, a task where choosing the right metric can make all the difference. Enter the Manhattan distance, a versatile tool that stands out from its Euclidean and Minkowski cousins. Let’s delve into their differences and discover when Manhattan shines brightest.

Manhattan Distance: Imagine a bustling city grid. To get from point A to point B, you wouldn’t take a diagonal shortcut, but traverse each block, turning only at corners. That’s the essence of Manhattan distance! It calculates the sum of the absolute differences in coordinates between two points. Think of it as the total number of blocks you need to cross to reach your destination.

It is also known as city block distance or taxicab distance. It is defined as the sum of the absolute differences of their coordinates.

For example, the Manhattan Distance between the points (x1, y1) and (x2, y2) is

$$d_{manhattan} = |x1 – x2| + |y1 – y2|$$

Euclidean Distance: In contrast, the Euclidean distance craves the efficiency of a crow’s flight. It represents the shortest straight-line path between two points, mimicking the Pythagorean theorem in higher dimensions. It’s like teleporting across the grid, ignoring the city’s constraints.

It is defined as the square root of the sum of the squared differences of their coordinates. For example, the Euclidean distance between the points (x1, y1) and (x2, y2) in a two-dimensional space is:

$$d_{euclidean} = \sqrt{(x1 – x2)^2 + (y1 – y2)^2}$$

Minkowski Distance: It is a generalization of both Manhattan distance and Euclidean distance. It is defined as the p-th root of the sum of the p-th power of the differences of their coordinates. For example, the Minkowski distance between the points (x1, y1) and (x2, y2) in a two-dimensional space is:

$$d_{minkowski} = (\sum_{i=1}^n |x_i – y_i|^p)^{1/p}$$

When p = 1, Minkowski distance is equivalent to Manhattan distance. When p = 2, Minkowski distance is equivalent to Euclidean distance.

Choosing the Right Metric: So, when does Manhattan distance take the lead? Here are some scenarios:

  • High Dimensionality: When dealing with many features, Euclidean distance can become computationally expensive. Manhattan’s simplicity makes it faster and more efficient.
  • Sparse Data: If your data has many missing values, Euclidean distance can be misleading, as it heavily penalizes large distances. Manhattan’s focus on individual differences makes it more robust in such cases.
  • Categorical Data: When dealing with non-numerical data like city blocks or taxi fares, Euclidean distance loses its geometric meaning. Manhattan, with its focus on absolute differences, shines in such scenarios.

Beyond the Numbers: Choosing a distance metric isn’t just about formulas. It’s about understanding the underlying assumptions and aligning them with your problem. Manhattan distance, with its focus on discrete steps and individual differences, offers a unique perspective, often overlooked in the Euclidean rush.

In this article, we will explore the concept of Manhattan distance in more detail, and see how it is used in various machine learning algorithms and applications. We will also discuss the advantages and disadvantages of using Manhattan distance as a distance metric in machine learning.

Manhattan Distance Formula

manhattan distance
manhattan distance in machine learning

The Manhattan distance between two points or vectors is the sum of the absolute differences of their coordinates. It is also known as the city block distance or the taxicab distance because it represents the distance that a car would have to travel in a grid-like city.

The formula for calculating the Manhattan distance between two points or vectors p and q with n dimensions is:

$$d_{manhattan}(p, q) = \sum_{i=1}^n |p_i – q_i|$$

where pi​ and qi​ are the coordinates of the ith dimension.

For example, if we have two points in a two-dimensional space, such as p=(2,3) and q=(5,7), the Manhattan distance between them is:

$$d_{manhattan}(p, q) = |2 – 5| + |3 – 7| = 3 + 4 = 7$$

The following figure shows the Manhattan distance between two points in a two-dimensional space:

As you can see, the Manhattan distance between p and q is equal to the length of the black path, which consists of horizontal and vertical segments.

If we have two vectors in a higher-dimensional space, such as p=(1,2,3,4) and q=(4,3,2,1), the Manhattan distance between them is:

$$d_{manhattan}(p, q) = |1 – 4| + |2 – 3| + |3 – 2| + |4 – 1| = 3 + 1 + 1 + 3 = 8$$

The following figure shows the Manhattan distance between two vectors in a four-dimensional space:

As you can see, the Manhattan distance between p and q is equal to the length of the black path, which consists of segments along each dimension.

To calculate the Manhattan distance in Python, we can use the built-in abs function to get the absolute value of the difference between two coordinates, and the sum function to add up all the differences. Alternatively, we can use the cityblock function from the scipy.spatial.distance module, which is specifically designed for this purpose.

Here is an example of how to use both methods in Python:

# Define two vectors
p = [1, 2, 3, 4]
q = [4, 3, 2, 1]

# Method 1: Use abs and sum functions
def manhattan_distance(p, q):
  return sum(abs(p_i - q_i) for p_i, q_i in zip(p, q))

# Method 2: Use cityblock function from scipy
from scipy.spatial.distance import cityblock

# Print the Manhattan distance
print(manhattan_distance(p, q))
print(cityblock(p, q))

Output

8
8

As you can see, both methods produce the same result, which is 8.

Manhattan Distance in Machine Learning

Manhattan distance is used in different machine learning tasks such as classification, clustering, regression, and dimensionality reduction. It is often used as a distance metric, which is a function that quantifies how similar or dissimilar two objects are.

Distance metrics are important for machine learning algorithms that rely on the notion of proximity or similarity, such as:

Classification

The task of assigning a label to an object based on its features. For example, in k-nearest neighbors (KNN), a popular classification algorithm, the label of an object is determined by the majority vote of its k-closest neighbors, where the closeness is measured by a distance metric. Manhattan distance can be used as a distance metric for KNN, especially when the features are categorical or ordinal.

For example, the following code snippet shows how to use Manhattan distance as a distance metric for KNN in Python using the scikit-learn library:

# Import libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
X_train = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # Features of training objects
y_train = [0, 1, 1] # Labels of training objects
X_test = [[2, 4, 6], [5, 7, 9]] # Features of test objects
y_test = [0, 1] # Labels of test objects

# Create and fit KNN classifier with manhattan distance
knn = KNeighborsClassifier(n_neighbors=3, metric='manhattan')
knn.fit(X_train, y_train)

# Predict labels of test objects
y_pred = knn.predict(X_test)

# Evaluate accuracy of predictions
acc = accuracy_score(y_test, y_pred)
print("Accuracy:", acc)

Output

Accuracy: 0.5

Clustering

The task of grouping objects into clusters based on their features. For example, in k-means, a popular clustering algorithm, the objects are assigned to the cluster whose center (or centroid) is closest to them, where the closeness is measured by a distance metric. Manhattan distance can be used as a distance metric for k-means, especially when the features are binary or discrete.

Regression

The task of predicting a continuous value for an object based on its features. For example, in linear regression, a popular regression algorithm, the value of an object is modeled as a linear combination of its features, and the parameters of the model are estimated by minimizing the sum of the squared errors between the actual and predicted values.

Manhattan distance can be used as an alternative to the squared error, resulting in a variant of linear regression called least absolute deviations (LAD) or L1-norm regression. LAD is more robust to outliers than linear regression, as it does not penalize large errors as much.

Dimensionality Reduction

The task of reducing the number of features of the objects while preserving their essential information. For example, in principal component analysis (PCA), a popular dimensionality reduction technique, the objects are projected onto a lower-dimensional space that captures the maximum variance of the data.

Manhattan distance can be used as a criterion for choosing the optimal projection, resulting in a variant of PCA called L1-norm PCA. L1-norm PCA is more robust to outliers and noise than PCA, as it does not assume that the data follows a normal distribution.

Advantages of Manhattan Distance

  • Robustness: Less sensitive to outliers and skewed data due to its focus on individual differences rather than large deviations.
  • Computational Efficiency: Simpler calculations compared to Euclidean distance, especially in high dimensions.
  • Interpretability: Easy to understand due to its intuitive “city block” analogy.

Disadvantages of Manhattan Distance

  • Ignores Direction: Unlike Euclidean distance, it doesn’t consider the direction of differences, potentially missing crucial relationships in certain data.
  • Less Accurate in Some Cases: For data with strong correlations or clusters of varying densities, Euclidean distance might perform better.

Algorithmic Compatibility

  • KNN: Works well with Manhattan distance due to its focus on individual differences and simplicity.
  • k-means: Efficiently clusters data using Manhattan distance for its fast calculations and robustness to outliers.
  • Linear Regression: Works well with Manhattan distance in the L1 norm for improved outlier resistance and potentially more robust models.
  • PCA: Doesn’t directly use Manhattan distance, but its reliance on the L1 norm aligns with Manhattan’s focus on consistent feature variations.

Choosing the Right Distance

The optimal distance metric depends on your specific data and task. Manhattan distance offers a valuable alternative to Euclidean distance, particularly in high dimensions, with outliers, or when interpretability is crucial. However, be mindful of its limitations and consider the data’s underlying structure to make an informed choice.

Remember, like any tool, Manhattan distance is most powerful when wielded with understanding and purpose. As you navigate the ever-evolving landscape of machine learning, let its unique perspective pave the way for insightful discoveries.

Conclusion

In this article, we have learned about the Manhattan distance, a way of measuring the distance between two points or vectors in a space. We have seen how it is different from other distance measures such as Euclidean distance and Minkowski distance, and when it is preferred over them.

We have also seen how it is used in different machine learning tasks such as classification, clustering, regression, and dimensionality reduction, and what are the advantages and disadvantages of using it in these tasks. We have also seen some examples of how to use Manhattan distance in Python using the scikit-learn library.

Some key takeaways for the reader are:

  • Manhattan distance is defined as the sum of the absolute differences of the coordinates of two points or vectors.
  • Manhattan distance is suitable for discrete or grid-like spaces, sparse or high-dimensional data, and noisy or outlier-prone data.
  • Manhattan distance can be used as a distance metric or a criterion for various machine learning algorithms, such as KNN, k-means, LAD, and L1-norm PCA.
  • Manhattan distance can be more robust, simple, and efficient than Euclidean distance, but it can also be less accurate, intuitive, and invariant than Euclidean distance.

Frequently Asked Questions

What is Manhattan distance in machine learning?

Manhattan distance is a metric that measures the sum of the absolute differences between the coordinates of two points in a Cartesian space.

How is Manhattan distance different from Euclidean distance?

Euclidean distance is the shortest distance between two points, calculated using the Pythagorean theorem. Manhattan distance is the distance between two points along axes at right angles, calculated by adding the absolute differences of their coordinates.

When is Manhattan distance preferred over Euclidean distance?

Manhattan distance is preferred over Euclidean distance when the data is high-dimensional, sparse, or contains outliers. Manhattan distance is less sensitive to these factors and can better handle them compared to Euclidean distance.

What are some applications of Manhattan distance in machine learning?

Manhattan distance is used as a similarity measure between data points, especially in nearest-neighbor search and clustering algorithms. It is also used in regression analysis, image processing, and game theory.

Share your love

Leave a Reply

Your email address will not be published. Required fields are marked *