Have you ever needed to find the closest point to your location or calculate distances between places on a map? Maybe you want to create boundaries around data points or find clusters in your dataset. These are spatial problems, and they pop up everywhere – from GPS apps to computer graphics to machine learning.
In this guide, I’ll show you how to use scipy.spatial to solve these problems with Python. You’ll learn how to work with convex hulls, KDTrees for fast nearest neighbor searches, and various distance calculations. By the end, you’ll have practical tools to handle spatial data like a pro.
How Do You Get Started With scipy.spatial in Python?
First, let me show you how to import and set up scipy.spatial. You’ll need NumPy too since all spatial operations work with NumPy arrays.
import numpy as np
from scipy.spatial import distance, ConvexHull, KDTree
import matplotlib.pyplot as plt
# Create some sample 2D points
points = np.array([
[1, 2], [3, 4], [5, 1], [2, 8], [7, 3],
[4, 6], [8, 2], [1, 7], [6, 5], [9, 1]
])
print(f"We have {len(points)} points to work with")
print(f"Points shape: {points.shape}")
This code creates a set of 2D points that we’ll use throughout our examples. The points represent coordinates like locations on a map or data points in a 2D space.
How Can You Calculate Distances Using scipy.spatial?
One of the most common spatial tasks is calculating distances between points. Scipy.spatial gives you many distance metrics to choose from.
# Calculate Euclidean distance between two points
point_a = np.array([1, 2])
point_b = np.array([4, 6])
euclidean_dist = distance.euclidean(point_a, point_b)
print(f"Euclidean distance: {euclidean_dist:.2f}")
# Manhattan distance (city block distance)
manhattan_dist = distance.cityblock(point_a, point_b)
print(f"Manhattan distance: {manhattan_dist:.2f}")
# Cosine distance (useful for high-dimensional data)
cosine_dist = distance.cosine(point_a, point_b)
print(f"Cosine distance: {cosine_dist:.4f}")
Euclidean distance is the straight-line distance you’d measure with a ruler. Manhattan distance is like walking through city blocks – you can only move horizontally and vertically. Cosine distance measures the angle between vectors, which is great for comparing patterns regardless of magnitude.
# Calculate all pairwise distances between points
from scipy.spatial.distance import pdist, squareform
# Get condensed distance matrix
distances_condensed = pdist(points[:5], metric='euclidean')
print(f"Condensed distances: {distances_condensed}")
# Convert to full square matrix
distances_matrix = squareform(distances_condensed)
print(f"\nDistance matrix shape: {distances_matrix.shape}")
print(f"Distance from point 0 to point 3: {distances_matrix[0, 3]:.2f}")
This creates a matrix showing distances between every pair of points. The condensed version saves memory by only storing the unique distances (since distance from A to B equals distance from B to A).
How Do You Perform Fast Nearest Neighbor Search with KDTree?
When you have lots of points and need to find the closest ones quickly, KDTree is your best friend. It creates a tree structure that makes searches incredibly fast.
# Build a KDTree from our points
tree = KDTree(points)
# Find the nearest neighbor to a query point
query_point = np.array([5, 5])
distance_to_nearest, nearest_index = tree.query(query_point)
print(f"Query point: {query_point}")
print(f"Nearest neighbor: {points[nearest_index]}")
print(f"Distance: {distance_to_nearest:.2f}")
# Find the 3 nearest neighbors
distances, indices = tree.query(query_point, k=3)
print(f"\n3 nearest neighbors:")
for i, (dist, idx) in enumerate(zip(distances, indices)):
print(f"{i+1}. Point {points[idx]} at distance {dist:.2f}")
KDTree shines when you have thousands or millions of points. Instead of checking every single point (which would be slow), it uses the tree structure to quickly eliminate most points from consideration.
# Find all neighbors within a certain radius
radius = 3.0
neighbor_indices = tree.query_ball_point(query_point, radius)
print(f"\nPoints within radius {radius} of {query_point}:")
for idx in neighbor_indices:
dist = np.linalg.norm(points[idx] - query_point)
print(f"Point {points[idx]} at distance {dist:.2f}")
This finds all points within a circular area around your query point. It’s perfect for finding “nearby” items like restaurants within walking distance.
How Do You Create Convex Hulls for Data Boundaries?
A convex hull is like wrapping a rubber band around your points – it gives you the smallest boundary that contains all points. This is useful for finding the outer edge of data clusters or creating boundaries around regions.
# Calculate the convex hull
hull = ConvexHull(points)
print(f"Number of points on the hull: {len(hull.vertices)}")
print(f"Hull vertices: {hull.vertices}")
print(f"Hull points: {points[hull.vertices]}")
# Calculate hull area and perimeter
print(f"Hull area: {hull.volume:.2f}") # In 2D, volume means area
print(f"Hull perimeter: {hull.area:.2f}") # In 2D, area means perimeter
The hull.vertices gives you the indices of points that form the boundary. These are the “corner” points of your data.
# Visualize the convex hull
plt.figure(figsize=(10, 6))
# Plot all points
plt.scatter(points[:, 0], points[:, 1], c='blue', s=100, alpha=0.7, label='Data points')
# Plot hull vertices
hull_points = points[hull.vertices]
plt.scatter(hull_points[:, 0], hull_points[:, 1], c='red', s=150, marker='^', label='Hull vertices')
# Draw hull boundary
for simplex in hull.simplices:
plt.plot(points[simplex, 0], points[simplex, 1], 'r-', alpha=0.7)
# Close the hull by connecting last point to first
hull_path = np.append(hull.vertices, hull.vertices[0])
plt.plot(points[hull_path, 0], points[hull_path, 1], 'r-', linewidth=2, alpha=0.8)
plt.xlabel('X coordinate')
plt.ylabel('Y coordinate')
plt.title('Convex Hull Example')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
This creates a nice visualization showing your data points and the convex hull boundary. The red triangles mark the corner points that define the hull.
What Are Real-World Applications of scipy.spatial?
Let me show you a practical example that combines several spatial tools. Imagine you’re analyzing delivery routes and need to find optimal pickup points.
# Simulate customer locations and delivery centers
np.random.seed(42)
customers = np.random.rand(50, 2) * 10 # 50 customers in 10x10 area
delivery_centers = np.array([[2, 2], [8, 8], [2, 8], [8, 2]]) # 4 delivery centers
# Build KDTree for fast customer-to-center matching
center_tree = KDTree(delivery_centers)
# Assign each customer to nearest delivery center
customer_assignments = []
for customer in customers:
distance_to_center, center_index = center_tree.query(customer)
customer_assignments.append(center_index)
customer_assignments = np.array(customer_assignments)
# Calculate service areas using convex hulls
colors = ['red', 'blue', 'green', 'orange']
plt.figure(figsize=(12, 8))
for center_id in range(len(delivery_centers)):
# Get customers assigned to this center
assigned_customers = customers[customer_assignments == center_id]
if len(assigned_customers) >= 3: # Need at least 3 points for hull
# Add the delivery center to the points for hull calculation
hull_points = np.vstack([assigned_customers, delivery_centers[center_id:center_id+1]])
hull = ConvexHull(hull_points)
# Plot service area
for simplex in hull.simplices:
plt.plot(hull_points[simplex, 0], hull_points[simplex, 1],
color=colors[center_id], alpha=0.3)
# Plot customers and centers
plt.scatter(assigned_customers[:, 0], assigned_customers[:, 1],
c=colors[center_id], alpha=0.6, s=30, label=f'Center {center_id+1} customers')
# Plot delivery centers
plt.scatter(delivery_centers[:, 0], delivery_centers[:, 1],
c='black', s=200, marker='s', label='Delivery Centers', edgecolors='white', linewidth=2)
plt.xlabel('X coordinate (km)')
plt.ylabel('Y coordinate (km)')
plt.title('Delivery Service Areas with Convex Hull Boundaries')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Calculate some useful statistics
for center_id in range(len(delivery_centers)):
assigned_customers = customers[customer_assignments == center_id]
if len(assigned_customers) > 0:
avg_distance = np.mean([distance.euclidean(customer, delivery_centers[center_id])
for customer in assigned_customers])
print(f"Center {center_id+1}: {len(assigned_customers)} customers, avg distance {avg_distance:.2f} km")
This example shows how spatial tools work together in practice. We use KDTree for fast assignment, convex hulls to visualize service areas, and distance calculations for optimization.
What Should You Learn Next About scipy.spatial?
Scipy.spatial gives you powerful tools to solve spatial problems efficiently. Here’s what you’ve learned:
- Distance calculations for measuring relationships between points
- KDTree for lightning-fast nearest neighbor searches in large datasets
- Convex hulls for finding boundaries and outlining regions
- Real-world applications that combine multiple spatial techniques
These tools will save you time whether you’re building recommendation systems, analyzing geographic data, or optimizing logistics. Start with simple distance calculations, then add KDTree when you need speed, and use convex hulls when you need boundaries. The combination is surprisingly powerful for most spatial computing needs.