I ran into this problem the hard way during a data pipeline project – I needed to duplicate a large NumPy array for processing, and my first attempt with the assignment operator left me with two variables pointing at the same memory. After that, I made sure I understood every way NumPy lets you copy arrays. This article covers all of them, with their tradeoffs.
NumPy is the foundational library for numerical computing in Python. Copying arrays shows up everywhere – from machine learning feature engineering to image processing pipelines. This article walks through each copy method, when to use which one, and how to avoid the most common pitfalls along the way.
TLDR
- np.copy() creates a new array with its own memory – a true deep copy
- The assignment operator (=) only creates a view, not a copy – both variables share the same underlying data
- np.empty_like() allocates new memory then copies values – useful when you need to pre-allocate
- Use a view when you want to work on a subset without copying, use copy when you need independent data
- np.shares_memory() checks whether two arrays point to the same memory block
What Is a NumPy Array?
A NumPy array is a grid of values, all of the same type, indexed by a tuple of integers. Unlike Python lists, arrays are stored in contiguous memory blocks, which makes numerical operations orders of magnitude faster. The library is the backbone of scientific Python – pandas, scikit-learn, PyTorch, and TensorFlow all build on NumPy arrays internally. The Python NumPy module article covers the basics in more detail.
Creating an array is straightforward. The np.array() function takes a Python list and converts it into a typed array with an associated dtype.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("Array:", arr)
print("Type:", type(arr))
print("Dtype:", arr.dtype)
Array: [1 2 3 4 5]
Type:
Dtype: int64
Copy vs View – The Fundamental Difference
Before diving into specific functions, it helps to understand what NumPy actually does when you “copy” an array. A view is a window into the same underlying memory block – modifying it changes the original. A copy is a completely new array with its own memory. The assignment operator (=) creates a view, not a copy. This catches almost everyone at first.
import numpy as np
original = np.array([10, 20, 30, 40, 50])
print("Original:", original)
# Assignment creates a VIEW - same memory
copy_view = original
copy_view[0] = 999
print("After modifying copy_view:")
print("original:", original)
print("copy_view:", copy_view)
Original: [ 10 20 30 40 50]
After modifying copy_view:
original: [999 20 30 40 50]
copy_view: [999 20 30 40 50]
Both variables point to the same data. Change one, and the other changes too. For most production pipelines, this is not what you want.
np.copy() – The True Copy
np.copy() creates a new array with its own memory block. Changes to the new array have no effect on the original. This is the method to reach for when you need true independence between two arrays.
import numpy as np
original = np.array([1.63, 7.92, 5.46, 66.8, 7.89,
3.33, 6.56, 50.60, 100.11])
print("Original array:")
print(original)
copied = np.copy(original)
copied[0] = 0.0
print("\nCopied array after modification:")
print(copied)
print("\nOriginal array (unchanged):")
print(original)
Original array:
[ 1.63 7.92 5.46 66.8 7.89 3.33 6.56 50.6 100.11]
Copied array after modification:
[ 0. 7.92 5.46 66.8 7.89 3.33 6.56 50.6 100.11]
Original array (unchanged):
[ 1.63 7.92 5.46 66.8 7.89 3.33 6.56 50.6 100.11]
The syntax is straightforward: numpy.copy(a, order=’K’). The order parameter controls how the memory is laid out – ‘K’ keeps the original layout, ‘A’ preserves Fortran ordering, ‘C’ uses row-major, and ‘F’ uses column-major. For most use cases, the default ‘K’ is fine.
np.empty_like() – Pre-allocated Copy
np.empty_like() creates a new array with the same shape and dtype as the input, but does not initialize the values. You then copy data into it. The performance advantage over np.copy() is marginal for small arrays, but for large arrays in memory-constrained environments, pre-allocating with empty_like and then filling can be slightly more efficient than the overhead of np.copy().
import numpy as np
source = np.array([34, 65, 11, 66, 80, 630, 50])
print("Source array:")
print(source)
# Pre-allocate a new array with same shape and dtype
destination = np.empty_like(source)
# Copy data into the pre-allocated array
destination[:] = source
print("Destination array:")
print(destination)
Source array:
[ 34 65 11 66 80 630 50]
Destination array:
[ 34 65 11 66 80 630 50]
One thing to watch: np.empty_like() does not guarantee that the allocated memory is zeroed out. The array contains whatever bits were already in that memory location. Always explicitly copy data into it before using the array values for anything.
Slicing Creates Views
Array slicing in NumPy returns a view, not a copy. The slice is a window into the same memory block as the original. This is fast and memory-efficient for read operations, but modifying a slice modifies the original array. The same applies to NumPy array indexing with boolean masks and integer arrays. This behavior is consistent across all array operations in Python that return subsets.
import numpy as np
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Slicing creates a VIEW
subset = data[2:7]
print("Slice:", subset)
subset[0] = 100
print("Original after modifying slice:", data)
Slice: [3 4 5 6 7]
Original after modifying slice: [ 1 2 100 4 5 6 7 8 9 10]
To get an independent copy of a slice, chain the copy call: data[2:7].copy().
Copying 2D Arrays and Checking Memory Sharing
The copy methods above work identically for arrays of any dimension. 2D array copying comes up frequently in image processing – you might extract a channel or a region and need to modify it independently. When converting image data to arrays, the convert images to NumPy arrays guide walks through the full pipeline. For creating train-test splits from the same source array, split data into training and testing sets is a common use case.
import numpy as np
matrix = np.array([[100, 55, 66, 44, 77],
[22, 88, 11, 33, 99]])
print("Matrix:")
print(matrix)
# Assignment creates a VIEW
view_copy = matrix
view_copy[0, 0] = 0
# True copy with np.copy()
true_copy = np.copy(matrix)
true_copy[0, 0] = -1
print("\nMatrix after both modifications:")
print("view_copy (shares memory):", view_copy[0, 0])
print("true_copy (independent):", true_copy[0, 0])
print("Original matrix:", matrix[0, 0])
Matrix:
[[100 55 66 44 77]
[ 22 88 11 33 99]]
Matrix after both modifications:
view_copy (shares memory): 0
true_copy (independent): -1
Original matrix: 0
Notice how the original matrix was modified when we changed view_copy. The true_copy remained independent throughout.
When debugging aliasing issues or verifying that a copy is truly independent, np.shares_memory() tells you whether two arrays reference any overlapping memory region.
import numpy as np
a = np.array([1, 2, 3])
b = a # view - same memory
c = np.copy(a) # true copy - different memory
print("shares_memory(a, b):", np.shares_memory(a, b))
print("shares_memory(a, c):", np.shares_memory(a, c))
shares_memory(a, b): True
shares_memory(a, c): False
FAQ
Which method should I use by default?
np.copy() is the default choice when you need an independent array. It is explicit, readable, and does exactly what the name says. The assignment operator (=) should only be used when you intentionally want variable aliasing – both names pointing at the same object. np.empty_like() is a niche tool for performance-sensitive code where you need to pre-allocate and fill, or when you are building array containers in library code.
Does np.copy() also copy the dtype and shape?
Yes. np.copy() creates an array with the same shape, dtype, and memory layout as the original. The new array is completely independent in every way except for the initial values, which are copied element by element.
What happens to the original array when I modify a view?
Both the view and the original share the same underlying memory block. Any modification to either the view or the original immediately reflects in both. This is the intended behavior for views – it is a performance feature, not a bug.
Is np.copy() slower than the assignment operator?
Yes, because np.copy() allocates new memory and copies every element. The assignment operator does not copy anything – it just creates a new reference to the same memory. For large arrays, np.copy() can be expensive in both time and memory. Only pay this cost when independence is genuinely needed.
How do I check if two arrays share memory?
Use np.shares_memory(a, b). It returns True if the two arrays reference any overlapping memory region, and False otherwise. This is useful for debugging subtle aliasing bugs in complex pipelines.
Does np.empty_like() zero out the new array?
No. np.empty_like() allocates uninitialized memory. The array contains whatever data was already at those memory addresses. Always explicitly copy data into it before reading values from it. Use np.zeros_like() or np.ones_like() if initialized arrays are needed.
Copying arrays correctly is one of those core skills that pays dividends across every area of scientific Python. Once the view-vs-copy distinction is clear, the rest follows naturally.

