I have been working with text data for years, and I keep coming back to the same tool whenever I need to find or transform strings in Python. That tool is the re module. It is one of those things that once you understand, you start seeing opportunities to use it everywhere – parsing logs, validating user input, cleaning up datasets.
Throughout this tutorial, I want to show you how to use regular expressions in Python to handle a specific problem: matching strings that satisfy certain conditions. I will walk through the key functions in the re module, and then we will build up to a real example that checks whether a string follows a pattern you define.
TLDR
- The
remodule in Python handles all regex operations – import it first - Use
re.search()to find a pattern anywhere in a string,re.match()to check only the start - Use
re.findall()to get all matches at once,re.split()to split on a pattern - Conditional regex like “dashes only in the middle” maps to
^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$ - Always test your patterns with Python before putting them into production
What Is the re Module in Python?
The re module is part of Python’s standard library, so you do not need to install anything extra. It lets you define a pattern – a regular expression – and then use that pattern to search, match, split, or replace text in a string.
Think of regex as a very powerful version of the find-and-replace dialog in a text editor. Instead of searching for an exact word, you can search for a pattern. For example, you could match all email addresses in a document without knowing them in advance.
Installing and Importing re
Description: Import the regex module and verify it works.
import re
print("re module imported successfully")
Explain code: We import the re module with a standard import statement. The print call confirms the import worked.
Output:
re module imported successfully
Core Functions: search, match, findall, split
Before we get into conditional patterns, let me show you the four functions I use most often from the re module.
re.search() – Find a Pattern Anywhere
Description: Search for a word inside a longer string.
import re
text = "Python is a popular programming language used in AI and data science"
pattern = "programming"
match = re.search(pattern, text)
if match:
print(f"Found '{match.group()}' at position {match.start()} to {match.end()}")
else:
print("No match found")
Explain code: re.search() scans through the string and returns a match object the first time the pattern appears. If the pattern is not found, it returns None. We check for a valid match object before trying to access the matched text.
Output:
Found 'programming' at position 12 to 23
re.match() – Check Only the Beginning
Description: Check if a string starts with a specific pattern.
import re
langs = ["Python", "Java", "Pythonista", "Cython"]
for lang in langs:
match = re.match(r"Python", lang)
if match:
print(f"'{lang}' starts with Python")
else:
print(f"'{lang}' does not start with Python")
Explain code: re.match() only looks at the beginning of the string. “Python” matches itself. “Pythonista” also starts with “Python” so it matches. “Java” and “Cython” do not start with “Python” so they return None.
Output:
'Python' starts with Python
'Java' does not start with Python
'Pythonista' starts with Python
'Cython' does not start with Python
re.findall() – Get All Matches
Description: Find all occurrences of a pattern in a string.
import re
text = "Python is great. Python is readable. Python is popular."
matches = re.findall(r"Python", text)
print(f"Found {len(matches)} occurrences of 'Python':")
print(matches)
Explain code: re.findall() returns a list of all non-overlapping matches. It is useful when you want to count occurrences or collect all the matched pieces.
Output:
Found 3 occurrences of 'Python':
['Python', 'Python', 'Python']
re.split() – Split on a Pattern
Description: Split a string using a regex pattern as the delimiter.
import re
text = "Python,Java:C++;Ruby;Go"
# Split on commas, colons, or semicolons
parts = re.split(r"[,;:]", text)
print("Split result:", parts)
Explain code: re.split() uses the pattern as a delimiter and returns a list of the pieces in between. Here the pattern [,;:] matches any of those three punctuation marks as a delimiter. We removed the + from the character class so that C++ stays together as a single element.
Output:
Split result: ['Python', 'Java', 'C++', 'Ruby', 'Go']
Building a Regex with Conditional Matching
Now let me show you the real problem this article is about. I want to write a regex that validates a string under a specific condition. Here is the rule:
The string must contain only letters, numbers, and dashes. But dashes cannot appear at the start or the end – they are only allowed in the middle.
So abc-12 is valid, but -abc and abc- are not valid.
Understanding the Pattern
The regex I need is:
^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
Explain code: Breaking this down piece by piece – ^ anchors to the start of the string. [a-zA-Z0-9]+ matches one or more alphanumeric characters. Then (-[a-zA-Z0-9]+)* matches zero or more groups of a dash followed by more alphanumerics. Finally $ anchors to the end. This means the string must start and end with alphanumerics, and dashes can only appear between them.
Testing the Pattern in Python
Description: Test valid and invalid strings against the conditional regex pattern.
import re
pattern = r"^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$"
test_strings = [
"abc-12", # Valid - dash in the middle
"abc-12-def", # Valid - multiple dashes, all in the middle
"12345", # Valid - no dashes at all
"-abc", # Invalid - dash at start
"abc-", # Invalid - dash at end
"abc--def", # Valid - consecutive dashes are fine as long as not at ends
]
for s in test_strings:
result = re.match(pattern, s)
if result:
print(f"'{s}' -> VALID")
else:
print(f"'{s}' -> INVALID")
Explain code: We loop through a set of test strings and apply the pattern to each one. re.match() returns a match object for valid strings and None for invalid ones. Notice that “abc–def” is technically valid under our pattern because the dashes are between alphanumeric characters.
Output:
abc-12 -> VALID
abc-12-def -> VALID
12345 -> VALID
-abc -> INVALID
abc- -> INVALID
abc--def -> VALID
Using re.sub() to Clean Up Strings
Sometimes you do not want to reject invalid strings – you want to fix them. I use re.sub() for this. It replaces the parts of a string that match a pattern with something else.
Description: Remove leading and trailing dashes from a string using re.sub.
import re
def clean_dashes(text):
# Remove dashes from start or end of string
cleaned = re.sub(r"^-+|-+$", "", text)
return cleaned
test = ["-hello-", "--world--", "no-dashes", "---"]
for t in test:
print(f"'{t}' -> '{clean_dashes(t)}'")
Explain code: The pattern ^-+|-+$ uses an alternation (the | symbol) to match either leading dashes at the start or trailing dashes at the end. re.sub() replaces those matched dashes with an empty string, effectively stripping them.
Output:
'-hello-' -> 'hello'
'--world--' -> 'world'
'no-dashes' -> 'no-dashes'
'---' -> ''
Compiling Patterns for Reuse
If you are applying the same pattern many times, compile it first. This is more efficient than passing the pattern string each time.
Description: Compile a regex pattern and reuse it for multiple match operations.
import re
# Compile once
identifier_pattern = re.compile(r"^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$")
# Use the compiled pattern repeatedly
test_ids = ["user-123", "-admin", "root", "v1-0-0-beta"]
for uid in test_ids:
if identifier_pattern.fullmatch(uid):
print(f"'{uid}' is a valid identifier")
else:
print(f"'{uid}' is not a valid identifier")
Explain code: re.compile() creates a compiled pattern object. The method .fullmatch() is stricter than .match() – it requires the entire string to match the pattern, not just the beginning. This is perfect for validating identifiers.
Output:
'user-123' is a valid identifier
'-admin' is not a valid identifier
'root' is a valid identifier
'v1-0-0-beta' is a valid identifier
FAQ
Q: What is the difference between re.search() and re.match()?
re.search() scans through the entire string and returns the first match found at any position. re.match() only checks whether the string begins with the pattern. For validating that a whole string follows a pattern, re.fullmatch() is the best choice because it requires the entire string to match.
Q: What does the + and * quantifier mean in regex?
+ means “one or more” of the preceding element. * means “zero or more”. So [a-zA-Z0-9]+ requires at least one alphanumeric character, while (-[a-zA-Z0-9]+)* allows zero or more occurrences of a dash followed by alphanumerics.
Q: How do I validate an email address with regex in Python?
Email validation is more complex than most people expect because the official specification allows many edge cases. A simple practical pattern is r”^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$”. For production systems, it is better to use a well-tested library rather than a hand-rolled regex.
Q: Why should I use re.compile() instead of passing the pattern string each time?
Compiling a pattern creates an optimized internal representation. If you are using the same pattern in a loop or applying it many times, compile it once and reuse the compiled object. This avoids the overhead of parsing the pattern string on every call.
Q: Can regex handle nested conditional patterns like “dashes only between numbers”?
Simple alternation like “dashes in the middle” is straightforward with regex. More complex conditions like “only between numbers” or “only between letters” require either multiple passes or more advanced constructs like lookahead (?=) and (?!) assertions. Python’s re module supports these.

