I have been working with text data for years, and I keep coming back to the same tool whenever I need to find or transform strings in Python. That tool is the re module. It is one of those things that once you understand, you start seeing opportunities to use it everywhere – parsing logs, validating user input, cleaning up datasets.

Throughout this tutorial, I want to show you how to use regular expressions in Python to handle a specific problem: matching strings that satisfy certain conditions. I will walk through the key functions in the re module, and then we will build up to a real example that checks whether a string follows a pattern you define.

TLDR

  • The re module in Python handles all regex operations – import it first
  • Use re.search() to find a pattern anywhere in a string, re.match() to check only the start
  • Use re.findall() to get all matches at once, re.split() to split on a pattern
  • Conditional regex like “dashes only in the middle” maps to ^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
  • Always test your patterns with Python before putting them into production

What Is the re Module in Python?

The re module is part of Python’s standard library, so you do not need to install anything extra. It lets you define a pattern – a regular expression – and then use that pattern to search, match, split, or replace text in a string.

Think of regex as a very powerful version of the find-and-replace dialog in a text editor. Instead of searching for an exact word, you can search for a pattern. For example, you could match all email addresses in a document without knowing them in advance.

Installing and Importing re

Description: Import the regex module and verify it works.


import re
print("re module imported successfully")

Explain code: We import the re module with a standard import statement. The print call confirms the import worked.

Output:


re module imported successfully

Core Functions: search, match, findall, split

Before we get into conditional patterns, let me show you the four functions I use most often from the re module.

re.search() – Find a Pattern Anywhere

Description: Search for a word inside a longer string.


import re

text = "Python is a popular programming language used in AI and data science"
pattern = "programming"

match = re.search(pattern, text)
if match:
    print(f"Found '{match.group()}' at position {match.start()} to {match.end()}")
else:
    print("No match found")

Explain code: re.search() scans through the string and returns a match object the first time the pattern appears. If the pattern is not found, it returns None. We check for a valid match object before trying to access the matched text.

Output:


Found 'programming' at position 12 to 23

re.match() – Check Only the Beginning

Description: Check if a string starts with a specific pattern.


import re

langs = ["Python", "Java", "Pythonista", "Cython"]

for lang in langs:
    match = re.match(r"Python", lang)
    if match:
        print(f"'{lang}' starts with Python")
    else:
        print(f"'{lang}' does not start with Python")

Explain code: re.match() only looks at the beginning of the string. “Python” matches itself. “Pythonista” also starts with “Python” so it matches. “Java” and “Cython” do not start with “Python” so they return None.

Output:


'Python' starts with Python
'Java' does not start with Python
'Pythonista' starts with Python
'Cython' does not start with Python

re.findall() – Get All Matches

Description: Find all occurrences of a pattern in a string.


import re

text = "Python is great. Python is readable. Python is popular."
matches = re.findall(r"Python", text)

print(f"Found {len(matches)} occurrences of 'Python':")
print(matches)

Explain code: re.findall() returns a list of all non-overlapping matches. It is useful when you want to count occurrences or collect all the matched pieces.

Output:


Found 3 occurrences of 'Python':
['Python', 'Python', 'Python']

re.split() – Split on a Pattern

Description: Split a string using a regex pattern as the delimiter.


import re

text = "Python,Java:C++;Ruby;Go"
# Split on commas, colons, or semicolons
parts = re.split(r"[,;:]", text)

print("Split result:", parts)

Explain code: re.split() uses the pattern as a delimiter and returns a list of the pieces in between. Here the pattern [,;:] matches any of those three punctuation marks as a delimiter. We removed the + from the character class so that C++ stays together as a single element.

Output:


Split result: ['Python', 'Java', 'C++', 'Ruby', 'Go']

Building a Regex with Conditional Matching

Now let me show you the real problem this article is about. I want to write a regex that validates a string under a specific condition. Here is the rule:

The string must contain only letters, numbers, and dashes. But dashes cannot appear at the start or the end – they are only allowed in the middle.

So abc-12 is valid, but -abc and abc- are not valid.

Understanding the Pattern

The regex I need is:


^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

Explain code: Breaking this down piece by piece – ^ anchors to the start of the string. [a-zA-Z0-9]+ matches one or more alphanumeric characters. Then (-[a-zA-Z0-9]+)* matches zero or more groups of a dash followed by more alphanumerics. Finally $ anchors to the end. This means the string must start and end with alphanumerics, and dashes can only appear between them.

Testing the Pattern in Python

Description: Test valid and invalid strings against the conditional regex pattern.


import re

pattern = r"^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$"

test_strings = [
    "abc-12",        # Valid - dash in the middle
    "abc-12-def",    # Valid - multiple dashes, all in the middle
    "12345",         # Valid - no dashes at all
    "-abc",          # Invalid - dash at start
    "abc-",          # Invalid - dash at end
    "abc--def",      # Valid - consecutive dashes are fine as long as not at ends
]

for s in test_strings:
    result = re.match(pattern, s)
    if result:
        print(f"'{s}' -> VALID")
    else:
        print(f"'{s}' -> INVALID")

Explain code: We loop through a set of test strings and apply the pattern to each one. re.match() returns a match object for valid strings and None for invalid ones. Notice that “abc–def” is technically valid under our pattern because the dashes are between alphanumeric characters.

Output:


abc-12 -> VALID
abc-12-def -> VALID
12345 -> VALID
-abc -> INVALID
abc- -> INVALID
abc--def -> VALID

Using re.sub() to Clean Up Strings

Sometimes you do not want to reject invalid strings – you want to fix them. I use re.sub() for this. It replaces the parts of a string that match a pattern with something else.

Description: Remove leading and trailing dashes from a string using re.sub.


import re

def clean_dashes(text):
    # Remove dashes from start or end of string
    cleaned = re.sub(r"^-+|-+$", "", text)
    return cleaned

test = ["-hello-", "--world--", "no-dashes", "---"]
for t in test:
    print(f"'{t}' -> '{clean_dashes(t)}'")

Explain code: The pattern ^-+|-+$ uses an alternation (the | symbol) to match either leading dashes at the start or trailing dashes at the end. re.sub() replaces those matched dashes with an empty string, effectively stripping them.

Output:


'-hello-' -> 'hello'
'--world--' -> 'world'
'no-dashes' -> 'no-dashes'
'---' -> ''

Compiling Patterns for Reuse

If you are applying the same pattern many times, compile it first. This is more efficient than passing the pattern string each time.

Description: Compile a regex pattern and reuse it for multiple match operations.


import re

# Compile once
identifier_pattern = re.compile(r"^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$")

# Use the compiled pattern repeatedly
test_ids = ["user-123", "-admin", "root", "v1-0-0-beta"]

for uid in test_ids:
    if identifier_pattern.fullmatch(uid):
        print(f"'{uid}' is a valid identifier")
    else:
        print(f"'{uid}' is not a valid identifier")

Explain code: re.compile() creates a compiled pattern object. The method .fullmatch() is stricter than .match() – it requires the entire string to match the pattern, not just the beginning. This is perfect for validating identifiers.

Output:


'user-123' is a valid identifier
'-admin' is not a valid identifier
'root' is a valid identifier
'v1-0-0-beta' is a valid identifier

FAQ

Q: What is the difference between re.search() and re.match()?

re.search() scans through the entire string and returns the first match found at any position. re.match() only checks whether the string begins with the pattern. For validating that a whole string follows a pattern, re.fullmatch() is the best choice because it requires the entire string to match.

Q: What does the + and * quantifier mean in regex?

+ means “one or more” of the preceding element. * means “zero or more”. So [a-zA-Z0-9]+ requires at least one alphanumeric character, while (-[a-zA-Z0-9]+)* allows zero or more occurrences of a dash followed by alphanumerics.

Q: How do I validate an email address with regex in Python?

Email validation is more complex than most people expect because the official specification allows many edge cases. A simple practical pattern is r”^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$”. For production systems, it is better to use a well-tested library rather than a hand-rolled regex.

Q: Why should I use re.compile() instead of passing the pattern string each time?

Compiling a pattern creates an optimized internal representation. If you are using the same pattern in a loop or applying it many times, compile it once and reuse the compiled object. This avoids the overhead of parsing the pattern string on every call.

Q: Can regex handle nested conditional patterns like “dashes only between numbers”?

Simple alternation like “dashes in the middle” is straightforward with regex. More complex conditions like “only between numbers” or “only between letters” require either multiple passes or more advanced constructs like lookahead (?=) and (?!) assertions. Python’s re module supports these.

Share.
Leave A Reply