The bug that took 6 hours to trace
I had a regex for validating phone numbers that I thought was correct. It had been in production for two months. Then a user with a phone number starting with +212 (Morocco) couldn't submit a form. The regex was /^\+?[1-9]\d{9,14}$/. It rejected +212XXXXXXXXX because 12 digits is one more than my range allowed for numbers with a country code. It had been silently rejecting every Moroccan phone number since deployment.
The fix was one character. The problem was I had written the regex in isolation without testing edge cases. I had tested it on US and UK numbers, which fell within the length range. Moroccan numbers with the country code are 12 digits — just over the limit I had assumed was correct.
Why regex is uniquely risky
Most code errors announce themselves. A null reference throws an exception. A wrong function name fails at parse time. A typo in a variable breaks the build.
Regex errors are silent. A pattern that is slightly wrong may match 99% of inputs correctly and fail only on edge cases that your test data doesn't include — international phone formats, special characters in email addresses, Unicode characters in usernames, file paths with spaces, URLs with query strings. These cases show up in production from real users, not in unit tests written by the developer who designed the pattern.
The five-step workflow
Step 1: Write the happy path cases first
Before writing the regex, write out 5–10 examples of strings that SHOULD match. For a phone number validator:
+1 555 123 4567+44 20 7946 0958+212 600 123456555-123-4567(555) 123-4567
This forces you to think about the scope of the problem before you start writing symbols. Most regex bugs come from writers who start with the syntax and forget a format variant.
Step 2: Write the rejection cases
Write 5–10 examples that should NOT match and explain why each one fails:
not-a-phone— obviously invalid12345— too short+0 555 123 4567— leading zero in country code555 123 4567 ext 42— extension not in scope++1 555 123 4567— double plus
Step 3: Write the edge cases that could go either way
The edge cases are where most bugs live. Document your decision for each one:
+1(555)1234567— no spaces, no separators: should this match?+1 555 123 45 67— unusual grouping: yes or no?555.123.4567— period separators: in scope?
Deciding edge cases before writing the regex prevents you from accidentally writing a pattern that handles them one way when you intended the other.
Step 4: Test in the regex tester with all three categories
Open the regex tester and paste all three categories of test strings. Check:
- Every happy path string matches (green)
- Every rejection string does not match (no highlight)
- Edge cases behave according to your documented decision
If any happy path string fails, your regex is wrong. If any rejection string matches, your regex is too permissive.
Step 5: Test for catastrophic backtracking
Regex engines can be exponentially slow on certain patterns when given malicious or unexpected input. This is called ReDoS (Regular Expression Denial of Service). Patterns with nested quantifiers are particularly vulnerable: (a+)+, (a*)*, ([a-zA-Z]+)*.
Test your regex with a long string of characters that partially match but ultimately fail. For example, if your email regex uses [a-zA-Z0-9._%+-]+ for the local part, test it with a 50-character string that ends with an invalid character (e.g., 50 letters followed by a space). If the browser pauses or the test takes more than 100ms, you have a backtracking problem.
The five regex patterns I use most often (and their gotchas)
Email validation
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
Gotcha: This rejects technically valid email addresses with quotes or IP addresses in the domain ("user name"@example.com, user@[127.0.0.1]). For 99% of web forms, these are acceptable false negatives. For systems that must accept any RFC 5321-compliant address, use a dedicated email parsing library instead.
URL validation (permissive)
/^https?:\/\/[^\s/$.?#].[^\s]*$/
Gotcha: This accepts malformed URLs that browsers would reject. For form validation where you want to catch obvious mistakes, it's fine. For systems that will fetch the URL, use new URL(input) in JavaScript — it throws on invalid URLs and is maintained by the browser engine.
Slug validation (URL-safe strings)
/^[a-z0-9]+(?:-[a-z0-9]+)*$/
Gotcha: Does not allow leading or trailing hyphens, does not allow consecutive hyphens. This is intentional for URL slugs. If you need to allow consecutive hyphens (e.g., for CSS class names), use /^[a-z0-9-]+$/ instead.
IPv4 address
/^(25[0-5]|2[0-4]\d|1?\d{1,2})(\.(25[0-5]|2[0-4]\d|1?\d{1,2})){3}$/
Gotcha: Most simple IPv4 patterns accept invalid octets like 999. This pattern validates the numeric range (0–255) per octet. It does not validate that the IP is routable or that it's not a reserved range.
Regex flags to know
JavaScript regex supports flags that fundamentally change behavior:
g— global: find all matches, not just the first.i— case-insensitive:/abc/imatches ABC, abc, Abc.m— multiline:^and$match start/end of each line, not just the full string.s— dotAll:.matches newlines (by default it doesn't).u— Unicode: enables proper handling of Unicode code points above U+FFFF.
The m flag is a common source of bugs: a pattern with ^ and $ designed to validate a single-line string will pass multiline input if the m flag is on, because ^ matches the start of any line, not the start of the full string.
Related tools
- Regex Tester — test regex patterns against multiple strings with live highlighting and match details.
Written by Achraf A., founder of TheFreeAITools — built in Morocco. The phone number bug described above happened in November 2024; the fix took 30 seconds once traced.