REGEXP_INSTR: Powerful Patterns for SQL Queries


Introduction

We continue the series of articles on database features to often ignored with:

The Oracle REGEXP_INSTR function which is one of the most useful regular expression functions in SQL and PL/SQL. It helps you find the position of a pattern inside a string using regular expressions.

What is

Unlike the classic INSTR function, REGEXP_INSTR understands powerful pattern matching rules such as:
• Digits
• Email patterns
• Phone numbers
• Dates
• Multiple spaces
• Special characters
• Repeated words
• Word boundaries
• Case-insensitive matching

Typical use cases

It is extremely useful in:
• Data validation
• ETL pipelines
• Data cleansing
• Log analysis
• Audit systems
• Customer data processing
• Parsing semi-structured text

Syntax of REGEXP_INSTR

REGEXP_INSTR(
source_string,
pattern,
position,
occurrence,
return_option,
match_parameter,
subexpr
)

Parameters Explained

ParameterDescription
source_stringThe text to search
patternRegular expression pattern
positionStarting position (default = 1)
occurrenceWhich occurrence to find
return_option0 = start position, 1 = end position
match_parameterMatching behavior (i, c, n, m)
subexprReturns position of capturing group

Difference Between INSTR and REGEXP_INSTR

Using INSTR

Works only for exact text:

SELECT INSTR('abc123xyz', '123')
FROM dual;

Using REGEXP_INSTR

SELECT REGEXP_INSTR('abc123xyz', '[0-9]+')
FROM dual;

This works for any numeric sequence!

Basic Example

Find Position of First Digit

SELECT REGEXP_INSTR('OrderID: A12345', '[0-9]')
FROM dual;
Output:
11
Explanation:
The first digit 1 starts at position 11.

Real-Life Example — Validate Email Structure

Suppose a customer table contains email addresses.

Sample Data

CREATE TABLE customers (
customer_id NUMBER,
email VARCHAR2(100)
);
INSERT INTO customers VALUES (1, 'john.doe@gmail.com');
INSERT INTO customers VALUES (2, 'invalid-email');
INSERT INTO customers VALUES (3, 'sales@company.org');

Find Position of @ Symbol

SELECT
email,
REGEXP_INSTR(email, '@')
FROM customers;

Result:

EMAILPOSITION
john.doe@gmail.com9
invalid-email0
sales@company.org6

Real-Life Example: Detect Phone Numbers

Suppose logs contain customer notes.

CREATE TABLE support_notes (
note_text VARCHAR2(200)
);
Table SUPPORT_NOTES created.
INSERT INTO support_notes VALUES
('Customer called from 9876543210 regarding invoice');
INSERT INTO support_notes VALUES
('No contact number available');

Find 10-Digit Phone Number Position

SELECT
note_text,
REGEXP_INSTR(note_text, '[0-9]{10}')
FROM support_notes;

Real-Life Example: Extract Multiple Occurrences

Suppose a text contains several numbers.

SELECT REGEXP_INSTR(
'Invoice 100 Amount 250 Tax 30',
'[0-9]+',
1,
2
)
FROM dual;

Output:

20

Explanation:

Because Oracle returns the starting position of the second numeric match.

Understanding Occurrence Parameter

Syntax

REGEXP_INSTR(text, pattern, start, occurrence)

Example

SELECT
REGEXP_INSTR('cat dog fox dog lion', 'dog', 1, 2)
FROM dual;

Output:

13

Because the second dog begins at position 13.

Using return_option

This parameter decides whether Oracle returns:

ValueMeaning
0Start position
1Position after match ends

Example

SELECT
REGEXP_INSTR(
'ABC123XYZ',
'[0-9]+',
1,
1,
0
) AS start_pos,
REGEXP_INSTR(
'ABC123XYZ',
'[0-9]+',
1,
1,
1
) AS end_pos
FROM dual;

Result:

START_POSEND_POS
47

Case-Insensitive Search

Without Case-Insensitive Matching

SELECT REGEXP_INSTR('Oracle Database', 'database')
FROM dual;

Output:

0

With i

Match Parameter

SELECT REGEXP_INSTR(
'Oracle Database',
'database',
1,
1,
0,
'i'
)
FROM dual;

Output:

8

Real-Life Example: Detect Invalid Characters

Suppose usernames should contain only:
• Letters
• Numbers
• Underscores

Sample Data

CREATE TABLE app_users (
username VARCHAR2(50)
);
Table APP_USERS created.
INSERT INTO app_users VALUES ('john_123');
INSERT INTO app_users VALUES ('mike#admin');
INSERT INTO app_users VALUES ('sarah.smith');

Detect Invalid Characters

SELECT
username,
REGEXP_INSTR(username, '[^a-zA-Z0-9_]')
FROM app_users;
SELECT
username,
REGEXP_INSTR(username, '[^a-zA-Z0-9_]')
FROM app_users;

Result:

USERNAMEPOSITION
john_1230
mike#admin5
sarah.smith6

Find Consecutive Spaces

Very useful in data cleansing.

SELECT REGEXP_INSTR(
'Oracle Database',
'( ){2,}'
)
FROM dual;

Output:

7

Real-Life Example: Parse Log Files

Suppose application logs contain timestamps.

CREATE TABLE app_logs (
log_text VARCHAR2(500)
);
INSERT INTO app_logs VALUES
('ERROR at 2025-04-15 10:45:22 connection timeout');

Detect Timestamp Position

SELECT
REGEXP_INSTR(
log_text,
'[0-9]{4}-[0-9]{2}-[0-9]{2}'
)
FROM app_logs;

Using Capturing Groups (subexpr)

One of the most advanced features.

A capturing group is a part of the regular expression enclosed in parentheses (). Example:

([0-9]{3})-([0-9]{4})
SELECT REGEXP_INSTR(
'Phone: (123) 456-7890',
'(\([0-9]{3}\)) ([0-9]{3})-([0-9]{4})',
1,
1,
0,
NULL,
2
)
FROM dual;

output:

14

Explanation:
Returns position of second capturing group (456).

Match Parameters Explained

ParameterMeaning
iCase-insensitive
cCase-sensitive
nDot matches newline
mMulti-line matching

Real-Life Example: Find HTML Tags

Suppose HTML content is stored in database.

SELECT REGEXP_INSTR(
'<div><h1>Oracle</h1></div>',
'<[^>]+>'
)
FROM dual;

Output:

1

Performance Considerations

Regular expressions are powerful but expensive.

Faster

INSTR(column, 'ABC')

Slower

REGEXP_INSTR(column, '[A-Z]{3}')

Best Practices

Use INSTR when simple search is enough

Avoid regex for exact matches.

Use:
• INSTR → for simple exact searches
• REGEXP_INSTR → for intelligent pattern matching
That balance gives both:
• Better performance
• Cleaner SQL
• Easier maintenance

Avoid Leading Wildcards

Bad:

‘.*abc’

Better:

‘abc’

Pre-Filter Large Datasets

WHERE column_name LIKE '%@%'
--before

REGEXP_INSTR(column_name, '@[a-z]+')

Common Regular Expression Patterns

PatternMeaning
[0-9]Single digit
[0-9]+One or more digits
[A-Z]Uppercase letter
[a-z]Lowercase letter
.Any character
^Start of string
$End of string
\sWhitespace
\wWord character
[^abc]Not a, b, or c

Real Enterprise Use Cases

Banking Systems
• Detect malformed account numbers
• Validate SWIFT codes
• Parse payment messages

Telecom
• Extract phone numbers from logs
• Detect invalid customer IDs

Retail
• Clean product descriptions
• Validate coupon codes

Healthcare
• Detect patient IDs
• Parse diagnosis codes

Data Warehousing
• Standardize incoming text data
• Validate ETL input streams

Common Mistakes

Forgetting Escape Characters

Using Regexp for Everything

Avoid regex when simple SQL functions work.

Ignoring Case Sensitivity

Always specify i if needed.

Summary

REGEXP_INSTR is a highly powerful Oracle SQL function for locating complex text patterns.
It is especially useful for:
• Data validation
• Cleansing
• Parsing
• ETL
• Log processing
• Audit systems
The function becomes extremely valuable when working with:
• Emails
• Phone numbers
• Dates
• Structured text
• Semi-structured logs


Discover more from Radu Pârvu

Subscribe to get the latest posts sent to your email.

, , , , , ,

Leave a Reply

Discover more from Radu Pârvu

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Radu Pârvu

Subscribe now to keep reading and get access to the full archive.

Continue reading