Back 🎁10 REGEX PATTERNS EVERY SCRAPER MUST KNOW🚀 02 Feb, 2026

🟢 1️⃣ Extract Numbers (Basic but Essential)

\d+

✅ Matches

123
9999

🔥 Use Case

Prices
Counts
IDs

🧠 “Bas number chahiye”

🔵 2️⃣ Extract Decimal Numbers (Ratings, Scores)

\d+\.\d+

✅ Matches

4.5
3.9

🔥 Use Case

Product ratings
Scores
Measurements

🟡 3️⃣ Extract Price (₹, $, €)

[₹$€]\s?\d+(,\d+)*

✅ Matches

₹14,999
$1,299
€999

🔥 Use Case

E-commerce scraping

🟠 4️⃣ Extract Total Reviews ⭐

(\d+(,\d+)?)\s+Reviews

✅ Matches

1,234 Reviews
45 Reviews

🔥 Use Case

Flipkart / Amazon review counts

🧠 Group 1 = actual number

🔴 5️⃣ Extract Product ID from URL

/p/(itm[0-9A-Za-z]+)

✅ Matches

/p/itmABC123

🔥 Use Case

Unique product identification

🟣 6️⃣ Extract URLs

https?://[^\s]+

✅ Matches

https://example.com
http://site.in/page

🔥 Use Case

Crawling
Link extraction

🟤 7️⃣ Extract Email IDs

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

✅ Matches

support@flipkart.com
info@test.co.in

🔥 Use Case

Contact scraping
Lead generation

⚫ 8️⃣ Extract Phone Numbers (India)

[6-9]\d{9}

✅ Matches

9876543210

🔥 Use Case

Contact pages
Customer support data

🔵 9️⃣ Remove HTML Tags (Cleaning Text)

<[^>]+>

✅ Removes

<b>Hello</b>

Result:

Hello

🔥 Use Case

Clean scraped content
NLP preprocessing

🟢 🔟 Extract Text Between Two Words

start(.*?)end

✅ Matches

start THIS TEXT end

🔥 Use Case

Description extraction
Section scraping

🧠 .*? = lazy match (very important!)

🎯 MASTER CHEAT SHEET (One Look Memory)

\d+           → numbers
\d+\.\d+      → decimals
₹\d+          → prices
Reviews       → review count
/p/(itm...)   → product ID
https?://     → URLs
@             → emails
[6-9]\d{9}    → Indian phones
<[^>]+>       → remove HTML
(.*?)         → extract between

🧠 PRO SCRAPER TIPS

✅ Always use raw strings in Python:

re.search(r"\d+", text)

✅ Test on regex101.com
❌ Don’t use regex where HTML parsing is enough
🔥 Combine CSS selectors + Regex for best results