Back 🎁10 REGEX PATTERNS EVERY SCRAPER MUST KNOW🚀 02 Feb, 2026

🟢 1️⃣ Extract Numbers (Basic but Essential)

\d+

✅ Matches

123
9999

🔥 Use Case

  • Prices

  • Counts

  • IDs

🧠 “Bas number chahiye”


🔵 2️⃣ Extract Decimal Numbers (Ratings, Scores)

\d+\.\d+

✅ Matches

4.5
3.9

🔥 Use Case

  • Product ratings

  • Scores

  • Measurements


🟡 3️⃣ Extract Price (₹, $, €)

[₹$€]\s?\d+(,\d+)*

✅ Matches

₹14,999
$1,299
€999

🔥 Use Case

  • E-commerce scraping


🟠 4️⃣ Extract Total Reviews ⭐

(\d+(,\d+)?)\s+Reviews

✅ Matches

1,234 Reviews
45 Reviews

🔥 Use Case

  • Flipkart / Amazon review counts

🧠 Group 1 = actual number


🔴 5️⃣ Extract Product ID from URL

/p/(itm[0-9A-Za-z]+)

✅ Matches

/p/itmABC123

🔥 Use Case

  • Unique product identification


🟣 6️⃣ Extract URLs

https?://[^\s]+

✅ Matches

https://example.com
http://site.in/page

🔥 Use Case

  • Crawling

  • Link extraction


🟤 7️⃣ Extract Email IDs

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

✅ Matches

support@flipkart.com
info@test.co.in

🔥 Use Case

  • Contact scraping

  • Lead generation


⚫ 8️⃣ Extract Phone Numbers (India)

[6-9]\d{9}

✅ Matches

9876543210

🔥 Use Case

  • Contact pages

  • Customer support data


🔵 9️⃣ Remove HTML Tags (Cleaning Text)

<[^>]+>

✅ Removes

<b>Hello</b>

Result:

Hello

🔥 Use Case

  • Clean scraped content

  • NLP preprocessing


🟢 🔟 Extract Text Between Two Words

start(.*?)end

✅ Matches

start THIS TEXT end

🔥 Use Case

  • Description extraction

  • Section scraping

🧠 .*? = lazy match (very important!)


🎯 MASTER CHEAT SHEET (One Look Memory)

\d+           → numbers
\d+\.\d+      → decimals
₹\d+          → prices
Reviews       → review count
/p/(itm...)   → product ID
https?://     → URLs
@             → emails
[6-9]\d{9}    → Indian phones
<[^>]+>       → remove HTML
(.*?)         → extract between

🧠 PRO SCRAPER TIPS 

✅ Always use raw strings in Python:

re.search(r"\d+", text)

✅ Test on regex101.com
❌ Don’t use regex where HTML parsing is enough
🔥 Combine CSS selectors + Regex for best results