\d+
123
9999
Prices
Counts
IDs
🧠 “Bas number chahiye”
\d+\.\d+
4.5
3.9
Product ratings
Scores
Measurements
[₹$€]\s?\d+(,\d+)*
₹14,999
$1,299
€999
E-commerce scraping
(\d+(,\d+)?)\s+Reviews
1,234 Reviews
45 Reviews
Flipkart / Amazon review counts
🧠 Group 1 = actual number
/p/(itm[0-9A-Za-z]+)
/p/itmABC123
Unique product identification
https?://[^\s]+
https://example.com
http://site.in/page
Crawling
Link extraction
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
support@flipkart.com
info@test.co.in
Contact scraping
Lead generation
[6-9]\d{9}
9876543210
Contact pages
Customer support data
<[^>]+>
<b>Hello</b>
Result:
Hello
Clean scraped content
NLP preprocessing
start(.*?)end
start THIS TEXT end
Description extraction
Section scraping
🧠 .*? = lazy match (very important!)
\d+ → numbers
\d+\.\d+ → decimals
₹\d+ → prices
Reviews → review count
/p/(itm...) → product ID
https?:// → URLs
@ → emails
[6-9]\d{9} → Indian phones
<[^>]+> → remove HTML
(.*?) → extract between
✅ Always use raw strings in Python:
re.search(r"\d+", text)
✅ Test on regex101.com
❌ Don’t use regex where HTML parsing is enough
🔥 Combine CSS selectors + Regex for best results