Back .get_text() method — explained (Python / BeautifulSoup) 01 Feb, 2026

.get_text() method — explained simply (Python / BeautifulSoup)

The .get_text() method is mainly used in BeautifulSoup to extract only the readable text from an HTML element, removing all HTML tags.


📌 Why .get_text() is used

When you scrape a webpage, you usually get HTML like this:

<p>Hello <b>World</b>!</p>

But you often only want the text, not the tags.

👉 .get_text() does exactly that.


🔹 Basic Syntax

tag.get_text()

🔹 Simple Example

from bs4 import BeautifulSoup

html = "<p>Hello <b>World</b>!</p>"
soup = BeautifulSoup(html, "html.parser")

text = soup.p.get_text()
print(text)

✅ Output

Hello World!

🔹 Using .get_text(strip=True)

Removes extra spaces and newlines.

text = soup.p.get_text(strip=True)

Output

HelloWorld!

⚠️ Notice spaces are removed — sometimes this is not desired.


🔹 Adding a Separator (VERY IMPORTANT)

To preserve spacing between text inside tags:

text = soup.p.get_text(separator=" ", strip=True)
print(text)

✅ Output

Hello World!

👉 Best practice while scraping.


🔹 Real Web-Scraping Example (Flipkart / Amazon style)

review = soup.find("div", class_="review-text")
clean_review = review.get_text(separator=" ", strip=True)

This:

  • Removes <br>, <span>, <div>

  • Keeps readable sentences

  • Returns clean review text


🔹 .text vs .get_text()

Feature.text.get_text()
Extract text
Separator control
Strip spaces
Recommended

Example:

tag.text
tag.get_text(separator=" ", strip=True)

👉 Always prefer .get_text()


🔹 When .get_text() is NOT needed

If data is already plain text:

json_data["title"]

No HTML → no need for .get_text().


🧠 One-line memory trick

HTML in → Clean text out = .get_text()