Tokenization
Text classification
Stemming and Lemmatization
POS tagging
Parsing and Syntax Tree generation
NER (Named Entity Recognition)
Language modeling
Semantic reasoning
Feature | Description |
---|---|
nltk.tokenize |
Break text into words/sentences |
nltk.corpus |
Access large text collections like Gutenberg, Reuters |
nltk.stem |
Includes stemmers like Porter, Lancaster |
nltk.pos_tag() |
Tags parts of speech (e.g., noun, verb) |
nltk.ne_chunk() |
Identifies named entities |
nltk.FreqDist() |
Frequency distribution of words |
nltk.sentiment |
Built-in sentiment analysis tools |
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
# Download required data
nltk.download('punkt')
nltk.download('stopwords')
# Sample text
text = "Natural Language Processing is an exciting field of Artificial Intelligence."
# Tokenization
tokens = word_tokenize(text)
# Stopword removal
stop_words = set(stopwords.words('english'))
filtered = [word for word in tokens if word.lower() not in stop_words]
# Stemming
stemmer = PorterStemmer()
stemmed = [stemmer.stem(word) for word in filtered]
print("Original:", tokens)
print("Filtered:", filtered)
print("Stemmed:", stemmed)
When you want control and transparency in NLP tasks
For educational purposes
For quick prototyping with classical NLP
For production-level or large-scale NLP, consider using spaCy or transformers which are more optimized.