Part of speech is really useful in every aspect of Machine Learning, Text Analytics, and NLP. This article will help you in part of speech tagging using NLTK python.NLTK provides a good interface for POS tagging. So let’s understand how –
Part of Speech Tagging using NLTK Python-
Step 1 –
This is a prerequisite step. In this step, we install NLTK module in Python. Here is the following code –
pip install nltk # install using the pip package manager
import nltk nltk.download('averaged_perceptron_tagger')
The above line will install and download the respective corpus etc.
Step 2 –
Here we will again start the real coding part. Lets import –
from nltk import pos_tag
Step 3 –
Let’s take the string on which we want to perform POS tagging. We will also convert it into tokens . Lets checkout the code –
data ="Data Science Learner is an easy way to learn data science" data_token =data.split()
Step 4 –
This is a step we will convert the token list to POS tagging. If we refer the above lines of code then we have already obtained a data_token list by splitting the data string. Let’s check out further –
data_tokens_tag = pos_tag(data_token) print(data_tokens_tag )
Let’s see the complete code and its output here –
Here you can see we have extracted the POS tagger for each token in the user string.
Well ! if you look the second line – nltk.download(‘averaged_perceptron_tagger’) , Here we have to define exactly which package we really need to download from the NLTK package. Because usually what people do is that they download the complete NLTK corpus. This increases the space complexity as well as time complexity unnecessary.
Now Few words for the NLP libraries. NLTK is one of the good options for text processing but there are few more like Spacy, gensim, etc . Here is the complete article for Best Python NLP libraries , You check it out.
Data Science Learner Team