How to use nltk sent_tokenize in Python? Get Solution

How to use nltk sent_tokenize in Python

Nltk sent_tokenize tokenize the sentence into the list. The sent_tokenize segment the sentences over various punctuations and complex logics. In this article, We will see the implementation of sent_tokenize with an example.

nltk sent_tokenize stepwise Implementation-

This section will cover those require steps of tokenization.

Step 1:

Firstly In this step, We will import the underline package. Well, sent_tokenize is a part of nltk.tokenize. Let’s import it.

from nltk.tokenize import sent_tokenize
nltk.download('punkt')

This ‘punkt’ is an external package that is required for sentence extraction.

Step 2:

Secondly, Let’s define the text/complete string which we need to tokenize.

input= "Well! Data Science Learner is a place for collaborative learning. What are your views?"

Step 3:

Further, In this step, We will tokenize the input text using sent_tokenize. Here is the code for this.

print(sent_tokenize(input))

It will return the list of sentences. We can iterate it and use them accordingly.

Complete Code –

In this step, We will merge the code from all three steps. Now let’s run them.

from nltk.tokenize import sent_tokenize
nltk.download('punkt')
input = "Well! Data Science Learner is a place for collaborative learning. What are your views?"
print(sent_tokenize(input))
nltk sent_tokenize in Python
nltk sent_tokenize in Python

As we have seen in the above example. We have seen that it split the paragraph into three sentences. The First is “Well! ” because of the “!” punctuation. The second sentence is split because of “.” punctuation. The third is because of the “?”

Note –

In case your system does not have NLTK installed. Please use the below lines.

sudo pip install nltk

But for Python 3.x, You may use these lines.

sudo pip3 install nltk

Conclusion-

In this article, We have seen how can we tokenize a sentence in python. We have used nltk sent_tokenize.  See, There are many ways to tokenize the sentence. The easiest one is to split the sentences based 0n punctuations like “.” etc. But sent_tokenize performs it in a very advanced way. We have given a self-explanatory example. All you need to follow these steps. You may alter the example as per your use case. Still, if you have any queries, please comment below in the comment box.

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner