How to use custom stopwords python NLP ? Lets add stopwords python

How to use Custom Stopwords in Python NLP

There are so many ways to extract or remove stop words from Text. But these stop words are pre-defined in NLP libraries. Now if you are searching How to use custom stopwords python NLP, You will get the answer step by step in this article. Let’s Add stopwords python-

1. Create a custom stopwords python NLP –

It will be a simple list of words(string) which you will consider as a stopword. Let’s understand with an example –

custom_stop_word_list=['you know', 'i mean','yo','dude']

2. Extracting the list of stop words NLTK corpora (optional) –

This is optional because if you want to go ahead with the above custom list of stopwords then This is not required. Usually, developers/data scientists merge the custom stopword list with a predefined stop stopword list. There are so many Libraries for extracting the stopwords list. Here we are achieving it with NLTK-

# importing Nltk stopword package
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

#Loading Stopwords into a list
NLTK_stop_words_list=stopwords.words('english')
print(NLTK_stop_words_list)
print("Total numbers of stop words are ")
print(len(NLTK_stop_words_list))

Here we have generated the list which contains predefined stop words from the NLTK package.

 

How to use custom stop word list in python NLP -1

How to use custom stop word list in python NLP -1

 

3. Add the custom stop words NLP  list into NLTK library’s stop words  (nltk add stopwords) –

Let’s see how to Add the custom stop words NLP  list with NLTK library’s stop words –

final_stopword_list = custom_stop_word_list + NLTK_stop_words_list

Here final_stopword_list contains stop words from both sources. Here is the output screenshot-

 

custom stop words NLP

 

Here the final_stopwords_list can be custom alone as well.

 

4. How to remove stop words python NLTK?

Here is the way to remove stopwords. Here will use the custom stopwords list.

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize 
nltk.download('punkt') 
input = "'you know Lets see the examples in DataScienceLearner"

tokens = word_tokenize(input) 
sentence_without_stopword = [k for k in tokens if not k in final_stopword_list] 
print(tokens) 
print(sentence_without_stopword) 

Once you run this code, It will remove the custom stopword and predefine the stopword.  That’s it.

Conclusion –

I hope this article must clear your doubt about removing standard stopwords as well as custom user define. If you have any thoughts and suggestions on this topic, please comment in the comment box. Apart from this custom way we can use any third party NLP library like spacy etc to achieve the same as well.

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner