There are so many ways to extract or remove stop words from Text. But these stop words are pre-defined in NLP libraries. Now if you are searching How to use custom stopwords python NLP, You will get the answer step by step in this article. Let’s Add stopwords python-
1. Create a custom stopwords python NLP –
It will be a simple list of words(string) which you will consider as a stopword. Let’s understand with an example –
custom_stop_word_list=['you know', 'i mean','yo','dude']
2. Extracting the list of stop words NLTK corpora (optional) –
This is optional because if you want to go ahead with the above custom list of stopwords then This is not required. Usually, developers/data scientists merge the custom stopword list with a predefined stop stopword list. There are so many Libraries for extracting the stopwords list. Here we are achieving it with NLTK-
# importing Nltk stopword package import nltk nltk.download('stopwords') from nltk.corpus import stopwords #Loading Stopwords into a list NLTK_stop_words_list=stopwords.words('english') print(NLTK_stop_words_list) print("Total numbers of stop words are ") print(len(NLTK_stop_words_list))
Here we have generated the list which contains predefined stop words from the NLTK package.
How to use custom stop word list in python NLP -1
3. Add the custom stop words NLP list into NLTK library’s stop words (nltk add stopwords) –
Let’s see how to Add the custom stop words NLP list with NLTK library’s stop words –
final_stopword_list = custom_stop_word_list + NLTK_stop_words_list
Here final_stopword_list contains stop words from both sources. Here is the output screenshot-
Here the final_stopwords_list can be custom alone as well.
4. How to remove stop words python NLTK?
Here is the way to remove stopwords. Here will use the custom stopwords list.
from nltk.corpus import stopwords from nltk.tokenize import word_tokenize nltk.download('punkt') input = "'you know Lets see the examples in DataScienceLearner" tokens = word_tokenize(input) sentence_without_stopword = [k for k in tokens if not k in final_stopword_list] print(tokens) print(sentence_without_stopword)
Once you run this code, It will remove the custom stopword and predefine the stopword. That’s it.
I hope this article must clear your doubt about removing standard stopwords as well as custom user define. If you have any thoughts and suggestions on this topic, please comment in the comment box. Apart from this custom way we can use any third party NLP library like spacy etc to achieve the same as well.
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.