Spacy is a free and open-source library for advanced Natural Language Processing(NLP) in Python. It is basically designed for production use and helps you to build applications that process and understand large volumes of text. In this tutorial, I will explain to you how to implement spacy lemmatization in python through steps.
What is a Spacy Lemmatization?
Lemmatization is one of the common text pre-processing tasks in NLP that reduces a given word to its root word. For example cars, car’s will be lemmatized into car. In the same way, are, is, am is lemmatized to be.
Steps to Implement Lemmatization
In this section, you will know all the steps required to implement spacy lemmatization. Make sure you have installed spacy in your system before following the steps. Also, you should follow all the steps for deep understanding.
Step 1: Import required package
The first step is to import all the necessary libraries. In my example, I am using spacy only so let’s import it using the import statement.
Step 2: Load your language model
There are many languages where you can perform lemmatization. You can find them in spacy documentation. In my example, I am using the English language model so let’s load them using the spacy.load() method. But make sure you have downloaded the model in your system.
Download the English model
python -m spacy download en_core_web_sm
Load the English model
nlp = spacy.load("en_core_web_sm")
Step 3: Make a Sample Document
Before doing the spacy lemmatization let’s first make an NLP document. To do so you have to use nlp() method. Add the below line of code.
doc = nlp("Welcome to the Data Science Learner! . Here you will learn all things about data science , machine learning , artifical intelligence and more" )
Step 4: Implement spacy lemmatization on the document
Now the last step is to lemmatize the document you have created. To do so you have to use the for loop and pass each lemmatize word to the empty list. Execute the complete code given below.
import spacy nlp = spacy.load("en_core_web_sm") doc = nlp("Welcome to the Data Science Learner! . Here you will learn all things about data science , machine learning , artifical intelligence and more." ) empty_list =  for token in doc: empty_list.append(token.lemma_) final_string = ' '.join(map(str,empty_list)) print(final_string)
There are the steps for doing the spacy lemmatization of any document. Here I have used small text but you can use large documents for lemmatization. I hope you have liked this tutorial. If you have any queries then you can contact us for more help.
You may read about spacy tokenization.
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.