How to Count Bigrams in NLTK
How to Count Bigrams in NLTK

How to Count Bigrams in NLTK ? Stepwise Solution

GET FREE AMZAON AUDIOBOOKS

We can count bigrams in nltk using nltk.FreqDist(). Then We have to convert the raw text into bigrams. We utilize the bigrams in nltk.FreqDist(). In this article, We will implement the solution in steps.

 

Count bigrams in nltk (Stepwise) –

This is a multi-step process. We will also explain one by one.

Step 1: Importing the packages-

In order to complete the counting of bigram in NLTK. We need the below python packages.

import nltk
nltk.download('punkt')

Step 2: Tokenize the input text-

In this step, we will define the input text and then we further tokenize it.

text=" This is the best place to learn Data Science Learner"
tokens = nltk.word_tokenize(text)

The nltk.word_tokenize() function tokenize the text into list.

Step 3: Generate the Bigrams –

In this step, we will generate the bigram pairs from the tokens. here is the code for bigrams pair extraction from tokens.

bigrams = nltk.bigrams(tokens)

The nltk.bigrams() function will create the bigrams from the tokens which we have created in the above text.

Step 4: Counting the Bigrams-

In the above steps, we have extracted the bigrams from the text in the form of a generative class sequence. Now in this section, we will use FreqDist(bigrams)

frequency = nltk.FreqDist(bigrams)
for key,value in frequency.items():
    print(key,value)

Once we have the frequencies, We can iterate the key, value pair.

 

Complete Code –

Let’s combine the code pieces from each step. Now run the consolidated code.

import nltk
nltk.download('punkt')
text="This is the best place to learn Data Science Learner"
tokens = nltk.word_tokenize(text)
bigrams = nltk.bigrams(tokens)
frequence = nltk.FreqDist(bigrams)
for key,value in frequence.items():
    print(key,value)
count bigrams in nltk
count bigrams in nltk

We have seen that the above code extracted the count of the occurrence of bigrams in the corpus. Although we have used a very small corpus. You may replace it with a bigger one.

Conclusion –

There may be various ways to count the bigrams from the raw text. But we have implemented the simplest for you. Although most of the steps are self-explanatory. Still, we have tried to explain it to you. If you have any doubt related to this topic or article, please let us know. You may also comment in the below comment box.

 

Thanks 

Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner