How to create word embedding using FastText

How to create word embedding using FastText ?

FastText is one of the popular name in Word Embedding these days . In short , It is  created by FaceBook . Still FastText is open source . You may use FastText in many ways like test classification and text representation etc . While under this article , We will only explore the text representation . Basically for any Machine Learning algorithms on Text , You need to convert them into numbers vectors .Lets understand How to create word embedding using FastText ?

Using FastText own Implementation –

Yes ! FastText has its own implementation for word embedding . Here I am sharing the official link  for FastText own implementation for word embedding .

FastText Word embedding
FastText Word embedding

Here you can use FastText pre train model as well as you may train your own model of embedding with fastText algorithms . For implementation prospective I will suggest you to visit the official FastText tutorial on embeddings .

Gensim for FastText Implementation –

create word embedding using FastText
create word embedding using FastText

Gensim provide the another way to apply FastText Algorithms and create word embedding .Here is the simple code example –

from gensim.models import FastText  
from gensim.test.utils import common_texts
model_FastText = FastText(size=4, window=3, min_count=1)
model_FastText .train(sentences=common_texts, total_examples=len(common_texts), epochs=10)

The above example is of 4 line implementation . Lets understand one by one –

  1. Import required modules .
  2. You need some corpus for training . Here the corpus must be list of lists of tokens . Regular text must contains sentence . you need to create the tokens out of it . Hence every element of the list will be a sentence tokens .Now any text data must contains multiple sentence . So there will be a corresponding list for each sentence . So the final data structure will be list of lists .
  3. Create the object for FastText with require parameters .Here size is numbers of feature or embedding dimensions . For more clarification 4 represents that each words will be represented in 4 columns . For more details , Please have look here –
  4.  create word embedding using FastText paramters
    create word embedding using FastText paramters

    4. This line gives the last train command syntax for FastText .

Fast Text Pre Trained Model –

When you read this title , you must have a question . Embedding on your training data or FastText Pre trained Model . Actually this is one of the big question point for every data scientist . Actually There is very clear line between these two . Lets understand them one by one . See FastText is not a model , Its an algorithm or Library which we use to train sentence embedding . However Pre train Fast Text Models are the ready made solution ( models ) on some large corpus . This is only beneficiary for generalize data itself .

See Training on large data involves heavy computation cost . Also any deep learning model like FastText etc needs too much of data .

Conclusion –

Friends how did you find this article – How to create word embedding using FastText ? Please write your views on this topic . Apart from this article , There are some other key terms which you should understand when it comes to word embedding . Word Embedding is very vast and hot research topic . So keep on reading related latest content on this .   Please refer the below article for reference and basic understanding  –

Word Embedding in Python : Different Approaches

Prediction Based Word Embedding Techniques

Which is the Best Word Embedding Technique with Domain Data ?