FastText is one of the popular name in Word Embedding these days . In short , It is created by FaceBook . Still FastText is open source . You may use FastText in many ways like test classification and text representation etc . While under this article , We will only explore the text representation . Basically for any Machine Learning algorithms on Text , You need to convert them into numbers vectors .Lets understand How to create word embedding using FastText ?
Using FastText own Implementation –
Yes ! FastText has its own implementation for word embedding . Here I am sharing the official link for FastText own implementation for word embedding .
Here you can use FastText pre train model as well as you may train your own model of embedding with fastText algorithms . For implementation prospective I will suggest you to visit the official FastText tutorial on embeddings .
Gensim for FastText Implementation –
Gensim provide the another way to apply FastText Algorithms and create word embedding .Here is the simple code example –
from gensim.models import FastText from gensim.test.utils import common_texts model_FastText = FastText(size=4, window=3, min_count=1) model_FastText .train(sentences=common_texts, total_examples=len(common_texts), epochs=10)
The above example is of 4 line implementation . Lets understand one by one –
- Import required modules .
- You need some corpus for training . Here the corpus must be list of lists of tokens . Regular text must contains sentence . you need to create the tokens out of it . Hence every element of the list will be a sentence tokens .Now any text data must contains multiple sentence . So there will be a corresponding list for each sentence . So the final data structure will be list of lists .
- Create the object for FastText with require parameters .Here size is numbers of feature or embedding dimensions . For more clarification 4 represents that each words will be represented in 4 columns . For more details , Please have look here –
4. This line gives the last train command syntax for FastText .
Fast Text Pre Trained Model –
When you read this title , you must have a question . Embedding on your training data or FastText Pre trained Model . Actually this is one of the big question point for every data scientist . Actually There is very clear line between these two . Lets understand them one by one . See FastText is not a model , Its an algorithm or Library which we use to train sentence embedding . However Pre train Fast Text Models are the ready made solution ( models ) on some large corpus . This is only beneficiary for generalize data itself .
See Training on large data involves heavy computation cost . Also any deep learning model like FastText etc needs too much of data .
Friends how did you find this article – How to create word embedding using FastText ? Please write your views on this topic . Apart from this article , There are some other key terms which you should understand when it comes to word embedding . Word Embedding is very vast and hot research topic . So keep on reading related latest content on this . Please refer the below article for reference and basic understanding –