After the frequency based Word Embedding Techniques , There was a revolutionary concept came in 2013 Word2Vec (Tomas Mikolov) .This concept really change the existing NLP approach . We can create smart chatting bots after this algorithm release . Even Google became so powerful after its invention . It was able to to capture the context while creating embeddings .Word2Vec is a kind of prediction word embedding technique .Actually Word2Vec is pre trained Prediction base Embedding Model ( Covers algorithms and training on its own data ) . It was trained on Google news corpus . Well I put these interesting facts in the starting paragraph to generate interest in you . This article will cover Prediction Based Word Embedding Techniques.
It is built on two Principal lets understand them first –
Under this approach , We try to predict the target word on the basis of context . Here context is nothing but neighbour words .We create window of some fixed size . All the words from left and right from the target words by the the window size comes as context . We train the shallow neurel network ( only one hidden layer ) .
Example – Lets understand with the below sentence –
Embedding is essential for all NLP stuffs .
if we take window size is 2 . If we consider “essentail” as target word , These taken will be context [“Embedding” , “is” , “for” , “all” ].
This is just opposite to Continuous Bag of words (CBOW) .Here we predict the contex based on the given word . While as I have already explained that in CBOW , we predict the word based on the context .
Example – Lets understand with the some example . For ease Lets take the same sentance –
Embedding is essential for all NLP stuffs .
if window size is 2 then there will be 4 pair for context and target .
1. essentail -> Embedding
2 .essentail ->is
3 . essentail ->for
4. essentail ->all
It is really a million doller question . There could be multiple scenario which effects this decision but we will cover the game changer reason . I mean which has the higher impact . Lets understand –
1.In order to achieve fast training / speed , You should go with SkipGram algoritms if data size is not very big . In General view , CBOW is faster than SkipGram.
2. SkipGram performs well on less frequent data . CBOW is good with high frequent data .
These are pretained Word Embedding Models on big corpus . Majorly it has good performance on general data . Still if you have domain specific data , just go for training your own word embedding on the same model like ( Word2Vec , FastText and Glove ) with your own data .
Word Embedding is really important when it comes to handle the context and co-occurrence of words . The context and co-occurrence is the general requirement for most of the NLP stuffs . So what the Frequency based model is out dates ? Please think and answer . No they are also important in some scenario . Obviously Prediction based is more problem solving generally but it may fail in some scenario.We will discuss in different article .
Thanks
Data Science Learner Team