Prediction Based Word Embedding Techniques

After the frequency based Word Embedding Techniques , There was a revolutionary concept came in 2013 Word2Vec (Tomas Mikolov) .This concept really change the existing NLP approach . We can create smart chatting bots after this algorithm release . Even Google became so powerful after its invention . It was able to to capture the context while creating embeddings .Word2Vec is a kind of prediction word embedding technique .Actually Word2Vec is pre trained Prediction base Embedding Model ( Covers algorithms and training on its own data )   . It was trained on Google news corpus .  Well  I put these interesting facts  in the starting paragraph to generate interest in you . This article will cover Prediction Based Word Embedding Techniques.

Prediction Based Word Embedding Techniques –

It is built on two Principal lets understand them first –

1. Continuous Bag of words (CBOW) –

Under this approach , We try to predict the target word on the basis of context . Here context is nothing but neighbour words .We create window of some fixed size . All the words from left and right from the target words by the the window size comes as context . We train the shallow neurel network ( only one hidden layer ) .
Example – Lets understand with the below sentence –
Embedding is essential for all NLP stuffs .
if  we take window size is 2 . If we consider “essentail” as target word , These taken will be context [“Embedding” , “is” , “for” , “all” ].

2. Skip gram –

This is just opposite to Continuous Bag of words (CBOW) .Here we predict the contex based on the given word . While as I have already explained that in CBOW , we predict the word based on the context .
Example – Lets understand with the some example . For ease Lets take the same sentance –
Embedding is essential for all NLP stuffs .
if window size is 2 then there will be 4 pair for context and target .

1. essentail -> Embedding

2 .essentail ->is

3 . essentail ->for

4. essentail ->all

How to choose Skip Gram vs CBOW ? –

It is really a million doller question . There could be multiple scenario which effects this decision but we will cover the game changer reason . I mean which has the higher impact . Lets understand –
1.In order to achieve fast training / speed , You should go with SkipGram algoritms if data size is not very big . In General view , CBOW is faster than SkipGram.
2. SkipGram performs well on less frequent data . CBOW is good with high frequent data .

Word2Vec , FastText and GloVe-

These are pretained Word  Embedding Models on big corpus . Majorly it has good performance on general data . Still if you have domain specific data , just go for training your own word embedding on the same model like ( Word2Vec , FastText and Glove  ) with your own data .

Conclusion –

Word Embedding is really important when it comes to handle the context and co-occurrence of words . The context and co-occurrence is the general requirement for most of the NLP stuffs . So what the Frequency based model is out dates ? Please think and answer . No they are also important in some scenario . Obviously Prediction based is more problem solving generally but it may fail in some scenario.We will discuss in different article .

Thanks

Data Science Learner Team