Which is the Best Word Embedding Technique with Domain Data ?

Best Word Embedding Technique with Domain Data

Hi Guys !  I can understand the choice difficulties with word Embedding . Specially when your data is quite specific to domain . I have some interesting finding on Word Embedding Technique with Domain Data .  I am really excited to share my Word Embedding experience with different data .So without any delay Lets start –

Word Embedding Technique with Domain Data –

Lets talk about the general understanding on existing word embedding Techniques . We all believe that Predictive Word Embedding Techniques like Word2Vec , Fast Text , GloVe are far better than Frequency Embedding Techniques like TF-IDF or Count Vectorizer  etc .  Actually it is true but not in all cases .Lets understand with some example .

Case 1 : Domain Data and Data Volume is low –

In this case you should apply the frequency based embedding technique ( TF -IDF and Count Vectorizer ) . Because if you see in the down-line implementation of Word2Vec and FastText etc , You will get the concept of down sampling in which few of the word in every sentence are down sample ( not consider ) based on some threshold value and term frequency . This may loose some important domain word while training . I have seen TF-IDF performs well in small domain data than others.

Case 2 : Domain Data and Data Volume is High –

If we have domain data of finance , There will be completely different key terms in you data . So using pretained model may will miss your key terms right . Because they are train on general data set like IMDB , new etc . In this case you should train your own embedding on Word2Vec and FastText Technique using you data . In short algorithm will be Predictive on you in house data .

Case 3 : General and Large Data –

All you need to apply pretained predictive embedding model ( GloVe , FastText etc ) . It is gonna give you awesome control over the data .It is really help in chatbot or general conversation implementation .

Conclusion –

Choosing the best Word Embedding Technique with Domain Data is very crucial while model development . As we have try to solve this problem in three different scenario . It will hep you in choosing the best way for you . Obviously I am not denying or confirming that my finding and suggestion will work in all type of data . Actually Data is king is Data Science . Algorithms plays only 30 percent role while 70 percent is all about data . Also data pre processing is one of the game changer which helps to extract the meaning full information . Hence I mean to say that what ever the finding you will get inside this post will help you in most of the cases but there could be some exceptions as well .

I hope you have find this article useful and interesting . In order to get such post of Data Science , NLP , Text Analytics , Please subscribe us .


Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner