Linux/Unix is most popular platform for Development and Analytics . I have seen many developers and data scientist struggles in basic command of linux .Actually they are very easy but because of little laziness we ignore to document them . Its just five minutes game to explore them .Lets see in this article – ” Top 10 Linux Command for Data Scientist ” . The best part is I have only shortlisted 10 most popular out of the big list of commands . I always believe in small steps for big success . I am a data scientist and it was my biggest pain area . Hence I have documented them in this article . You may book mark it if you think , you forget them easily .
Top 10 Linux Command for Data Scientist:
This command helps to search file in a directory . It recursively search them . Here is the syntax for them .
[regular_expression] [-options] [search_type]
➜ etc find . -name '*trans*' -type f ./filetransfer.txt
2. grep –
If you find the file . Now you need to search any pattern inside the file . You may use grep command . There are many options which make this search more effective .Lets Understand them one by one –
Syntax: grep "WhatToSearch" filename
- You may use regex at the place of string ( WhatToSearch) and filename as well .
- By default grep command is case sensitive . In order to make it case insensitive use “grep -i “.For Example –
cut -d ',' -f 5 filename.csv
grep -i "whatToSearch" filename
Get more details on grep command .
3. Cut –
This is very useful for quick filtering . It gives best result with column data .Lets first see an example for cut command –
cut -d 'separator' -f column_no filename
cut -d ',' -f 5 filename.csv
4. Wget Command –
Incase you need to download something from remote location , Use this command . Here is the simple syntax –
~$ wget taget_link
We must face this situation that we worded over some command but it get disappear from the scree . When we again need to use it , We search for that . The smart solution is use history command for that –
6. head –
Often we need to see the structure of the file .We need not to open the file for that just print some top line from it . It usually required to see the header of csv/excel type of file . In most of the analytics software the column name is required to mapped with file . Next time use this command that scenario. Here is the syntax for head command –
~$ head -n 5 filename
here the value of n denotes the number of the line from header .
7. tail –
Quite similar to the head command but opposite in nature . Basically it will print from last .Please refer the below for syntax –
tail -n 15 filename
8. awk –
It is a complete topic for learning . The truth is covering it inline here will be a big injustice with it .Just I have put because I really want you to search for it . Awk will process and filter text files specially . I think you should refer a detail content on this(awk) here .
9. wc –
This Linux command /shell command helps data scientist in finding or estimating the the number of lines , words under a file .
For example –
$ wc -l filename.txt
Here wc -l gives the number of the line in this file . Again if you want to estimate the number of words inside the file . Here is the way
$ wc -w filename.txt
10 . cat –
Coming at the end at the list but not the list . In fact it is one of those command which is most popular among us . We use cat command to print the content of any file . Along with it we can merge /concatenate two files into one using this command . Here is the syntax for cat command –
cat input1.csv input2.data > output.csv
This is the one the most required command for me as a data scientist . I hope will be the same for you . It almost finish my 80 percent linux stuff everyday .
Some time these little learning helps a lot . Usually what happen when we see or decide to learn something . We invest time in finding the best tutorial around . We usually get the detailed one but we do not start . Some time we start but stop early because it seems big to us . This article is not a tutorial content but it is actually a mind set of taking small steps . Let me know your views on this . I mean this mind set . Does this article effect your performance anyway ? Please let us know . Again If you any doubt related to above mention commands , Please write back to us.
Data Science Learner Team