How to use Git version control for Data Science Project ?

How to use Git version control for Data Science Project

If you are data scientist or doing any Data science Project , You have to write the code out of the box . I mean completely new Right ? Suppose some how your system go in crash mode . This is a situation which can break all your hard works . Do not worry , Your next five minutes in this article can save you from loosing your hard work in terms of code . In this article you will know  –  How to use Git version control for Data Science .

Before I directly start exploring Git or GitHub , I will explain the basics of version control software and version control repository . Those who know about , can directly jump into the next section .

Why to use version control software –

Version control software have many use cases . Lets have one by one –

  1. It basically save snapshot of your project  when you commit it . I mean every version contains the copy of the project till that time . So In case of crashes , You can recover your code .
  2. Suppose while in development of you Data science project , You encountered a new  approach you start working on that . After some time you come to know that your new code is effecting the previous one . In this case , Using version control software you can role back your project .
  3. Concept of Branching gives the flexibility of flow in development  . I know you like example to understand well . In that case , Suppose you have a project which has some feature . Now two clients approaches you . First client need your project with extra feature A . At the same time , Second Client need the the project with B feature but not A . Now you can go with branching . This will help you in development and delivery of your software product .Now you have two different teams for two different branches .

What is version control Repository –

This is the common place , Which you can host on cloud and share among your team members . There are two type of version control repository

  1. Central version control Repository.
  2. Local version control Repository .

I think , You all are intelligent enough to understand both by the name itself . Lets discuss on the the market leaders in version control software .

Market leaders in version control software –

  1. Git
  2. Apache Subversion
  3. CVS
  4. Mercurial

I think , Now you have enough basics in version control . Lets make you specific in Git today .

How to use Git version control –

Using git is not a big deal . It is just a game of some commands .Hey before we go deep down into it . My question is to you ” What is the difference between GitHub and Git “. Actually Git is a version control software and GitHub is place ( Cloud based Centralize Repository ) where you can store and share your code .

Essential Steps for working with git ( Windows) . Do not bother if you have Linux , Mac or some other because after the third step git command will be same   –

  1. Make a profile in GitHub from here .
How to use Git version control 1
How to use Git version control 1

2. Download Git on your system .

How to use Git version control 2
How to use Git version control 2

3.  Open the folder where you want to make local repository .

4.  Right click and select Git Bash Here.

How to use Git version control 3
How to use Git version control 3

5. Use  below command for initializing git repository .

$ git init

6. Now you have to create first repository or you can connect through existing repository.For connecting to existing directory write the command –

$ git remote add origin “url_of_your _repository”

Let me tell  you where you can get this url . Once you login into the repository , please  Press Clone or Download button  .Do not give much stress , I am sharing the image below . Please take the reference from here .

How to use Git version control 5
How to use Git version control 5

I think it completely clear to you Right ! Anyways .This is the way you can remotely connect with the repository .Now take the scenario when you pulled the code from repository and add some extra file in your local repository  . Now you want to commit that file in central Git remote repository ,You have to take these steps.

Steps for commit file in local repository Git

1.Add /Index file into local repository first using the command –

$ git add filename.extension 

2. Now you can commit the file by using the command –

$ git commit -m ”  comment”

Note – if you need to add full folder or commit multiple files (folders ) you may use below written Git command

1.For adding the file

$ git add -A

2. For committing  complete folder –

$ git commit -a -m “comments”

Steps for Pushing your code into central repository –

Once you commit your code in local repository , You have to push and sync with remote repository and for this here are some steps –

1.You have to add ssh key into your Git account . First run the below command –

$ ssh-keygen 

2. This command will create the key . Now to view the complete key  use the command . Make sure once you run command ssh-keygen it will show you the path where your key is saved .This path you have to enter once you run the below written command .

$cat path_where _your_is_saved    

3. This will show you the complete key now go to your Git profile and under the setting Tab , Press SSH and GPG keys . Here you can paste and save your key .


How to use Git version control 6
How to use Git version control 6

Now last command to push your code. Remember if do not create extra branch , By default it will be master .

$ git push origin branchname 

The git is the tool for committing the code from local to repository. But this repository will be code hub like Github. This github or similar repositories can be of commercialized by any name like Bitbucket etc.

Now I will stop here before I say anything else . My question is to you how did you find this article ? For any suggestion to improve this article , You may openly write back to DataScienceLearner . This article is not only for Data science but its a simple answer for how to use git version control . Apart from this if you are totally new to Data science start your journey from the Article How to become a Data Scientist . This article will help you in finding the right way to learn Data science .

Stay connect with –

Data science learner Team 







Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages, where he and his team share knowledge and help others learn more about data science.
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner