 Gradient Descent Algorithm variations

Gradient Descent is an optimization algorithm that minimizes any function. Basically, it gives the optimal values for the coefficient in any function which minimizes the function. In machine learning and deep learning, everything depends on the weights of the neurons which minimizes the cost function. If the cost function will be low, the Model will be a better fit on the datasets.

## The formula for Gradient Descent :

Suppose we have a neural network with a learning rate η. Here is the formula for weight updation using a gradient descent optimizer. Here L is the loss function. In general, while performing a single epoch/backpropagation, We can consider single training dataset observation or multiple at a time.

## Variations of Gradient Descent –

There are many variations of Gradient Descent Algorithm.

### 1.Batch gradient descent :

In this variation of gradient descent, We consider the losses of the complete training set at a single iteration/backpropagation/epoch. Which is the cost function for the neural network. This is standard gradient descent. It has some advantages and disadvantages.

#### Advantage of Batch gradient descent –

1.This is computationally efficient because all training set goes in one go. Hence only a few machine cycles are required.

2. Fewer oscillations process and easy convergence to global minima.

#### The disadvantage of Batch gradient descent –

1.It is less prone to local minima but in case it tends to local minima. It has no noisy step hence it will not be able to come out of it.

2. Although it is computationally efficient but not fast. As it needs more memory to load the complete data into memory at once.

### 2.Stochastic gradient descent –

This gradient descent algorithm variation is the same as earlier with a difference that it considers a single training observation at single epoch/backpropagation/iteration. Hence it is better

#### Advantage of Stochastic gradient descent –

It is computationally very fast. It is also memory efficient because it considers one observation at a time from the complete dataset.

#### The disadvantage of Stochastic gradient descent –

It does not converge straight because of the noise.

Thanks

Data Science Learner Team 