Logistic Regression with Stochastic Gradient Descent

Yash Gupta
3 min readNov 4, 2019

--

Machine Learning

Code to the whole program can be found at the end of the post.

So to start with, Logistic Regression is a discriminative machine learning algorithm.

This time I created some artificial data through python library random. The data is a <x,y> pair and also each data point has a label. It looks something like this:

Data to be used

For any given data point X, as per logistic regression, P(C|X) is given by

Probability in logistic regression

The parameter ‘w’ is the weight vector. We will be optimizing ‘w’ so that when we calculate P(C = ‘Class1’|x), for any given point x, we should get a value close to either 0 or 1 and hence we can classify the data point accordingly.

We will be using stochastic gradient descent for optimizing the weight vector.

Since we’re using stochastic gradient descent, error would be calculated for a single data point. The formula for error would be :

Error formula to be used

where, Ypredicted is P(C|X) from logistic regression, Yactual would be 1 if the data point is of Class 1 and 0 if it is of Class 0.

As we have seen earlier, now we will calculate the gradient of the error function w.r.t. ‘w’ since ‘w’ is to be optimized. After which we will obtain:

Gradient of error w.r.t. w
Code for calculating gradient w.r.t ‘w’

The following equation would be used to update the weight vector:

Parameter updating formula

the learning rate would be predefined.

Now, as per stochastic gradient, we will only update the weight vector if a point is miss classified. So after calculating the predicted value, we’ll first check if the point is miss classified. If miss classified only then will the weight vectors be updated. You’ll get a better picture seeing the implementation below:

Code for Stochastic Gradient

After running 1000 iterations, we get optimized values of ‘w’. The decision line and data looks like:

Decision Line

Now when the system gets a new data point, it calculates P(C|X) with the optimized value of ‘w’ and then classifies it accordingly.

You can find the code for the whole program at:

Link to other related posts:

Sign up to discover human stories that deepen your understanding of the world.

--

--

Yash Gupta
Yash Gupta

Written by Yash Gupta

Pursuing Masters in Computer Science

Responses (1)

Write a response