how to implement simple linear regression in python
How to Implement Simple Linear Regression in Python

How to Implement Simple Linear Regression in Python

For implementing Simple Linear Regression in Python, we have to understand some basics about Linear Regression. Please read this blog first to this – What is Linear Regression?

I hope, currently you know some basics about Linear Regression. Now for implementing Simple Linear Regression, we have to follow some steps. Those steps are given below.

Step 1: First we need a clean a dataset. I use this SimpleLinearRegression.csv dummy dataset for this implementation. This dataset has two variables, YearsExperience and Salary. YearsExperience is our Independent Variable and Salary is our Dependent Variable. You also can use this dataset for practice. Download it from this Link.

dummy dataset
SimpleLinearRegression.csv file preview

Step 2: Import pandas library. Use read_csv() function for importing the csv file and print it to check whether csv file is imported successfully or not.

import pandas as pd

df = pd.read_csv('SimpleLinearRegression.csv')

print(df.head())

Step 3: Put our independent variable, YearsExperience into X variable and put dependent variable, Salary into y variable.

X = df[['YearsExperience']]

y = df[['Salary']]

Step 4: For training our model, we have to split our variables X, y into X_train, y_train, X_test and y_test. And for doing this we have to import a library train_test_split from sklearn.model_selection

from sklearn.model_selection import train_test_split

X_train,  X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 42)

Step 5: Now we are ready for applying Linear Regression. First import LinearRegression library and then create a Linear Regression object/model. Then fit the model with X_train and y_train variables. Use predict() function to find out y_prediction. Now our model is ready for rest calculation.

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

Step 6: Now for calculating MAE, MSE, RMSE and R_Squared, we used build-in functions mean_absolute_error(), mean_squared_error() and r2_score().

from sklearn import metrics

mae = metrics.mean_absolute_error(y_test,y_pred)
mse = metrics.mean_squared_error(y_test,y_pred)
rmse = metrics.mean_squared_error(y_test,y_pred)**.5
r_squared = metrics.r2_score(y_test,y_pred)

print(mae)
print(mse)
print(rmse)
print(r_squared)

Bonus part: We can also plot our Linear Regression graph. In this code below I plotted graph for training dataset.
import matplotlib.pyplot as plt

plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs. Years (For training set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()
regression plot
Training dataset Graph

I also plotted graph for Testing dataset.

plt.scatter(X_test, y_test, color='red')
plt.plot(X_test, regressor.predict(X_test), color='blue')
plt.title('Salary vs. Years (For Test set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()
regression plot
Testing dataset Graph

Full code:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import matplotlib.pyplot as plt

df = pd.read_csv('SimpleLinearRegression.csv')
print(df.head())

X = df[['YearsExperience']]
y = df[['Salary']]

print(X.head())
print(y.head())

X_train,  X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 42)

regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

mae = metrics.mean_absolute_error(y_test,y_pred)
mse = metrics.mean_squared_error(y_test,y_pred)
rmse = metrics.mean_squared_error(y_test,y_pred)**.5
r_squared = metrics.r2_score(y_test,y_pred)

print(mae)
print(mse)
print(rmse)
print(r_squared)

plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs. Years (For training set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()

plt.scatter(X_test, y_test, color='red')
plt.plot(X_test, regressor.predict(X_test), color='blue')
plt.title('Salary vs. Years (For Test set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()



If you want to learn, own implementation of Simple Linear Regression, then click here

If you want to learn, Datacamp course – “Supervised Learning with scikit-learn” – then click here.

Thank you for reading my blog. If you have any query about this code, feel free to ask by comment. Thank you again. 

Leave a Reply