## In this blog, we are talking about our own implementation code of Simple Linear Regression.

First we need to know about the **Hypothesis Equation of Simple Linear Regression**, for building our own implementation code. **The Hypothesis Equation is:** **y = a + b * x**

where, **a** is the **intercept**,

**b** is the **slop** and

**x** is the **independent variable**.

So, for calculating **y**, we have to calculate **intercept** and **slop** first.

**Intercept formula**,** a = Ȳ – b * X̄**

where, ** Ȳ **is the

**y mean (dependent variable mean)**,

**b**is the

**slop**and

**X̄**is the

**X mean (independent variable mean)**.

and **Slop formula**,** b = COVARIANCE( X, y )/VARIANCE( X )**

Step 1: In this step, for implementing own linear regression code, we need some own implementation functions like,

**LEN**,**SUM**,**MEAN**,**VARIANCE**,**COVARIANCE**,**Y_PREDICTION**,**ABSOLUTE**,**MEAN_ABSOLUTE_ERROR**,**MEAN_SQUARED_ERROR**and**R_SQUARED**

**CODES:**

## 1. **Len calculation function**

```
#1 LENGTH CALCULATION FUNCTION
def LEN(List):
n=0
for i in List:
n+=1
return n
```

2. **Sum calculation function**

```
#2 SUM CALCULATION FUNCTION
def SUM(List):
if type(List[0])==str:
st=''
for i in List:
st+=i
return st
SUM=0
for i in List:
SUM+=i
return SUM
```

3. Mean calculation function

```
#3 MEAN CALCULATION FUNCTION
def MEAN(List):
return SUM(List)/LEN(List)
```

4. Variance calculation function

```
#4 VARIANCE CALCULATION FUNCTION
def VARIANCE(List):
return SUM([(x-MEAN(List))**2 for x in List])/(LEN(List)-1)
```

5. Covariance calculation function

```
#5 COVARIANCE CALCULATION FUNCTION
def COVARIANCE(list1, list2):
return SUM([(list1[i]-MEAN(list1))*(list2[i]-MEAN(list2)) for i in range(0,LEN(list1))])/(LEN(list1)-1)
```

6. Y-Prediction calculation function

```
#6 Y-PREDICTION CALCULATION FUNCTION
def Y_PREDICTION(a, b, x):
return [a+b*x[i] for i in range(0,LEN(x))]
```

7. Absolute calculation function

```
#7 ABSOLUTE CALCULATION FUNCTION
def ABSOLUTE(value):
if (value<0):
value*=-1
return value
```

8. Mean-Absolute-Error calculation function

```
#8 MEAN ABSOLUTE ERROR CALCULATION FUNCTION
def MEAN_ABSOLUTE_ERROR(y_pred, y_test):
return SUM([ABSOLUTE(y_pred[i]-y_test[i]) for i in range(0,LEN(y_test))])/LEN(y_test)
```

9. Mean-Squared-Error calculation function

```
#9 MEAN SQUARED ERROR CALCULATION FUNCTION
def MEAN_SQUARED_ERROR(y_pred, y_test):
return SUM([(y_pred[i]-y_test[i])**2 for i in range(0,LEN(y_test))])/LEN(y_test)
```

10. R-Squared calculation function

```
#10 R SQUARED CALCULATION FUNCTION
def R_SQUARED(y_pred, y_test, yMean):
return 1-((SUM([(y_test[i]-y_pred[i])**2 for i in range(0,LEN(y_test))]))/(SUM([(y_test[i]-yMean)**2 for i in range(0,LEN(y_test))])))
```

Step 2: Now we have to import our dataset and put our independent variable into **X** and dependent variable into **y**. Get the dataset from this Link – Dataset.csv

```
import pandas as pd
df = pd.read_csv('SimpleLinearRegression.csv')
print(df.head())
X = df['YearsExperience']
y_hat = df['Salary']
```

Step 3: We split our dataset into train-test and reset their index with drop=True

```
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y_hat, test_size = 1/3, random_state = 42 )
X_train = X_train.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)
y_train = y_train.reset_index(drop=True)
y_test = y_test.reset_index(drop=True)
```

Step 4: Lastly, calculating intercept, a and slop, b by using their formula and also calculating Y-Prediction, MAE, MSE and R-Squared value by our own implementation functions.

```
b = COVARIANCE(X_train,y_train)/VARIANCE(X_train)
a = MEAN(y_train)-b*MEAN(X_train)
y_pred = Y_PREDICTION(a, b, X_test)
mae = MEAN_ABSOLUTE_ERROR(y_pred, y_test)
mse = MEAN_SQUARED_ERROR(y_pred, y_test)
r_squared = R_SQUARED(y_pred, y_test, MEAN(y_test))
print('Mean Absolute Error :',mae)
print('Mean Squared Error :',mse)
print('R-Squared Value :',r_squared)
```

Bonus:We can also plot our Linear Regression graph. In this code below, I plotted graph for training dataset.

```
# Salary vs. Years of experience (For Training Set)
import matplotlib.pyplot as plt
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, Y_PREDICTION(a, b, X_train), color='blue')
plt.title('Salary vs. Years of experience (For Training Set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()
```

I also plotted graph for Testing dataset.

```
# Salary vs. Years of experience (For Test Set)
plt.scatter(X_test, y_test, color='red')
plt.plot(X_test, y_pred, color='blue')
plt.title('Salary vs. Years of experience (For Test Set)')
plt.xlabel('Years of experience')
plt.ylabel('Salary')
plt.show()
```

➤ **If you want to learn, Datacamp course – “Supervised Learning with scikit-learn” – then click here.**

Thank you for reading my blog. If you have any query about this code, feel free to ask by comment. Thank you again.

Insightful piece

Thank you @MerleChhum 😊

I am no longer sure where you are getting your info,

however great topic. I needs to spend some time finding out more or

understanding more. Thank you for wonderful information I used to

be in search of this information for my mission.

Thank you for your kind words! I’m delighted that you found the topic intriguing. If you need further information or clarification on anything, I’m here to assist you. Feel free to reach out anytime as you continue on your mission. 😊

Excellent write-up

Thank you very much! I’m glad you enjoyed the write-up. If you have any more questions or need further assistance, feel free to reach out. I’m here to help! 😊