Goal: Find the best-fit line for a set of data points . “Best-fit” is defined as minimizing the sum of squared errors.
1. Error Function (Sum of Squared Errors):
We define the error, , as the sum of the squared differences between the actual values and the predicted values () from our line:
where n is number of element.
2. Minimizing the Error:
To minimize , we find where its partial derivatives with respect to (slope) and (y-intercept) are equal to zero. This is because at a minimum (or maximum) of a function, the slope of the function in any direction is zero.
3. Partial Derivatives:
First calculate partial derivative:
- Partial derivative with respect to :
- Partial derivative with respect to :
4. Setting Derivatives to Zero:
To find the minimum, set both partial derivatives equal to zero:
5. Simplifying the Equations:
Expand and rearrange the equations:
- (Equation 1)
- (Equation 2)
6. Solving for c:
From Equation 2, isolate :
Recognize that (the mean of the values) and (the mean of the values):
(Equation 3)
7. Solving for m:
Substitute Equation 3 into Equation 1:
Multiply and divide equation by n:
Divide num and den by :
This is equivalent to: or
where,
8. regression of x on y:
where,
Key Takeaways (for the exam):
- Error Function:
- Derivatives: Set and
- Slope (m):
- Intercept (c):
- By=, means y is independent and x is dependent.