Goal: Find the best-fit line for a set of data points . “Best-fit” is defined as minimizing the sum of squared errors.

1. Error Function (Sum of Squared Errors):

We define the error, , as the sum of the squared differences between the actual values and the predicted values () from our line:

where n is number of element.

2. Minimizing the Error:

To minimize , we find where its partial derivatives with respect to (slope) and (y-intercept) are equal to zero. This is because at a minimum (or maximum) of a function, the slope of the function in any direction is zero.

3. Partial Derivatives:

First calculate partial derivative:

  • Partial derivative with respect to :

  • Partial derivative with respect to :

4. Setting Derivatives to Zero:

To find the minimum, set both partial derivatives equal to zero:

5. Simplifying the Equations:

Expand and rearrange the equations:

  • (Equation 1)
  • (Equation 2)

6. Solving for c:

From Equation 2, isolate :

Recognize that (the mean of the values) and (the mean of the values):

(Equation 3)

7. Solving for m: Substitute Equation 3 into Equation 1:

Multiply and divide equation by n:

Divide num and den by :

This is equivalent to: or

where,

8. regression of x on y:

where,

Key Takeaways (for the exam):

  • Error Function:
  • Derivatives: Set and
  • Slope (m):
  • Intercept (c):
  • By=, means y is independent and x is dependent.