Display the regression line

If there are three or more points given in the plane, usually one can not precisely specify a line which passes through all the points any more. The aim of the linear regression is to approximate such a point cloud as well as possible by a straight line.

The method of least squares is quite common here: If the points (x₁,y₁),...,(x_n,y_n) are given, a line y=ax+b is in search. As the first step for all points the vertical distance y_i-(ax_i+b) between the point and the straight line is calculated. This distance then is squared and provides the residuals (y_i-(ax_i+b))². Now the residuals will be summed up and a and b are determined so that this sum is as small as possible. This minimization problem can be solved, it will become the final formulas for a and b in general.

The regression line can be calculated in any case, even if a linear regression is not meaningful. In applications unfortunately one can not always decide by the conditions whether a linear relationship as a description is useful or not. Therefore, measures of the quality of the approximation of the point cloud by the regression line are needed. The residual sum can not not be used directly as an evaluation measure, because their value depends on the magnitude of the x and y values. If, for example, other units are selected, the residual sum changes. One solution of this is the correlation coefficient: It provides independent of sample size and the magnitude of the x and y values always a number between -1 and 1. On values close to 1 or -1, the points can be well matched by a rising or falling straight line. On values near 0, the adaptation of the point cloud by a straight line is rather inappropriate as a model description.

From a statistical point of view the linear regression is a simple form of a linear model: A random value, which is taking into account the measurement error, is added to the straight line ax+b. It is very common to assume that the error is normally distributed with mean of 0, i.e. that the observations are random variables Y₁,...,Y_n with Y_i=ax_i+b+Z_i, where the Z_i are normally distributed with an expected value of 0 and an unknown variance σ². If there is the task to estimate the values a and b based on a concrete sample, for a and b just the estimators of the least squares method will appear.

In this model, the square of the correlation coefficient, the coefficient of determination, has an important interpretation: The fact that the y_i are not all the same is due to the fact that that different values x_i have been set and that the error term is normally distributed. The coefficient of determination now gives the part of the variation of y_iwhich can be explained by x_i. For values close to 1 a very large part of the variation of y_i can be explained by the adjusting of the x_i, i.e. by the linear regression. For values close to 0, the variation of the y_i is mainly due to the normally distributed error and can hardly be explained by the regression.

Function of the interactive figure

By clicking in the coordinate system, you can create a point cloud. for which the corresponding regression line will be calculated and displayed.

Current line:
Residual sum of the line:
Empirical correlation coefficient:

Name	Purpose	Lifetime	Type	Provider
_pk_id	Used to store a few details about the user such as the unique visitor ID.	13 months	HTML	Matomo
_pk_ref	Used to store the attribution information, the referrer initially used to visit the website.	6 months	HTML	Matomo
_pk_ses	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_cvar	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

Display the regression line

Function of the interactive figure

Info

Portals

Weather & Webcam

Social Media

Content

Content

Content

Display the regression line