Contents
Main
Site map
Links
Site and author
News
Contact

Linear regression

Linear regression is one of the best-known regression methods. The main advantages of the algorithm are simplicity and high speed. There is only one disadvantage - its unsuitability for solving inherently nonlinear problems.

Contents

    Linear Regression in ALGLIB
    Algorithm Features and Improvements
    Manual entries

Linear Regression in ALGLIB

Operations with linear models are performed in two stages:

  • Linear model construction by calling one of the subroutines (subroutine choice will depend on the problem to be solved). Result is a LinearModel structure containing the model built.
  • Operations with the model (data processing, model copying/serialization, etc.)

Algorithm Features and Improvements

The linear regression algorithm that is included in the ALGLIB package uses singular value decomposition (SVD). However, there are a number of improvements in the algorithm, as compared with the classical approach to this problem:

  • First, prior to proceeding with the work, the ALGLIB package will standardize variables in order to improve condition number and to solve some of the problems, which may arise if variables are inadequately scaled.
  • Second, the ALGLIB package will calculate a cross-validation estimate of the generalization error using fast algorithm. This algorithm uses Sherman-Morrison formula to update SVD of the task matrix when one row is left out of the training set. Fast algorithm permits generalization error to be calculated in O(N·M) time (where N is the size of the training set, and M is the number of variables). For comparison, the time solve a linear regression problem is O(N·M 2), whereas the straightforward "leave-one-out" cross-validation will take O(N 2·M 2) time.

There are two distinctive features of fast cross-validation evaluation that should be mentioned. First, the "fast" formula is applicable only to non-degenerate problems: when dealing with a degenerate task, the dimension of the problem shall be reduced using the Principal Component Method. It is automatically done if necessary, but it slows down the algorithm approximately twice as much. Secondly, there are more refined types of degeneracy: in view of the solution of linear equations, the problem is non-degenerate, but the Sherman-Morrison formula turns inapplicable to some vectors of the training set (for some of the vectors which are ordinary in appearance, it leads to the division by zero). Theoretically, a training set may contain up to M+1 of such "defective" elements (out of the available N), although there is mostly none. These "defective" vectors are not taken into account when a cross-validation error estimate is made, which brings about some misrepresentation of the algorithm's result, but it is not a bug (in case somebody is determined to perform a testing of fast cross-validation algorithm, and accidentally runs into one of such vectors).

Manual entries

C++ linreg.h   
C# linreg.cs   
Delphi linreg.pas   
FreePascal linreg.pas   
VBA linreg.bas   

This article is intended for personal use only.

Download ALGLIB

C#

C# source.

alglib-2.4.0.csharp.zip

 

C++

C++ source.

alglib-2.4.0.cpp.zip

 

C++, multiple precision arithmetic

C++ source. MPFR/GMP is used.

GMP source is available from gmplib.org. MPFR source is available from www.mpfr.org.

alglib-2.4.0.mpfr.zip

 

FreePascal

FreePascal source.

alglib-2.4.0.freepascal.zip

 

Delphi

Delphi source.

alglib-2.4.0.delphi.zip

 

Visual Basic

VBA source.

alglib-2.4.0.vb6.zip

 


 
 
Sergey Bochkanov, Vladimir Bystritsky
Copyright © 1999-2010