Contents
Main
Site map
Links
Site and author
News
Contact

Principal component analysis

The Principal Component Analysis (PCA) is one of the dimension reduction methods consisting in the transfer to a new orthogonal basis, whose axes oriented in the directions of the maximum variance of an input data set. The variance is maximum along the first axis of the new basis, whilst the second axis will maximize variance subject to the first axis orthogonality, and so forth, the last axis having the least variance of all possible ones. Such transformation permits information to be reduced by rejecting the coordinates that correspond to the directions with a minimum variance. If one of the base vectors needs to be rejected, that should preferably be the vector along which the input data set is less changeable.

It may be noted that the PCA is based on the following assumptions:

  • The assumption that the dimensionality of data can be efficiently reduced by linear transformation
  • The assumption that most information is contained in those directions where input data variance is maximum.

As it is evident, these conditions are by no means always met. For example, if points of an input set are positioned on the surface of a hypersphere, no linear transformation can reduce dimension (nonlinear transformation, however, can easily cope with this task). This disadvantage is equally attributable to all linear algorithms, and it can be eliminated due to the use of complementary dummy variables that are nonlinear functions of the input data set elements (the so-called "kernel trick").

The second disadvantage of the PCA consists in the fact that the directions maximizing variance do not always maximize information. The page of the LDA subroutines gives an example of such a task, wherein the maximum-variance variable affords almost no information, whilst the minimum-variance variable permits classes to be wholly separated. In this case, the PCA will give preference to the first (less informative) variable. This drawback is closely connected to the fact that the PCA does not perform linear separation of classes, linear regression or other similar operations, but it merely permits the input vector to be best restored on the basis of the partial information about it. All additional information pertaining to the vector (such as the identification of an image with one of the classes) is ignored.

Manual entries

C++ pca.h   
C# pca.cs   
Delphi pca.pas   
FreePascal pca.pas   
VBA pca.bas   

This article is intended for personal use only.

Download ALGLIB

C#

C# source.

alglib-2.4.0.csharp.zip

 

C++

C++ source.

alglib-2.4.0.cpp.zip

 

C++, multiple precision arithmetic

C++ source. MPFR/GMP is used.

GMP source is available from gmplib.org. MPFR source is available from www.mpfr.org.

alglib-2.4.0.mpfr.zip

 

FreePascal

FreePascal source.

alglib-2.4.0.freepascal.zip

 

Delphi

Delphi source.

alglib-2.4.0.delphi.zip

 

Visual Basic

VBA source.

alglib-2.4.0.vb6.zip

 


 
 
Sergey Bochkanov, Vladimir Bystritsky
Copyright © 1999-2010