Principal component analysis

The Principal Component Analysis (PCA) is one of the dimension reduction methods consisting in the transfer to a new orthogonal basis, whose axes oriented in the directions of the maximum variance of an input data set. The variance is maximum along the first axis of the new basis, whilst the second axis will maximize variance subject to the first axis orthogonality, and so forth, the last axis having the least variance of all possible ones. Such transformation permits information to be reduced by rejecting the coordinates that correspond to the directions with a minimum variance. If one of the base vectors needs to be rejected, that should preferably be the vector along which the input data set is less changeable.

It may be noted that the PCA is based on the following assumptions:

As it is evident, these conditions are by no means always met. For example, if points of an input set are positioned on the surface of a hypersphere, no linear transformation can reduce dimension (nonlinear transformation, however, can easily cope with this task). This disadvantage is equally attributable to all linear algorithms, and it can be eliminated due to the use of complementary dummy variables that are nonlinear functions of the input data set elements (the so-called "kernel trick").

The second disadvantage of the PCA consists in the fact that the directions maximizing variance do not always maximize information. The page of the LDA subroutines gives an example of such a task, wherein the maximum-variance variable affords almost no information, whilst the minimum-variance variable permits classes to be wholly separated. In this case, the PCA will give preference to the first (less informative) variable. This drawback is closely connected to the fact that the PCA does not perform linear separation of classes, linear regression or other similar operations, but it merely permits the input vector to be best restored on the basis of the partial information about it. All additional information pertaining to the vector (such as the identification of an image with one of the classes) is ignored.

This article is licensed for personal use only.

Download ALGLIB for C++ / C# / ...

ALGLIB Project offers you two editions of ALGLIB:

ALGLIB Free Edition:
delivered for free
offers full set of numerical functionality
extensive algorithmic optimizations
no low level optimizations
non-commercial license

ALGLIB Commercial Edition:
flexible pricing
offers full set of numerical functionality
extensive algorithmic optimizations
high performance (SMP, SIMD)
commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB 3.13.0 for C++

C++ library.
Delivered with sources.
Monolithic design.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 3.13.0 for C#

C# library with native kernels.
Delivered with sources.
VB.NET and IronPython wrappers.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 3.13.0 for Delphi

Delphi wrapper around C core.
Delivered as precompiled binary.
Compatible with FreePascal.
Editions:   FREE   COMMERCIAL

ALGLIB 3.13.0 for CPython

CPython wrapper around C core.
Delivered as precompiled binary.
Editions:   FREE   COMMERCIAL