The Principal Component Analysis (PCA) is one of the dimension reduction methods consisting in the transfer to a new orthogonal basis, whose axes oriented in the directions of the maximum variance of an input data set. The variance is maximum along the first axis of the new basis, whilst the second axis will maximize variance subject to the first axis orthogonality, and so forth, the last axis having the least variance of all possible ones. Such transformation permits information to be reduced by rejecting the coordinates that correspond to the directions with a minimum variance. If one of the base vectors needs to be rejected, that should preferably be the vector along which the input data set is less changeable.

It may be noted that the PCA is based on the following assumptions:

- The assumption that the dimensionality of data can be efficiently reduced by linear transformation
- The assumption that most information is contained in those directions where input data variance is maximum.

As it is evident, these conditions are by no means always met. For example, if points of an input set are positioned on the surface of a hypersphere, no linear transformation can reduce dimension (nonlinear transformation, however, can easily cope with this task). This disadvantage is equally attributable to all linear algorithms, and it can be eliminated due to the use of complementary dummy variables that are nonlinear functions of the input data set elements (the so-called "kernel trick").

The second disadvantage of the PCA consists in the fact that the directions maximizing variance do not always maximize information. The page of the LDA subroutines gives an example of such a task, wherein the maximum-variance variable affords almost no information, whilst the minimum-variance variable permits classes to be wholly separated. In this case, the PCA will give preference to the first (less informative) variable. This drawback is closely connected to the fact that the PCA does not perform linear separation of classes, linear regression or other similar operations, but it merely permits the input vector to be best restored on the basis of the partial information about it. All additional information pertaining to the vector (such as the identification of an image with one of the classes) is ignored.

*This article is licensed for personal use only.*

ALGLIB Project offers you two editions of ALGLIB:

**ALGLIB Free Edition**:

delivered for free

offers full set of numerical functionality

extensive algorithmic optimizations

no low level optimizations

non-commercial license

**ALGLIB Commercial Edition**:

flexible pricing

offers full set of numerical functionality

extensive algorithmic optimizations

high performance (SMP, SIMD)

commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

C++ library.

Delivered with sources.

Monolithic design.

Extreme portability.

Delivered with sources.

Monolithic design.

Extreme portability.

C# library with native kernels.

Delivered with sources.

VB.NET and IronPython wrappers.

Extreme portability.

Delivered with sources.

VB.NET and IronPython wrappers.

Extreme portability.

Delphi wrapper around C core.

Delivered as precompiled binary.

Compatible with FreePascal.

Delivered as precompiled binary.

Compatible with FreePascal.

CPython wrapper around C core.

Delivered as precompiled binary.

Delivered as precompiled binary.