The Principal Component Analysis (PCA) is one of the dimension reduction methods consisting in the transfer to a new orthogonal basis, whose axes oriented in the directions of the maximum variance of an input data set. The variance is maximum along the first axis of the new basis, whilst the second axis will maximize variance subject to the first axis orthogonality, and so forth, the last axis having the least variance of all possible ones. Such transformation permits information to be reduced by rejecting the coordinates that correspond to the directions with a minimum variance. If one of the base vectors needs to be rejected, that should preferably be the vector along which the input data set is less changeable.

It may be noted that the PCA is based on the following assumptions:

- The assumption that the dimensionality of data can be efficiently reduced by linear transformation
- The assumption that most information is contained in those directions where input data variance is maximum.

As it is evident, these conditions are by no means always met. For example, if points of an input set are positioned on the surface of a hypersphere, no linear transformation can reduce dimension (nonlinear transformation, however, can easily cope with this task). This disadvantage is equally attributable to all linear algorithms, and it can be eliminated due to the use of complementary dummy variables that are nonlinear functions of the input data set elements (the so-called "kernel trick").

The second disadvantage of the PCA consists in the fact that the directions maximizing variance do not always maximize information. The page of the LDA subroutines gives an example of such a task, wherein the maximum-variance variable affords almost no information, whilst the minimum-variance variable permits classes to be wholly separated. In this case, the PCA will give preference to the first (less informative) variable. This drawback is closely connected to the fact that the PCA does not perform linear separation of classes, linear regression or other similar operations, but it merely permits the input vector to be best restored on the basis of the partial information about it. All additional information pertaining to the vector (such as the identification of an image with one of the classes) is ignored.

*This article is intended for personal use only.*

ALGLIB Project offers you two editions of ALGLIB:

delivered for free

offers full set of numerical functionality

single-threaded, no low-level optimizations

non-commercial license (GPL or Personal/Academic)

flexible pricing

offers full set of numerical functionality

high performance (multithreading, SIMD, Intel MKL)

commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB for C++

C++ library.Delivered with sources.

Monolithic design.

Extreme portability.

ALGLIB for Delphi

Delphi wrapper around generic C core.Delivered as precompiled binary.

Compatible with FreePascal.

ALGLIB for C#

Generic C# library.Delivered with sources.

VB.NET and IronPython wrappers.

Extreme portability.

ALGLIB® - numerical analysis library, 1999-2017.

ALGLIB is a registered trademark of the ALGLIB Project.

Policies for this site: privacy policy, trademark policy.

ALGLIB is a registered trademark of the ALGLIB Project.

Policies for this site: privacy policy, trademark policy.