Singular spectrum analysis (SSA, also known as Caterpillar-SSA) is a non-parametric time series analysis method. It can be used to filter out noise components or to predict future values.
ALGLIB package includes highly optimized SSA implementation available in several programming languages, including:
Our implementation of singular spectrum analysis includes following features (see below for more detailed discussion):
1 Getting started and examples
Getting started
Examples
2 Singular spectrum analysis (SSA) in ALGLIB
Creating SSA model object
Specifying dataset
Selecting/configuring SSA algorithm
Trend/noise separation
Forecasting
Other functions
Real-time and big-data features
3 Benchmarks
Backends: C++ (native) vs C# (managed)
Algorithms: direct solver vs real-time one
4 Downloads section
Singular spectrum analysis is provided by the ssa subpackage of ALGLIB package.
Everything starts with SSA model creation, which is performed by means of the ssacreate function.
This function initializes an instance of the ssamodel
class, which can be used to perform various kinds of activities.
We tried to make SSA models as easy as possible, so you can treat it as some kind of calculator. Just tell ALGLIB what data you have, and what kind of analysis you need: (1) create the model, (2) add the data to the model, (3) set analysis options, (4) perform an analysis.
Here analysis options are eigensolver being used, window width, components count and several other performance-related settings which are discussed below. And analysis can be basis extraction, trend/noise separation, forecasting (and extraction of linear recurrence relation coefficients).
We prepared several examples (in C++, C# and all other programming languages supported by ALGLIB) of SSA usage with ALGLIB:
Sections below discuss these examples in more details.
Singular spectrum analysis is provided by the ssa subpackage of ALGLIB package.
An SSA model is created with ssacreate function.
This function initializes an instance of the ssamodel
class,
which can be used to perform various kinds of activities.
SSA model works with single/multiple sequences of data. The most frequent case of time series analysis is a situation when you have "just one big sequence of values", i.e. all you data form single contiguous sequence. Such sequence can be specified with the ssaaddsequence function.
This function has "add" semantics, i.e. multiple calls to this function will result in multiple sequences being stored internally by the model object. Say, it is possible to store five sequences corresponding to five different periods (days/months/...). These sequences will be analyzed together to create a common basis, but solver will assume that they are independent, i.e. no overlap exists between different sequences.
Another situation is when you have a constant stream of arriving data and want to reuse results of the previous analysis as much as possible. Say, new values are appended to the dataset every second, and new basis constructed by SSA is almost same as one constructed one second earlier. In this case, you may use the ssaappendpointandupdate function, which adds one/several points to the end of the dataset and performs a quick update of the previously computed basis. There also exist similar function, ssaappendsequenceandupdate, which adds new independent sequence.
The most time-consuming part of SSA is an eigensolver which finds singular vectors. ALGLIB includes several singular spectrum analysis algorithms, which perform exactly same analysis, but use different approaches to compute singular vectors:
Component count K is selected when you select an SSA algorithm. Another important option is analysis window width, which is chosen with ssasetwindow function.
Most frequent use of singular spectrum analysis is to perform trend/noise separation. After you specified a dataset for analysis, an analysis algorithm, a window width and a components counts, you may perform analysis with one of the following functions: ssaanalyzesequence and ssaanalyzelast.
First one, ssaanalyzesequence, analyzes user-specified sequence (not necessarily one used for basis computation) and returns trend and noise components separated. Latter is a convenience wrapper which performs analysis of the last N ≥ WindowWidth values of the time series stored in the model.
SSA forecasting can be performed with one of the functions below:
You may want to extract some information internally stored in the model:
Real-time SSA algorithm can perform really fast updates of previously computed basis when new data arrive. Explicitly tell ALGLIB that new values are appended to the end of the dataset by calling ssaappendpointandupdate or ssaappendsequenceandupdate function, and it will be able to perform quick O(k·WindowWidth^{ 2}) basis update instead of performing O(SeriesLength·WindowWidth^{ 2}) calculation from scratch. Such algorithm greatly accelerates some "long-running" use cases.
From the other side, incremental updates are fast, but the initial calculation of the basis still has O(SeriesLength·WindowWidth^{ 2}) cost. Your program may perform well in the long run, but initial "power up" may be too slow. Thus, ALGLIB provides you with one more performance-related option: "delayed power-up", which is activated with the ssasetpoweruplength function. This function allows you to split costly initialization amongst T first iterations, with each of them having reduced O((SeriesLength/T)·WindowWidth^{ 2}) cost.
As result, first T attempts to perform an analysis will return somewhat inaccurate basis (1/T-th of the entire dataset will be used for the first analysis run, 2/T-th - for the second portion of incoming data, and so on), but eventually algorithm will "warm up" and start to return precise results.
ALGLIB package has several versions in various programming languages, relying on two primary backends: (a) native computational core, and (b) 100% managed computational core (.NET framework). In addition, commercial edition of ALGLIB for C# can access native computational core (with exactly the same C# API being provided).
Thus, leaving aside multithreading, we have three primary options to study:
Obviously, first implementation (pure C#) is the slowest one, whilst HPC implementation is the fastest. But what about exact numbers? Our first comparison involves single-threaded SSA of 7200 ticks long signal, with window width set to 256, and 10 principal components being extracted. Real-time version of SSA algorithm is used. This test was performed on 2.3GHz x64 Intel CPU, running Windows operating system.
Conclusion: native linear algebra wins! :)
Two most time-consuming phase of singular spectrum analysis is initial calculation of trajectory matrix product A=X'·X. Second, less costly phase, is eigenvalue decomposition of A. Furthermore, incremental updates to the model with ssaappendpointandupdate function add two more distinct phases: updating A and updating evd(A).
Different SSA solvers perform these phases in different manner:
On the chart below you may see running times for 4 subsequent basis calculations with ssagetbasis: initial basis calculation and three subsequent updates performed with same settings as ones used for our previous benchmark:
As you may see, "direct" solver has highest running times for both initial basis calculation and incremental updates. "Real-time" solver has comparable initial basis calculation time, but orders of magnitude faster incremental update. Finally, power-up feature allows us to spread computational load more evenly across iterations.
This article is licensed for personal use only.
ALGLIB Project offers you two editions of ALGLIB:
ALGLIB Free Edition:
delivered for free
offers full set of numerical functionality
extensive algorithmic optimizations
no low level optimizations
non-commercial license
ALGLIB Commercial Edition:
flexible pricing
offers full set of numerical functionality
extensive algorithmic optimizations
high performance (SMP, SIMD)
commercial license with support plan
Links to download sections for Free and Commercial editions can be found below: