Inverse distance weighting interpolation/fitting
Inverse distance weighting is a scattered data interpolation algorithm. There exists several variations of the algorithms, different both in conceptual and implementation aspects. ALGLIB package contains local version of inverse distance weighting algorithm, which generates C1-continuous interpolant and have O(N·logN) construction complexity and O(logN) interpolation complexity.
Inverse distance weighting: Shepard's method
Consider the simplest form of inverse distance weighting interpolation before proceeding to a more complex algorithm implemented in the ALGLIB. Let xi are points in D dimensional space, let yi are function values at xi . Then Shepard interpolant have following form:

Such algorithm have both benefits and drawbacks. Its benefits are:
- Extreme simplicity of implementation
- Lack of tunable parameters
- Ability to work in N-dimensional space
- Ability to interpolate scattered data and to work on ANY grid. Multicollinearity is not a problem, algorithm may work even when all points lie in a low-dimensional subspace. When properly implemented, it may even work with coinciding nodes!
However, it has a number of important drawbacks:
- Low performance on large datasets: interpolation complexity is O(N).
- Algorithm gives too much weight for distant nodes. Their total weight may be larger than weight of nearby nodes. It is more pronounced in high-dimensional spaces.
- Shepard method is a global method, which is a problem by itself. Interpolant is too sensitive to distant outliers.
- f(x) is flat at xi , i.e. it has zero derivative at the nodes.
For these reasons original Shepard's method is rarely used in practice.
Modified Shepard's method: scattered data interpolation
ALGLIB package contains more advanced version of inverse distance weighting interpolation algorithm - modified Shepard's method, first proposed by Robert J. Renka ('Multivariate Interpolation of Large Sets of Scattered Data', 1988). Modified interpolant have following form:

Modified Shepard's method differs from original algorithm in the following aspects:
- Only Nw points is used for interpolation - a set of Nw nearest neighbors of x, which is denoted by K.
- Weight functions Wi (x) has become more complex. Now they are equal to zero at the boundary of the sphere surrounding K.
- Instead of constant yi we've got nodal functions Qi (x). These functions may be quadratic, linear or constant (at your choice). Qi (x) is obtained as a result of weighted least squares fitting on a set of Nq nearest neighbors of xi with constraint Qi (xi )=yi .Formula (2) is used to calculate weights.
- kd-trees are used for efficient NN search with O(logN) complexity.
Main benefits of modified algorithm are:
- Ability to work in N-dimensional space
- Ability to interpolate scattered data and to work on ANY grid. Multicollinearity is not a problem, algorithm may work even when all points lie in a low-dimensional subspace. When properly implemented, it may even work with coinciding nodes!
- f(x) is C1-continuous (except for several special cases, see below)
- Good performance even for large datasets (achieved due to efficient NN search algorithm). Interpolant construction has O(N·logN) complexity, interpolation itself has O(logN) complexity.
- Absence of "flat spots" near interpolation nodes (when quadratic or linear functions are used)
- Locality of interpolation algorithm. f(x) depends only on nearest neighbors of x which significantly improves quality of interpolation.
- Easy generalization to regression task, i.e. ability to work with noisy data (see below).
- Only two tunable parameters: Nw è Nq (see below).
However, several drawbacks may be noted too:
- Modified Shepard's method is faster than original inverse distance weighting algorithm only for large datasets (hundreds of points) and only in low dimensional spaces (2-5 dimensions).
- Although algorithm may work on any grid, its performance degrades when points lie in a low-dimensional subspace. In such cases slow SVD-based least squares solver is used instead of faster QR-based one.
- In some rare cases f(x) may have discontinuities. Discontinuity will appear when more than Nw nearest neighbors of x are equidistant from x. In such cases k-NN search algorithm will choose neighbors at random, depending on roundoff errors or order at which they were extracted from search structure. However, f(x) will remain bounded, and increasing Nw will decrease probability of such situations. For instance, lower bound for Nw is set to prevent such errors issues on Cartesian grid.
However, these drawbacks did not outweigh the benefits of modified Shepard's algorithm, so we can recommend it as a standard tool for multidimensional interpolation (either scattered or regular).
Modified Shepard's method: scattered data fitting
Modified Shepard's method can be slightly modified to fit noisy data. The only modification is just calculation of Qi (x). First, constraint Qi (xi )=yi is removed; thus f(xi ) ≠ yi . Second, an equal weight is assigned to all points used to fit Qi (x).
Note #1
(1), (2) and (3) are left unchanged. All modifications are related to calculation of A, b and c from (3).
This algorithm should be applied only to noisy tasks. In absence of noise it have no benefits over interpolation algorithm.
Tuning: Nw and Nq
Nw and Nq parameters control number of nodes used during nodal functions construction (Nq parameter) and interpolation (Nw parameter):
- Nw parameter controls algorithm locality. Too large value will make interpolant too global (slow and unable to reproduce local changes in the function). From the other side, too small Nw will lead us to sharp and inaccurate interpolant (for example, Nw =1 is just a nearest neighbor interpolation). Good Nw is usually slightly larger than max(1.5·Nq ,2 D+1).
- Nq controls another aspect of locality: number of nodes used to build nodal functions Qi (x). Good nodal function must pass through (xi ,yi ) and it should approximately reproduce function in the neighborhood of xi . Too small Nq will make us unable to build good approximation. But too large Nq will lead to Qi (x) which reproduces global behavior of function, but is not very good where it is really needed: in the sphere of influence of xi . Good Nq is usually 1.5 times larger than number of free parameters of the nodal function: 1+D for a linear nodal function, (D+2)·(D+1)/2 for a quadratic nodal function.
Situation is somewhat different with fitting tasks. Meaning of Nw is still the same, but Nq now have two meanings: it controls both algorithm locality and noise suppression. Nq noisy points are combined to get noise-free picture, so the larger level of noise is, the larger Nq is needed.
Tuning: nodal functions
User can choose between four types of nodal functions:
- constant nodal function. This function is used in the original Shepard's method and it is left only for evaluation purposes.
- linear nodal function, whose coefficients are calculated using least squares fitting. This function provides better accuracy than constant function.
- quadratic nodal function, whose coefficients are calculated using least squares fitting. This function provides best quality as long as you have enough data to robustly calculate its coefficients. If your data are sparse, may be linear nodal function will perform better.
- "fast" linear nodal function, whose coefficients are calculated with fast algorithm (gradient interpolation) instead of least squares-based algorithm. This is a special nodal function type for problems where speed is more important than quality. Coefficients of such nodal function did't minimize some least squares-like error function, but usually they provide moderate precision. However, this kind of nodal functions is less robust and noise-tolerant than previous ones, so you should use it only when you can test its quality/stability.
As for fitting problems, user can choose between only two types of nodal functions: linear and quadratic. Quadratic nodal function is still best choice - as long as you have enough data to fight with the noise. If noise level is too high, or data are too sparse, linear nodal function may be better option.
Manual entries
This article is intended for personal use only.
Download ALGLIB