L-BFGS algorithm for multivariate optimization
|
The Newton method is a classical method. It is studied first in the "Optimization Methods" course. This old method isn't used, but it becomes a basis for a whole family of quasi-Newton methods. One of those methods is the L-BFGS algorithm.
Quasi-Newton methods
The classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form. In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighbourhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing).
Quasi-Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases.
It was stated above that we perform a step using the Newton method. Actually, it is not exactly so - in that way we just define a direction in which the step will be performed. Some modifications of quasi-Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it's enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value.
LBFGS Hessian update scheme
The Hessian of a function isn't always available, more often we can only calculate the function gradient. Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the quasi-Newton step is performed. There is a special formulas which allows to iteratively get a Hessian approximation. On each step approximation, the matrix remains positive definite. The algorithm uses the BFGS update scheme. BFGS stands for Broyden-Fletcher-Goldfarb-Shanno (more precisely, this scheme generates not the Hessian, but its inverse matrix, so we don't have to waste time inverting a Hessian).
The L letter in the scheme name comes from the words "limited memory". In case of big dimensions, the amount of memory required to store a Hessian (N 2) is too big, along with the machine time required to process it. Therefore, instead of using N gradient values to generate a Hessian we can use a smaller number of values, which requires a memory capacity of order of N·M. In practice, M is usually chosen from 3 to 7, in difficult cases it is reasonable to increase this constant to 20. Of course, as a result we'll get not the Hessian but its approximation. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this statement is paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result.
As a matter of fact, initially this method was designed to optimize the functions of a number of arguments (hundreds and thousands), because in this case it is worth having an increasing iteration number due to the lower approximation precision because the overheads become much lower. But we can use these methods for small dimension problems too. The main advantage of the method is scalability, because it provides high performance when solving high dimensionality problems, and it allows to solve small dimension problems too.
Difference scheme and analytical gradient
Do not calculate the function gradient on the basis of a two-point difference formula because it is insufficiently precise. In a number of cases, the algorithm will not be able to work and will return an error message. Use at least a four-point formula or analytical form of the gradient.
L-BFGS algorithm with bound constraints
In a number of cases it is required to solve constrained problems. In general, constraints could be given both as equalities and inequalities. A special case is the so-called bound constraints: li ≤ xi ≤ ui (either all or some variables could be limited either from above or from below). The algorithm solving an unconstrained problem is described on this page. If it is required to solve a constrained problem, you can use L-BFGS-B algorithm. This variant of algorithm is specially designed to solve problems with bound constraints.
The description of the LBFGSMinimize subroutine
This subroutine uses the FuncGrad subroutine which calculates the value of the function F and gradient G in point X. It should be noted that FuncGrad subroutine doesn't need to waste time for memory allocation of array G, because the memory is allocated in calling the subroutine before the first call of FuncGrad. Setting a dimension of array G each time when calling a subroutine will excessively slow down an algorithm.
Report a bug
C#
C# 1.0 source.
lbfgs.csharp.zip - L-BFGS algorithm for multivariate optimization
C++
C++ source.
lbfgs.cpp.zip - L-BFGS algorithm for multivariate optimization
ablas.zip - optimized basic linear algebra subroutines with SSE2 support (for C++ sources only)
Delphi
Delphi source.
Can be compiled under FPC (in Delphi compatibility mode).
lbfgs.delphi.zip - L-BFGS algorithm for multivariate optimization
Visual Basic 6
Visual Basic 6 source.
lbfgs.vb6.zip - L-BFGS algorithm for multivariate optimization
Zonnon beta
Zonnon source.
Zonnon is an experimental language developed at ETH Zurich.
See www.zonnon.ethz.ch for more information.
lbfgs.zonnon.zip - L-BFGS algorithm for multivariate optimization
|