Nelder-Mead Simplex Optimization

The Nelder-Mead-Algorithm (also known as the “Simplex Algorithm” or even as the “Amoeba Algorithm”) is an algorithm for the minimization of non-linear functions in several variables. In contrast to other non-linear minimization methods, it does not require gradient information. This makes it less efficient, but also less prone to divergence problems. In contrast to other methods, it is not necessary for the minimum to be bracketed by the initial guess: the algorithm performs a limited “global” search. (It may still converge to a local, rather than the global extremum, of course.) Finally, the algorithm is fairly simple to implement as a stand-alone routine, which makes it a natural choice for multi-dimensional minimization if function evaluations are not prohibitively expensive.

Convention: The presentation in this post is phrased in terms of the minimization of a function, for greater clarity. Everything is equally applicable to maximization problems as well, with the objective function f being replaced by -f.

How It Works

The algorithm maintains a “simplex” of guesses for the location of the minimum. A simplex is a convex shape that is spanned by d+1 points, where d is the dimension of the space. In other words, in 2 dimensions, a simplex is a triangle, in 3 dimensions a simplex is a (generally irregular) tetrahedron, and so on.

At each step, the algorithm selects the vertex with the greatest function value, and probes several alternative positions for that vertex. It selects the best of these positions, until the minimum has been sufficiently accurately determined.

The algorithm probes three candidate positions for the vertex in question. All three lie on the line that passes through this vertex and through the centroid of all the other vertices. (A “centroid” is the arithmetic mean of vertices.) The first position is found by reflection of the initial vertex through the centroid. The second and third positions are found by extension and contraction along the same line. For instance, we may choose a point that is twice as far from the centroid as the reflected point (extension) or that is half-way between the centroid and the reflected point (contraction). The algorithm chooses the best among those three candidate points and replaces the initial vertex with that point. This process is now repeated with the vertex that now corresponds to the greatest function value, until the minimum has been determined.

It is because both the reflection and the extension step probe points outside of the original simplex that the algorithm can find solutions that are not bracketed by the original starting configuration. Instead, the simplex “crawls down” the function, by extending and contracting like an Amoeba (hence the name), until it reaches a (local) minimum.

The Implementation

The implementation follows the description on Wikipedia quite closely.

package main

import (
	"sort"
	"fmt" )

// NelderMead finds the minimum of the function f for a given starting guess.
// - The first argument is the objective function f to minimize. The objective
//   function takes a slice of length d, and returns a scalar float64.
// - The second argument is a slice of (d+1) slices, each of length d,
//   representing the starting points of the iteration.
// - The last argument contains stopping conditions, as a slice of 3 values
//   of type float64. The first is the maximum number of iteration steps (as
//   a float, but interpreted as an int), the second is the maximum allowed
//   relative deviation of the objective function at the centroid of the
//   simplex from any of the centroid vertices (that is, f(p)/f(c)-1.0), the
//   third is the maximum square distance of any of the vertices from the
//   centroid.
// NelderMead returns the number of iteration steps taken (as an int), the
// value of the objective function at the centroid (as float64), and a slice
// of length d with the final centroid coordinates.
//
// NelderMead overwrites the values in the second argument (with the starting
// guess)!
func NelderMead( f func( []float64 ) float64, p [][]float64,
	limits []float64 ) ( int, float64, []float64 ) {
	
	a, b, c, d := 1.0, 2.0, 0.5, 0.5 // fixed parameters
	maxitr, dplim, dflim := int(limits[0]), limits[1], limits[2]
	
    n := len(p) - 1  // dimension - must have n+1 starting pts!
	
    ctr := make( []float64, n, n )  // centroid
    ref := make( []float64, n, n )  // reflection
    ext := make( []float64, n, n )  // extension, expansion
    con := make( []float64, n, n )  // contraction

    for itr:=0; ; itr++ {		
        // sort points p by value of f
	    sort.Slice( p, func (i,j int) bool { return f(p[i]) < f(p[j]) } )

    	// centroid, without largest-valued pt
		for i:=0; i<n; i++ { ctr[i] = 0.0 }
		for j:=0; j<n-1; j++ {
			for i:=0; i<n; i++ {
				ctr[i] += p[j][i]
			}
		}
		for i:=0; i<n; i++ { ctr[i] /= float64(n-1) }
		fc := f(ctr)
		
		// check termination
		df := 0.0                     // MAX (!) deviation of f(p) from fc
		for j:=0; j<n+1; j++ {
			tmp := f(p[j])/fc - 1.0        // ( f(p[j]) - fc )/fc
			if tmp < 0.0 { tmp = -tmp }    // abs value
			if tmp > df { df = tmp }
		}

		dp := 0.0                     // MAX of deviation of p from ctr
		for j:=0; j<n+1; j++ {
			tmp := 0.0
			for i:=0; i<n; i++ {
				tmp += ( p[j][i] - ctr[i] )*( p[j][i] - ctr[i] )
			}
            if tmp > dp { dp = tmp }  // NOT relative, absolute bounding
		}
		
		if itr > maxitr || df < dflim || dp < dplim {
			return itr, fc, ctr       // terminate
		}
		
		// reflection
		for i:=0; i<n; i++ {
			ref[i] = ctr[i] + a * ( ctr[i] - p[n][i] )
		}
		fr := f(ref)
		if f( p[0] ) <= fr && fr < f( p[n-1] ) {
			for i:=0; i<n; i++ {
				p[n][i] = ref[i]
			}
			continue
		}

        // extension
		if fr < f(p[0]) {
			for i:=0; i<n; i++ {
				ext[i] = ctr[i] + b * ( ref[i] - ctr[i] ) 
			}
			if f(ext) < fr {
				for i:=0; i<n; i++ {
					p[n][i] = ext[i]
				}
			} else {
				for i:=0; i<n; i++ {
					p[n][i] = ref[i]
				}
			}
			continue
        }

        // contraction
		if fr < f( p[n] ) {
			for i:=0; i<n; i++ {
				con[i] = ctr[i] + c * ( ref[i] - ctr[i] ) // outside
			}
			if f(con) < fr {
				for i:=0; i<n; i++ {
					p[n][i] = con[i]
				}
			}
			continue
			
		} else {
			for i:=0; i<n; i++ {
				con[i] = ctr[i] + c * ( p[n][i] - ctr[i] ) // inside
			}
			if f(con) < f(p[n]) {
				for i:=0; i<n; i++ {
					p[n][i] = con[i]
				}
			}
			continue
		}

        // shrink
		for j:=1; j<n+1; j++ {
			for i:=0; i<n; i++ {
				p[j][i] = p[0][i] + d * ( p[j][i] - p[0][i] )
			}
		}
    } // end for
	
	panic( "Never get here!" )
}

The only somewhat delicate point concerns the question how to terminate the iteration. A number of different stopping criteria can be imagined, but some of them can be fooled into stopping the iteration prematurely.

In my opinion, the most robust condition is based on the size of the simplex: an upper bound on the distances between the centroid and all the vertices fixes the location of the minimum. If only the minimum value of the function is needed, but not its location, then bounding the variation of the function value across all vertices may lead to faster termination, in particular for functions that are “flat” near their minimum.

By contrast, my experience with conditions that focus on the “rate of improvement” (the reduction in the function value from one iteration step to the next, or the reduction in size of the simplex) is less good. The reason is that the present algorithm occasionally makes steps that predominantly reshape the simplex, without improving the objective function significantly. Stopping criteria that end the iteration when the “rate of improvement” is small are easily confused by such behavior and may end the iteration too early.

Usage

The examples below show how to use the current implementation of the Nelder-Mead Simplex algorithm. Notice that the dimensionality of the space does not need to be specified explicitly; instead, it is inferred from the dimensionality of the starting points.

The examples below use a simple function, with a well-defined, isolated minimum, and the Rosenbrock function, which is often used as test function for optimization algorithms.

func simple() {
	f := func( x []float64 ) float64 {
		return (x[0]-.3)*(x[0]-.3) + 2.0*x[1]*x[1] + 0.5
	}
	start := [][]float64{ []float64{1,1}, []float64{2,1}, []float64{-1,2}, }
	limits := []float64{50., 1e-9, 1e-9}
	fmt.Println( "Simple, 2dim" )
	fmt.Println( NelderMead( f, start, limits ) )
}

func rosenbrock( x []float64 ) float64 {
	s := 0.0
	for i:=0; i<len(x)-1; i++ {
		s += 100.0*(x[i+1]-x[i])*(x[i+1]-x[i]) + (1.0 - x[i])*(1.0 - x[i])
	}
	return s
}

func rosenbrock_2d() {
	start := [][]float64{ []float64{1,1}, []float64{2,1}, []float64{-1,2}, }
	limits := []float64{50., 1.e-9, 1e-9}
	fmt.Println( "Rosenbrock, 2dim" )
	fmt.Println( NelderMead( rosenbrock, start, limits ) )	
}

func rosenbrock_3d() {
	start := [][]float64{ []float64{1,1,1}, []float64{2,1,2},
		[]float64{-1,2,-1},	[]float64{-1,-1,-1} }
	limits := []float64{50., 1e-9, 1e-9}
	fmt.Println( "Rosenbrock, 3dim" )	
	fmt.Println( NelderMead( rosenbrock, start, limits ) )
}

func main() {
	simple()
	rosenbrock_2d()
	rosenbrock_3d()
}

Resources