Nelder-Mead Method

Standard

A while without posting…

Well, this is part of a set of exercises that I’ll be solving for, you know, personal stuff. I’ll do Copy-Paste of some stuff just to set the context and the idea, which was taken from the book Numerical Methods Using Matlab by John H. Mathews and Kurtis K. Fink. The implementation shown at the end was done all by me using Python.

 

Nelder-Mead Method

A simplex method for finding a local minimum of a function of several variables has been devised by Nelder and Mead. For two variables, a simplex is a triangle, and the method is a pattern search that compares function values at the three vertices of a triangle. The worst vertex, where f (x, y) is largest, is rejected and replaced with a new vertex. A new triangle is formed and the search is continued. The process generates a sequence of triangles (which might have different shapes), for which the function values at the vertices get smaller and smaller. The size of the triangles is reduced and the coordinates of the minimum point are found.

The algorithm is stated using the term simplex (a generalized triangle in N dimensions) and will find the minimum of a function of N variables. It is effective and computationally compact.

Initial Triangle BGW

Let f (x, y) be the function that is to be minimized. To start, we are given three vertices of a triangle: V_k = (x_k, y_k), k = 1, 2, 3. The function f (x, y) is then evaluated at each of the three points: z_k = f (x_k , y_k ) for k = 1, 2, 3. The subscripts are then reordered so that z_1 \leq z_2 \leq z_3. We use the notation

B =(x_1,y_1), G =(x_2,y_2), and W =(x_3,y_3)

to help remember that B is the best vertex, G is good (next to best), and W is the worst vertex.

Midpoint of the Good Side

The construction process uses the midpoint of the line segment joining B and G . It is found by averaging the coordinates:

 M = \dfrac{B + G}{2} = \left(\dfrac{x_1 + x_2}{2},\dfrac{y_1 + y_2}{2} \right).

Reflection Using the Point R

The function decreases as we move along the side of the triangle from W to B, and it decreases as we move along the side from W to G. Hence it is feasible that f (x, y) takes on smaller values at points that lie away from W on the opposite side of the line between B and G. We choose a test point R that is obtained by “reflecting” the triangle through the side \overline{BG}. To determine R, we first find the midpoint M of the side \overline{BG}. Then draw the line segment from W to M and call its length d. This last segment is extended a distance d through M to locate the point R (see Figure 8.6). The vector formula for R is

R = M + (M - W) = 2M - W.

screen-shot-2017-02-24-at-2-32-52-pm

Expansion Using the Point E

If the function value at R is smaller than the function value at W , then we have moved in the correct direction toward the minimum. Perhaps the minimum is just a bit farther than the point R. So we extend the line segment through M and R to the point E. This forms an expanded triangle BGE. The point E is found by moving an additional distance d along the line joining M and R (see Figure 8.7). If the function value at E is less than the function value at R, then we have found a better vertex than R. The vector formula for E is

E = R + (R - M) = 2R -M.

screen-shot-2017-02-24-at-2-37-21-pm

Contraction Using the Point C

If the function values at R and W are the same, another point must be tested. Perhaps the function is smaller at M, but we cannot replace W with M because we must have a triangle. Consider the two midpoints C_1 and C_2 of the line segments \overline{WM} and \overline{MR}, respectively (see Figure 8.8). The point with the smaller function value is called C, and the new triangle is BGC. Note. The choice between C_1 and C_2 might seem inappropriate for the two-dimensional case, but it is important in higher dimensions.

screen-shot-2017-02-24-at-2-41-52-pm

Shrink toward B

If the function value at C is not less than the value at W, the points G and W must be shrunk toward B (see Figure 8.9). The point G is replaced with M, and W is replaced with S, which is the midpoint of the line segment joining B with W.

screen-shot-2017-02-24-at-2-42-01-pm

Logical Decisions for Each Step

A computationally efficient algorithm should perform function evaluations only if needed. In each step, a new vertex is found, which replaces W. As soon as it is found, further investigation is not needed, and the iteration step is completed. The logical details for two-dimensional cases are explained in Table 8.5.

screen-shot-2017-02-24-at-2-44-13-pm

Testing the implementation on a provided example

Use the Nelder-Mead algorithm to find the minimum of f (x, y) = x2 - 4x + y2 - y - xy. Start with the three vertices

V_1 = (0,0)\text{, }V_2 = (1.2,0.0)\text{, }V_3 = (0.0,0.8).

The process done in the example generates a sequence of triangles that converges down on the solution point (see Figure 8.10). Table 8.6 gives the function values at vertices of the triangle for several steps in the iteration. (Again, this results are given by the author, my implementation results are presented at the end.)

screen-shot-2017-02-24-at-2-51-30-pm

screen-shot-2017-02-24-at-2-52-00-pm

Now, my results are shown. The 3D plot of the graph in [0,5]\times[0,5] looks as follows:

fig1

So, it’s easy to see that this function has a local minimum. Now, running the code for 10 iterations, we obtain the following data:

screen-shot-2017-02-24-at-2-56-51-pm

The iterative plotting in two dimensions (because a picture is worth a thousand words) of the approximation to the local minimum is shown below. The cyan triangle is the current triangle in the iteration, and the red dot is the current minimum.

gif1

I hope you find this post useful, even without explaining the code (check my Github for it).

References:

  1. Numerical Methods Using Matlab, 4th Edition, 2004. John H. Mathews and Kurtis K. Fink.
  2. NumPy Reference. (https://docs.scipy.org/doc/numpy/reference/)
  3. Matplotlib. (http://matplotlib.org/)

One thought on “Nelder-Mead Method

Leave a comment