Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
DISCRETE BARYCENTER METHOD FOR DIRECT OPTIMIZATION
´ n∗, Felipe Miguel Pait∗
Guilherme Scabin Vicinansa∗, Diego Colo
∗
Laborat´
orio de Automa¸ca
˜o e Controle - LAC/PTC, Escola Polit´ecnica da USP
Av. Prof. Luciano Gualberto, trav 3, 158
Cidade Universit´
aria, S˜
ao Paulo, BRAZIL
Emails: [email protected], [email protected], [email protected]
Abstract— The Discrete Barycenter Method is a direct, derivative–free optimization algorithm which uses
the weighted average values of a sequence of guesses to search for minimum points of a function. Good guesses
receive larger weights while bad ones are penalized. The method converges with a superlinear order. Simulations
show that its performance is at least comparable to that of existing methods, and examples of applications to
controller design are presented. Finally, conclusions and suggestions of future work are presented.
Keywords—
Direct Optimization, Barycenter.
Resumo— O M´
etodo do Baricentro Discreto ´
e um algoritmo de otimiza¸ca
˜o sem derivadas que utiliza a m´
edia
ponderada de estimativas para busca de m´ınimos de uma certa fun¸c˜
ao custo. Boas estimativas recebem pesos
maiores, enquanto que as piores s˜
ao penalizadas. O m´
etodo do baricentro discreto converge com ordem superlinear. S˜
ao realizadas simula¸c˜
oes que mostram que o desempenho deste m´
etodo ´
e, no m´ınimo, compar´
avel ao
dos existentes atualmente, e exemplos de aplica¸c˜
ao em projetos de controladores s˜
ao apresentados. Ao final,
conclus˜
oes e sugest˜
oes de trabalhos futuros s˜
ao apresentadas.
Palavras-chave—
1
Otimiza¸ca
˜o Direta, Baricentro.
randomly choose a curiosity that is added to the
estimate, and provides the dynamics of the search.
Each new estimate is considered in the calculation
of next barycenter, and the goal of the algorithm
is for the sequence of barycenters to converge to a
minimum.
This paper is organized as follows: in Section
2, we define the method and its parameters, and
show an equivalent expression that is convenient
for analysis. In Section 3, we prove convergence
and superlinear convergence order. In Section
4, we compare the barycenter method with the
Compass Search Method (Kolda et al., 2003) and
Particle Swarm Optimization (Kennedy and Eberhart, 1995), a meta–heuristic method that has become popular in recent years. In Section 5, we design a proportional and a proportional plus derivative (PD) controllers using the Discrete Barycenter Method. Conclusions about the method’s performance and directions for future work are presented in section 6.
Throughout this paper || · || is the usual Euclidian norm in Rm . Also R+ and R∗+ denote respectively the set of positive and strictly positive
real numbers.
Introduction
Although the success of derivative based optimization methods, such as Newton, quasi–Newton, and
steepest descent is fully demonstrated in many applications, there are cases in which they cannot be
applied, for instance when the derivatives are unavailable or cannot be calculated reliably (Conn
et al., 2008). To solve those problems, one may
use direct search methods, also called derivative–
free methods.
Direct search methods arose in the 1950’s,
but their golden age was the 1960’s (Lewis et al.,
1998), when several successful applications were
proposed. Because of the lack of convergence
proofs, and the fact that direct search methods
were developed heuristically, they suffered neglect
from the mathematical optimization community
(Kolda et al., 2003). Nonetheless, interest in such
methods continued for three main reasons:
• Many work well in practice and some are very
intuitive.
• They normally are easier to implement than
derivative based methods.
• They often work well when other methods
fail.
2
The Discrete Barycenter Method
In this section, we define the main concepts related to the barycenter method.
We present the Discrete Barycenter Method,
a new direct search algorithm that uses a weighted
mean to search for the minimum point of a nonlinear function and solve unrestricted optimization
problems. This weighted mean is found much in
the same manner as in the calculation of the center
of mass of a set of particles. The main characteristic of the recursive version of the method is to
Definition 2.1 (Curiosity) The curiosity of the
method is a nonzero vector b ∈ Rm with random
components bi ≥ 0, ∀i ∈ {1, ..., m}.
Roughly speaking, the higher the curiosity, the
greater the search region, that is, for larger curios-
3565
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
ity, the method is more appropriate for searching
for a global minimum, and if the curiosity is small,
the method will search locally.
3
Definition 2.2 (Discrete Barycenter Method)
Let z(x) be an objective functions from Rm → R+
and µ ∈ R∗+ . The sequences {xn }n ∈ Rm and
{ˆ
xn }n ∈ Rm are given by:
Pn
−µ·zk
k=0 xk · e
x
ˆn = P
n
−µ·zk
k=0 e
In order to prove convergence in Theorem 2, we
require Lemma 1, which provides a recursive form
for the Discrete Barycenter Method.
Lemma 1 For n ≥ 1, equations (1) and (2) are
equivalent to:
(1)
xn+1 = x
ˆn + bn+1 ,
Convergence
x
ˆn = x
ˆ0 +
n
X
bk · e−µ·zk
k=1
(2)
Sk
(3)
where
where
Sk =
k
X
1. bn is the curiosity sequence,
e−µ·zi ;
i=0
2. zk is the value of the objective function z(x)
at the point xk , and
and if n = 0
3. {ˆ
xn }n is the sequence of barycenters.
with x0 an initial condition.
x
ˆ 0 = x0
Proof: By induction, for n = 1 equation (1)
reads:
We wish that the latter sequence approaches
the minimum point. The idea behind the method
is that if the value of objective function is small,
then the corresponding point receives a large
weight; and conversely if the value of the function is large, the weight will be small and the
point will contribute little to the barycenter computation. Notice that even if the barycenter approaches the minimum, a high curiosity can take
it away from there. So some modifications in the
method should be implemented.
In the following, we define another parameter
that can be used to improve convergence.
x
ˆ1 = x
ˆ0 + b1 ·
e−µ·z1
S1
e−µ·z1
e−µ·z0 + e−µ·z1
−µ·z0
x
ˆ0 · e
+x
ˆ0 · e−µ·z1 + b1 · e−µ·z1
=
S1
x
ˆ0 · e−µ·z0 + e−µ·z1 (ˆ
x0 + b1 )
=
S1
x
ˆ0 · e−µ·z0 + x1 · e−µ·z1
=
S1
−µ·z0
x0 · e
+ x1 · e−µ·z1
=
S1
P1
xk · e−µ·zk
= k=0
S1
=x
ˆ0 + b1 ·
Definition 2.3 (Shape Factor) The shape factor is:
γ
zˆn−1
maxi∈{1,...,n−1} {ˆ
zi }
which is a particular case of (1). For n = s + 1
where:
1. zˆk is the value of the objective function z(x)
at the point x
ˆk ;
x
ˆs+1 = x
ˆ0 +
2. γ ∈ R is such that 0 ≤ γ ≤ 1;
=x
ˆ0 +
If an is a nonzero vector such that an ∈ Rm ,
the curiosity becomes
bn = an
zˆn−1
maxi∈{1,...,n−1} {ˆ
zi }
s+1
X
bk · e−µ·zk
k=1
s
X
bk
k=1
Sk
· e−µ·zk
bs+1 · e−µ·zs+1
+
Sk
Ss+1
bs+1 · e−µ·zs+1
Ss+1
x
ˆs · Ss+1 + bs+1 · e−µ·zs+1
=
Ss+1
=x
ˆs +
γ
.
The purpose of the shape factor is to diminish the
amplitude of the search step when the function
approaches its minimum, by reducing the curiosity.
From the induction hypothesis:
Ps
x
ˆs =
3566
k=0
xk · e−µ·zk
,
Ss
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
and limn→∞ zn 6= ∞. Hence, ∃w ∈ R∗+ such as
0 < w ≤ e−µ·zn , ∀n ∈ N, and limn→∞ w 6= 0.
so
x
ˆs+1
Ps
( k=0 xk · e−µ·zk ) · SSs+1
bs+1 · e−µ·zs+1
s
=
+
Ss+1
Ss+1
Ps
e−µzs+1
−µzk
( k=0 xk e
) 1 + Ss
bs+1 · e−µzs+1
=
+
Ss+1
Ss+1
Ps
−µzk
x
e
k
Ps
−µz
k=0
s+1
−µzk
+
b
e
s+1
Ss
xk e
= k=0
+
Ss+1
Ss+1
Ps+1
xk · e−µ·zk
= k=0
Ss+1
which is a particular case of (1), concluding the
proof.
2
Our main theorem shows that the Discrete
Barycenter Method converges. Ideally, we would
like the limit point to be a minimum of the objective function, but this cannot be guaranteed
without extra assumptions. Issues concerning the
convergence to the minimum will be reported in
a subsequent contribution (Vicinansa et al., 2014,
submitted ).
k=0
1
1
Pn
≤
−µ·z
n
(n + 1) · w
k=0 e
Then:
bn+1 · e−µ·zn+1 ||ˆ
xn+1 − x
ˆn || = Sn+1
M · e−µ·zn+1
M · e−µ·zn+1
≤
≤
Sn+1
(n + 1) · w
and taking to the limit
lim ||ˆ
xn+1 − x
ˆn || = 0
n→∞
2
which concludes the proof.
Note that the convexity of the function’s domain guarantees that the barycenters sequence remains inside the function’s domain. Moreover, the
barycenter sequence cannot diverge for finite values of n (see footnote 1).
Now we show that the method converges superlinearly.
Theorem 2 (Convergence) Let z(x) be a function from Rm → R+ , µ ∈ R∗+ and {bn }n the sequence of curiosities, such that ||bn || ≤ M, ∀n with
M ∈ R∗+ . Let the vector sequences {xn }n ∈ Rm
and {ˆ
xn }n ∈ Rm be generated by the Discrete
Barycenter Method as described in equation (3),
both of them belonging to the function’s domain.
Then limn→∞ ||ˆ
xn+1 − x
ˆn || → 0.
Definition 3.1 If the sequence {xn }n converges
to the point x∗ , the rate of convergence is said to
be superlinear if
||x∗ − x
ˆn+1 ||
=0
n→∞ ||x∗ − x
ˆn ||
Proof: There are two cases to consider: 1. The
first case occurs when the sequence {zn }n diverges, which means limn→∞ zn = +∞. Using
Lemma 1:1
bn+1 · e−µ·zn+1 M · e−µ·zn+1
≤
||ˆ
xn+1 − x
ˆn || = Sn+1
Sn+1
lim
Theorem 3 (Order of Convergence ) Let
z be a function from Rm → R+ , µ ∈ R∗+ ,
{bn }n a limited sequence of curiosities, such that
||bn || ≤ M, ∀n with M ∈ R∗+ and ||bn || ≥ r, ∀n
with r ∈ R∗+ . Then the convergence order of the
barycenter sequence {ˆ
xn }n is superlinear.
Taking limits:
M · e−µ·zn+1
n→∞
Sn+1
|
{z
}
lim ||ˆ
xn+1 − x
ˆn || ≤ lim
n→∞
w ≤ e−µ·zn
n
X
(n + 1) · w ≤
e−µ·zn
Proof: By formula (1),
!
n+1
X e−µ·zk · bk
||x∗ − x
ˆn+1 || = x∗ −
+x
ˆ 0 Sk
k=1
∞
∞
X
X
e−µ·zk
e−µ·zk · bk = ≤
M
Sk
Sk
→0
thus
lim ||ˆ
xn+1 − x
ˆn || = 0,
n→∞
and the method converges.
2. In the second case, the sequence of weights
{e−µ·zn }n is limited2 to the semi-open set (0, 1]
k=n+2
k=n+2
It is also true that
1 Note
that the sequence {zn }n cannot diverge for a finite value of n, because there are no singularities in the
function’s domain. Hence, the function can only diverge if
the point xn falls in the domain’s boundary, but in that
case xn must diverge as well. Once the curiosity sequence is
limited, the sequence {xn }n can only diverge for a infinite
value of n.
2 Because the image set of the function z(x) is R∗ , then
+
the image set of e−µz(x) , with µ positive, is (0,1]
!
n
−µ·zk
X
e
·
b
k
||x∗ − x
ˆn || = x∗ −
+x
ˆ0 Sk
k=1
∞
∞
X e−µ·zk · b X
e−µ·zk
k = ,
≥ r
Sk
Sk
k=n+1
3567
k=n+1
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
hence
(0.935595, 0.875188) with the value for the objective function equal to z(x, y) = 0.00415028. PSO,
an evolutionary method based on flocking patterns of birds (Kennedy and Eberhart, 1995), converged to the point (x, y) = (0.846025, 0.714902)
with the corresponding value for the objective
function equal to z(x, y) = 0.0237817. Figures 1
and 2 show the evolution of both methods in the
domain space and in time, respectively. The code
of colors is the following: Discrete Barycenter
Method sequence in represented in red, the Compass Search Method sequence in blue, and the
PSO in green.4 In the case presented, the PSO
was faster. On the other hand, its performance
is highly variable, and after several simulations,
we concluded that it may be comparable to the
Barycenter Method5 .
P∞
e−µ·zk
||x∗ − x
ˆn+1 ||
M
k=n+2
Sk
≤
· P∞
e−µ·zk
||x∗ − x
ˆn ||
r
k=n+1
Sk
P∞
e−µ·zk
M
k=n+2
Sk
=
·
−µ·zn+1
e−µ·zk
r P∞
+e
k=n+2
Sk
Sn+1
Taking the limit
||x∗ − x
ˆn+1 ||
n→∞ ||x∗ − x
ˆn ||
lim
→0
z
≤ lim
n→∞
}|
{
∞
X
e−µ·zk
Sk
M
k=n+2
· ∞
X e−µ·zk
r
k=n+2
|
Sk
{z
=0
−µ·zn+1
+ e Sn+1
2.0
}
→0
1.5
so
∗
lim
n→∞
||x − x
ˆn+1 ||
=0
||x∗ − x
ˆn ||
1.0
2
concluding the proof.
0.5
4
Simulation Results and Comparative
Analysis
-1.0
In this section, we use the Compass Search
Method (Kolda et al., 2003), Particle Swarm Optimization (PSO) (Kennedy and Eberhart, 1995)
and the Discrete Barycenter Method to minimize the Rosenbrock’s Function, a traditional
benchmark for derivative-free optimization methods (Conn et al., 2008). The simulations in this
article concern unimodal functions. Simulation
results with multimodal function (Ackley’s function) appear in the paper (Vicinansa et al., 2014,
submitted ), using an alternative version of the
Barycenter Method.
The parameters of the Discrete Barycenter
Method are µ = 100000 and γ = 0.5 (shape factor)
and the parameter of the Compass Search Method
is = 10. For the PSO, we used the parameters3
w = 0.5, c1 = 2.0 and c2 = 2.0, with 100 iterations and 30 particles. All methods start with the
same initial point, that is (x0 , y0 ) = (0.0, 1.2)
The Rosenbrock function is given by
2
2
0.5
-0.5
1.0
1.5
2.0
Figure 1: Evolution of Compass Search (blue),
Barycenter Method (red), and PSO (green).
1.0
0.8
0.6
0.4
0.2
20
40
60
80
100
Figure 2: Value of the objective function versus
number of iterations.
5
2
z(x, y) = 100 · (x − y) + (1 − x) .
Application to Optimal Controller
Design
In this section, we present applications of the
barycenter method to problems of interest in controls: the off–line optimal design of closed loop P
(proportional) and PD (proportional plus derivative) controllers for linear time invariant plants.
It has a single minimum at (1, 1) where its
value is zero. After 100 iterations, the Compass Search Method converged to the point
(x, y) = (0.544434, 0.29668) with the corresponding value for the objective function equal to
z(x, y) = 0.205895, while the Discrete Barycenter Method converged to the point (x, y) =
4 The graphic that shows the evolution of the objective
function value takes in consideration each time that the
PSO method find a new candidate to the minimum point.
5 However, further research and statistical analysis must
be made to assure that
3 The parameter w is the inertia factor, c is the self
1
confidence factor and c2 is the swarm confidence factor
3568
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
We also apply the Compass Search and the PSO
methods for comparison.
5.1
Proportional Controller
For the proportional controller design, we chose a
quadratic cost function, as usual in optimal control.
0.8
0.6
0.4
0.2
Definition 5.1 (Cost Function) The
cost
function in the interval from t = 0 to t = tf is
Ztf
z=
5
10
15
20
Figure 3: Closed–loop step response with proportional control tuned by PSO (green), Compass
Search (blue) and Barycenter Method (red).
(q · (y(t) − v(t))2 + r · u(t)2 )dt
0
where:
1. y(t) is the plant’s output,
20
2. v(t) is the reference signal,
15
3. u(t) is the controller output,
and q and r are weights of the error between the
plant output and the reference signal and the controller effort.
10
5
Considering a linear time–invariant SISO
b
, with
plant with transfer function G(s) = s·(s+a)
a = 1 and b = 1 and a unit step function as a
reference signal, we used the Discrete Barycenter
Method, Compass Search, and PSO to find a proportional controller that minimizes the cost function for 0 ≤ t ≤ 1, with the parameters q and r
equal to 1.
The PSO method was used with 5 particles
and 50 iterations. The method parameters are
c1 = 1.0, c2 = 1.0 and w = 0.5. The method
found a proportional gain k = 0.0813761 and
the corresponding cost function value equal to
z(k) = 0.985255. For the case of Compass Search,
50 iterations were performed. The parameter is
= 1.0. The method found a proportional gain
k = 0.133467 and the corresponding cost function value equal to z(k) = 0.982671. Finally, we
used the Discrete Barycenter Method with 50 iterations. The parameters are µ = 10000 and Shape
Factor γ = 0.5. The method found a proportional gain k = 0.136357 and the corresponding
cost function equal to z(k) = 0.982679.
Figure 3 shows the closed–loop unit step response for the proportional controller found by
the three methods: 1) PSO in green, 2) Compass
Search in blue and 3) Barycenter in red. Figure 4
shows the evolution of the cost with the number
of iterations with the same code of colors. For the
sake of comparison, in figure 5 the cost function
for the proportional controller is presented (it was
numerically determined).
0
5
10
15
20
Figure 4: Cost function evolution for PSO
(green), Compass Search (blue), and the Barycenter Method (red).
The differences in the step responses (particularly the one found by PSO, in green) tend to
disappear with longer simulation times.
1.5
1.4
1.3
1.2
1.1
0.2
0.4
0.6
0.8
1.0
Figure 5: Cost function of the closed–loop system
with proportional control
As shown in Figure 4, the PSO and the
Barycenter Method converged to a final value after
3569
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
a small number of steps. The bigger the number
of particles in PSO, the faster the convergence (in
terms of steps) but higher the time between successive steps. In the case of Compass Search, it
was estimated that the execution time was, in average, 1.3 times higher than the Barycenter. Thus,
we can say that, as far as the execution time is
concerned, the barycenter method might have a
similar performance to the compass search method
and PSO, if the parameters of the last one are chosen properly. It is important to say that our aim
is to present a new simple and intuitive method
that works at least as well as the old ones.
value z(kp , kd ) = 8.56515.
Figure 7 shows the evolution of the gains in
the cost function domain for 1) PSO in green, 2)
Compass Search in blue and 3) Barycenter in red.
Figure 8 shows the cost evolution as a function
of time, and Figure-9 the closed loop unit step
response for the closed loop system (the colors are
the same in the three figures).
5
4
3
5.2
PD controller
We also used the Discrete Baricenter Method,
Compass Search and PSO to design a PD
(proportional-derivative) controller for an unstab
with a = −1 and b = 1.
ble plant G(s) = s·(s+a)
The cost function, which has the same expression
as in the last section, is now defined for 0 ≤ t ≤ 10
and with weights q = 10.0 and r = 1.0 (with an
increased penalty on the output error term), and
again the reference input is unit step. The graph
of cost function under the later condition is shown
in Figure 6.
2
1
2
3
4
5
Figure 7: Evolution of the method, the x axis represents kd and the y axis represents kp
100
80
60
40
20
0
5
10
15
20
25
30
Figure 8: Evolution of the objective function with
the number of iterations
Figure 6: Cost function for the PD design
1.0
The PSO method was used with 5 particles
and 50 iterations. The method parameters are
c1 = 1.0, c2 = 1.0 and w = 0.5. The method found
a proportional gain kp = 3.16214 and a derivative
gain kd = 3.70651 with the correspondent cost
function value equals to z(kp , kd ) = 8.55836. For
the case of Compass Search, 50 iterations were
performed using the parameter = 2.5. The
method found a proportional gain kp = 3.16228, a
derivative gain kd = 3.70639, and the corresponding cost function z(kp , kd ) = 8.55836. Finally, we
used the Discrete Barycenter Method with 50 iterations, µ = 10 and shape factor γ = 0.5. The
method found a proportional gain kp = 2.97849,
a derivative gain kd = 3.56687, and cost function
0.8
0.6
0.4
0.2
5
10
15
20
Figure 9: Closed-loop system output with the
PD controller designed by the three optimization
methods
3570
Anais do XX Congresso Brasileiro de Automática
Belo Horizonte, MG, 20 a 24 de Setembro de 2014
The methods worked equally well in minimizing the cost. Finding the PD controller took approximately 10 minutes in a personal computer
with CPU AMDTurion X2 Ultra Dual-Core with
memory of 3GB running Mathematica, while PSO
and Compass Search Methods took approximately
20 and 13 minutes respectively. In this particular case, the Barycenter Method outperformed
the others from the point of view of computation
time.
6
The authors are also grateful for useful comments by the anonymous reviewer, which were
helpful in improving and clarifying the text.
References
Conn, A. R., Scheinberg, K. and N. Vicente,
L. (2008). Introduction to derivative-free
optimization, SIAM, Society for Industrial
and Applied Mathematics Philadelphia, PA,
USA.
Conclusion
Kennedy, J. and Eberhart, R. (1995). Particle
swarm optimization, IEEE Internatinal Conference on Neural Networks 4: 1942–1948.
In this article, we presented the Discrete Barycenter Method, a derivative–free approach proposed
by the authors for solving optimization problems, especially those that cannot be solved by
derivative based methods such as Newton’s and
gradient–type. We have shown that the method
converges superlinearly, however we could not
yet prove that it will converge to the minimum
point (local or global), if it exists (see (Vicinansa
et al., 2014, submitted ) for a modification in the
method that guaratees convergence to the minimum).
In order to argue that the method reaches the
minimum in practice, we applied it to find the
minimum of the Rosenbrock function, a benchmark function for testing nonlinear optimization
algorithms. We also compared the Barycenter
Method with the Compass Search and PSO (particle swarm) methods. The Barycenter Method
resulted in performance at least as good as the
other two, and in many cases, better. The method
was also applied to designing optimal controllers
offline. Based on the simulations, we could conclude that the Barycenter Method was able to find
quickly an optimal proportional controller for that
cost function with a good precision, after a small
number of iterations. For the case of PD controller, the Barycenter Method performed better.
Future work will analyze conditions under
which the method converges to the actual minimum point, and apply the Barycenter Method to
real time optimization. The goal is to find minima
of functions whose values for specified parameters
can be computed by experiment or simulation, but
whose mathematical closed–form expression is not
a priori known. One application we have in mind
is to Adaptive Control, where heretofore methods such as recursive least squares have been employed.
7
Kolda, T. G., Lewis, R. M. and Torczon, V.
(2003). Optimization by direct search: New
perspectives on some classical and modern
methods, SIAM Review 45: 385–482.
Lewis, R. M., Torczon, V. and Trosset, M. W.
(1998). Direct search methods: Then and
now, Optimia 59: 1–7.
Vicinansa, G. S., Col´on, D. and Pait, F. (2014,
submitted ). Directional barycenter method
for control engineering optimization, 4th International Conference on Engineering Optimization .
Acknowledgements
The author Guilherme Scabin Vicinansa thanks
to FAPESP for the scholarship under grant
2014/00539-0. The author Diego Col´
on also
thanks to FAPESP for the grant to participate
in this reunion.
3571