DSpace at VNU: Optimal control problem for the Lyapunov exponents of random matrix products

23 127 0
DSpace at VNU: Optimal control problem for the Lyapunov exponents of random matrix products

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

DSpace at VNU: Optimal control problem for the Lyapunov exponents of random matrix products tài liệu, giáo án, bài giảng...

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol 105, No 2, pp 347–369, MAY 2000 Optimal Control Problem for the Lyapunov Exponents of Random Matrix Products1 N H DU2 Communicated by G P Papavassilopoulos Abstract This paper deals with the optimal control problem for the Lyapunov exponents of stochastic matrix products when these matrices depend on a controlled Markov process with values in a finite or countable set Under some hypotheses, the reduced process satisfies the Doeblin condition and the existence of an optimal control is proved Furthermore, with this optimal control, the spectrum of the system consists of only one element Key Words Random matrix products, Lyapunov exponents, Markov processes, decision models, optimal policy, optimal control, system spectrum Introduction In this article, we deal with an optimal decision problem in which the objective function is the essential supremum of the Lyapunov exponents for a dynamical system described by random matrix products when these matrices depend on a controlled Markov process (ξn) with values in a finite or countable set I We assume that (ξn) has transition probability P(a)G(Pij (a): i, j∈I ), which depends on a control parameter a The author thanks Professor C C Heyde, Dr D J Daley, Mrs Lynne Simpson, and Ms Jenny Goodwin for helping him during his stay in the School of Mathematical Sciences, Australian National University, Canberra, Australia He thanks the referee for suggesting constructive ideas for this article Professor, Faculty of Mathematics, Mechanics, and Informatics, Vietnam National University, Thanh Xuan, Hanoi, Vietnam 347 0022-3239͞00͞0500-0347$18.00͞0  2000 Plenum Publishing Corporation 348 JOTA: VOL 105, NO 2, MAY 2000 For any admissible control (ut ), that we shall define exactly below, we consider the R d-valued random variables (Xn : nG0, 1, ) given by the following difference equation: XnC1 GM(ξnC1 , YnC1)Xn , d X0 Gx∈R , (1a) (1b) where (Yn ) is a sequence of i.i.d random variables and M(i, y) are invertible dBd matrices The behavior of the solutions of the system (1) when the transition probability does not depend on a has been studied by many authors; see Refs 1–3 Let X un (x) be the solution of (1) associated with the control (ut ) The process (ut ) affects the solutions of the system through the transition probability P(a) We define the Lyapunov exponent of X un (x) by λ u[x]Glim sup (1͞n) log ͉X un (x)͉ n→S (2) For any admissible control u, the Lyapunov exponent of the system (1) is in general a random variable; then, in order to exclude the randomness, we introduce here the new concept of essential Lyapunov exponent of X un (x) defined as follows: Λ[x]GPAess sup λ u[x], where PAess sup denotes the essential supremum taken under the probability P Because of the linearity of (1), it is easy to verify that the map x → Λ u[x] satisfies the general property of Lyapunov exponents, i.e., (a) Λ u[α x]GΛ u[x], for any α ≠0, x∈R d, u u (b) Λ [xCy]Ymax{Λ [x], Λ u[y]} Therefore, Λ u[·] takes many finite values, namely, Λ u1YΛ u2Y· · ·YΛ ud It is well-known that the trivial solution X ≡ of (1) is stable if sup Λ u[x]F0 ͉x͉Y1 (3) Hence, for the trivial solution X ≡ to be stable, it suffices to choose a control (ut ) such that the condition (3) holds However, this condition is not always satisfied because, in some cases, among the class of admissible controls, we are not able to find a control (ut ) that yield negative Lyapunov exponents of the solutions of (1) So, in view of applications, it is natural that we want to find a control (ut ) with which the system (1) is nearest to stability This means that the Lyapunov exponents of our system must be as small as possible JOTA: VOL 105, NO 2, MAY 2000 349 This question leads us to consider the problem of minimizing the function Λ u[x] over the class of admissible controls In this article, the main idea for solving this problem is to relate it to the Markov decision problem with per unit average cost The paper is organized as follows Section introduces the fundamental notations and hypotheses, in terms of which we define the policies and objective function for the problem Sections 3–4 contain the main results: we reduce the state space and prove that, under the assumptions introduced in Section 2, our model satisfies the Doeblin condition From this, we can use methods dealt with in Ref and the properties of Lyapunov exponents to show the existence of an optimal policy Furthermore, with this policy, the spectrum of the system (1) consists of only one element Notations and Hypotheses Let A be a compact metric space, called the space of actions, and let NG{1, 2, } be the set of natural numbers Throughout this paper, if m(·) is a measure and f is an m-integrable function, we denote ͐ f (x) dm by m( f ); and if S is a topological space, we write B (S ) for the Borel sets of S Let Y be a measurable space endowed with the σ -algebra B (Y), and suppose that µ is a probability measure on (Y, B (Y)) Let I be a finite or countable set Suppose that, for every a∈A, we have a transition matrix P(a)G(Pij (a): i, j∈I) Let M: IBY → Gl (d, R) be a measurable map from IBY into the group of invertible matrices Gl (d, R) Throughout this paper, we shall make the following hypotheses Assumption A1 sup i ∈I Ύ [͉log͉M(i, Y) −1 p ͉͉ C͉log͉M(i, Y)͉͉p ] µ (dy)FS, Y for some pH1 (4) Assumption A2 The map a → P(a) is weakly continuous in the sense that, for any i∈I, the ith row vector of the matrix P(a) is continuous in a in l1 Moreover, the Markov chain P(a) satisfies the Doeblin condition; namely, there exist a finite set K⊂I and numbers α H0, n0H0 such that ∑ ∑ J n0 ∈K j , j , , jn0A1 ∈I Pij (a1 )P j j (a2 ) P jn0A1 jn0 (an0)Xα , for any i∈I and a1 , a2 , , an0 ∈A (5) 350 JOTA: VOL 105, NO 2, MAY 2000 Assumption A3 For the distribution Q(i, H )Gµ ( y∈Y: M (i, y)∈H), H∈B (Gl (d, R)), of M(i, ·) on GL(d, R), there exists a number n1H0 such that3 Q(i1 , i2 , , in1 , ·)GQ(i1 , ·) ∗ Q(i2 , ·) ∗ · · · ∗Q(in1 , ·), (6) for any i1 , i2 , , in1 ∈I, has a nonvanishing absolutely continuous part in its Lesbegue decomposition Assumption A3 means that, if Q(i1 , i1 , , in1 , ·)GQc(·)CQ s(·), where Q c and Q s are respectively the absolutely continuous and singular parts with respect to the Lebesgue measure, then Q c(Gl (d, R))H0 Example 2.1 Let IG{+, −} and let M(±, y)G ±y ΄ 0΅ Suppose that YnӍγ , where γ has a continuous distribution; then, it is easy to see that Assumption A3 is true with n1 G4 We shall now formulate the problem in canonical space Denote by Ω0 the set of all sequences (wn ) with wn G(ξn , yn , an ), where ξn ∈I, yn ∈Y, and an ∈A, i.e., Ω0 G{ f: N → IBYBA}, ΩGR dBΩ0 Let x0 be a random variable, let Wn G(ξn , Yn , An ) be the canonical process defined on Ω by Wn (x, ω )Gωn , x0(x, ω )Gx, (x, ω )∈R dBΩ0 GΩ, and let Fn Gσ (x0 , Wt : tYn) be the canonical filtration on Ω (F t ) is called the σ -field of observable events up to t We write F for FS , S F S Gٚ n G1 Fn , F Gσ (x0 ) A decision π t at time t is a stochastic kernel on (IBYBA)tA1B(iBY), namely, π t Gπ t (·͉x, w1 , w2 , , wtA1 , ξt , yt) The asterisk in (6) denotes the convolution operation B (A)BR dB 351 JOTA: VOL 105, NO 2, MAY 2000 A sequence of decisions π G(π , π , ) is called a policy We use Π to denote the class of all policies Let π ∈Π be a policy, and let q∈P (I ), ν ∈P (R d ), where P (S ) denotes the set of probability measures on S for any measurable space S Then, we can define a probability measure P on (Ω, Ft , F ) such that the following conditions are satisfied: for any nG1, 2, , B∈B (Y), and i, j∈I: (i) P(Yn ∈B ͉ FnA1 , ξn)Gµ (B), (7) (ii) P(ξnC1 Gj͉ Fn)GPξn (An ), (8) (iii) P(An ∈·͉ FnA1 , ξn , Yn)Gπ n (· ͉x0 , W1 , W2 , , WnA1 , ξn , Yn ), (iv) P(ξ0 Gi)Gqi , (v) P(x0 ∈B)Gν (B), (9) qG(q1 , q2 , ), ∀B∈B (R d ), with the convention W0 Gconst The probability P is called the control associated with the policy π and the initial distributions q, ν We denote by R (q, ν) the class of controls starting from q It is well-known that R (q, ν) is a convex, closed set Let P∈R (q, ν) be a control associated with the policy π ∈Π and q, ν We consider a difference equation in the form XnC1 GM(ξnC1 , YnC1)Xn , (10a) X0 Gx0 (10b) Suppose that X(n, x0 ) is the solution of (10) starting at x0 , i.e., X(0, x0 )Gx0 , P-a.s We consider the following two objective functions: Ά · Λ(q, ν, π )GPAess sup lim sup (1͞t) log͉X(t, x0 )͉ , t→S (11) with the essential supremum taken over the probability P, and Ψ(q, ν, π )GEq,π ν lim sup (1͞t) log͉X(t, x0 )͉, t→S (12) where Eq,π ν denotes the expectation with respect to the measure Pq,ν If q and ν are degenerate at i and x, we will write simply Λ(i, x, π ) and Ψ(i, x, π ), instead of Λ(q, ν, π ) and Ψ(q, ν, π ), respectively It is evident that Λ(q, ν, π )XΨ(q, ν, π ), (13) 352 JOTA: VOL 105, NO 2, MAY 2000 for any q, ν, π Let Λ(q, ν)Ginf{Λ(q, ν, π ): π ∈Π}, Λ*Ginf Λ(q, ν), q,ν (14a) Ψ(q, ν)Ginf{Ψ(q, ν, π ): π ∈Π}, Ψ *Ginf Ψ(q, ν) (14b) q,ν The triplet (q, ν, π ) is said to be minimum for problem (11) [respectively (12)] if Λ(q, ν, π )GΛ* [respectively Ψ(q, ν, π )GΨ *], and π *∈Π is called optimal if Λ(i, x, π *)GΛ(i, x) [respectively Ψ(i, x, π *)GΨ(i, x)] for any i∈ I, x∈R d From (13), we get Λ*XΨ * So, if (q, ν, π ) is minimum for problem (12) and Λ(q, ν, π )GΨ *, then (q, ν, π ) is also minimum for problem (11) Therefore, we hope that, under suitable hypotheses, it is sufficient to consider problem (12) to find an optimal control for problem (11) Reduced Markov Decision Model It is well-known that the objective function given in the form (11) or (12) is independent of the length of the vectors Therefore, we may reduce the state space Any two nonzero vectors are said to be equivalent if they are proportional The space of equivalent classes is denoted by P dA1 The action of a matrix g on R d preserves the equivalence relation We use g again to denote the quotient action on P dA1 Let us consider the Fn-adapted reduced process Zn G(ξn , Sn), nG1, 2, , (15) defined on Ω with values on IBP dA1, where Sn GXn ͉͞Xn ͉, nG0, 1, 2, We put ρn (i, s)Glog[1͉͞M −1 (i, Yn )s͉], i∈I, s∈P dA1, nG1, 2, Then, it is easy to check that log͉X(t, x0 )͉Gρ1 (Z1 )Cρ2(Z2 )C· · ·Cρt (Zt)Clog ͉x0 ͉, where ρk (Zk)Glog[1͉͞M −1 (ξk , Yk)Sk ͉] Glog͉Xk ͉Alog͉XkA1 ͉ (16) JOTA: VOL 105, NO 2, MAY 2000 353 Hence, t lim sup (1͞t) log͉X(t, x0 )͉Glim sup (1͞t) ∑ ρn (Zn) t→S t→S (17) n G1 If the policy is constant, i.e., π t (·͉x0 , w1 , w2 , , wtA1 , ξt , yt)Gδ a , where a∈A is fixed and δ a is the Dirac mass at a, then (Zn ) is a Markov process with transition operator P(ξtC1 Gj, StC1 ∈B͉ FtA1 , ξt Gi, St Gs) GP(ξtC1 Gj, M( j, YtC1)s͉͞M( j, YtC1)s͉∈B͉ξ Gi, St Gs) GPij (a) Ύ [M( j, y)s͉͞M( j, y)s͉]µ (dy), B for any i, j∈I, s∈P dA1, B∈B (P dA1 ) We denote this transition by T( jBB͉i, s, a)GPij (a) Ύ [M( j, y)s͉͞M( j, y)s͉]µ (dy) B (18) The policy π G(π , π , ) is said to be Markov stationary for the control problem of Lyapunov exponents (or randomized stationary, see Ref 4) if there exists a kernel Φ on B (A)B(IBP dA1 ) such that, for tG1, 2, , π t G(da͉x0 , W1 , W2 , , WtA1 , ξt , Yt )GΦ(da͉Zt) We write ΦS for the policy (Φ, Φ, ) A Markov stationary policy Φ is called a stationary policy (or determined stationary policy) if Φ(·͉i, s) is the Dirac mass for any i∈I, s∈P dA1 In this case, a stationary policy is described completely by a measurable mapping f: IBP dA1 → A such that Φ({ f (i, s)}͉i, s)G1, for i∈I, s∈P dA1 ; see Refs 5–6 We denote this policy by f S Let Φ(da͉i, s) be a Markov stationary policy; then, under the probability associated with Φ, the process (Zt ) is Markov with transition probability TΦ given by Ύ TΦ (C͉i, s)G T(C͉i, s, a)Φ(da͉i, s) Lemma 3.1 Under Assumptions A2 and A3, for any Markov stationary policy, the Markov chain (Zn ) satisfies the Doeblin condition (see Refs 354 JOTA: VOL 105, NO 2, MAY 2000 7–8) with respect to the product measure of Lebesgue measure meas(·) on P dA1 and a counting measure on I Proof We have to prove that, for any Markov stationary policy Φ(·͉i, s), there exist a counting measure on I (γ , say) and numbers (H0, δ H0, and m0 such that, for every i∈I and s∈P dA1, T Φm0 (C͉i, s)Y1A(, (19) for any C∈B (I )BB (P dA1 ) such that γ Bmeas(C)Fδ , where meas(·) denotes the Lebesgue measure on P dA1 Let K and α , n0 be given as in Assumption A2 Then, ∑ ∑ j ∉K j , j , , jn0 ∈I Pij (a0 )·P j j (a1 ) P jn0 j (an0)Y1Aα , (20) for any i∈I, a0 , a1 , , a jn0 ∈A We note that, if (20) is satisfied for n0 , then it will be satisfied for any nXn0 Indeed, ∑ ∑ j ∉K j , j , , jn0C1 ∈I Pij (a0 ) · P j j (a1 ) P jn0C1 j (an0C1) G ∑ Pij (a0 ) ∑ j ∈I ∑ j ∉K j 2, , jn0C1 ∈I P j j (a1 ) P jn0C1 j (an0C1)Y1Aα Furthermore, if Assumption A3 is true for n1 , then it is still true for any nXn1 by the following property: if one of the measures σ and σ is absolutely continuous with respect to σ on a topological group, then their convolution is absolutely continuous with respect to σ Hence, without loss of generality, we can suppose that n0 Gn1 G1 and we shall show that (19) is satisfied for m0 G1 To avoid complexities, we put Q(i, s, B)Gµ{y: M(i, y)s͉͞M(i, y)s͉∈B}, Q(i, H )Gµ{y: M(i, y)∈H}, Ci G{s∈P dA1 : (i, s)∈C}, for any i∈I, H∈B (Gl (d, R)), and C∈B (IBP dA1 ) Let us choose γ to be a probability measure on I such that γ (i)G1͞r, if i∈K, where r is the number of elements of K We denote by m(·) the product measure γ (·)Bmeas(·) on IBP dA1 Suppose that δ 1F1͞r; then, from m(C)Fδ 21 , it follows that meas(Ci )Fδ , for any i∈K JOTA: VOL 105, NO 2, MAY 2000 355 By using the definition of TΦ , we have Ύ P (a)Q( j, s, C )Φ(da͉i, s) TΦ (C͉i, s)G ∑ ij j ∈I G∑ j ∉K Ύ P (a)Q( j, s, C )Φ(da͉i, s) ij C∑ j ∈K Y∑ j ∉K j j Ύ P (a)Q( j, s, C )Φ(da͉i, s) ij j Ύ P (a)Φ(da͉i, s) ij Csup Q( j, s, Cj ) ∑ j ∈K j ∈K Ύ P (a)Φ(da͉i, s) ij (21) Let i∈K be fixed By Assumption A3, there exists a function F defined on Gl(d, R) such that Q c(i, H )G Ύ F(g) dg H Since F≠0, we can find a bounded Borel set H0 such that Q c(i, H0 )Gσ H0 and F(g) is essentially bounded on H0 We suppose that ͉g͉Yc, ͉F(g)͉Yc, for any g∈H0 From this, we have r )Y1Aσ , Q s (i, H0 )CQ(i, H r denotes the complement of H0 Letting where H c · BG{x: x͉͞x͉∈B, ͉x͉Fc}, we get r )CQ s(i, H0 )CQ c(i, {g∈H0 : gs͉͞gs͉∈B}) Q(i, s, B)YQ(i, H Y1Aσ CQ c(i, H0 ∩{gs∈c·B}) But Q c (i, H0 ∩{gs∈c · B})Yk· meas(c · B) Yk · cd · meas(B), (22) 356 JOTA: VOL 105, NO 2, MAY 2000 where there is an abuse of notation between meas(·) on Gl(d, R) and meas(·) on R d This implies that, if meas(B)Fδ _ σ ͞2kcd, then by (22), Q(i, s, B)Y1A(1͞2)σ From (21), we have TΦ (C͉i, s)Y ∑ j≠K Ύ P (a)Φ(da͉i, s) ij C(1A(1͞2) σ ) ∑ j ∈K Y1A(1͞2) σ ∑ j ∈K Ύ P (a)Φ(da͉i, s) ij Ύ P (a)Φ(da͉i, s) ij Y1A(1͞2)ασ The proof of the lemma is completed by putting (G(1͞2)ασ and δ Gmin{1͞r 2, δ 21 } ᮀ In connection with the value functions Λ(·) and Ψ(·), we consider the Markov decision model mentioned above with value function in the form t V(i, s, π )Glim sup (1͞t)Eiπ ∑ ρn (Zn), t→S Z0 G(i, s), (23) n G1 which is familiar to us We put V(i, s)Ginf V(i, s, π ), V(q, ν)Ginf V(q, ν, π ), V*Ginf V(q, ν), π π q,ν as in (12) Let ρ(i, s, a)G ∑ Pij (a)E log͉M( j, Yn)s͉ j ∈I By Assumption A1, ρ(i, s, a) is a bounded continuous function, and it is easy to see that Eiπ ρn (Zn)GEiπ ρ(ZnA1 , AnA1), nG1, 2, From this, we get t V(i, s, π )Glim sup (1͞t) ∑ Eiπ ρn (Zn , An) t→S n G1 (24) JOTA: VOL 105, NO 2, MAY 2000 357 Because (Yn ) is an i.i.d sequence, then by Assumption A1, the sequence (ρ(Zn )) is uniformly integrable for any π ∈Π By virtue of the Fatou lemma, we have Ψ(i, s, π )XV(i, s, π ), i∈I, s∈P dA1, π ∈Π (25) Hence, Ψ(i, s)XV(i, s) (26) Theorem 3.1 If ΦS is a Markov policy, then Ψ(i, s, ΦS )GV(i, s, ΦS ) (27) Proof Under the policy ΦS, (Zn ) is a Markov process with transition probability T(C͉i, s, Φ )G Ύ T(C͉i, s, a)Φ(da͉i, s)_T (C͉i, s) Φ A By Lemma 3.1, the Markov process (Zn ) satisfies the Doeblin condition with respect to the measure m(·) defined in the proof of Lemma 3.1 with constants δ and ( So, we can define a decomposition of the state space IBP dA1 into a transient set F and a finite number of ergodic sets C 1, C 2, , C p, with m(C r )Xδ , 1YrYp The restriction of (Zn ) on C r is ergodic, so it is Harris recurrent with respect to the invariant measure γ r (·) defined by t γ r (·)G lim (1͞t) ∑ T nΦ (· ͉i, s), t→S (i, s)∈C r n G1 By Exercise 31 in Ref 9, TΦ is quasicompact On the other hand, if we put Ύ ρ(i, s, a) · Φ(da͉i, s), ρΦ G A then it is easy to show that Ύ [ ρ (i, s)AV(i, s, Φ )]ν (di, ds)G0 Φ This implies that ρΦAV(i, s, Φ ) is a charge on C r Hence, the Poisson equation (EATΦ )hGρΦAV, (28) 358 JOTA: VOL 105, NO 2, MAY 2000 where E is the identity operator on C r, has a bounded solution Let h be such a solution of (28), and put Hn GρnC1 (ZnC1)Ch(ZnC1)Ah(Zn)AV(Zn); (29) we remark that, for any r, V is constant on C r by the ergodicity of (Zn ) t From (28), we can prove that ∑n G0 Hn is an F n-martingale Indeed, E[Hn ͉ F n ]GρΦ (Zn )CTh(Zn )Ah(Zn )AV(Zn )G0 From Assumption A1, it follows that sup E[͉Hn ͉(log+ ͉Hn ͉)2 ]FS n By using the law of large numbers for martingale sequences, we get t lim (1͞t) ∑ Hn G0, t→S a.s n G0 Hence, t lim (1͞t) ∑ ρnC1 (ZnC1)GV(i, s, ΦS ), t→S a.s., (i, s)∈C r (30) n G0 This means that Ψ(i, v, ΦS )GV(i, s, ΦS ), p for any (i, s)∈ * C r r G1 We consider now (i, s)∈F Let τ be the last exit time from the set F, i.e., τ Gsup{nH0: ZnA1 ∈F} Since F is a transient set, P(τ FS )G1 and τ is the stopping time, because every ergodic set is absorbed Hence, t Ψ(i, s, ΦS )GE lim sup (1͞t) ∑ ρn (Zn) t→S n G1 t GE lim sup (1͞t) ∑ ρn (Zn ) t→S S G∑ n G1 n Gτ Ύ* E ΄lim sup (1͞t) ∑ ρ (Z )͉ τ Gk, Z G( j, u)΅ t Cr t→S BP(τ Gk, Zk ∈(dj, du)) n nGk n k 359 JOTA: VOL 105, NO 2, MAY 2000 Using a proof similar to that of (30), we can show that ΄ ΅ t E lim (1͞t) ∑ ρn (Zn)͉ τ Gk, Zk G( j, u) GV( j, u, ΦS ), t→S n Gk for any ( j, h)∈*C r Then, we get S p Ψ(i, s, ΦS )G ∑ ∑ Vr P(τ Gk, Zk ∈C r ), (i, s)∈F, (31) k G1 r G1 where when (i, s)∈C r Vr GV(i, s), On the other hand, if (i, s)∈F, then V(i, s, ΦS ) t Glim sup (1͞t)E ∑ ρn (Zn) t→S n G1 t XE lim inf (1͞t) ∑ ρn (Zn) t→S S G∑ k G1 n G1 Ύ* E ΄lim inf (1͞t) ∑ ρ (Z ))͉ τ Gk, Z G( j, u)΅ t n t→S Cr n k n Gk BP(τ Gk, Zk ∈(dj, du)) S p G ∑ ∑ VrP(τ Gk, Zk ∈C r ) (32) k G1 r G1 By comparing (31) and (32), we get t t E lim (1͞t) ∑ ρn (Zn)G lim (1͞t)E ∑ ρn (Zn), t→S n G1 t→S n G1 for any (i, s)∈F, i.e., Ψ(i, s, ΦS )GV(i, s, ΦS ), (i, s)∈F From this and (30), Theorem 3.1 is proved ᮀ Since (30) takes place P-a.s., then it is easy to establish a relation between the value functions (11) and (23) Theorem 3.2 Under the assumptions of Theorem 3.1, for any Markov stationary policy ΦS G(Φ, Φ, ), let τ and C r, Vr , 1YrYl, defined as in 360 JOTA: VOL 105, NO 2, MAY 2000 the proof of Theorem 3.1 Then, Λ(i, s, ΦS )G V(i, s, Φ ), if (i, s)∈C r, l PAess sup{∑r G1 1{Zr ∈C r} · Vr }, if (i, s)∈F Ά ¯ the subclass of Π We now turn to the reduced problem Denote by Π consisting of kernels of the form π¯ (da͉Z1 , A1 , , ZtA1 , AtA1 , Zt ), which may be considered as a kernel on B (A)B(IBP dA1BA)tB(IBP dA1 ) Let π ∈Π be an arbitrary policy, and let Frt Gσ {Zn , An : nYt} Suppose that π¯ is the dual projection of π on FrnA1 ∨ σ (Zn ), i.e., E[π t ( f ͉ FrtA1 , zt ]Gπ¯ t ( f ͉Z1 , A1 , Z2 , A2 , , ZtA1 , AtA1 , Zt), ¯ , and it is easy to verify for any measurable bounded fX0 It is obvious π¯ ∈Π that Eisπ ρ(Zn , An)GEisπ¯ ρ(Zn , An), for any n∈N; i.e., the control P associated with π and the control Pr associated with π¯ agree on (Ω, Frn , FrS ) So, t V(i, s, π )Glim sup (1͞t) ∑ Eisπ ρ(Zn , An) t→S n G1 t Glim sup (1͞t) ∑ Eisπr ρ(Zn , An) t→S n G1 GV(i, s, π¯ ); i.e., π and π¯ have the same value Therefore, for the control problem of Lyapunov exponents, we can reduce our model by considering that (Zn , An ) is a canonical process defined on the canonical space ¯ G{ f: N → IBP dA1BA}, Ω ¯ with controlled transition probability and the policies are in the class Π (18) This reduced model has many advantages because P dA1 is compact Therefore, in the following, we consider only the reduced model Further¯ , as we now show more, we can find optimal policies in Π JOTA: VOL 105, NO 2, MAY 2000 361 Existence of an Optimal Policy To prove the existence of an optimal policy, we use the Kurano ideas, which are explained in Ref In this section, we replace Assumption A2 by the following assumption Assumption A2′ the map a → P(a) is continuous and the family {P(a): a∈A} is tight, i.e., for any (H0, there is a finite set K such that, for any i∈I, a∈A, ∑ Pij (a)X1A( j ∈K This assumption implies the tightness of the sequence (Zn , An ) under ¯ and any initial distribution of Z0 Indeed, P dA1 and A are any policy π ∈Π compact, so we have only to prove that the sequence of (ξn ) is tight Taking ¯ and initial distribution of Z0 , we have any policy π ∈Π ∑ P(ξn Gj)G ∑ EP(ξn Gj͉ F j ∈K j ∈K ) nA1 G ∑ PξnA1, j (AnA1)X1A( j ∈K ¯ and any Lemma 4.1 See Ref 4, Lemma 2.1 For any policy π ∈Π initial distribution q, ν of ξ0 , S0 , we can find a probability measure on IBP dA1BA, namely σ , such that Ύ ρ(z, a)σ (dz, da)YV(q, ν, π ) Ύ g(z)σ (dz, da) (33) IBP dA1BA and IBP dA1BA G Ύ σ (dz, da) IBP dA1BA Ύ g(z′) · T(dz′͉z, a), IBP dA1 for any bounded continuous function g ¯ , q∈P (I), and ν ∈P (P dA1), we put Proof For given π ∈Π T µT (D)G(1͞T) ∑ E1D (Zn , An), n G1 (34) 362 JOTA: VOL 105, NO 2, MAY 2000 where the expectation is taken with respect to the control associated with (q, ν, π ) The family { µT (·): TG1, 2, } is tight, so there exists a sequence {Tn } and a probability measure σ on IBP dA1 such that w µT n (·) →σ (·) This implies that Ύ σ ( ρ)G ρ(z, a)σ (dz, da) Glim µT n (ρ) Ylim sup µT (ρ) T→S T Glim sup (1͞T) ∑ Eρ(Zn , An) T→S n G1 GV(q, ν, π ), so we have (33) On the other hand, for any bounded continuous function g, we have ΂ ΃ T 0G(1͞T)E ∑ g(Zn)AE[g(Zn)͉ FrnA1 ] , n G1 and Ύ E[g(Zn)͉ FrnA1 ]G g(z)T(dz͉ZnA1 , AnA1) Hence, Ύ Tn g(z)σ (dz, da)Glim sup (1͞Tn) ∑ Eg(Zt) T n→S t G1 Tn Glim sup (1͞Tn ) ∑ ET(g͉ZtA1 , AtA1) T n→S tG1 Ύ G T(g͉z, a)σ (dz, da), ᮀ so we get (34) The lemma is proved This lemma allows us to conclude that the set of Markov policies is complete in the sense that inf {V(q, ν, π ): π is policy}Ginf {V(q, ν, Φ ): Φ is Markov} q,ν q,ν 363 JOTA: VOL 105, NO 2, MAY 2000 Lemma 4.2 If {σ n } is a sequence of probability measures which satisfy (34), then {σ n } is tight Proof Let (H0, and let K be as mentioned in Assumption A2′ Then, from (34), we have Ύ σ n (KBP dA1BA)G σ n (dz, da)T(KBP dA1 ͉z, a) Ύ G σ n(dz, da) ∑ Pij (a) j ∈K X1A(, for any n This means that the sequence {σ n } is tight Hence, the lemma is proved ᮀ From this lemma and using the same argument as in Ref 4, we can show that there is an invariant measure ν on IBP dA1 and a Markov policy ΦS such that V(ν, ΦS )GV*; here, there is an abuse of notation, but we can define V(ν, π ) exactly as V(i, s, π ) Under the Doeblin condition on the process (Zn ), we have a decomposition of the state space as in the proof of Theorem 3.1, namely, the subsets F, C 1, , C l for which m(C r )Xδ , for any 1YrYl, and V(i, s, ΦS )GV*, (35) and by Theorem 3.2, Λ(i, s, ΦS )GV*, (36) for any (i, s)∈C_*C , where the union is over all r∈{i: ν (C )H0} On the other hand, for any fixed i∈I such that r meas(Ci )≠0, the map s → V(i, s, ΦS ) i where Ci G{s: (i, s)∈C}, 364 JOTA: VOL 105, NO 2, MAY 2000 satisfies the general property of the Lyapunov exponents (see Ref 10) Hence, the set Li G{s: V(i, s, ΦS )GV*} is the projection of a linear subspace of R d containing Ci , so it must be P dA1 Let SG{(i, s)∈IBP dA1 : V(i, s, ΦS )GV*} We remark that C⊂S, so there exists an i∈I such that meas(Si )H0 This implies that Si GP dA1, (37) for some i∈I Lemma 4.3 S is an invariant set Proof Suppose that there is an (i0 , s0 )∈S such that P iΦ0 {ω : Zn (ω )∉S, for some nH0}H0 S Then, we can find an integer kH0 and a nonempty Borel set B⊂IBP dA1 such that P{ω : Zk (ω )∈B}H0 and V(i, s, ΦS )Xα HV*, for any (i, s)∈B Because ΦS is a Markov policy, then we have by Theorem 3.1 that V(i, s, ΦS )GΨ(i, s, ΦS ), for any (i, s) This implies that V*GV(i0 , s0 , ΦS ) t GE lim (1͞t) ∑ ρn (Zn) t→S n G1 t GE lim (1͞t) ∑ ρn (Zn) t→S ΄ n Gk t t ΅ GE lim (1͞t) ∑ ρn (Zn)1B (Zk)C lim (1͞t) ∑ ρn (Zn)1Br (Zk) t→S n Gk t→S n Gk 365 JOTA: VOL 105, NO 2, MAY 2000 ΄ t t GE lim (1͞t) ∑ ρn (Zn)1B (Zk)C lim (1͞t) ∑ ρn (Zn)1Br (Zk) t→S t→S n Gk n Gk ΅ GE[Ψ(Zk) · 1B (Zk)CE[Ψ(Zk) · 1Br (Zk) Xα P(Zk ∈B)CV*P(Zk)∈Br )HV* This is a contradiction Thus, S is an invariant ᮀ Theorem 4.1 Suppose that there exists a Markov policy LS(·͉i, s) such that S PiL Ά * {ξ Gj}· G1, S n G1 (38) n for any i, j∈I Then, there exists an optimal stationary policy for all problems with the objective functions Λ, Ψ, V Proof We define a new policy Φ∏(·͉i, s)G Φ(· ͉i, s), ΆL(·͉ i, s), if (i, s)∈S, otherwise, where Φ and S are mentioned in (36)–(37) Let (i0 , s0 )∈IBP dA1 be fixed, and let τ Gτ (i0 , s0 ) be the first hitting time of the set S of (Zn ) From (37)– (38), it follows that P(τ FS )G1 So, in a way similar to the proof of Theorem 3.1 and Theorem 3.2, we get S V(i0 , s0 , Φ∏S)G ∑ V* · P(τ Gk, Zk ∈S ) k G1 GV*P(τ FS )GV*, and this means that Φ∏S is an optimal policy for the objective function V(·) The fact that Φ∏ is optimal for the objective function Λ(·) follows from Theorem 3.2 and for Ψ(·) is deduced from Inequalities (13) and (25) In this case, we have Λ*GΨ *GV* Let F, C , , C m be a decomposition of IBP dA1 with respect to Φ∏S as in Theorem 3.1 Then, there exists a function h defined on IBP dA1 such that (EATΦ∏)hGρΦ∏AV*, (39) where TΦ (· ͉i, s)G(T· ͉i, s, Φ )G Ύ T(· ͉i, s, a)Φ(da͉i, s), A 366 JOTA: VOL 105, NO 2, MAY 2000 and ρΦ (i, s)Gρ(i, s, Φ )G Ύ ρ(i, s, a)Φ(da͉i, s) A m Indeed, the function h can be defined on CG*i G1 C i, because TΦ∏ is quasicompact and ρΦ∏AV* is a charge on every C r On the other hand, the set F is transient, so ∑ T Φ∏ (F͉i, s)FS, n n for any (i, s)∈F Since ρΦ∏AV* is bounded, we can define the function h on F by S h(i, s)G ∑ TΦn ∏ (ρΦ∏AV*͉i, s) n G0 It is easy to check that h satisfies Eq (39) Let h be such a solution of (39); we put S(i, s)G{a∈A: h(i, s)AT(h͉i, s, a)Xρ(i, s, a)AV*}; here, S(i, s) is measurable and by (39) and Φ(S(i, s)͉i, s)H0, for any (i, s), so it follows from the selection theorem that we can find a map f: IBP dA1 → A such that h(i, s)AT(h͉i, s, f (i, s))Xρ(i, s, f (i, s))AV* Putting Tf (h͉i, s)GT(h͉i, s, f (i, s)), ρf (i, s)Gρ(i, s, f (i, s)), from (39)–(40) we get h(i, s)ATf (h͉i, s)Xρf (i, s)AV*, Tf (h͉i, s)AT 2f (h͉i, s)pXTf (ρf ͉i, s)AV*, · ·· TfnA1 (h͉i, s)ATfn (h͉i, s)XT fnA1 ( ρf ͉i, s)AV* This implies that t 0Xlim sup (1͞t) ∑ T nf (ρf ͉i, s)AV*, t→S n G1 (40) JOTA: VOL 105, NO 2, MAY 2000 367 i.e., t V(i, s, f S )Glim sup (1͞t) ∑ T nf ( ρf ͉i, s)GV* t→S This means that f is proved S n G1 is an optimal stationary policy Therefore, the theorem ᮀ Corollary 4.1 If there exists a0 ∈A such that the chain P(a0 ) is Harris recurrent, then there exists an optimal stationary policy In conclusion, we see that Λ(i, s, π )XΨ(i, s, π )XV(i, s, π ), for any policy π ∈Π; and if ΦS is a Markov policy, then under the hypotheses of Theorem 4.1, we have Λ(i, s, ΦS )GΨ(i, s, ΦS )GV(i, s, ΦS ) But we have proved that there exists a stationary policy f S such that Λ(i, s, f S )GV(i, s)GV(i, s, f S )GV*, for any (i, s)∈IBP dA1 This implies that Λ(i, s)GΛ(i, s, f S ) Theorem 4.2 If there exists a Markov policy such that condition (38) is satisfied, then there exists a sationary policy f S which minimizes the Lyapunov exponent of solutions of the system (10) In this case, the spectrum of the system (10) consists of only one element, namely Λ* This means that lim (1͞t) log͉X f S (t, x)͉GΛ*, t→S S P if -a.s., for any i∈I and x∈R d \{0} Example 4.1 We consider an example for dG2, IG{1, 2}, and P(a)G 1Aα ΄β α , ∀aG(α , β )∈AG[1͞2; 2͞3]B[1͞3; 2͞3] 1Aβ ΅ Suppose that, for any i∈I, the expectation λ i GE log ͉M(i, Yn)s͉ 368 JOTA: VOL 105, NO 2, MAY 2000 does not depend on s∈I Without loss of generality, we can suppose that λ 1Yλ In this case, ρ defined in Section is now given by ρ(i, s, a)Gpi1λ 1Cpi2 λ By Lemma 4.1, it suffices to consider only stationary policies Let { f (i, s)} be such a policy, and let ν (di, ds) be an invariant measure associated to f S Then, Ύ V(ν, f S )G ρ(i, s, f (i, s))ν (di, ds) ∑ j ∈I Gλ 1C(λ 2Aλ ) Ύ p ( f (i, s))λ ν (di, ds) ij j Ύ p ( f (i, s))ν (di, ds) i2 Xλ 1C(1͞3)(λ 2Aλ ) G(2͞3)λ 1C(1͞3)λ The inequality takes places if and only if f (1, s)G1͞3, f (2, s)G2͞3 (41) This means that an optimal policy is given by (41), and in this case V*G(2͞3)λ 1C(1͞3)λ It is easy to see that we can choose an example where V*Y0, but in general V(ν, f )X0 for stationary policies; i.e., under optimal policies, our system is stable; while, under other policies, our system may not be stable References FURSTENBERG, H., and KIFFER, Y., Random Matrix Products and Measures on Projective Spaces, Israel Journal of Mathematics, Vol 46, pp 12–32, 1982 ARNOLD, L., and KLIEMAN, W., Qualitative Theory of Stochastic Systems, Probabilistic Analysis and Related Topics, Vol 3, pp 1–79, 1983 DU, N H., and NHUNG, T V., Relation between the Sample and Moment Lyapunov Exponents, Stochastics and Stochastics Reports, Vol 37, pp 201–211, 1991 KURANO, M., The Existence of Minimum Pair of State and Policy for Markov Decision Process under the Hypothesis of Doeblin, SIAM Journal on Control and Optimization, Vol 27, pp 296–307, 1989 KUSHNER, H., Stochastic Stability and Control, Academic Press, New York, NY, 1967 DYNKIN, E B., and IUSHKEVICH, A A., Markov Controlled Processes and Applications, Nauka, Moscow, Russia, 1976 (in Russian) JOTA: VOL 105, NO 2, MAY 2000 369 ROSENBLATT, M., Markov Process: Structure and Asymptotic Behavior, Springer Verlag, Heidelberg, Germany, 1983 DOOB, J L., Stochastic Processes, John Wiley and Sons, New York, NY, 1953 REVUZ, D., Markov Chains, North Holland, Amsterdam, Holland, 1975 10 BYLOV, B F., VINOGRAD, R E., GROBMAN, M M., and NEMYCKII, V V., Theory of Lyapunov Exponents, Nauka, Moscow, Russia, 1986 (in Russian) ... and the properties of Lyapunov exponents to show the existence of an optimal policy Furthermore, with this policy, the spectrum of the system (1) consists of only one element Notations and Hypotheses... satisfied because, in some cases, among the class of admissible controls, we are not able to find a control (ut ) that yield negative Lyapunov exponents of the solutions of (1) So, in view of. .. consider the problem of minimizing the function Λ u[x] over the class of admissible controls In this article, the main idea for solving this problem is to relate it to the Markov decision problem

Ngày đăng: 16/12/2017, 03:18

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan