MAKING COMPLEX DECISIONS
AI in Modern Approach : Stuart Russell Peter Norvig, Prentice Hall, 2003, Page 462~488
|
|
p613
4 × 3 +1 -1
p614
±×¸² 1 4 × 3 0.8 0.2 +1 -1 -0.04
[Up, Up, Right, Right,
Right] (1, 1) Up (1, 2)
(2, 1) (1, 1) [Up, Up,
Right, Right, Right] (4, 3)
0.32776
T(s, a, s') s' a s s' s s T (s, a, s')
s R(s) -0.04 +1 -1
p615
+1 10 0.6 -0.04
T(s, a, s')
R(s)
¥ð ¥ð(s) ¥ð
¥ð* ¥ð* s ¥ð*(s)
(4, 2) (3, 1) (4, 2)
R(s) R(s) R(s) ¡Â -1.6284 -1 -0.4278 ¡Â R(s) ¡Â -0.0850 +1 -1 (3, 1) (-0.0221 < R(s) < 0) (4, 1) (3, 2)
p616
±×¸² 2 R(s) = -0.04 R(s)
-1 R(s) > 0 (4, 1), (3, 2), (4, 2) R(s)
N
k > 0 (3, 1) 4
× 3
p617
N = 3 +1 Up N = 100
1.
4 × 3
2.
0 1
0
1
(1/
) - 1
= 1
p618
+¡Ä -¡Ä +¡Ä
1.
(1)
2.
3. 4 × 3 (1, 1)
¥ð ¥ð*
(2)
p619
¥ð
¥ð t
(3)
U(s) U(s) R(s) R(s) U(s)
s 4 × 3 +1
±×¸² 3 4
× 3 R(s) = -0.04
U(s)
(4)
(5)
p620
4 × 3 (1, 1)
n n
n
(6)
4 × 3
p621
VALUE-ITERATION(mdp,
mdp,
S T R
U, U' S ¥ä U ¡ç U' ; ¥ä ¡ç 0 s S
U |
±×¸² 4
±×¸² 5 k c
p622
(7)
U BU = U
N
N
N
p623
(8)
i U
s
¥ð*
if then
(9)
4 × 3
±×¸² 6
p624
s
POLICY-ITERATION(mdp) mdp S T U, U' S ¥ð s S P |
±×¸² 7
p625
s
(10)
n n
k
p626
¥ð(s) s s s
4 × 3 +1 77.5 % 81.8 %
±×¸² 8
T(s, a, s') R(s) O(s, o) o s
p627
b(s) s b b(s) a o
(11)
¥á
1. b
2. o
3.
4 × 3
b b'
a b' o a b
s'
b' b
a
¥ð(b) b
4 × 3
+1 +1 86.6 %
p629
t
t
t
O(s, o)
t
t
±×¸² 9 t
p630
±×¸² 10
U
t
p631
d
E
p632
1. O E f f O f E f E f O
n-player n > 2 O E
|
O : one |
O : two |
E : one E : two |
E = 2, O = -2 E = -3, O = 3 |
E = -3, O = 3 E = 4, O = -4 |
O two E two 4 E -4 O
a p b [p ; a ; (1 - p) : b] [0.5 : one ; 0.5 : two]
p633
|
|
|
|
A = -5, B = -5 A = 0, B = -10 |
A = -10, B = 0 A = -1, B = -1 |
s p s' s
p s' s s'
p634
(-1, -1)
(-1, -1)
|
|
|
|
A = 9, B = 9 A = -3, B = -1 |
A = -4, B = -1 A = 5, B = 5 |
two (dvd, dvd) (cd, cd)
p635
(dvd, dvd) (dvd, dvd)O E
E E E e O
o
E O
O U
-3
O E
E U
+2 U ¡Â +2
p636
U
U [p
: one ; (1 - p) : two]
E E [p : one ; (1 - p) : two] O p O one E 2p - 3 (1 - p) = 5p - 3 -3p + 4 (1 - p) = 4 - 7p p x-axis O E p
E
O O [q : one ; (1 - q) : two] E q 2q - 3 (1 - q) = 5q - 3 -3q + 4 (1 - q) = 4 - 7q O
E
-1/12 -1/12 -1/12 O E [7/12 : one ; 5/12 : two] -1/12
p637
±×¸² 11
n
p638
[7/12 : one ; 5/12
: two] -1/12 E
-1/12 E
p639
n m
> n
p641
U
p642
i
p643
= LENGTH (path with
) - LENGTH (path with
)
n