-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBoard.lhs
177 lines (131 loc) · 6.5 KB
/
Board.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
Board.lhs
Cody Shepherd
> module Board where
> import Data.List
> import Math.Geometry.Grid
> import Math.Geometry.Grid.Square
The Rob type is an instance of Rob's location on the board. We would like to
be able to compare this location to a pair of Ints later on, so we will provide a couple
of conversion functions.
> data Rob = Rob (Int, Int)
> deriving (Show, Eq)
> rfst :: Rob -> Int
> rfst (Rob (a, b)) = a
> rsnd :: Rob -> Int
> rsnd (Rob (a, b)) = b
> rToPair :: Rob -> (Int, Int)
> rToPair (Rob a) = a
The first thing Rob has to do is move around a board.
It seems like we don't need to store all the empty locations... we can just
track where there are cans. A can, like Rob, is defined by its location.
A location will be defined in its implementation as (a, b), where a represents the "row"
or y-axis location, and b will represent the "column" or x-axis location. This is reversed
from the format of cartesian coordinates because this version makes it much easier to
use in the context of lists of lists - a being the number of the outermost list, and b
being the offset within that list.
However, this format will behave as expected (as a cartesian pair) when visualized. I.e. specifying
Rob's location at (0, 0) will put him at the bottom left corner of the grid, and moving him up will
result in a location of (0, 1). This should turn out to not matter much in the actual execution of
the q-learning algorithm, as it is primarily concerned with the "key" of each grid square rather
than its grid coordinates.
We will specify the locations of Cans in much the same way.
> data Can = Can (Int, Int)
> deriving (Show, Eq)
> cfst :: Can -> Int
> cfst (Can (a, b)) = a
> csnd :: Can -> Int
> csnd (Can (a, b)) = b
> cToPair :: Can -> (Int, Int)
> cToPair (Can a) = a
For the board, we will have to define limits, but otherwise we don't need to store
empty cells.
We will define Board as a 3-tuple with its height and width limits, plus a list
of cans and the location of Rob: ((MAXHEIGHT, MAXWIDTH), Cans, Rob)
> data Board = Board ((Int, Int), [Can], Rob)
> deriving (Eq, Show)
> bfst :: Board -> (Int, Int)
> bfst (Board (a, b, c)) = a
> bsnd :: Board -> [Can]
> bsnd (Board (a, b, c)) = b
> bthd :: Board -> Rob
> bthd (Board (a, b, c)) = c
> top :: Board -> Int
> top (Board (a, b, c)) = (fst a) - 1
> right :: Board -> Int
> right (Board (a, b, c)) = (snd a) - 1
If Rob is going to move, we would like some way of constraining his possible movements
to the cardinal directions. We'll define a direction in the order of "nsew", though it
doesn't make any difference as long as we are consistent throughout the program.
> data Dir = U
> | D
> | R
> | L
> | P
> deriving (Show, Eq, Enum, Ord)
At some point we will need to convert a Dir (which represents an action) to an index for
getting values out of the Q Table.
> dirIndex :: Dir -> Int
> dirIndex d
> | d == U = 0
> | d == D = 1
> | d == R = 2
> | d == L = 3
> | d == P = 4
We want a nice way to visualize the board.
First we need to break up a list of "[ ]" cell tokens into a list of n cell tokens, delimited
by newlines:
> splitEvery :: Int -> [String] -> [String]
> splitEvery _ [] = []
> splitEvery n s = take n s ++ "\n" : splitEvery n ss
> where ss = drop n s
Then we will process the list of cans into their places on the "grid" and build a graphical
representation of the grid for printing.
> showBoard :: Board -> IO ()
> showBoard b = do let r = (rfst (bthd b), rsnd (bthd b))
> cs = map (\x -> if any (\y -> (cfst y == fst x) && (csnd y == snd x)) (bsnd b) && (r == x)
> then "[%]"
> else if any (\y -> (cfst y == fst x) && (csnd y == snd x)) (bsnd b)
> then "[.]"
> else if r == x
> then "[o]"
> else "[ ]") (indices (rectSquareGrid (fst (bfst b)) (snd (bfst b))))
> ls = splitEvery (snd (bfst b)) cs
> s = reverse (lines (concat ls))
> r <- return s
> mapM putStrLn r
> putStrLn ""
> return ()
The very first thing we need to work out is movement around the board.
A movement is an action performed on Rob - none of the other pieces move.
Rob's movement does not affect any of the state on the board other than
Rob (with the exception of a pick-up movement).
> moveRob :: Rob -> Dir -> Rob
> moveRob (Rob (x, y)) d = case d of
> U -> Rob (x+1, y)
> D -> Rob (x-1, y)
> R -> Rob (x, y+1)
> L -> Rob (x, y-1)
> P -> Rob (x, y)
In the Q-Learning problem, a move is conflated with a reward. Every move returns some
notion of a reward, even if the value of the reward is zero.
A move is also essentially a permutation on a board, so we will need to return the updated
board as well.
The robot is rewarded 10 points for picking up a can, and is penalized 1 point for attempting to
pick up a can when none is present, and 5 points for running into a wall.
Note that here we must make sure that if the robot chooses to move into a wall, it "bounces back,"
i.e. b' is identical to b for starting board b.
> move :: Dir -> Board -> (Board, Double)
> move dir (Board (dims, cans, rob))
> | r' == r = case find (\c -> cToPair c == r) cans of
> Just a -> (Board(dims, filter (\c -> cToPair c /= r) cans, rob), 10.0)
> Nothing -> (Board(dims, cans, rob), -1.0)
> | otherwise =
> if (fst r' > (fst dims) -1 ) ||
> (fst r' < 0) ||
> (snd r' > (snd dims) -1) ||
> (snd r' < 0)
> then (Board(dims, cans, rob), -5.0)
> else (Board(dims, cans, Rob r'), 0.0)
> where
> r = rToPair rob
> r' = rToPair $ moveRob rob dir