-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPlanned Improvements.txt
82 lines (82 loc) · 3.2 KB
/
Planned Improvements.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
Clean up info passing
Actual class or at least abstract class
Have goal not in corner
Maybe 1 or two away
Agents wouldn't be scared off by hitting walls near goal
Agents wouldn't be able to crawl along walls
Time
Punish slightly for taking time
Adjust exploration rate based on time, maybe time since positive reward
Organize Planned Improvements
const double correctionFactor for all agents
That way rewards for each square only have to be relative to each other
Make default map actually default map
Maybe only update map when scrollableList changes
Maybe temporary solution
Database system for square types
Breadcrumbs that are removable
Let LevelEditor actually EDIT levels, instead of just creating them
Document file format rules
Implement better copying system for GameMap and LevelEditor
Consider changing how one considers size of map (do mandatory edge wall count?)
Make sure 0 reward never matters
New, more interesting maps
Copy Minds
Model that uses past move or two to decide
Make sure it knows whether it's hit a wall
Bring back SpeedDemon-like system
Save changes as working model in part of memory
Only update for real at end if better
Separate current situation (walls) from global training
Model with understanding of position
Model with understanding of current and surrounding squares
Maybe use binary stuff before neural network
Model that can stay
To help build towards Labyrinth AI
To demonstrate wireheading
Include in constructors allowed move
Model that knows opposites
eg reinforcing right positively reinforces left negatively
Seems to do that on its own, oddly enough, when it hits a wall, it already careens in other direction
Models that can be multiple of the different agents with only parameters changed
Like ExponentialLearner is ExponentialWithDecay with decay=1
Would clean up code substantially
Maybe put agents in a subfolder?
Really, only linear vs. exponential is major difference, the rest is minor
Model that can tune decay, correction factor, etc. based on how it's improving
"Introspective"
"Game" that lets the player see what the thing agent sees
Visualize what is avaiable to get rewarded or punished
Histograms for the directions
Path (fading) along map
Make comparison to automata
Two agents running side by side for easy comparison
Slider for animation speed
Checkpoints
Can be datapoints: agent knows what last checkpoint was, can change behvaior accordingly
Better info display system
Change toString to use scientific notation
Implement toString in KnowsLastMove
Maybe save good info in a file, so I don't have to run it every time
Pure punishment
Document properly
Comments
Modularity
Fix strings in map (don't be lazy)
Make softReset public and remove from reward
Different classes for menu and game
Redo checkMenu() in ReinforcementLearner to only reset if an actual change was occurred
Metric for success of different agents
Expected value of model
Time to reach optimal model
Handle overflows in RP2
Find interesting seeds
Pillbug demonstration
Give orientation
Make boxes so small as to appear continuous
Maybe actually make continuous
Add reset for all agents
Make menu look better
Ecosystem? See notebook
Multiple agents?
Multiple starts? Disorientation, prevent memorization