This repository has been archived by the owner on Oct 26, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path.openai.yml
178 lines (158 loc) · 4.51 KB
/
.openai.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
envs:
- id: GuessingGame
version: 1
entry_point: GuessingGame:GuessingGame
timestep_limit: 200
description: |
The goal of the game is to effective use the reward provided
in order to understand the best action to take.
After each step the agent receives an observation of:
0 - No guess yet submitted (only after reset)
1 - Guess is lower than the target
2 - Guess is equal to the target
3 - Guess is higher than the target
The rewards is calculated as:
((min(action, self.number) + self.bounds) / (max(action, self.number) + self.bounds)) ** 2
This is essentially the squared percentage of the way the
agent has guessed toward the target.
Ideally an agent will be able to recognise the 'scent' of a
higher reward and increase the rate in which is guesses in that
direction until the reward reaches its maximum.
requirements:
- gym
- numpy
files:
- GuessingGame/__init__.py
- GuessingGame/guessing_game
commit_hash: dcd3e4e279a9b4a5c21ea366b1bc157d58ce2490
envs:
- id: HotterColder
version: 1
entry_point: HotterColder:HotterColder
timestep_limit: 200
description: |
The goal of the game is to guess within 1% of the randomly
chosen number within 200 time steps
After each step the agent is provided with one of four possible
observations which indicate where the guess is in relation to
the randomly chosen number
0 - No guess yet submitted (only after reset)
1 - Guess is lower than the target
2 - Guess is equal to the target
3 - Guess is higher than the target
The rewards are:
0 if the agent's guess is outside of 1% of the target
1 if the agent's guess is inside 1% of the target
The episode terminates after the agent guesses within 1% of
the target or 200 steps have been taken
The agent will need to use a memory of previously submitted
actions and observations in order to efficiently explore
the available actions.
requirements:
- gym
- numpy
files:
- HotterColder/__init__.py
- HotterColder/hotter_colder.py
commit_hash: dcd3e4e279a9b4a5c21ea366b1bc157d58ce2490
envs:
- id: EightPuzzle
version: 0
entry_point: EightPuzzle:EightPuzzle
timestep_limit: 200
requirements:
- gym
- numpy
- six
files:
- EightPuzzle/__init__.py
- EightPuzzle/eight_puzzle.py
commit_hash: e334adee9b26bf1d3f687accf30b53fcaa50bc18
envs:
- id: BanditTwoArmedDeterministicFixed
version: 0
entry_point: Bandit:BanditTwoArmedDeterministicFixed
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTwoArmedHighHighFixed
version: 0
entry_point: Bandit:BanditTwoArmedHighHighFixed
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTwoArmedHighLowFixed
version: 0
entry_point: Bandit:BanditTwoArmedHighLowFixed
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTwoArmedHighLowFixedNegative
version: 0
entry_point: Bandit:BanditTwoArmedHighLowFixedNegative
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTwoArmedLowLowFixed
version: 0
entry_point: Bandit:BanditTwoArmedLowLowFixed
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTenArmedRandomFixed
version: 0
entry_point: Bandit:BanditTenArmedRandomFixed
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTenArmedRandomRandom
version: 0
entry_point: Bandit:BanditTenArmedRandomRandom
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py
envs:
- id: BanditTenArmedRandomStochastic
version: 0
entry_point: Bandit:BanditTenArmedRandomStochastic
timestep_limit: 1
requirements:
- gym
- numpy
files:
- Bandit/__init__.py
- Bandit/eight_puzzle.py