-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
214 lines (171 loc) · 11.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
This is the University of Delaware's Self-Aware FramEwork (SAFE).
Requirements:
----------------------------------------------------------------------------------------------------
- To generate synthetic benchmarks
* Python v2.7+ (should work with Python 3 but not tested)
- To generate the framework:
* g++ v4.8+ (we are using some C++11 features not available before this version).
* GNU make (not tested with BSD make)
Building:
----------------------------------------------------------------------------------------------------
make mrproper
make
* A binary ("safe") should have been built.
Content of the Framework folder:
----------------------------------------------------------------------------------------------------
./input/ --> Folder that contains the .queue and .task files.
./logs/ --> Folder that contains the output logs of the files
(Overwrited if the folder exist and created if it does not exist.)
./instructionNewFormat_3_0_8_PerOpClassEnergyMap_TechScaled.txt --> Table with energy per
instruction (pJ), provided
by Ganesh Venkatesh.
./queueGenerationScript.py --> Python queue Generation Script
./taskGenerationScript.py --> Phyton Task Generation Script
Usage:
----------------------------------------------------------------------------------------------------
The Framework has two folders are used during execution. "input" and "logs". The user should know
that a Task is a collection of serial instructions and a Queue is a collection of tasks. These are
the necessary steps required to run the framework.
1. Generate as many synthetic tasks as required, using the following scripts:
>> taskGenerationScript.py <task name> <inst count> <full op header> <ntv op header> <inst>:<percent> ...
* The task name is defined by the user.
* The instruction count is the number of instructions that the task contains.
* Full op header and NTV op header are the columns in the
"instructionNewFormat_3_0_8_PerOpClassEnergyMap_TechScaled.txt" file.
* Currently, only 22nm has NTV information. Its header is 22nm_NTV.
* Instructions are the rows in the "instructionNewFormat_3_0_8_PerOpClassEnergyMap_TechScaled.txt"
file. It must exist.
* The sum of all the percentages of the instructions must be 100 %.
* The output of this script is a file named <task name>.task.
Example:
python taskGenerationScript.py exampleTask1 1000 22nm 22nm_NTV div_int:50 div_fp:50
This example creates the file exampleTask1.task with 500 integer divisions and 500 floating point
divisions organized in random order.
2. Generate one queue of tasks, using the following script
>> queueGenerationScript.py <queue name> <inst count> <task>:<percent> ...
* queue name is defined by the user.
* inst count is the number of instructions that the queue contains.
* For each task, a file with name <task name>. task must exist.
* The sum of all the percentages of the tasks must be 100%.
* The output of this script is a file named <queue name>.queue.
Example:
python queueGenerationScript.py exampleQueue 10000 exampleTask1:30 exampleTask2:30 exampleTask3:40
This example creates the file exampleQueue.queue containing 3000 entries of exampleTask1, 3000
entries of exampleTask2 and 4000 entries of exampleTask3.
3. Then run safe with the queue name as the only parameter.
./safe <queue name>
Example: ./safe exampleQueue
* The input/ folder is implicitly assumed
Output:
----------------------------------------------------------------------------------------------------
The output of the framework is divided in two parts.
1. Standard output: corresponds to the statistics of the execution. These statistics are:
* Chip dimensions, number of blocks, units and chips.
* Input file.
* Temperature average, variance and skew (See Bugs and Future Work section).
* Power average, variance and skew (See Bugs and Future Work section).
* Number of tasks executed.
* Number of instructions executed.
* Number of cycles
* Roles time: Correspond to the time the program used in sending and receiving messages using the
API.
* Simulation time: Correspond to the time used in updating the clock of the system and scheduling
* Barriers Time: Correspond to the time used in syncronization between all the threads.
* Updating Temperature and Power models Time: Correspond to the program spent using the power and
temperature models
* Total execution time.
2. Detailed log files: includes the temperature, power, changes in state and messages sent (See Bugs
and Future Work Section).
For each of the threads, a log file is created. The name of the log files contains the information
of the specific block, as explained below:
>> temperatureRun.log.brdWW.chpXX.untYY.blkZZ
* WW corresponds to the board number (See Bugs and Future Work section)
* XX corresponds to the chip number (See Bugs and Future Work section)
* YY Corresponds to the unit number
* ZZ Corresponds to the block number
Threads are given an ID for each level in the hierarchy (block, unit, chip, board). Right now, if a
thread is given an ID 0 for a given component, then it is also given a role as "Super-CE" for that
level of the architecture.
For example, for brd00.chp00.unit00.blk00, the thread executes all the roles of "Super-CE" for its
unit, chip, and board. But for brd00.chp00.unit10.blk00, the thread executes the role of unit
"Super-CE" only. Once again, all threads are at least assigned a role to manage a local block. This
means, that those files will contain log information for each of the role they take.
The format for the log file, for each of the lines corresponds to:
>> [TYPE_OF_LOG] [ELEMENT](Optional) {description, value and cycle}
Examples of log file lines are:
>> [RMD_TRACE_AGGREGATE] [Chip] Temperature Average of 50.00C at cycle 0.
>> [RMD_TRACE_AGGREGATE] [Unit] Temperature Average of 50.00C at cycle 0.
>> [RMD_TRACE_TEMPERATURE] Temperature of 50.0C at cycle 200000.
Configuration File:
----------------------------------------------------------------------------------------------------
The file ss-conf.h requires a more detailed explanation, because it provides some configurations
that have to be set before compiling the code. In this section we explain some of this parameters
Name Value Description
====================================================================================================
LOGGING_LEVEL 0 or 1 Enable/Disable the generation of log files.
LOGGING_INTERVAL In cycles How often the log is written.
EXECUTION_TIMES 0 or 1 Enable/Disable the time statistics as part of the
output (See output section)
N_UNITS_IN_CHIP 16UL Number of units per chip (See Bugs and Future Work
Section)
N_BLOCKS_IN_UNIT 16UL Number of blocks per unit (See Bugs and Future Work
Section)
N_CORES_IN_UNIT 8UL Number of XEs per block (See Bugs and Future Work
Section)
ROLLING_ENERGY_WINDOW 100 Specify the size, in cycles, of the window, for
computing the power.
MAX_XE_CLOCK_SPEED_MHZ 4200 Clock speed of XEs in MHz at Full State
S_CHIP_IN_MM 500 Chip size in mm^2.
TEMPERATURE_JUNCTION 127 Maximum junction point temperature.
TEMPERATURE_OPERATION 100 Maximum operating temperature threshold.
TEMPERATURE_AMBIENT 50 Minimum temperature threshold.
CYCLES_PER_ITERATION 1 Cycles passed per iteration of temperature model
(affects the temperature model).
COMM_INTERVAL 10000000 In cycles, how often to send the status information
up the tree.
BARRIER_INTERVALS 1000000 In cycles, how often the simulation is sincronized
QUEUE_FILE_SUFFIX ".queue" Extension of the files that describe a queue (See
Usage section)
TASK_FILE_SUFFIX ".task" Extension of the files that describe a task (See
Usage section)
OUT_FILE_PREFIX "temperatureRun" Outfix of the output log files (See Output section)
MAX_POWER_PER_BLOCK 10.0 Maximum power a block can transmit in its status
messages
HALF_STATE_STATIC_ENERGY 0.0 How much energy is consumed when no instruction is
being executed, during Half freq. state
(Static Energy model)
FULL_STATE_STATIC_ENERGY 0.0 How much energy is consumed when no instruction is
being executed, during Full freq. state
(Static Energy model)
STATIC_ENERGY_FACTOR_FULL 0.2 Factor of the energy that sums up to the instruction
as static energy, when full freq. state.
(Static Energy model)
STATIC_ENERGY_FACTOR_HALF 0.5 Factor of the energy that sums up to the instruction
as static energy, when half freq. state.
(Static Energy model)
BLOCK_QUEUE_MAX_SIZE 100 Limits the max number of tasks in the queue per each
block. (See Bugs and Future Work section)
LOAD_INSTRUCTIONS_CHUNK_SIZE 100000000 for large files, the file must be divided in chunks
(memory limit). That is, in instructions, the size of
this chunk
====================================================================================================
Constraints
----------------------------------------------------------------------------------------------------
* The framework is currently fixed for 1 chip, 16 units per chip and 16 blocks per unit. Changes
will result in crashes
* The framework does not support more than 1 board.
* For visualization of results it is necessary to have an additional script. Currently, our videos
have been generated in MATLAB.
* Besides the state messages, no control orders messages are being sent right now. However, the
communication infrastructure is currently available.
Known Bugs
----------------------------------------------------------------------------------------------------
* Currently, the BLOCK_QUEUE_MAX_SIZE is not functional, hence, the queues have no bounds.
* The statistics for temperature variance and skew are not available at the end of the execution.
* The statistics for power average, variance and skew are not available at the end of the execution.
TODO
----------------------------------------------------------------------------------------------------
* Some of the log messages have not been defined, however, the current format allows the definition
of these messages.
* The current scheduling is made completely randomly, changes are necessary.
* Add top-down resource management orders.