-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathdataTypes.txt
233 lines (209 loc) · 6.73 KB
/
dataTypes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
==common data types seen in Director/Shockwave formats==
format
{
shortened name(bytesize)
long name
aka
purposes/description
}
Uint32(4):
Unsigned 32-bit Integer
Unsigned Integer, Uint
commonly used for encoding lengths in binary files
Int32(4):
Signed 32-bit Integer
Signed Integer, Integer, Int
like a Uint32, but with positive and negative values, the first bit is a 'sign' bit,
that is, it's used to indicate if the number is negative or positive. These are
used for storing a multitude of values, excluding file/data sizes
Uint16(2)
Unsigned 16-bit Integer
Unsigned Short, Ushort
half the size of a Uint32, can store significantly smaller values, hence
being more commonly referred to as an unsigned "short". Used especially in
image dimensions because most images have yet to exceed width or heights exceeding 65,535
Int16(2)
Signed 16-bit Integer
Signed Short, Short
like an Int32, but two bytes, instead of four.
Uint8(1)
Unsigned 8-bit Integer
Unsigned Byte, Ubyte, Byte
For values that won't exceed the range 0-255
The Shockwave formats use these especially in encoding the length of shorter text strings
String(?)
String
String
Values that store text, these have an arbitrary length, which is defined somewhere before,
or even completely isolated from the actual data.
{modifiers}
Null-terminated: character data followed immediatly by a byte 0x00
Length-prefixed: character data prefixed by some number indicating its length
ASCII: text is encoded as ASCII, rather not go into super-specific details... character size: 1 byte
Varint(?)
Varying Length Integer
Variable-Length Quantity
a special type of integer that for all practical purposes, is comprised of a group of 7-bit values.
The first bit is used to signify if the next byte is part of the value. The DCR format uses these almost exclusively
Boolean(1)
Boolean
Bool
A value storing true(1) or false(0). The lowest adressable size in memory or on disk is a byte,
so these always take-up an entire byte when in use. When either not in use, or writing to a file,
these can sometimes be stored in Bitfields.
Bitfield(?)
Bitfield
Flags, Bitmask
Effectively, a super-compact array of Booleans
It's not always efficient to store entire booleans to
files, this was especially true of older systems, which
had limited storage space. As the name suggests, each
individual bit is its own value. These are typically read
using bitwise operators.
FourCC(4)
Four Character Code
FourCC
A string of exactly four characters, used by IFF and RIFF formats as ID codes for various sections/blocks
Tag
RIFF tag
RIFF tag, RIFF chunk, Chunk, Section
These are used to provide the basic stucture of shockwave files, as Shockwave is a variant of
R.I.F.F. : Resource Interchange File Format
The basic structure is this:
{
FourCC ID
Uint32 length
<length> bytes data
}
the main tag of the entire file also has one extra fourCC in the first four bytes of data:
FourCC formID
DCRTag
DCRTag
DCRTag, DCRchunk, DCRsection
Similar to a RIFF tag, but these use Varints to encode their lengths
UTCDate(4)
Universal Time Coordinated Date/Time
Date
counts milliseconds since January 1, 1970
GUID
Globally Unique Identifier
UUID, Universally Unique Identifier
16 bytes representing a unique id for an object or codebase
Shockwave/Director makes use of these in XTRAs
{
// for more info: https://en.wikipedia.org/wiki/Universally_unique_identifier#Format
Uint32 time_low
Uint16 time_mid
Uint16 time_hi_and_version
Uint16 clock_seq_hi_and_res clock_seq_low
Uint48 node //6 bytes
}
Block
Block
Struct //specifically in languages like C
Some arbitrarily sized and formatted piece of data, is generally a composite of other data types, and
is effectively a small format in and of itself. Pretty much any custom format has these in some shape or
form. There's no universal rule on how to lay these out, and it varies largely between developers.
==other stuff==
[<big|little>-end] , [<big|little|lit>]
{
used to indicate if a value's byte order is big-ended or little-ended
By default, shockwave tends to use little-endian byte order, including the FourCC
}
[ghost]
{
The actual data does not exist for this entry, but it's been pointed to for an unknown reason
}
/* */
{
multi-line comment
Not all that neccesarry in documentation, but none the less is used to
keep sidenotes separate from actual information about the format
}
//
{
single line comment, same purpose as multi-line comment, but only occur on one line
}
()
{
primarily used to indicate possible values and meanings of certain data
indicates a size if appended directly to a name like section(len)
}
@
{
specific to a certain type of data
}
[loose]
{
this structure is not concrete and might be split into pieces
}
Byte-Aligned
{
Something has been forced to always be divisible by certain length in bytes
}
==RIFF/RIFX==
RIFF: http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html
basically,
(
FourCC(4) chunkID
Uint32(4) chunklength
<chunklength> bytes chunkdata (plus extra byte if not even length)
)
special chunks (like "RIFF") have one extra field
(
FourCC(4) chunkID
Uint32(4) chunkLength
<chunklength> bytes chunkdata (plus extra byte if not even length)
(
FourCC(4) formType ("WAVE", for example)
sub-chunks
)
)
RIFX:
network byte order / big-endian
XFIR:
network byte order / big-endian
EXCEPT
(
chunkID
chunkLength
formType
), which are little-endian
==VLIST==
Vlist { // or Vector List
// Thanks to Tommysshadow for figuring this out
U32 length // inclusive
/*
There is no known way to magically identify a VList's type
It must be manually analyzed in a hex editor or similar
In any case, the only difference between a VLIST16 and a VLIST32 is
the type of numbers stored in its numbers array
The numbers Array, also, is not technically considered part of the VList
*/
@VLIST16 : {
int numlength = ((length - 4) / 2)
U16Array(numlength) numbers[]
}
@VLIST32 : {
int numlength = ((length - 4) / 4)
U32Array(numlength) numbers[]
}
u16 entrycount
U32Array(entrycount) endpointers[]
U32 listendpointer // or can just read the array with one extra value
Array(entrycount) vlist [ // could call it entries, whatever...
/*
The length of a given entry <x> is calculated with
endpointers[x + 1] - endpointers[x]
This is also why including the listendpointer in the endpointers array
is useful, because it avoids declaring an extra variable and needing an
if statement to detect the one time that variable needs to be used
It would probably be more accurate to call endpointers 'pointers',
because the pointer corresponding to any given entry is in fact its
relative offset from the base of the vlist. It is the next entry which
marks the end of the current entry's data
the base offset of the vlist is immediatly after listendpointer,
in case that wasn't obvious
*/
]
}