forked from adrianstone55/SymbolSort
-
Notifications
You must be signed in to change notification settings - Fork 2
/
SymbolSort.txt
241 lines (171 loc) · 10.3 KB
/
SymbolSort.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
//-----------------------------------------------------------------------------
// This is an example application for analyzing the symbols from an executable
// extracted either from the PDB or from a dump using DumpBin /headers. More
// documentation is available at http://gameangst.com/?p=320
//
// This code was originally authored and released by Adrian Stone
// ([email protected]). It is available for use under the
// Apache 2.0 license. See LICENCE file for details.
//-----------------------------------------------------------------------------
OVERVIEW:
SymbolSort is a utility for analyzing code bloat in C++ applications. It works
by extracting the symbols from a dump generated by the Microsoft DumpBin utility
or by reading a PDB file. It processes the symbols it extracts and generates
lists sorted by a number of different criteria. The lists are:
- Raw Symbols, sorted by size -
This list is generated from the complete set of symbols. No deduplication is
performed so this list is intended to highlight individual large symbols.
- File contributions, sorted by size -
This list is generated by calculating the total size of symbols that contribute
to a folder path. If the input is a COMDAT dump, the source location for
symbols is the .obj or .lib file that DumpBin was run on (see usage for
details). It is important to note that for COMDAT dumps individual symbols
will appear multiple times coming from different .obj files. If the input is a
PDB file, the source location for symbols is the actual source file in which
the symbol is defined. The source file for data symbols is not always clearly
defined within the PDB so in some cases it is a best guess.
- File contribution, sorted by path -
This is a complete, hierarchical list of the size of symbols in all
contributing source files.
- Symbol Sections / Types, sorted by total size and by total count -
This shows a breakdown of symbols by section or type, depending on the kind
of information that can be extracted from the input source.
- Merged Duplicate Symbols, sorted by total size and by total count -
This list is generated by merging symbols with identical names. The symbols
are not guaranteed to be the same symbol. In the case of PDB input there will
be very few duplicate symbols. COMDAT input, however, should contain a large
number of duplicate symbols. This list is useful for measuring total compile
and link time for a particular symbol. A relatively small symbol that appears
in a very large number of .obj files will have a large total size and appear
near the top of this list.
- Merged Template Symbols, sorted by total size and by total count -
This list is generated by stripping template parameters from symbols and then
merging duplicates. Symbols std::auto_ptr<int> and std::auto_ptr<float> will
be transformed into std::auto_ptr<T> in this list and be counted together.
- Merged Overloaded Symbols, sorted by total size and by total count -
This list is generated by stripping template parameters and function parameters
from symbols and then merging duplicates. Overloaded functions sqrt(float) and
sqrt(double) will be transformed into sqrt(...) in this list and be counted
together.
- Symbol Tags, sorted by total size and by total count -
This list represents a tag cloud generated from the symbol names. The symbols
are tokenized and the total size and count is tallied for each token. I'm not
sure what this list is good for, but I'm all about tag clouds so I couldn't
resist including it.
USAGE:
SymbolSort [options]
Options:
-in[:type] filename
Specify an input file with optional type. Exe and PDB files are
identified automatically by extension. Otherwise type may be:
comdat - the format produced by DumpBin /headers
sysv - the format produced by nm --format=sysv
bsd - the format produced by nm --format=bsd --print-size
-out filename
Write output to specified file instead of stdout
-count num_symbols
Limit the number of symbols displayed to num_symbols
-exclude substring
Exclude symbols that contain the specified substring
-diff:[type] filename
Use this file as a basis for generating a differences report.
See -in option for valid types.
-searchpath path
Specify the symbol search path when loading an exe
-path_replace regex_match regex_replace
Specify a regular expression search/replace for symbol paths.
Multiple path_replace sequences can be specified for a single
run. The match term is escaped but the replace term is not.
For example: -path_replace d:\\SDK_v1 c:\SDK -path_replace
d:\\SDK_v2 c:\SDK
-complete
Include a complete listing of all symbols sorted by address.
Options specific to Exe and PDB inputs:
-include_public_symbols
Include 'public symbols' from PDB inputs. Many symbols in the
PDB are listed redundantly as 'public symbols.' These symbols
provide a slightly different view of the PDB as they are named
more descriptively and usually include padding for alignment
in their sizes.
-keep_redundant_symbols
Normally symbols are processed to remove redundancies. Partially
overlapped symbols are adjusted so that their sizes aren't over
reported and completely overlapped symbols are discarded
completely. This option preserves all symbols and their reported
sizes
-include_sections_as_symbols
Attempt to extract entire sections and treat them as individual
symbols. This can be useful when mapping sections of an
executable that don't otherwise contain symbols (such as .pdata).
-include_unmapped_addresses
Insert fake symbols representing any unmapped addresses in the
PDB. This option can highlight sections of the executable that
aren't directly attributable to symbols. In the complete view
this will also highlight space lost due to alignment padding.
SymbolSort supports several types of input files:
- COMDAT dump -
A COMDAT dump is generated using the DumpBin utility with the /headers option.
DumpBin is included with the Microsoft compiler toolchain. SymbolSort can
accept the dump from a single .lib or .obj file, but the best way to use it is
to create a complete dump of all the .obj files from an entire application.
The Windows command line utility FOR can be used for this:
for /R "c:\obj_file_location" %n in (*.obj) do "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin\DumpBin.exe" /headers "%n" >> c:\comdat_dump.txt
This will generate a concatenated dump of all the headers in all the .obj
files in c:\obj_file_location. Beware, for large applications this could
produce a multi-gigabyte file.
- PDB or EXE -
SymbolSort supports reading debug symbol information from .exe files and .pdb
files. The .exe file will only be used to find the location of its matching
.pdb file, and then the symbols will be extracted from the PDB. SymbolSort
uses msdia140.dll to extract data from the PDB file. Msdia140.dll is included
with the Microsoft compiler toolchain. In order to use it you will probably
have to register the dll by running this command from an elevated command
prompt:
regsvr32 "c:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA SDK\bin\amd64\msdia140.dll"
It is important that you register the 64-bit version of msdia140.dll on 64-bit
Windows and the 32-bit version on 32-bit Windows. Note that SymbolSort works
with multiple versions of msdia*.dll, from at least msdia90.dll to
msdia140.dll.
- NM dump -
Similar to the COMDAT dump, SymbolSort can accept symbol dumps from the unix
utility nm. The symbols can be extracted from .obj files or entire .elfs.
SymbolSort supports bsd and sysv format dumps. Sysv is preferred because it
contains more information. The recommended nm commands lines are:
nm --format=sysv --demangle --line-numbers input_file.elf
nm --format=bsd --demangle --line-numbers --print-size input_file.elf
BUILDING:
The source for SymbolSort is distributed as a single file, SymbolSort.cs. It
can be built as a simple C# command line utility. In order to get the msdia140
interop to work you must add msdia140.dll as a reference to the C# project.
That is done either by dragging and dropping the dll onto the references folder
in the C# project or by right clicking the references folder, selecting "Add
Reference" and then browsing for the msdia140 dll.
You may get this error message:
A reference to 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA
SDK\bin\amd64\msdia140.dll' could not be added. Please make sure that the
file is accessible, and that it is a valid assembly or COM component.
This just means that msdia140.dll has not been registered. This is easily
fixed by running this command from an administrator command prompt:
regsvr32 "c:\Program Files (x86)\Microsoft Visual Studio 14.0\DIA SDK\bin\amd64\msdia140.dll"
REVISION HISTORY:
1.2 + Upgraded to Visual Studio 2010 / msdia100.dll
+ Added -path_replace option to convert paths stored in PDBs.
+ Added -complete option to dump a full list of all symbols sorted by
address.
+ Added several options for controlling what symbols are included in PDB
dumps since PDBs often list the same address redundantly under
different labels.
1.1 + Added support for computing differences between multiple input sources
+ Added support for nm output for PS3 / unix platforms.
+ Changed command line parameters. See usage for details.
+ Added section / type information to output.
1.0 + First release!
FUTURE WORK (to be done by someone else!):
* Add a GUI frontend to allow interactive filtering and sorting.
* Read both the PDB and the COMDAT dump simultaneously and cross-reference
the two. This would enable new kinds of analysis and richer dumps.
* Produce additional merged symbol reports by merging all symbols from the
same class or namespace or that match based on some more clever fuzzy
comparison.
* Improve relative -> absolute path conversion for nm inputs
* Figure out how to extract string literal information from PDB.