This repository is a fork of a Teensy 3/4 gprof implementation by ftrias
gprof is a profiler for applications on Unix systems. A profiler is a software analysis tool that allows for discernment of where a program spends a majority of its execution time thereby allowing an individual to determine the slowest functions within their program and optimize accordingly. gprof itself is a simple profiler and works most easily in conjunction with the GCC compiler due to its built in ‘-pg’ compilation option. GCC’s ‘-pg’ compilation option enables gprof profiling by inserting a call to the instrumentation function ‘mcount’ prior to each called function. The function call to ‘mcount’ allows for the number of times a function is called to be tracked and later reported. In addition to tracking function calls, gprof takes advantage of interrupts to stop execution of the program at set intervals and records the location of the processor’s ‘program counter’ (‘pc’) register. The ‘pc’ is a register that the processor uses to store the memory address for the next execution instruction of the currently running program. Once the program is closed, a gmon.out file is created and can be analyzed using gprof wherein the recorded ‘pc’ register calls may be resolved to a particular function and an estimated runtime for each function can be generated. Once gprof processes a gmon.out file and its associated compiled program binary, it can generate flat profile and call graph tables. The flat profile is a table that lists the estimated time spent in each function of the program whereas the call graph profile lists the relationships between functions callers and callees. These two programs can be used in conjunction to determine the slowest functions of the profiled program.
gprof2dot is a program that allows for a more intuitive interpretation of gprof output by analyzing output generated by gprof and creating a graph of the program. The generated gprof2dot graph summarizes both the call graph and percentage spent in each function in a single generated image for holistic overview of the program’s execution.
To install and prepare to use gprof on the Teensy 4.1 microprocessor, the following requirements must be met.
- The Arduino Legacy IDE must be installed (It is known to work with version 1.8.19.)
- Python 3.0 or higher must be installed.
- The OS of host computer must list connected devices in ‘/dev/’. (Most all Linux distributions satisfy this requirement.)
- Git and the GitHub CLI must be installed on the host computer.
- gprof2dot must be installed, installation instructions are in the provided link.
If all requirements are satisfied, then the following steps should be undertaken:
-
Navigate to where you would like to install gprof and run the following command in your terminal:
git clone https://github.com/JoshuaTurner3/TeensyGProf
-
Open your installed Legacy Arduino IDE and open its preferences
File > Preferences
. Under the 'Additional Boards Manager URLs' section add the followinghttps://www.pjrc.com/teensy/package_teensy_index.json
, and then clickOk
to close the preferences menu. -
Add the
TeensyGProf
library by selectingSketch > Include Library > Add .ZIP library...
and selecting theTeensyGProf.zip
folder in the repository cloned in step #1. -
Close the Legacy Arduino IDE
-
Open your file explorer and navigate to where you installed the
TeensyGProf
repository. -
Copy
boards.local.txt
andplatform.local.txt
from theTeensyGProf
repository folder and then navigate to your.arduino15
folder. It is likely installed at~/.arduino15
(You may need to enableShow Hidden Files
for your file explorer) -
Once at the
.arduino15
, navigate to./arduino15/packages/teensy/hardware/avr/<teensy-board-version>/
and then paste the files copied in step #7 into this folder. -
To install the cross compiled arm gprof binary, open your terminal and run the following command (or an equivalent command for different package managers)
sudo apt-get install -y binutils-arm-none-eabi
Confirm the file was installed correctly by running the following command and ensuring the output is not empty:
ls /usr/bin | grep arm-none-eabi-gprof
Your Legacy Arduino IDE is now set up to profile Teensy 4.1 programs.
-
Open your Terminal (
ctrl+alt+t
) andcd
to theTeensyGProf
repository cloned in step #1 and run the following command in your terminal with your desired selection of parameters (Listed below).python3 gprof_read.py <your-choice-of-options>
Parameter Function Example --hex
Convert from ASCII to HEX python3 gprof_read.py --hex
--serial
Read serial device with given name python3 gprof_read.py --serial /dev/ttyACM0
--elf
The ELF file to read (default is /tmp/build.elf
)python3 gprof_read.py --elf /path/to/elf.elf
--img
Generate an image from gmon.out
python3 gprof_read.py --img "test_1"
--save
Save gprof output to a text file python3 gprof_read.py --save "test_1"
--project
Saves gmon.out
, gprof2dot image, and gprof text output in a folder with the project namepython3 gprof_read.py --project "test_1"
--exclude
Excludes named functions from gprof processing python3 gprof_read.py --exclude "foo bar func1 func2"
The recommended command is:
python3 gprof_read.py --serial /dev/<serial-name> --project <project-name>
The serial device name for the connected Teensy 4.1 can be most easily found from the Arduino IDE under
Tools > Port
and finding the detected Teensy 4.1 port. -
Open the Sketch you would like to profile and select
Tools > Board > Teensyduino > Teensy 4.1
andTools > Profile > On
-
Edit your sketch to resemble the following:
// Other code above... // Duration of time (ms) to profile the program for unsigned long gprof_duration(30000); unsigned long gprof_timer_start(millis()); void loop() { if(gprof_timer_start && millis() - gprof_timer_start > gprof_duration) { if(gprof_end() != 0) { // Failure Serial.println("gprof failure"); Serial.println("Requested memory: "); Serial.println(grpof_memory()); } } // Rest of your loop code below } // Other code below...
-
Upload the sketch and observe the terminal running
gprof_read.py
from step #6. The program will exit when it is done processing and its results can be found in theTeensyGProf
repository folder.
-
TeensyGProf will commandeer the
systick_isr
handler. It will process it's own profiling before handing control to the originalsystick_isr
handler. Thus EventResponder and all other timing code should be unaffected. The sampling data will be stored in RAM. -
It adds the
-pg
compiler flag to cpp files. This causes the compiler to add a call to_gnu_mcount_nc
at the start of every function. That's how it keeps track of the call stack. Call stacks (called Arcs) are also stored in RAM. -
You can configure the amount of RAM memory used by the sampler in Step 1 and the call tracker in Step 2. The more memory you allocate, the more accurate your results. Look at file
gmon.h
and modifyHASHFRACTION
andARCDENSITY
. -
If you call
grpof.begin()
and pass milliseconds it will start a timer that upon termination executesgprof.end()
, which outputs the data. Otherwise you must callgprof.end()
. That processes all the data and outputs the contents ofgmon.out
to the desired port in the format requested. This file, along with a copy of theelf
file is used by gprof to generate a report. You can customize the output method by subclassing classGProfOutput
. For example, you could send this file via a network or HTTP. -
For some reason, Teensy 4 puts it's code in a section called
.text.itcm
. Gprof expects it in a section called.text
, which is the standard in Linux. Teensy 3 puts it in the right place. So theplatform.local.txt
files tells arduino to runobjcopy
to rename the section.
For ARM solution this project is based on see: https://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/
For an interesting overview of gprof: http://wwwcdf.pd.infn.it/localdoc/gprof.pdf
Original repository: https://github.com/ftrias/TeensyGProf