-
Notifications
You must be signed in to change notification settings - Fork 38
FAQ
Does X have an API? If so, don't think of it as a choice between timemory and X, think of it as timemory being able to provide X in addition to a whole lot of other things. If X is not already provided, create a feature request or a pull request with the implementation.
- If you don't have the first set of results yet, it is often convenient to place them in a known directory, e.g.
TIMEMORY_OUTPUT_PATH=baseline
- Once you have the first set of results, set the environment variable
TIMEMORY_INPUT_PATH
to the directory containing the results - Enable
TIMEMORY_TIME_OUTPUT
-- this will create another sub-folder with the time-stamp of the run - Enable
TIMEMORY_DIFF_OUTPUT
-- this will instruct timemory to search the input path, load any results found, and compute the difference - Run the executable again and when the output is generated, there will be additional outputs reporting the differences.
$ export TIMEMORY_OUTPUT_PATH=baseline
$ ./myexe
$ ls -1 baseline/
wall.txt
wall.json
wall.jpeg
$ export TIMEMORY_INPUT_PATH=baseline
$ export TIMEMORY_TIME_OUTPUT=ON
$ export TIMEMORY_DIFF_OUTPUT=ON
$ ./myexe
$ ls -1 baseline/
wall.txt
wall.json
wall.jpeg
2020-08-27_04.12_PM/
$ ls -1 baseline/2020-08-27_04.12_PM/
wall.txt
wall.json
wall.jpeg
wall.diff.txt
wall.diff.json
wall.diff.jpeg
It will probably be most productive if you don't modify the source code initially. In C++ with the template API, there is no significant difference between creating a component in an external project vs. the component being defined within the timemory source -- the only thing gained by building a component within the source is that the component can then be assigned an enumeration ID (timemory/enums.h
) and can then be mapped into C, Python, and Fortran but that's not critical in the early stages -- the most important thing starting out is getting the contents of the component definition written and doing that in a stand-alone executable makes things very clean, straight-forward, and productive.
Recommended Steps:
- create a folder,
component-dev
- create a simple
component-dev/CMakeLists.txt
-
component-dev/ex_component_dev.cpp
file which is just your:- main
- Component definition
- Some code in the main that uses your component around some code that will produce data for it to collect.
cmake_minimum_required(VERSION 3.11 FATAL_ERROR)
project(component-dev LANGUAGES CXX)
set(timemory_FIND_COMPONENTS_INTERFACE timemory-component-dev)
find_package(timemory REQUIRED COMPONENTS headers cxx)
add_executable(ex_component_dev ex_component_dev.cpp)
target_link_libraries(ex_component_dev timemory-component-dev)
# find/add any include-dirs, libs, etc. that you need for your component
find_library(TPL_LIBRARY
NAMES some-library
# ... etc.
)
target_link_libraries(ex_component_dev ${PDH_LIBRARY})
#include "timemory/timemory.hpp"
using namespace tim::component;
TIMEMORY_DECLARE_COMPONENT(component_dev) // forward declare your component
using dev_bundle_t = tim::component_tuple<component_dev>; // use an alias like this to call the component
int main(int argc, char** argv)
{
tim::timemory_init(argc, argv);
// create a measurement instance and give it the label "work"
dev_bundle_t _obj("work");
_obj.start(); // start recording
// ... do something that produces data your component can collect ...
_obj.stop(); // stop recording
tim::timemory_finalize();
return EXIT_SUCCESS;
}
namespace tim
{
namespace component
{
struct component_dev : public base<component_dev, some_data_type>
{
static std::string label() { return "component_dev"; }
static std::string description() { return "collects some component data"; }
auto record() { ... something ... }
auto start() { ... something ... }
auto stop() { ... something ... }
};
}
}
// this is only really necessary if threads are used
TIMEMORY_INITIALIZE_STORAGE(component_dev)
Once you have this set up, just try to encapsulate taking one measurement as one instance of component_dev
, where:
-
void start()
starts the measurement and stores the measurement data as member data for that instance -
void stop()
stops the measurement and computes the difference between the current measurement and the measurement instart()
A couple things to note:
- The
some_data_type
inbase<component_dev, some_data_type>
above doesn't have to be the "final" data type reported by the component. You should set that data type to whatever data type is optimal for between start/stop- The base-class will provide
some_data_type value
andsome_data_type accum
. In general, it is recommended to record tovalue
instart()
and then instop()
, record value asvalue = (record() - value)
and thenaccum += value
. This way, start() and stop() can be called multiple times without issue:value
is represents the most recent measurement or delta andaccum
is the sum/max/etc. of one or more phases.
- The base-class will provide
- The "final" data type is what is returned the "get() const" member function, e.g. for the wall-clock timer:
-
some_data_type
isint64_t
and the values are always in nanoseconds - the
get()
function isdouble get() const
and it takesaccum
and converts it seconds.
-
For the most part, I would just recommend looking at the existing components in the components.hpp
files for the folders in source/timemory/components. In particular, the timing/components.hpp, rusage/components.hpp, and io/components.hpp and modeling your component definition after the component which is most similar. Those component definitions are really the gist of what needs to be done; all the other stuff in */types.hpp
, etc. is just window dressing and enhancements (like adding support for statistics, unit conversion, mapping the type to strings and enums, etc.) which is not at all necessary for an initial implementation.
When you have a simple stand-alone implementation like ex_component_dev.cpp
above, you can open a PR and we can tell you how to migrate it into the actual source code so that is then becomes universally available in C, C++, Python, and Fortran.
NOTE: Although
data_tracker
is recommended below, you can always create a custom component which will accept any arguments desired and store the data however necessary.
There is a templated tim::component::data_tracker<Tp, Tag>
where Tp
is the data type you want to track and Tag
is just an arbitrary struct to differentiate component X that tracks ints and component Y that tracks ints. You basically put the data_tracker
in a bundle and call the store(...)
function and pass in any data that you want to store but beware of implicit conversions when using multiple data trackers. Three concrete (i.e. non-templated) implementations are provided:
data_tracker_integer
data_tracker_unsigned
data_tracker_floating
It can be quite useful to create a dedicated auto_tuple
bundle for the just the components which track data bc you can append in a single line because the auto_*
bundles call start()
and stop()
at construction/destruction.
This is probably best demonstrated with an example.
using namespace tim::component;
// component_tuple requires explicit start, stop is optional (will call stop if started when it gets destroyed)
using bundle_t = tim::component_tuple<wall_clock, data_tracker_integer>;
// auto_tuple automatically calls start and stop
using tracker_t = tim::auto_tuple<data_tracker_integer>;
// auto_* bundles will write to stdout when destroyed
tim::settings::destructor_report() = true;
bundle_t _obj{ "example" };
_obj.start();
for(int i = 0; i < 10; ++i)
{
long ans = fibonacci(10) + fibonacci(10 + (i % 3));
// store a single iteration
tracker_t{ TIMEMORY_JOIN("", _obj.key(), "#", i % 3) }.store(ans);
// join macro is like pythons "#".join(...)
// accumulate into parent bundle and demonstrate using lambda
// specifying how to update variable
_obj.store([](long cur, long upd) { return cur + upd; }, ans);
}
The result:
>>> example#0 : 110 data_integer [laps: 1]
>>> example#1 : 144 data_integer [laps: 1]
>>> example#2 : 199 data_integer [laps: 1]
>>> example#0 : 110 data_integer [laps: 1]
>>> example#1 : 144 data_integer [laps: 1]
>>> example#2 : 199 data_integer [laps: 1]
>>> example#0 : 110 data_integer [laps: 1]
>>> example#1 : 144 data_integer [laps: 1]
>>> example#2 : 199 data_integer [laps: 1]
>>> example#0 : 110 data_integer [laps: 1]
[data_integer]|0> Outputting 'timemory-ex-derived-output/data_integer.json'...
[data_integer]|0> Outputting 'timemory-ex-derived-output/data_integer.tree.json'...
[data_integer]|0> Outputting 'timemory-ex-derived-output/data_integer.txt'...
|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| STORES SIGNED INTEGER DATA W.R.T. CALL-GRAPH |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| LABEL | COUNT | DEPTH | METRIC | UNITS | SUM | MEAN | MIN | MAX | STDDEV | % SELF |
|---------------------|------------|------------|--------------|------------|------------|------------|------------|------------|------------|------------|
| >>> example | 1 | 0 | data_integer | | 1469 | 1469 | 1469 | 1469 | 0 | 0 |
| >>> |_example#0 | 4 | 1 | data_integer | | 440 | 110 | 110 | 110 | 0 | 100 |
| >>> |_example#1 | 3 | 1 | data_integer | | 432 | 144 | 144 | 144 | 0 | 100 |
| >>> |_example#2 | 3 | 1 | data_integer | | 597 | 199 | 199 | 199 | 0 | 100 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|
The default behavior is to add entries together but the member function store
of the data tracker supports using a binary-function or lambda:
using namespace tim::component;
struct A_data {}; // differentiator type for data of type "A"
struct B_data {}; // differentiator type for data of type "B"
using A_tracker = data_tracker<int, A_data>;
using B_tracker = data_tracker<double, B_data>;
using bundle_t = tim::auto_tuple<A_tracker, B_tracker>;
void foo(int A, double B)
{
// update lambda for B
auto B_update = [](double current, double incoming) {
return std::max(current, incoming);
};
bundle_t obj{ "foo" };
obj.store(A); // adds A instances
obj.store(B_update, B); // records max
}
Timemory is supported by a python project called Hatchet which converts the *.tree.json
output files into a Pandas data-frames for extended analysis. In the longer term, we will be developing a framework for performance unit testing which will automate historical comparisons but, at present, the roll-your-own-solution would be to just import multiple jsons into Python and do a comparison there. The regular .json
files (w/o .tree
) provide the results in a flat JSON array so it is relatively easy to traverse (the json trees require recursion to process).
Also, you can specify TIMEMORY_INPUT_PATH=/path/to/some/folder/of/results
+ TIMEMORY_DIFF_OUTPUT=ON
in the environment (or the corresponding tim::settings::input_path()
, etc.) and when the results are finalized, timemory will search that input folder, try to find a corresponding input, and then generate additional *.diff.*
files in the output folder with the difference between any matches it can find.
One options is to prefix the files with some key (such as the PID):
tim::settings::output_prefix() = std::to_string(tim::process::get_id()) + "-";
Another options is to specify the TIMEMORY_TIME_OUTPUT=ON
in the env (tim::settings::time_output()
) and that will create subdirectories which are time-stamped. This time-stamp gets fixed for the process the first time settings::get_global_output_prefix()
is called and time-output is enabled. Also, there is a time_format()
in settings which allows you to customize the time-stamping format using the strftime
format.
First off, do not call timemory_finalize()
and then try use timemory after that point. That routine is designed to delete a lot of things and seg-faults are basically a certainty. However, you can pretty much dump the current state of the call-stack storage at any point for any thread on any process any time you want as long as you are fine with the side-effects. The only known issue is flushing the output on the primary thread while secondary threads are attempting to update their data. Using this approach, you can have full control of where the files get written. The side-effects are:
- Any of the components that have been pushed onto the call-stack will be popped off the call-stack unless you set
settings::stack_clearing() = false
- Setting this to false may cause issues. You might get nonsensical results for components bc they might represent a single sample as a phase measurement. You might get zero in the laps columns, etc.
- If you use the higher level
storage<Tp>::instance()->get()
on the master thread, it will merge in the child threads and clear their storage - If you use the higher level
storage<Tp>::instance()->dmp_get()
(distributed memory parallelism, i.e. MPI or UPC++) on rank zero it will return the storage from all processes. - You will likely artificially inflate the values for components measuring memory usage if those are still collecting when you retrieve the storage of, say, the timers.
The best references for how to dump output is in the array-of-bundles example.
Within python, the process is quite easy:
import timemory
def write_json(fname, specific_types = [], hierarchy = False):
data = timemory.get(hierarchy=hierarchy, components=specific_types)
with open(fname, "w") as f:
f.write(data)