From 3aecdac02d56eb3c0cdca2a7f1fe4a9374e325f2 Mon Sep 17 00:00:00 2001 From: Fred Hebert Date: Tue, 4 Mar 2014 08:56:00 -0500 Subject: [PATCH] 2.1.0 docs --- index.html | 28 +++- recon.html | 8 +- recon_alloc.html | 4 +- recon_lib.html | 2 +- recon_trace.html | 331 +++++++++++++++++++++++++++++++++++++++++++++++ screen.css | 2 +- 6 files changed, 362 insertions(+), 13 deletions(-) create mode 100644 recon_trace.html diff --git a/index.html b/index.html index 7214663..501ba2d 100644 --- a/index.html +++ b/index.html @@ -13,13 +13,13 @@

Recon

Recon Application

-

Copyright © 2012-2013 Fred Hebert (BSD 3-Clause License) +

Copyright © 2012-2014 Fred Hebert (BSD 3-Clause License)

Authors: Fred Hebert (mononcqc@ferd.ca) [web site: http://ferd.ca/].

Recon is a library to be dropped into any other Erlang project, @@ -41,6 +41,13 @@

Recon Application

calls with distributed Erlang nodes. +
recon_alloc
+
+ Regroups functions to deal with Erlang's memory allocators, or + particularly, to try to present the allocator data in a way that + makes it simpler to discover the presence of possible problems. +
+
recon_lib
Regroups useful functionality used by recon when dealing @@ -48,11 +55,10 @@

Recon Application

if you were looking to extend Recon's functionality
-
recon_alloc
+
recon_trace
- Regroups functions to deal with Erlang's memory allocators, or - particularly, to try to present the allocator data in a way that - makes it simpler to discover the presence of possible problems. + Provides production-safe tracing facilities, to dig into the + execution of programs and function calls as they are running.
@@ -79,6 +85,16 @@

Recon Application

information that can be useful in determining the most common causes of node failure. + +
queue_fun.awk
+
+ Awk script to tun on an Erlang Crash dump as + awk -v threshold=<queue size> -f queue_fun.awk <crashdump> and will + show what function processes with queue sizes larger or equal to + <queue size> were operating at the time of the crash dump. May help + find out if most processes were stuck blocking on a given function + call while accumulating messages forever. +
diff --git a/recon.html b/recon.html index 339fb8a..aab9d80 100644 --- a/recon.html +++ b/recon.html @@ -13,7 +13,7 @@

Recon

@@ -221,7 +221,8 @@

Function Index

the biggest Num consumers. proc_window/3Fetches a given attribute from all processes and returns the biggest entries, over a sliding time window. -remote_load/1Equivalent to remote_load(nodes(), Mod). +remote_load/1Equivalent to remote_load(nodes(), Mod). + remote_load/2Loads one or more modules remotely, in a diskless manner. rpc/1Shorthand for rpc([node()|nodes()], Fun). rpc/2Shorthand for rpc(Nodes, Fun, infinity). @@ -475,7 +476,8 @@

proc_window/3

remote_load/1

remote_load(Mod::module()) -> term()

-

Equivalent to remote_load(nodes(), Mod).

+

Equivalent to remote_load(nodes(), Mod).

+

remote_load/2

diff --git a/recon_alloc.html b/recon_alloc.html index 9d0fb69..75b8711 100644 --- a/recon_alloc.html +++ b/recon_alloc.html @@ -13,7 +13,7 @@

Recon

@@ -369,7 +369,7 @@

snapshot_load/1

load a snapshot from a given file. The format of the data in the file can be either the same as output by snapshot_save(), or the output obtained by calling - {erlang:memory(),[{A,erlang:system_info({allocator,A})} || A <- element(3,erlang:system_info(allocator))]}. + {erlang:memory(),[{A,erlang:system_info({allocator,A})} || A <- erlang:system_info(alloc_util_allocators)++[sys_alloc,mseg_alloc]]}. and storing it in a file. If the latter option is taken, please remember to add a full stop at the end of the resulting Erlang term, as this function uses file:consult/1 to load diff --git a/recon_lib.html b/recon_lib.html index 42c4345..c3aac77 100644 --- a/recon_lib.html +++ b/recon_lib.html @@ -13,7 +13,7 @@

Recon

diff --git a/recon_trace.html b/recon_trace.html new file mode 100644 index 0000000..c16376c --- /dev/null +++ b/recon_trace.html @@ -0,0 +1,331 @@ + + + + + + + + + Recon Library + + +
+

Recon

+ +
+
+

Module recon_trace

+ + recon_trace is a module that handles tracing in a safe manner for single +Erlang nodes, currently for function calls only. + +

Authors: Fred Hebert (mononcqc@ferd.ca) [web site: http://ferd.ca/].

+ +

Description

+ recon_trace is a module that handles tracing in a safe manner for single +Erlang nodes, currently for function calls only. Functionality includes:

+ + + +

Tracing Erlang Code

+ +

The Erlang Trace BIFs allow to trace any Erlang code at all. They work in +two parts: pid specifications, and trace patterns.

+ +

Pid specifications let you decide which processes to target. They can be + specific pids, all pids, existing pids, or new pids (those not +spawned at the time of the function call).

+ +

The trace patterns represent functions. Functions can be specified in two + parts: specifying the modules, functions, and arguments, and then with + Erlang match specifications to add constraints to arguments (see + calls/3 for details).

+ +

What defines whether you get traced or not is the intersection of both:

+ +
           _,--------,_      _,--------,_
+        ,-'            `-,,-'            `-,
+     ,-'              ,-'  '-,              `-,
+    |   Matching    -'        '-   Matching    |
+    |     Pids     |  Getting   |    Trace     |
+    |              |   Traced   |  Patterns    |
+    |               -,        ,-               |
+     '-,              '-,  ,-'              ,-'
+        '-,_          _,-''-,_          _,-'
+            '--------'        '--------'
+ +

If either the pid specification excludes a process or a trace pattern +excludes a given call, no trace will be received.

+ +

Example Session

+ +

First let's trace the queue:new functions in any process:

+ +
   1> recon_trace:calls({queue, new, '_'}, 1).
+   1
+   13:14:34.086078 <0.44.0> queue:new()
+   Recon tracer rate limit tripped.
+ +

The limit was set to 1 trace message at most, and recon let us +know when that limit was reached.

+ +

Let's instead look for all the queue:in/2 calls, to see what it is +we're inserting in queues:

+ +
   2> recon_trace:calls({queue, in, 2}, 1).
+   1
+   13:14:55.365157 <0.44.0> queue:in(a, {[],[]})
+   Recon tracer rate limit tripped.
+ +

In order to see the content we want, we should change the trace patterns + to use a fun that matches on all arguments in a list (_) and returns + return_trace(). This last part will generate a second trace for each +call that includes the return value:

+ +
   3> recon_trace:calls({queue, in, fun(_) -> return_trace() end}, 3).
+   1
+  
+   13:15:27.655132 <0.44.0> queue:in(a, {[],[]})
+  
+   13:15:27.655467 <0.44.0> queue:in/2 --> {[a],[]}
+  
+   13:15:27.757921 <0.44.0> queue:in(a, {[],[]})
+   Recon tracer rate limit tripped.
+ +

Matching on argument lists can be done in a more complex manner:

+ +
   4> recon_trace:calls(
+   4>   {queue, '_', fun([A,_]) when is_list(A); is_integer(A) andalso A > 1 -> return_trace() end},
+   4>   {10,100}
+   4> ).
+   32
+  
+   13:24:21.324309 <0.38.0> queue:in(3, {[],[]})
+  
+   13:24:21.371473 <0.38.0> queue:in/2 --> {[3],[]}
+  
+   13:25:14.694865 <0.53.0> queue:split(4, {[10,9,8,7],[1,2,3,4,5,6]})
+  
+   13:25:14.695194 <0.53.0> queue:split/2 --> {{[4,3,2],[1]},{[10,9,8,7],[5,6]}}
+  
+   5> recon_trace:clear().
+   ok
+ +

Note that in the pattern above, no specific function ('_') was + matched against. Instead, the fun used restricted functions to those + having two arguments, the first of which is either a list or an integer + greater than 1.

+ +

The limit was also set using {10,100} instead of an integer, making the +rate-limitting at 10 messages per 100 milliseconds, instead of an absolute +value.

+ +

Any tracing can be manually interrupted by calling recon_trace:clear(), +or killing the shell process.

+ +

Be aware that extremely broad patterns with lax rate-limitting (or very + high absolute limits) may impact your node's stability in ways + recon_trace cannot easily help you with.

+ +

In doubt, start with the most restrictive tracing possible, with low +limits, and progressively increase your scope.

+ +

See calls/3 for more details and tracing possibilities.

+ +

Structure

+ +

This library is production-safe due to taking the following structure for +tracing:

+ +
   [IO/Group leader] <---------------------,
+     |                                     |
+   [shell] ---> [tracer process] ----> [formatter]
+ +

The tracer process receives trace messages from the node, and enforces +limits in absolute terms or trace rates, before forwarding the messages +to the formatter. This is done so the tracer can do as little work as +possible and never block while building up a large mailbox.

+ +

The tracer process is linked to the shell, and the formatter to the +tracer process. The formatter also traps exits to be able to handle +all received trace messages until the tracer termination, but will then +shut down as soon as possible.

+ + In case the operator is tracing from a remote shell which gets + disconnected, the links between the shell and the tracer should make it + so tracing is automatically turned off once you disconnect. +

Data Types

+ +

args()

+

args() = '_' | 0..255 | matchspec() | shellfun()

+ + +

fn()

+

fn() = '_' | atom()

+ + +

matchspec()

+

matchspec() = [{[term()], [term()], [term()]}]

+ + +

max()

+

max() = max_traces() | max_rate()

+ + +

max_rate()

+

max_rate() = {max_traces(), millisecs()}

+ + +

max_traces()

+

max_traces() = non_neg_integer()

+ + +

mfa()

+

mfa() = {mod(), fn(), args()}

+ + +

millisecs()

+

millisecs() = non_neg_integer()

+ + +

mod()

+

mod() = '_' | module()

+ + +

num_matches()

+

num_matches() = non_neg_integer()

+ + +

options()

+

options() = [{pid, pidspec() | [pidspec(), ...]} | {timestamp, formatter | trace} | {args, args | arity} | {scope, global | local}]

+ + +

pidspec()

+

pidspec() = all | existing | new | recon:pid_term()

+ + +

shellfun()

+

shellfun() = fun((term()) -> term())

+ + +

Function Index

+ + + +
calls/2Equivalent to calls({Mod, Fun, Args}, Max, []). +
calls/3Allows to set trace patterns and pid specifications to trace +function calls.
clear/0Stops all tracing at once.
+ +

Function Details

+ +

calls/2

+
+

calls(X1::mfa(), Max::max()) -> num_matches()

+

Equivalent to calls({Mod, Fun, Args}, Max, []).

+ + +

calls/3

+
+

calls(MFAs::mfa() | [mfa(), ...], Max::max(), Opts::options()) -> num_matches()

+

Allows to set trace patterns and pid specifications to trace +function calls.

+ +

The basic calls take the trace patterns as tuples of the form + {Module, Function, Args} where:

+ + + +

There is also an argument specifying either a maximal count (a number) + of trace messages to be received, or a maximal frequency ({Num, Millisecs}).

+ +

Here are examples of things to trace:

+ + + +

There's a few more combination possible, with multiple trace patterns per call, and more +options:

+ + + + Also note that putting extremely large Max values (i.e. 99999999 or + {10000,1}) will probably negate most of the safe-guarding this library + does and be dangerous to your node. Similarly, tracing extremely large + amounts of function calls (all of them, or all of io for example) + can be risky if more trace messages are generated than any process on + the node could ever handle, despite the precautions taken by this library.

+ +

clear/0

+
+

clear() -> ok

+

Stops all tracing at once.

+ +
+ + diff --git a/screen.css b/screen.css index f5746ee..b4bc0e1 100644 --- a/screen.css +++ b/screen.css @@ -56,7 +56,7 @@ nav { nav ul li { float: left; list-style-type: none; - width: 10em; + width: 7.5em; text-align: center; }