From aee802bfdb21507e5e5b9f1c614093177e96fb6d Mon Sep 17 00:00:00 2001 From: Jeffrey Novotny Date: Wed, 26 Jun 2024 13:36:02 -0400 Subject: [PATCH] Add new how to file on causal profiling and infra changes. --- docs/data/causal-foobar.png | Bin 0 -> 27358 bytes ...rumenting-rewriting-binary-application.rst | 4 +- docs/how-to/performing-causal-profiling.rst | 618 ++++++++++++++++++ docs/sphinx/_toc.yml.in | 4 +- 4 files changed, 623 insertions(+), 3 deletions(-) create mode 100644 docs/data/causal-foobar.png create mode 100644 docs/how-to/performing-causal-profiling.rst diff --git a/docs/data/causal-foobar.png b/docs/data/causal-foobar.png new file mode 100644 index 0000000000000000000000000000000000000000..a887b126aefb474c4130451c94e47bab9135edf1 GIT binary patch literal 27358 zcmcG01yq$^`{lJj2~k2oN};x?!4m1L}EuJ=Yu3OZ6=qX%j)d+gCG0C-t89?W?_8IN;IK+ZVn6a}46!mj(_q=|5M~ z89@JD?5#c&_2loxO+V>h{=N9CxaRx67o()Mdj7o_N8msC!4=6%A#C-?p}##Z2xyDT ze%11ng?+Ge?CV?THdPTxTbnc5HdISi@3n`0HpK<^`%xM{TjO@^W5DkgiANZape3&| z?W)tz6Injk`M#dy`c<(Sr)zCDf$;9m>4~}Hv}?IH&U-MX3X|imG-%WzyKW56Z_5vp z^Dc%cz1MlTx;&1-d%}$3xZt@Kz8LWavt7wwnN9{RJ*$}CT67K_Cw3@XZp!5cNWPzbiv~84Yh=PbM=h<6`H5-88Fju&pD6 zH0II<<(dGA_v#?OYf@Dr@(y3~3$P$l1xTMR=Gf1tM=(`q)2EK>ckF2Pc$btN(i>cS z<9Ga*`>Prs?(fxiH${>fZ?E(h`kk_rIjKE0ynf<6uGKiPBF{o}8TLDfukTkAqmwww zaZWhQxc{XP_x9@*S5D2Ef{3Hi>*b)j?TuQzC&tfp+#-;%0U)5l!dmXa2@lc`*Fui;!elG02n7LNSzA!H4*(aMqrv6*@4Qnzom7}0kOpOCb7jmcI|6jS20dC){6%d_ohT_<`!rz zNgT^$uQh3ZO>WsZ7IJIbuvXPY`)bxd)^V(F+8GN9n*hI0cJ97L7!y$|`?YCEWEZsK z#T!&=N>Mb<_>4!(RE=SryRx^2u2w1Yat#k{j9Y}X$~OPOaiS3d{$sEf6{P%$E#n_t zH5081hD>PB`mge5y(Xri(nC*3RUMZsV>c_7ueV3lj6QWx9_N{iiJ&5mN5*GG zWt>(;ZYt-Db-UM<<#Q8*h`KhP(!roPnGiQwLXdQ{CGuN)8$;OJCXcH~Xvn(VcvwD50lDlbe3zCZm2F$HE9}1R^^}KT zxLjYpe88{zbasSq#COmC>?8d6m<(W|ikLKm57Xm(rTVZ(gB|*29qYF-@uP#R8D{9)_ z%)kUulAjf<8ZXUmh-$EY+h5wm%26os4VrQKx;7H9`-Z!B-I8BBgWhPB%`#7evB6}$ z`SgdW+`))ZI`8!xW21}lGi4qd$$O0?jRi1bkF;)Mt3wv`-s`}4@=7u(KECFo_?1xY zjrJnV(^9C8g2-P_9RYiX*)tj|DdGx@yoUDZ-G)-U%9g7|dBG>M#IXMgvd&|C-ARQxuzEASg_!A--(=QO!zN%9;KRz!VA?ylZcA4=&W ze)ng17HnP)hqa%4+b`5?ybLsfP3&kOHsCuDos^V9UgMu#Jvn2>HSC?2Yqj5udW3@X z6ZX1gNleWRk5uS4TTF7RXFDp$n!$MC@xEwrOwp9J3)xE$SE45^nv1J@>b`j5^t8oh zXSzoAM=!Z(ia&*BU?*UeihoqkB!H^eIZE$c73`nc;CXqqjMfN)@c zSAQ-TFBsO!srlaco>Q%{bJDLvx~1jJ%d0Z{!;%2V+U=+v$Ahv)pQ`M5Ub?pFlqME? z{0>m$^lX^1wZHIFG9o1+Ji8#P<|IkrYU088$!j;Sxl(nzQLUWch^j;w zYOBzhaf+E0>5k}X%Q5=-#la-sRxj^zhMR>tPTA^$EgrxeFYZXN)ctNBV?9g8gu0;Z zYt<&mJUrIdV6 zF>`&D4@I;zm4m3D#`Mjz-ow4_70JQJ3dDQsff#kw&)Lj9=AQ0yb$`}|_u6!Y%t&=# zF(%qQ8*yO0DjaZ@$YExC?i`T@@mqqT1;reIDgI$nwH!{CYyCWXk*J~jj zwtW0X<&4KuJn;uzM%8F|840*QE34zN!NDt-pc{5L zlMR~E*_N*od>`g8E^|VM81`bc}a<;N3+kdo(nJ;lFYEO2*&qh|)Tw&&Z?V1~&Eo=F} z3`bs4-+uck?mS-HbWg?aG;Q=a!7e<`&407a5K8UdNd)w(cE`sunw) zS+tG$`=g|!rs_|X(;uw%BF-&Gs*-YW#7tM&4*t@Ygy{6deUgws=d{^;!O9xBJzX_e zYOIKWMz}d9V`N0t^c9tafuYF7R*sJkWpA%`k?NcxK7X+2k?AppxVpQ){jLnGddfeL zw1kh@3xv@I!!P=$=;)2%Vr1YhTa#rN&z~c` zm6E~)-|B)n?@Hjtb3?^rp|jgnd;j6XBUDuFdbc|{s&zBse6L7JEq3SL$>b_XyK+(x z6Mywly&~weCA5TQXsrZ?L<>rEn2!Y1igo+oc^V*rl^y3K&=uv|fgL;X|W~|e0 z1LBnH^=07QyDzAy{7t4S3pHANkv3wX=@ezLp`lG0-mEG}}DwfrBrJHs36${HVS65ev zXd+K{p=R6ZmZ__v{%k18sJE-o&I+tabGmm8|Ec{JA-w6sCP*|MBHPR(s? z7z6|&*H@Pt*4KO{I`(HRKk#{*upKSSdxyUN9JwPPSh=q3p!ne<32kHS8U5b8=r6q` zZL41t@w61lNn--s!Brd%-6kUX+^OyBSTcBgw%Zkdh2-2Ta5F^ zto>yXtur&YgU^$0g6D{hUvmKrUF&eIKX5&mM3BLDTdB>;#)btF@SHj8m&x0=Zkqmy ziA0NSL1ZK(9_{TPr>3TG=r!(t_=@t9ojp37QZ^(crC5{rXnE1O4Yv%&xsN0SvD4h> z^<+KJ9I2&d98;-AmFu2eTO%-?6&!%V@^&Pr~!dnm9_T z5}bk-$mBF7mLr!;{Anym!b0E&$)(fj6&zKb>xFaegMB|C0@SaK=y(rW(nd?_#&s!q z4|gL26FO`s(*1|xZdZO0Gr*^f(9wU=q$*%)N+~2XG@8|v(B9r21d$AmZAB+Jt@6tH z`fu3T0RdX^m!e{_!oosmJV*H4oLU5xeB;vZ(BB;$DS|havE#jE#LyY1doeJ8R&+BGlxZolFfbH>qjxnITDd$1WpMA<;?{^=GQt7h3}sGcyv1IXEWv3bNwj;)FILdMSB3#l!Un)8}_~dcB{L zj8#~Y(bLmAfj{Bn=ih6_5TkQA)Qg}|R#>i7S67$qR*k0@Aiq{OKlX>;aR-NxJG{4i zd3MiQ!N%)M)8Ut9mq*$2usmo%|IeH)4yeWxvcF!~LzYHlnMpo_GIxeE!ln!9GXF#qZc>VxE1;@1MF;=S* z48LHU#wfOkH`26hNXG+t14~&C1v7Hu4RwjQM#km|a=^C*2|TaYr3OTUFql(yb>3yy zRlnqFemP9ag&a|IeD_xbugg!EUu@cD(6a)$Ig3a82M2fZC^ekr%6{I$O6t^0o+PpD5c*it&AAo9-?3joV0#N!e!GsQVTvA!2^1 z^cp_*1Wg;snsS^)70Zw>%-+987t4CM(DJP*-|%7q_D_n%!$?a@lcNgv>j6pfp4mn&cJDdQ z@t7u+T!9;jU|?U7wlHL1aPX6;C@Bric2VL}JiN~KV3PHz3d``$t>bLv5GvN}YM2IT zzQ}y%P$IjcT!-;XveTxT+_mY<%;XBuRqSG`i+)O%y~MMjicqdwd(x5#m4@l@6t#AMM;6=0cmPPh$+D-^)y2znlaE*X z3NvXz0p%J@jaR{PxF;?G$Lp0S#?b2h4oai(A}>&-p5o&}LGi$2HCbvx<6TPrMhM@S z^v4ye37YEzX@~2B->_Be?zly&x|_{ioQZB1_wmg3vmR{as|K%t>S_JW zMfVn?#u}JkY1(ElspL5Kj85pI;>=B&yT#*D$5p>Q$DP!#yp~sxHAS#n6}nb@H!d~& z&q9tMR5}=BVhQy|4E=`@iZuQeR`L72_d31wLoi6>ww7AubOmn^gK2?f{eHnw)qt(v3BUO@;4z>n=&?A*QO8zysRvRM$ z@tn2{U?N20*xX}d@!;@dBuCz#kh`+cBNLtuEi4%J_V!TlSh@U!Zm;RUm+vlF|=c_MP2wW$3cUSGeL(1ikKXe*!b5@oxTE zF8V5xFq;_+MPGjPV=s@O!w7p(FV?UgURr5(VmM{tq+2H|X$Ni7N%@jdc?QTi=#FK6 zkrR$`U3TU2X0&y$W(^vf49w&BZR6(a%yec9$g6T(4;rWzB3)&dbuS#`&gzoIMUO~5 zfBm`(X>9bbAx@_$(OYV1cA2lPF5Qp4@5G(+3Z-(|O^UeW!@mF;0fF*7O6#!aw{OKD z%nSlhQBamZ<&&oI^YM8}N}4{ANXZ&m4O{y67O17^(hr7(7?vO|?+_ z?|!KqpG9uY=yN7w*ywR-w>2mw!La{Oe~I6&fR~mBKX3ARb6RFPP>Ri!MbCDHpOS1< zb$q~c&t{?N!3^6KadGE3CQe!iCzXuHv>?*Pp@^M_<{w(6s&SkL72XVzgFHa}K(r4?8%81km3^l;}y zVqXv`JaE8xw)8NN* z>?o_O9n1R)pUcK|%+0pd*+i|TTBu7GkE*_ZRZABEHJ*i-oLac7>`*%V>AeOao|)vg z`aBI>QKA57vnApY8lAofl4|-@*Lb6Z39^2B6ya9E_?EwuEjGJ!^`}(fj~^;akQaLT z1cqH6`~W!F+q=CR3jVxv_rPLt_?_}cVc{ye#b?iq*}EDrux7L7P3SSEX}quEW6k{% zGwyuizZthuB;Zf7=II4t%!n4_q1GvlPP}52H5|-owi)TC4r_w&$_LoA!rEk5JC+eo zR7{Wq#mHYb5NAj8A19cMC7)T^m_~PsvlIhYz12t_}Q@oz7U%vbAzt?bqYm_wh zWY>+xY^xiqFPdO>ud1J_8wlMzvG)+ihgn^_(_M8HyZHx|Hd{h~JUv1}N>MB`$&oKk z9~fI7%+O?rrnoeiDmV8s2PCk$rR6CmCZCxZT_~wY*00E*L(V9~?KOvYZC65xGK*2c|Vg&+Z{U)>>*0j^nfqY#on(>p6kVab|!P3ggK#imAM5%G-T!ROJ zriP{_E}(3|!NJyLmJ196MLqNfi}#iiXl=TxYWP1SxXj5^Pf`Df=Qj5$;pc0{{dv1p zzp{`5_SwGHpGt{&7d%3mWQHh!)FmCn3wv!uZ3Z2cFQa~toA0>E!c>V zN`@a;T3Fz4+E7`p_O!IMtxmk}L|s&!43yFnI<=EK+i04HuT}Yre?G94Pg+PwXspy2 z4}=RAO+M9Vxw+sS#dfVkPtI!6NJQDD2HB9pHolCM$&@eT?i z12p>T@?x|t&l@B5;OwQ$HCYq=G;w`ww3roXb#$V8CRi$#+k}QHdLk1Tsp8-Rlm#Bx zZWGeMhxhO8uFh6{{QaAonmlq9%l6MV@3eEb+e4R*SxrKR-*7uN?ADiQ&I zwf65XUy+d&TPzDnNJ#jPQJbR!D)h=Jq+=#glyhZzCgo-CvC+4xmi2QhVqyz|+ZaIz zoi-`MBS@KQ^g@QOE|3Sf@&U+@ZZjP#aOZJ8r_<@e(dkWa9yjOb=eJrP@DfiIc#4hP z7V=s+ip!osUS1wBL=V7sF)%TumMiu3^<}$uuP$~N_4<;sb+8uWDU0ZLHWsy(l%3kO zrG!-zHu2TeD4qs)G=hQ@IJbX03{jDJrVQ^{`&ELkXhfn;PEI9A58b{II(U|f_#xbfUXYd9)s z1Wgn3Ld;(CvQR38U*5?dzA+dK{iIakkO$jRf97JRPE1tv@yU$ij`pZuK!7&bzE;~) z6&=f>R%Cud?p=cqpfEimd0k|JgM_XAya^G|a5X2$A_wV6DKuFYnj=^E1x!<8Qe4+1b56`9Le7H=BDqQDzzhVrR0{So3$t>t9CWAwZmvskm(M!wyYI=!s!A z#6NI2-u&I{gCSpLLICzuS1fDqo~Lm|AuP%{kbL9VGCir8-qdO5F<+^X9$3|I zk4z5m0WqslHaQNcNI05fM_1PmKoTK`Ta%Guyq94h35kFH{PyF=hfUwmR`XpiDMf&o z^~(eZ41i^l(9=IPH8pL6fnYCEYq%>A@SI;p<{28#=?lx1uFR@<%n$fn<`J?ZJ7je~ zhv;{DojLm0`M?AO;|1R`eQiBW;+HSoJv}`Q0+h?m)B{QhaB$jxhmvJ0J;T8G0EI#W zO3CsT=X{Ex^0DvLeY)vQHR{Yb9$er?3Gwm$3HhrOVWQPxW>WL&Bxd)8nUzIF?|w$S zk!-;n$Avw|uxMhlp-2Q1AzLo7(Cf@Okerzh-`4Qgk`Rh3<+qQ1uOOZ&=nHD>)Aa1MX zH$3^7%ZHgFf)yPdk@2h(1GCMA1}a7;rS>6a0HFv zN*m(G%QJt?bD#Dn+osrBYQ(oSFq0e%<#qQZ>`7JN5=VMf39?3C{7VE<^qaSX4dDZc z3$F^?TTZ5Gv(oDP>c2M9TF-YoOp9EAf+F`@8mO*n4x%(Gr>SfGccvTu>)YJ$w+m@W zf>>DyETCloEF7Ty10WVU;BT6-ZQvJ&)Y?K>_ia|aT;}a66jA<}gwf8v)@HE7?dQF) z*Ct9RL78~<=8ZR8pwsD&p=kTRl1M`@Lnf^Ep)d1uX2sr_lS2SQA?3jjtc#3o0Hw28 z{$Eu~CP%J8(X@si$dIlVle4EAs@EOSjJj|s*0g$0z^f1!7ne^^Ff<~f73f9LyskXo z9uoix76T+P&erZY_Rt?cUfFDpHJl%9*k2xvo{iRBNvkUYMG4Rc|MvEFhjFnAn^ixd zs7e!FT_hp=gM^*IU^Q2LDZfs=c6^-}wlI zE`hJ0nVLm-B9$loUV@_S011C^(V=8@gp}uI{dI41r!}ra3tzTOE(PWjh>D5|po-?O zQ*$GhSWP$ss=qv$-Ffo@(2K^kkz71VnVg>msG+~5mRFaTmgsCY( zuJ6RhcOQsuuW9o)k&Sj{YV1#@ZP_iB9y*?EYajg?KKwI`>~?x;2XyT9**e$Dt?x>q zC5{Z^VGaZI+)2W8XunJ~;PGzGe4Cv-lFKZlmO=Z2n3_5OjThEOPs_=P=Oz$9kw}g9 z@S&Oi-ob(9|GK{Z(G^^Z+2XSBZO(VKf(m*)*uSV!cpmi; zg6ow%0t(J&{j+77jB2asjEs!McQVVqzP{cdum2g!ypM=jto0k6R4f4s(j2$Lq1qMf z?EZrX(iPLce*MZ`$?L&MWp`}dcfz&juRdNj_cgYGn4lhNy*2ccT(bR&BD=T=FDh)V z0=m73J&FC}2cNcpxM%xs#l69)j=dhv9Xh8|vxUV)`{i(13Myn2l=~gqtFW|v`(<7B zEC%BVCfMaAE|7FNZMT6y`W28^Gk{+$0Epstu%h1fQOd6VBC%r1Ij<+gbYQKyrd;X# zfsfBX;ATk;&3!3JXW$k6F+sQ7#Tf%d+>Xb6Dg4NwzmYX% zV_-n>=+PsfZ9FI*hk}*^Sa?t@u-!nEgHLc6H~N=)xU;}+*qV>FTHvz!5>C)!uC*jI zY}z`Jzc_J6?YA4BI#%Ek^sfiKEEk1O2*=ma51=)(w%K_k4kqQ0(dANU8RyG^uGO(q zlBONE6^h|E%3;i}s`Q0={lxI${uEEe&tJ{z&#GNKj+T_dHVImC3y(*(&wMa+k@jrLcHhqi ziuQ*A>NuKwgGiXcu4?whqVxuwnu{xLZA~wNR#ipa9%%gp`Bx6x6{~WUR^-m-hamZ> zKyKPHFJ8Dk!XTCUKPb?dM4(du8*R*8!U~g$QZ?}S6{?F1b{(}#0zw*S5y}3uvnQg) zDy=C&7smPGm>wWDp-x6dM*WFj?)8$5Z;!VOrOqqpy>hJlxAIAT#U?`PPqnk!ds1X8CncMfzDTXaWM1XbwGZN)q4?)#?N(XA~b zL4lir4AGlYSKlLY@gk6`J9tv}80|IJ*mYK!{Vn5b58w}Yl!={h%GG`)JX5-tfoSrY zg6fHzBAbX@R9?u9jPSGqcU|Ujrsa;arDtwrHSwWsok2s5@m*ASa)iHnfVV7D5)UnI zu4#t>f3UxdWu<>I77sp=@K>AJawBjRi_ZA+j<1k0JyOzisGOFn9UWhG4)jTfqOxh!9fP^627`r|UDcbJ1-UYiz!^!aeP z`6ALH5;C%npC1XR-k_5LFujz%A}E%PZEfF_t89jU^GxR;2^&Eb>k>nCk?sJXWxt*L znCa+%i7ndOdkW>2Bam*K%(|kz?Tnz_+uzSRfq){6016w|`C4lA*-ES|L7mY!L)VJm zB_P!AGPLw0#aIy_gWNV-&So46*W)nxJI2!a=WwO2PV0&$a3bd|@|I?zo z^M)851B2ae_XVh=StZU-!Ln^cx%{c6+t%yvgM%gsRe%))LGpY3TDxrTh8hOP z6o^LAxi>bx2X(!z(t2a8+MW^Qp(*n=f+O^~0YnoUjbxjRz)fAX;c3v{FZve;d zA5WStE~YY`m1qxdCyte0>$V-b(KK+4CHuJh0OHAGJn_q?jmR}_Bv&yT&eVgSGm0hh z#E%#2hK7YT1K_=Tc=(EkXJ$<)60G5Wp7-K?hqWF89hhp`Wr+SVTjmbzDlY<#D}v7> z0J|vC?!2Ab(Uis`zp_~u&R(tq6v=cbJw>m!m#610?4TC|klSzH5P?Fn4BFN3bOfRW z$lfURr+bUP+uL6-Frc%tvI_X)G<<*g;iIUi&T>aM;B+|EWUPTT&?r?-?~Q4^P78=B5Lx#qp-Rle4q^`Fe)^R>^2q zNe{?=Yl9hRnyvoo$D8Ak1faXi7fGXx!)$<3zX+fKNajvUq$Qso?Xbck-sUu)Az$YTzA{>YsT;{b`4M)stXF}Qb~r8 zhB2|=>m7W)Lcz?RW*Q8DRFZ0)i-1;U;eUz;ABCN^npGx+xjJ1Ir;^WasH-n=iwuLp z6O8otvE{%rZ`pm822O+M=x8dX^1eMS_K&l6&k6t z@#u*TyV{?g71_tP6&UoupoD!s5qBb8&E%{ZW`HMxrd;ShRAHs9oJz*?!NjX(qN`m* zfEuMFf2Pxg6<_;Tb$vQk34T7AUet`FUMhJ zEkin(alLj1K%^YdW7;YeU_}7>%MYOINYk)7)G2l6{E%W0#b!%yJ2tXaz9GiWfH2C+zG%0Dba~n`_xmqW72pZQeiO9TROVBj3!fTAhA$ZCsrGd=D}NF zK%WT?4{z~6b|K>64AF+T#LGHYA=$Dn|p)Zmo3#a zXr*Y^&ks$cBDaEAOKGbA`<9fXht*D(ZFM{U40;$YU$G;nn0~Jaefaod;;=t|7h-v zhhVQj<%-z=2wRrVd7R1@dvB2bkMta1YO#eHO;?83I$!Kh80O>ReDaTw{oEKw?#d`MD^os^4<3!bwWb-L*2 zLWaunUD{V_Oas}{U56>$UWzB*5I$K8eQI&h8}u#}X}vL$8wDl3d-v``RMeK~>FG77 zM34l-!RduABEg_>>?vKTE+-HiyuGu>E3LEtHRC`h63~6(0Sk-m&H1rIFxYUFz;*H& z@#a!W32Jyf9Zel*qFrA3Zkgwuxp5q-Duv`$7l}uEh}_oJWMIZ{t4vfx%PlT0W-ERF z{{1uU9e^=9WL2G80tqOGk<@lhWiUV^?sp+h!p58bfNlHqJ-fS&5Sa~&2!5~IJ*9A8Q4l5zx2)l=XqIP+H z6i7@=Pao`oh*o4UoK-o?PIh?S>ap3|WeE*h8~N(Ja&;RYqo5xEpHmL2wHuodpF!6Y z@PmQm^#Smfv%^8Ly`v*-Aj}?N*DKYuDg7I3l_p$_pKJ%h?dDivS3D;+5mBNc+7|J# z_R`*7!=RJXF4G0WzTHkNiIGUB4Hc1B+|ZwfdQ zR3-F%9lI)>rX&I*i*}!B4|W&q6aw5gkQ7tnE{x!p2xM45WOQ%Q$Yc_`-6*nYvUG7A z*1n9oJvhP3MMcG(J^QRc$Q>vGP4_mqf3G!&NK8Ng0W?m^D=O$Unjd~e!F>-J$&|8r zKx|G^DaKO=CMaN?s~LcSzB;?rK8o1X_?O_5vfOJmkkKS>q#)v9@nm{)z`uc@(iv|% zqc4>}Svj>F7D;n(c&G*Rg1~R`cQCQA;C6?hOvy)&AL{^B7n;Boos&c5aJbe0;_?wv z)_KI&8%ZInC+a-JKE1pnov#Aiw1T95TPnJtB5;2B&Ek>evD2+geWNI>`nmvYi&Vz% zX5gDut{`I0-je_%26u)Lgn#4)_Zu>u*UVwL0^PLXyg1p>9WT-z+p^K=ic+`AL%MtK zo_o;+yKAPC%9(s%7BrRx6&T{+z^x~q6Y$9S)F2*Wpk0wA*nR0w6)Xmg28vNsWaKJ? z+CTo8w1_7Jb$_CUie=9E*~L~sk0gWRYn{{;FTQp)&$(l*3t;%YN!3c3Kb z4xNxNFJ=KDi`?ZC@8W2$bl}TdapbQSP%a^&9#5zKMROs+aT^ZorhwB1K_IJZb~bUP zI~E}N1ffpw>kj}N?j9UuTTBlPNui^oj}>Vn1H=!6JrG@MlV!9J00L16IDUy9_yMNU zUTL-72s$GaUAHA{So`u%c0S-lo?c%4KR#ik9w?2#HH9w^kb$P~kXo_y{tIsI_@yN+ zVAi4eZ*ff7bZdr z`yL(r>D{|K5J0UofZh+`+O5AA*h!w~xXlg>SVDMD zfLTVqSSLVJ1$c|1+02PS;wJ%X0c0BXgaT(0z5aH;$q&zqF62zYJ9kV7>bXIq)B7W} z;lK}{`R<2*am*9|Nj5t(sP&+O)D=c9$p#t>H4ma#jGucTplX2L6a?U#h<6?cKxTL$ zAt_0)_}5MWC>&f5Rd4$G^x3mkF!}m$jo}|`1AhMY-+3xhq_bt1jK*#~6@eq6+MIz4 zI`o4>y~P7Xz~oV2_EGkpgpYT7fYJWa?4~QJcXSbZ6=31u#fAo;g zr%=-VJY~+CV%=WybGPy{$j2b4J3xUC=pY%@1D2*U!F{yA^}ueJ-QN7JaDrZTI{Z%q zz`~hbY?Uo8FE2;^MG>8_PF%px<^jtT&yd#j9930B10`0P zUG+8#0W5L99(3V0IKL^S0%rwOxh(O^f&>KEAncM=*=Dv!vUA7*!92Mg;joke)Cm+W zpb9?^<~R6R(A+v?eE}Rj_uW9>U-jJPRg!xZToC8Gg&#*!dQkZK0k|C`NNC^ulS|@W zzA1G1V^0gCI3Uh~osb>M98lUNE;>o zQN3=@BdG{nzqS>l{D=Bl6#M%7!;P|k8Mp2OxoSIl(oi3l%k#Y;SC~ir+nR=kXy>cT zM6NlcFUI;Pkx;?$uOQQr65##E2Kk+%%WPY z#gbCNOJ8SVd^3CF&Y*UK7I#QYOx6u_Gw5;{Rkp>Af0wlmoTxglop+*4X8Qb2Rkm*N zZaRq2o8~~Up}GmEr&YE)2y?56t~woIcFj!oY@EUF5=h#*eLve7;%>K&i}G zt1O5-2S)pSt?jBEp1%P?myw<>@%c04EiiH;W7B9ge0hM-7NQzcTf1QR?s_WYp1T9} zPhco^2XO_^yw}ZEuXu(fS{B=_L+E@c4xGb!QiPd}vpk<(Q*Xp+p05*@p+6tH!HmS% z#=s$FT6bC6$n|$2!ng0~4W^@lu0A7iCzePe4*`(q*JVMgjnbczis}^&O<+>ei!UA? z(M>hBR{}-(mk>_WNKJF z*I?x)`UFcrIHMGwY4El*HKjK0|6AMxpl>UK(HI>u3CRZaNy(VS8$anPR}1LuERDCz z)9xD;MuPFS$Qw{Flh&TC-~q%H&@Mq8#CydiV;Fvk31KlBs|Q^S(D``H#1wYt9zuc3 zrGwy8V8m(%5Jw)^0i5}3Ni%s9)q7UI;SbJt!@RftcCVOkO!9nLsg1e#fH^JKoXji+ z3lh>(h?OX9ZTkJz%Cc|)?N`u&&<0J{)6>%-{$C2_3(%L0Jhi6^ed#U<%_p}TF}PD$ zSUCG1I1q`e1{M=;z60+4sPEsO0Nb5TFNChC91AGVo-6GNTDWbc(yTED|Fo$}AI86Zb36KD|-kV={1(&h`N$~@6 z9OQqEPg>@H0t`f7OG z_W=!Fb3x$5QU(IX^!)1T${&aR3$XZsQ8x(^IrD+mo*38O{Zzpq9pIRh&Qpx9Yo9pn zRJlwOUKipela)_)!fAoKf96At50E z#ljT;2=D>XTdnn-`zofz3ySXL8w&!Vw**P;Gc^Rj*NqgOgpfu%H1{42mwt>&p|c`t!h8P_yX(+L8YF>>&V#$yQ96l#Lqn z#GKP#;ODzK^wATHl;7z+9>QY-Jj7ZGRJvQlL1XCR!UNnOo6}X1o}Tw-TrP|ie(5#% z(bno&l_CKXP77#Q>Vp(%Vq#)6SsL(KI0A0UZ4D$)T;}k@rX>ZChuLt1ZRKclJk$q+ ztjJ`V1CW#`R?~#VN0Ud?Cr@V7 z4VZXtU7K*c$T{a|-v;)PXn zuVnrMEYQqhFdU)cu-WuBK+?RK99h|Q#a)shu1266`nOx~gXEsme_CJvRy2W2ez^$v zF*GzZ3`jS8Gf2E$jX)$>D14Ry`C|lUG*b7cc1!U)G=S)fKh=ak{*M=6@T3fR zA^ER9VByTbzyMrCd31kCOyp6tZbEWF!FSLHDzrb)*)Csve7rr~iA-DlS$z(`ITT;J zMc^MmvE<=hih69iB4|{`a>EHLT!7pGr&mD7yiEj}Bcm#gxsCpy07bxy(7;=)r=zRQ z(eHJfU^SYRnTce3HCFc7K>Mr(^56CBOnh6L)wV0-x=SnMH;d1UFW!7VlayY+w|Z$T z^rXU9I6vKGcp&A`gzs{xLRwbIFCppaEah^F+|PzqPmROz#A_&C^I>*>;d?e7xwC`T z-5wR)&J%rFt(tFdo^a5`-fOQC9V(V!-@`su<5;G`&iDYx3({Zzbc9o;n|w%(RFw1g zx^NK`eMb+e7tL+l_`uOL;wv`yf|&TAjSAjaVoo*~Ob^JM>|3J_@D&u2skO0}al=;7 z8zMD=H>bDMBfzyOF`w_gH8G(99O3il&uV{;nK-gjF`h)mr70(j^W2t)33Q zI!gj50neTXAb?8iR$R=;b9JgIS(e%F?JL|ef6Cd*jkY<06y#V!rw)>x_a6aZi-{r! z-~HqWQ!F)l2YLv*r;9-!_~<})84f;%wiq1Ef6L3ANT%3S#AcJUo>#^ez4{(;DBuw;9H5 zZfCV;dwQl3lNl?>H}v97hDa=eqM{?M9`%DhprOcg!8~^ggH7l0A5R?{nZJ|G0hkiRE&9vH+4Iy$|Gdr)+SUn+$Sd@x z)A?bbObwI2p{Di{7^F*lZVrThz~e98fde?eU5E;?TaCBv%U9(Eq5~3~K|-ihs&N1qgUe!ITf+k* z6}Uve_8u_-9PN`GIOy32DzJi~suj$7lRG{~nQ)|rnb0xGKG!%A#8fhvNbuikJWrGD zfxZh6KWnSM4vz4PYt(?e1-s(vW9M+DDSv?Jn+&CbH}^; zgoL2ubJ;xsvB4_s=goc}!KVTU6k;}4e+Sqqnx>}- zwzs!WFJ`X;FE36EKQx0yGzSecJ^=xlDr&F|PS++6yuE9_;)J4-$6!y_Y|g@|eKk#Dl5D-8|jTnId4=%x>BGNp)AI%(Z{&SFu0xn$;h z0)agjMf-mAu44FnZhJv7E@mm4#TJ_y&lf_9=Qiw7?^TtX%{B8C?farU7J|H$Qlfxw zql$NEF-h9i8&HX7q$cd{+1iZ08u?8P3N7!d6_PATfofJ?VmL|%__<_B+fM+K8hd-i z@LoXb2M000+_iy&B$=pBAxWR#pqP+wa}&T@yb~pc>ZYnS!c^d1yS>b9RxadAm=P~B z3hv9F@#FN3KafncOH!C5B_?JN1QKm4D2*wStx$8*uMGS$F7zXpVcwe9gS&~?<W) z=DScNQWKwHazEjo#wg~tf+eD9`i`mq9|fB>-ckI)vvgj{I-IA4vW3@H3@lu;c&GgBFg&;0|Bd>)&%pe9=dOwCYm%e|n=r zQzXXZD6%iFD%FhW(v6<cF4*ITIylDn>|0hcr*6#uo@# z8@WJC!K=MIR@Bng1}AnwSLx0}G_tT!IN3G9`DoPwMyILIQRV3c=E2{ zcxkh;{o-dR6;>pO6|UTuo3Ii^4iEI@Wz2}V<=s#{)y|WPfoS)>AsNoahMtbQ+qReD zS85xR)@TWz-i-yS1}mGbmM6H5-OJ}0?_Zmf1Y4g57l-SNgc_0Em4Fwb)Z5b*py)NR zEUaJAo||;N9ijNye7X_6p>~kGKynoJ$a1^eNL^R@%<@a|%CR;@mFT7Z;VLEZF~#OE zg_X5+j`>B$wb3x2&dQc!4}0tKZ0d?}$uh!)O}l6P{r#svk$ZxTEt^`3F9yvwyS(n0 zh5@L(=F$S9@}vWEQcS5~hkTrQLQq6&1zQ@Yv3TEaZ|eMd&~NCr6$I_=l77)>*Ya&q24NSFsDo*%?xq4}Z!IKN87YBE(HD#qgjj?p!ZkB`sU zU{fpnC^iaRMcJ{ab3Neqijxr53-ci_A(#9SKBIsl(P;5^PHn1~49qE|t zWbq|7u81$1N)qnaaxJ=cxKrO^F*zwIyhUBS)>cze-SS8L-Y!k4SFwU^x}eV+@vA%> zl`@*U<0+D4vWM6HZz!abem+B>+g?c@f?)YuDqu%8lBUgb*+@I$H6y^B(@>L)6LX99 z8I3yR!WG_Y>Kj_sNZhg0xx6!^#_-W46Fn^&RoAV`I+GwKf^$}mLTEl|i2IsawH9_f z>xt^gu8mZT6Qwt@dE6a7FnCla|AvzCBBJR#(7wO5?+dWN4NJuyNDqm%{XXB*7D?@Zqurl_o zBsGU9(}Yfv$)pRDxy{nPyAaJV&{>YNi(E3QrQS0&l|)Hd+5G%azs5R4Eb%MwX#?Mb zm?5xIwhg2S?*T*IyG$t{nIi$)R1%0cEDDepA0MjU zI0_Vg^d3Al+(lXnR5gH_)Pb0*H}Na8Uq1*O;jd2h+k)(7J`$pyWx1<(-zM~^gb z5IKb;E;2GIJs)twh&RCxFCdLFO2U8Om-6xXFRfj9JQV!boh&V$lwDaOEwU%Al1i2m zS&LMbtSv}cLY7EGlB|;?L?pzMQ8Zby6cVy)WJ$K9Xb6?}eCzlAKA-phcm9#*8Dr+V z+%5wDA|EMaX7O>VjqE7pvfXJYPI*{ln6FQJsqrP>vW zw?`g26=(0(kTu{mI&~^DGhq+@X{YlI6i1kLTGO${n2>kCCzX-CIO zlH~91@?P4y8?y_bgXBOQ-M)Rh_=XLOVR{7Is7>#B#u)zaBH zB1%%P^|cGN2>_zux#xuqd>l}6YB>8K6++dzR8UZ`y}kXO$9Q*piC>91o$T(*m#l{l z9Rf_%)YQCMKZ|xPx<2f+aSFNrT4y@%*syYBd9F?H}Aq+6=IkfjVd2@o#=tuxUj&$z#lU+ z2e9F^ZeBf=)wpaoF=zul2EhU|Ip|zFnQ+SVef}(qu^pQ#t(wC?l7JNpgJIznFqLg( zD1TF`*7617Xc9JO&Vj!Rh3XRikE1BQ)~To*S9Ti#Yb$^2&Yefu+x&iC8g57qJYM8_ z;83FSIxH*XRRW2QGjnsbq-U*AZTdkoVwGIBSpg3vAN75r8$8q#?bZ?3@J?kdUK|Gv zYm0k*(#mS}hf9;;v~F;c6-`Zs%fwV=VK8q3hfhVnte*DiZIS}ri+KwTE3X|nMed`m zayT2@_*7`*;!=}%N{fU2S5DOrW zExqf;5R z1Nl&66JSEbv5Ogn?g?rqPo{hv92|`X`>2imyN}bXJOJq$H;CY8uEV!zva46`HI|Mc zX6?>=@W7^nflRPa2hN>)?C$w4UAGBM^VjwDIRmMRbV%B+X{JUpN85y#a$ES$q)~2u z{d&&vsFhVRKJV*&%Xi(E^9D{#^PB(OIe@2yWo2c#?@(1WhhmNRa@{;OGG%3Db(&8X zptv?Rjqq_e4&Jq*rRDJ8d&t$+!iW??cB&Mv1nBf0U0}`i`#qWTI4`e^6c0ug$@(kalifpL zy-Z&Cl(+61#DUWVW)JGBsum)d?muv#@tM&a!dvg);35wX4@b!5C1h7Ft*p5wf*esNIAxS$eKr{7n zK|ukHcn&s5OP52->e@YD{*MA=gr!4IMvLhX`8_Ty)Y`v)fA*7-7g8}XbNps8zxzD% zU2xWf>_R(V>KORc@f8G8Q%=WA{dlUstK`>|Foh^pln0I+Ir4R4qAABDtA)5qj^rG_ zR#&G36fwUS8c@jM32*~J+6Ukkl$CXcE*6-{FDTfLGzPCl!qmQ4D)kz=1|v+FbH%y2 z$DgIpySIvqQ}E)Ih(h1GoJ)Ff!>T~XE}(#>8*0lH3kV1p8TTJKU~0MovW>uo2K`hw zh+($5jx;Yp>{A}F{be>axsr>^AFOyE1fmr~KJ6C(>ZPeAA zmFuyvr8)+?>iAl!bqsdKOO0q+*Yimu4+KU`&B9hXdTEf;07!yT`ug<&=M#A2F1_XK zV-t|qD!TNk!%Mgju~6;e#qE%?_@i_661AB^$(fAg zYI?RJTc}LGzrg>+9-i*LFJI*0mbe8`2Mo3Gz)$~T;}ukFUhl}e|Y);|9^Q9IO4 z-$?=mK6yVf%N|$P&FBS(pr6uwpjbEnsy(Z|ifd?(y@aFJt(B2wvB0>!aGXB9N<>tY zl)?DWCxLx_fF>jZD9eOh`8@7XZo1J&i+TU0Uq%t$hA9nL@0#S#&DFuJ(d7?a4A{Y=8meJ=F{bH^x@ ziCTq{=HAHGL=fB|YRMaa%zU4m-`Te3o5Ffz3LzNvIP z-DqUI{J*&z7rwdf*KH}GX~y1%kVP*e{!5w@BNR@F8VacyI zJzf{a+i?4PP!hFy%AF|UqkP4R_T$@Nl)?U8KJf_&Sr;T3_}!REl8UZ5tO=W+Mv#IW zP&aIm6#~G9(T)e!2f6z~R@KqEQWu2Wu)e&NCQi9yX z-+{WRGm;79j?8ruX+qg;+u3%CjEoH1!i7Tbt5G;}!-No_I=jDkteEP(TgdRg*0M5EceIdS7HIR&zGe?TL#NGUVq$VFA|ehPeJH}@B@)0~7Bj(_s_}6Iso<#RTURfmo6MY<1g$8I zQWuL@^a|I;7HkeYX!fTXQ-UBRT>1Vt);6CT0c@kxJ5=~Z8Qp=2XRtoz0 zNMw8JjtlAC1<{{A0fYcP4nQ*z34lp-PHAYM;y=0Nyt5G#nyx{2hywTEl}k|F=KIXK z&wzuMr`SLps_X`Rog8pKPOl9_XFLJ|>+v1_kM+4O+Yifppm|pw*t>V{CWy!m=;_5F z-sC-g+%PXB|xB;mU3@T^66b?%qL-f-ye2@lxNBvQ{Z+ zFf7wE$VOp@fX2E~Mf=9a6s0C$+F=+}W+*gW+}&dlh)z8)VT|Db z#-TEHc(67G*ETxS=mDw0w(s0o3lJ*I2!B|UF0|5{!H|U2C=!bnhFAbxdeGAH-kn|E ztvxj7_fJ=o-9lYGnmq?Y3)z~KMu(3Wq6Rf^wDJyBBSirTs9L+9O97bjp|Uo}YH7+c zQ6TxI(Q#&GX81i@Rvnn+G%VT4GiPKlg>d4-{O`ge5oLV7v%`|Gs`BFQku(KeiQx^g z-8m+x08Cw6@*bJf&EVgc0)2xv5btnHRvedozUWIJns$fur8sX-*ikD$-kb4S2y}@0r|Oy- z4$9xwZX#;m*dMSZ7qEjb2%lJ%PS_KVXzg|XX`vcJ2yzh_u_0Yom=ACoGCeiS?9gGz zBdu>z`uko)L(%7hm_Q`}%1_E3ga&SYes&Bc;N2EZ-qT*RWH$?OqTl|)$%XGM?mx0^& zChPaan3yZN8P%CXo{jrozlIYU`!9khB7&&_8B^yh171}aOY@JL>8aagf3}NLw=u(8 zw#hMu!rR|0yY*NeCMFO{X=xrR6HEw0C7PMw?RJSqp>XJszY(COe!m($9ZYJx3`R|f z-GH?ktP*GBX|N{?O*X1?ZIK0gobJTJ9k~}d9v^pO(Ovv*D>ScDu4;M?)bOwH0!?N$ zg|~Xl(`6z{1DDjv_D^#bWvE*Gt^p-7)>68W8>Dx#F%u=W1*k zN4O>{jy^79^^n+(nCyh4BxE_evX&Npg!QjeQ&pH(%IS2W;oe?|-@?s3wY53$sxZc9 z%gKBg2eYC+Sf7}DB&!kSI7VA6N2is4YPEs2jqJgVMqFD|>i7=l*6Qpvo;JR!x?SBy z6_9)(N)==mVSoYn2P=TFdOv==(q`dz;uxelKE$5`k=0l97*e30CHnJj_%;YafROS~| zimtBWa%J=hSoXjYoi0~=ejt=`6aUoq@+vyy1D*S+j{G$D9SRD1m|t=qLbh8;Nx4QI zjVT%|dqDWgoP7ymi{1Z@if4Yg7uZ-q3NBBI(cRm(rLkQ+D^_I4Vay3Xg-1x}VPWFx zeOE(3i-6;hZrgS7>pjX%fWMLEovuSSDO{A!A_)l`mcEgZhlPwkXb6MYnqp&^#W684 z$LU^&AN76q4=ZDL~LOGnPP{_0WL zLWf2&-l!6ZpkN$+=0y~!+;t!5B8Ff5M76(}m>7-^ZSu?s`$8GgiDPr}>u*BJ_omS| zE#wqp0|-dWCjtkyJoO1heXQcIE?T;{r%RDouIduejlNoR*tI+Cqo!+1SxY$anOmZ z%Nm>=5jJF6=G+Tv-+NRhyWaIy_M| zu>LeY9)S{=9|J59EbXpS%g>###_2vC@$mD$HfYyj)m)6cK<-Dw(o|ZiP zE;tuUdx}t4a3j)h>3orxD7w|L^U8y}UK|Kdl>nQK=wo2umAJh8GR&i>=%#yr1Jf+T zsOqZ>%k!Rg!nlkwBsHwrk538CT%w}cll!wLs`<>*LC<3PW&F`h%o!CZI&fX35eneOmysUWm!KA`b1oN??%#FW^T z*dvch&xhPV$KxPyd>ZmcXFgaaLMe~@{3fnGEdz#TtUrAz-4~7x4zV$2m9X=W<0Q;K z+>~4K@fk+G=(nM;j70%<4HtkAEVEN%GSGZBc&oHC5Or%kRGinfs*C{GC4yDqG zYp#Rlm_QJE*Nwt%uY(wYBK=t_rE2I)_4{ii*0KCqmuH4Qna`~Nq6LuZwOm{j!B00} zW*zKJq-}(I+K$eE&-Ey-NNNJGLTk>6FmPoV#$HkRb=~M^_kQ|xbzAqvmqXhO)u)BM zy}e1O1J}wzz#7QdfT7>XU|3?^%g{R$^Zk+RJ`xiY#10h0Dl9B)R1g8oBlHybg9IF9 z+rNB#)@xQntTO=B6%PwEjO&er2q4Z O<$%T^^&B``) +* Attaching to a process that is currently running (analogous to ``gdb -p ``) * This mode is activated via ``-p `` * Same caveats as ``omnitrace-instrument`` with respect to memory and overhead @@ -514,7 +514,7 @@ were available for instrumentation, which functions were instrumented, which functions were excluded, and which functions contained overlapping function bodies. The default output path of these files will be in a ``omnitrace--output`` folder where ```` is the base name of the targeted binary or -(in the case of binary rewrite, the basename of the resulting executable), e.g. +(in the case of binary rewrite, the base name of the resulting executable), e.g. ``omnitrace-instrument -- ls`` will output its files to ``omnitrace-ls-output`` whereas ``omnitrace-instrument -o ls.inst -- ls`` will output to ``omnitrace-ls.inst-output``. diff --git a/docs/how-to/performing-causal-profiling.rst b/docs/how-to/performing-causal-profiling.rst new file mode 100644 index 000000000..6e4682ada --- /dev/null +++ b/docs/how-to/performing-causal-profiling.rst @@ -0,0 +1,618 @@ +.. meta:: + :description: Omnitrace documentation and reference + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD + +**************************************************** +Performing causal profiling +**************************************************** + +The process of causal profiling can be summarized as: + +*If you speed up a given block of code by X%, the application will execute Y% faster*. + +Causal profiling directs parallel application developers to where they should focus their optimization +efforts by quantifying the potential impact of optimizations. Causal profiling is rooted in the concept +that *software execution speed is relative*: speeding up a block of code by X% is mathematically equivalent +to that block of code running at its current speed if all the other code running slower by X%. +Thus, causal profiling works by performing experiments on blocks of code during program execution which +insert pauses to slow down all other concurrently running code. During post-processing, these experiments +are translated into calculations for the potential impact of speeding up this block of code. + +Consider the following C++ code executing ``foo`` and ``bar`` concurrently in two different threads +where ``foo`` is 30% faster than ```bar``` (ideally): + +.. code-block:: cpp + + #include + #include + constexpr size_t FOO_N = 7 * 1000000000UL; + constexpr size_t BAR_N = 10 * 1000000000UL; + + void foo() + { + for(volatile size_t i = 0; i < FOO_N; ++i) {} + } + + void bar() + { + for(volatile size_t i = 0; i < BAR_N; ++i) {} + } + + int main() + { + std::thread _threads[] = { std::thread{ foo }, + std::thread{ bar } }; + + for(auto& itr : _threads) + itr.join(); + } + +No matter how many optimizations are applied to ``foo``, the application will always +require the same amount of time +because the end-to-end performance is limited by ``bar``. However, a 5% speed-up +in ``bar`` will result in the +end-to-end performance improving by 5% and this trend will continue linearly (10% speed-up +in ``bar`` yields 10% speed-up in +end-to-end performance, and so on) up to 30% speed-up, at which point, ``bar`` executes as fast as ``foo``; +any speed-up to ``bar`` beyond 30% will still only yield an end-to-end performance +speed-up of 30% since the application +will be limited by performance of ``foo``, as demonstrated below in the causal +profiling visualization: + +.. image:: ../data/causal-foobar.png + :alt: Visualization of the performance improvements for two functions with causal profiling + +The full details of the causal profiling methodology can be found in the paper +`Coz: Finding Code that Counts with Causal Profiling `_. +The author's implementation is publicly available on `GitHub `_. + +Getting started +======================================== + +To effectively use causal profiling, it is important to understand a few key +concepts, such as progress points. + +Progress points +----------------------------------- + +Causal profiling requires "progress points" to track progress through the code +in between samples. Progress points must be triggered deterministically via instrumentation. +This can happen in three different ways: + +* `Omnitrace `_ can leverage the callbacks from + Kokkos-Tools, OpenMP-Tools, roctracer, etc. and the wrappers around functions for + MPI, NUMA, RCCL, etc. to act as progress points +* Users can leverage the :doc:`runtime instrumentation capabilities <./instrumenting-rewriting-binary-application>` + to insert progress points +* Users can leverage the :doc:`User API <../reference/using-omnitrace-display-api>`, + for example ``OMNITRACE_CAUSAL_PROGRESS`` + +.. note:: + + Binary rewrite to insert progress points is not supported. When a rewritten binary + is executed, Dyninst translates the instruction pointer address in order to execute + the instrumentation. As a result, call-stack samples never return instruction + pointer addresses in the ranges defined as valid by Omnitrace. + +Key concepts +----------------------------------- + ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Concept | Setting | Options | Description | ++==================+=====================================+==================================+============================================+ +| Backend | ``OMNITRACE_CAUSAL_BACKEND`` | ``perf``, ``timer`` | Backend for recording samples required | +| | | | to calculate the virtual speed-up | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Mode | ``OMNITRACE_CAUSAL_MODE`` | ``function``, ``line`` | Select entire function or individual | +| | | | line of code for causal experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| End-to-end | ``OMNITRACE_CAUSAL_END_TO_END`` | boolean | Perform a single experiment during the | +| | | | entire run (does not require | +| | | | progress-points) | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Fixed speed-up | ``OMNITRACE_CAUSAL_FIXED_SPEEDUP`` | one or more values from [0, 100] | Virtual speed-up or pool of virtual | +| | | | speed-ups to randomly select | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Binary scope | ``OMNITRACE_CAUSAL_BINARY_SCOPE`` | regular expression(s) | Dynamic binaries containing code for | +| | | | experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Source scope | ``OMNITRACE_CAUSAL_SOURCE_SCOPE`` | regular expression(s) | ```` and/or ``:`` | +| | | | containing code to include in experiments | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ +| Function scope | ``OMNITRACE_CAUSAL_FUNCTION_SCOPE`` | regular expression(s) | Restricts experiments to matching | +| | | | functions (function mode) or lines of | +| | | | code within matching functions (line mode) | ++------------------+-------------------------------------+----------------------------------+--------------------------------------------+ + +.. note:: + + * Binary scope defaults to ``%MAIN%`` (executable), but the scope can be expanded to include linked libraries. + * ```` and ``:`` support requires debug info (i.e. code was compiled with ``-g`` or, preferably, ``-g3``) + * Function mode does not require debug info but does not support stripped binaries + +Backends +----------------------------------- + +Both causal profiling backends interrupt each thread 1000x per second of CPU-time to apply virtual speed-ups. +The difference between the backends is how the samples which are responsible calculating +the virtual speed-up are recorded. +There are 3 key differences between the two backends: + +* ``perf`` backend requires Linux Perf and elevated security priviledges +* ``perf`` backend interrupts the application less frequently whereas the ``timer`` backend + will interrupt the application 1000x per second of realtime +* ``timer`` backend has less accurate call-stacks due to instruction pointer skid + +In general, the ``perf`` is preferred over the ``timer`` backend when sufficient +security priviledges permit its usage. +If ``OMNITRACE_CAUSAL_BACKEND`` is set to ``auto``, Omnitrace will fallback +to using the ``timer`` backend only if +using the ``perf`` backend fails; if ``OMNITRACE_CAUSAL_BACKEND`` is +set to ``perf`` and using this backend fails, Omnitrace +will abort. + +Instruction pointer skid +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Instruction pointer (IP) skid is how many instructions execute between an event of interest +happening and where the IP is when the kernel is able to stop the application. +For the ``timer`` backend, this translates to the +difference between when the IP when the timer generated a signal and the IP when the +signal was actually generated. Although IP skid does still occur with the ``perf`` backend, +the overhead of pausing the entire thread with the ``timer`` backend makes this much more pronounced +and, as such, the ``timer`` backend tends to have a lower resolution than the ``perf`` backend, +especially in ``line`` mode. + +Installing Linux Perf +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Linux Perf is built into the kernel and may already be installed +(e.g., included in the default kernel for OpenSUSE). +The official method of checking whether Linux Perf is installed is +checking for the existence of the file +``/proc/sys/kernel/perf_event_paranoid`` -- if the file exists, the kernel has Perf installed. + +If this file does not exist, on Debian-based systems like Ubuntu, install (as superuser): + +.. code-block:: shell + + apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r) + +and reboot your computer. In order to use the ``perf`` backend, the value +of ``/proc/sys/kernel/perf_event_paranoid`` +should be <= 2. If the value in this file is greater than 2, you will likely be +unable to use the perf backend. + +To update the paranoid level temporarily (until the system is rebooted), run +one of the following methods +as a superuser (where ``PARANOID_LEVEL=`` with ```` in the range ``[-1, 2]``): + +.. code-block:: shell + + echo ${PARANOID_LEVEL} | sudo tee /proc/sys/kernel/perf_event_paranoid + sysctl kernel.perf_event_paranoid=${PARANOID_LEVEL} + +To make the paranoid level persistent after a reboot, add ``kernel.perf_event_paranoid=`` +(where ```` is the desired paranoid level) to the ``/etc/sysctl.conf`` file. + +Speed-up prediction variability and the ``omnitrace-causal`` executable +----------------------------------------------------------------------- + +Causal profiling typically require executing the application several times in +order to adequately sample all the domains of executing code, experiment +speed-ups, etc. and resolve statistical fluctuations. +The ``omnitrace-causal`` executable is designed to simplify running this procedure: + +.. code-block:: shell + + $ omnitrace-causal --help + [omnitrace-causal] Usage: ./bin/omnitrace-causal [ --help (count: 0, dtype: bool) + --version (count: 0, dtype: bool) + --monochrome (max: 1, dtype: bool) + --debug (max: 1, dtype: bool) + --verbose (count: 1) + --config (min: 0, dtype: filepath) + --launcher (count: 1, dtype: executable) + --generate-configs (min: 0, dtype: folder) + --no-defaults (min: 0, dtype: bool) + --mode (count: 1, dtype: string) + --output-name (min: 1, dtype: filename) + --reset (max: 1, dtype: bool) + --end-to-end (max: 1, dtype: bool) + --wait (count: 1, dtype: seconds) + --duration (count: 1, dtype: seconds) + --iterations (count: 1, dtype: int) + --speedups (min: 0, dtype: integers) + --binary-scope (min: 0, dtype: integers) + --source-scope (min: 0, dtype: integers) + --function-scope (min: 0, dtype: regex-list) + --binary-exclude (min: 0, dtype: integers) + --source-exclude (min: 0, dtype: integers) + --function-exclude (min: 0, dtype: regex-list) + ] + + Causal profiling usually requires multiple runs to reliably resolve the speedup estimates. + This executable is designed to streamline that process. + For example (assume all commands end with \'-- \'): + + omnitrace-causal -n 5 -- # runs 5x with causal profiling enabled + + omnitrace-causal -s 0 5,10,15,20 # runs 2x with virtual speedups: + # - 0 + # - randomly selected from 5, 10, 15, and 20 + + omnitrace-causal -F func_A func_B func_(A|B) # runs 3x with the function scope limited to: + # 1. func_A + # 2. func_B + # 3. func_A or func_B + General tips: + - Insert progress points at hotspots in your code or use omnitrace\'s runtime instrumentation + - Note: binary rewrite will produce a incompatible new binary + - Run omnitrace-causal in "function" mode first (does not require debug info) + - Run omnitrace-causal in "line" mode when you are targeting one function (requires debug info) + - Preferably, use predictions from the "function" mode to determine which function to target + - Limit the virtual speedups to a smaller pool, e.g., 0,5,10,25,50, to get reliable predictions quicker + - Make use of the binary, source, and function scope to limit the functions/lines selected for experiments + - Note: source scope requires debug info + + + Options: + -h, -?, --help Shows this page + --version Prints the version and exit + + [DEBUG OPTIONS] + + --monochrome Disable colorized output + --debug Debug output + -v, --verbose Verbose output + + [GENERAL OPTIONS] + + -c, --config Base configuration file + -l, --launcher When running MPI jobs, omnitrace-causal needs to be *before* the executable which launches the MPI processes (i.e. + before `mpirun`, `srun`, etc.). Pass the name of the target executable (or a regex for matching to the name of the + target) for causal profiling, e.g., `omnitrace-causal -l foo -- mpirun -n 4 foo`. This ensures that the omnitrace + library is LD_PRELOADed on the proper target + -g, --generate-configs Generate config files instead of passing environment variables directly. If no arguments are provided, the config files + will be placed in ${PWD}/omnitrace-causal-config folder + --no-defaults Do not activate default features which are recommended for causal profiling. For example: PID-tagging of output files + and timestamped subdirectories are disabled by default. Kokkos tools support is added by default + (OMNITRACE_USE_KOKKOSP=ON) because, for Kokkos applications, the Kokkos-Tools callbacks are used for progress points. + Activation of OpenMP tools support is similar + + [CAUSAL PROFILING OPTIONS (General)] + (These settings will be applied to all causal profiling runs) + + -m, --mode [ function (func) | line ] + Causal profiling mode + -o, --output-name Output filename of causal profiling data w/o extension + -r, --reset Overwrite any existing experiment results during the first run + -e, --end-to-end Single causal experiment for the entire application runtime + -w, --wait Set the wait time (i.e. delay) before starting the first causal experiment (in seconds) + -d, --duration Set the length of time (in seconds) to perform causal experimentationafter the first experiment is started. Once this + amount of time has elapsed, no more causal experiments will be started but any currently running experiment will be + allowed to finish. + -n, --iterations Number of times to repeat the combination of run configurations + + [CAUSAL PROFILING OPTIONS (Combinatorial)] + (Each individual argument to these options will multiply the number runs by the number of arguments and the number of + iterations. E.g. -n 2 -B "MAIN" -F "foo" "bar" will produce 4 runs: 2 iterations x 1 binary scope x 2 function scopes + (MAIN+foo, MAIN+bar, MAIN+foo, MAIN+bar)) + + -s, --speedups Pool of virtual speedups to sample from during experimentation. Each space designates a group and multiple speedups can + be grouped together by commas, e.g. -s 0 0,10,20-50 is two groups: group #1 is \'0\' and group #2 is \'0 10 20 25 30 35 40 + 45 50\' + -B, --binary-scope Restricts causal experiments to the binaries matching the list of regular expressions. Each space designates a group + and multiple scopes can be grouped together with a semi-colon + -S, --source-scope Restricts causal experiments to the source files or source file + lineno pairs (i.e. or :) matching + the list of regular expressions. Each space designates a group and multiple scopes can be grouped together with a + semi-colon + -F, --function-scope Restricts causal experiments to the functions matching the list of regular expressions. Each space designates a group + and multiple scopes can be grouped together with a semi-colon + -BE, --binary-exclude Excludes causal experiments from being performed on the binaries matching the list of regular expressions. Each space + designates a group and multiple excludes can be grouped together with a semi-colon + -SE, --source-exclude Excludes causal experiments from being performed on the code from the source files or source file + lineno pair (i.e. + or :) matching the list of regular expressions. Each space designates a group and multiple excludes + can be grouped together with a semi-colon + -FE, --function-exclude Excludes causal experiments from being performed on the functions matching the list of regular expressions. Each space + designates a group and multiple excludes can be grouped together with a semi-colon + +Examples +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: shell + + #!/bin/bash -e + + module load omnitrace + + N=20 + I=3 + + # when providing speedups to omnitrace-causal, speedup + # groups are separated by a space so "0,10" results in + # one speedup group where omnitrace samples from + # the speedup set of {0, 10}. Passing "0 10" (without + # quotes to omnitrace-causal multiplies the + # number of runs by 2, where the first half of the + # runs instruct omnitrace to only use 0 as the + # speedup and the second half of the runs instruct + # omnitrace to only use 10 as the speedup. + SPEEDUPS="0,0,0,10,20,30,40,50,50,75,75,75,90,90,90" + # thus, -s ${SPEEDUPS} only multiplies the number + # of runs by 1 whereas -S ${SPEEDUPS_E2E} multiplies + # the number of runs by 15: + # - 3 runs with speedup of 0 + # - 1 run for each of the speedups 10, 20, 30, and 40 + # - 2 runs with speedup of 50 + # - 3 runs with speedup of 75 + # - 3 runs with speedup of 90 + SPEEDUPS_E2E=$(echo "${SPEEDUPS}" | sed \'s/,/ /g\') + + + # 20 iterations in function mode with 1 speedup group + # and source scope set to .cpp files + # + # outputs to files: + # - causal/experiments.func.coz + # - causal/experiments.func.json + # + # total executions: 20 + # + omnitrace-causal \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m function \ + -o experiments.func \ + -S ".*\\.cpp" \ + -- \ + ./causal-omni-cpu "${@}" + + + # 20 iterations in line mode with 1 speedup group + # and source scope restricted to lines 100 and 110 + # in the causal.cpp file. + # + # outputs to files: + # - causal/experiments.line.coz + # - causal/experiments.line.json + # + # total executions: 20 + # + omnitrace-causal \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line \ + -S "causal\\.cpp:(100|110)" \ + -- \ + ./causal-omni-cpu "${@}" + + + # 3 iterations in function mode of 15 singular speedups + # in end-to-end mode with 2 different function scopes + # where one is restricted to "cpu_slow_func" and + # another is restricted to "cpu_fast_func". + # + # outputs to files: + # - causal/experiments.func.e2e.coz + # - causal/experiments.func.e2e.json + # + # total executions: 90 + # + omnitrace-causal \ + -n ${I} \ + -s ${SPEEDUPS_E2E} \ + -m func \ + -e \ + -o experiments.func.e2e \ + -F "cpu_slow_func" \ + "cpu_fast_func" \ + -- \ + ./causal-omni-cpu "${@}" + + # 3 iterations in line mode of 15 singular speedups + # in end-to-end mode with 2 different source scopes + # where one is restricted to line 100 in causal.cpp + # and another is restricted to line 110 in causal.cpp. + # + # outputs to files: + # - causal/experiments.line.e2e.coz + # - causal/experiments.line.e2e.json + # + # total executions: 90 + # + omnitrace-causal \ + -n ${I} \ + -s ${SPEEDUPS_E2E} \ + -m line \ + -e \ + -o experiments.line.e2e \ + -S "causal\\.cpp:100" \ + "causal\\.cpp:110" \ + -- \ + ./causal-omni-cpu "${@}" + + + export OMP_NUM_THREADS=8 + export OMP_PROC_BIND=spread + export OMP_PLACES=threads + + # set number of iterations to 5 + N=5 + + # 5 iterations in function mode of 1 speedup + # group with the source scope restricted + # to files containing "lulesh" in their filename + # and exclude functions which start with "Kokkos::" + # or "std::enable_if". + # + # outputs to files: + # - causal/experiments.func.coz + # - causal/experiments.func.json + # + # total executions: 5 + # + # First of 5 executions overwrites any + # existing causal/experiments.func.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m func \ + -o experiments.func \ + -S "lulesh.*" \ + -FE "^(Kokkos::|std::enable_if)" \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + + + # 5 iterations in line mode of 1 speedup + # group with the source scope restricted + # to files containing "lulesh" in their filename + # and exclude functions which start with "exec_range" + # or "execute" and which contain either + # "construct_shared_allocation" or "._omp_fn." in + # the function name. + # + # outputs to files: + # - causal/experiments.line.coz + # - causal/experiments.line.json + # + # total executions: 5 + # + # First of 5 executions overwrites any + # existing causal/experiments.line.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line \ + -S "lulesh.*" \ + -FE "^(exec_range|execute);construct_shared_allocation;\\._omp_fn\\." \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + + + # 5 iterations in line mode of 1 speedup + # group with the source scope restricted + # to files whose basename is "lulesh.cc" + # for 3 different functions: + # - ApplyMaterialPropertiesForElems + # - CalcHourglassControlForElems + # - CalcVolumeForceForElems + # + # outputs to files: + # - causal/experiments.line.targeted.coz + # - causal/experiments.line.targeted.json + # + # total executions: 15 + # + # First of 5 executions overwrites any + # existing causal/experiments.line.(coz|json) + # file due to "--reset" argument + # + omnitrace-causal \ + --reset \ + -n ${N} \ + -s ${SPEEDUPS} \ + -m line \ + -o experiments.line.targeted \ + -F "ApplyMaterialPropertiesForElems" \ + "CalcHourglassControlForElems" \ + "CalcVolumeForceForElems" \ + -S "lulesh\\.cc" \ + -- \ + ./lulesh-omni -i 50 -s 200 -r 20 -b 5 -c 5 -p + +Using ``omnitrace-causal`` with other launchers like ``mpirun`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``omnitrace-causal`` executable is intended to assist with application replay +and is designed to always be at the start of the command-line (i.e. the primary process). +``omnitrace-causal`` typically adds a ``LD_PRELOAD`` of the Omnitrace libraries +into the environment before launching the command in order to inject the functionality +required to start the causal profiling tooling. However, this is problematic +when the target application for causal profiling requires another command-line +tool in order to run, e.g. ``foo`` is the target application but executing ``foo`` +requires ``mpirun -n 2 foo``. If one were to simply do ``omnitrace-causal -- mpirun -n 2 foo``, +then the causal profiling would be applied to ``mpirun`` instead of ``foo``. +``omnitrace-causal`` remedies this by providing a command-line option ``-l` / `--launcher`` +to indicate the target application is using a launcher script/executable. The +argument to the command-line option is the name of (or regex for) the target application +on the command-line. When ``--launcher`` is used, ``omnitrace-causal`` will generate +all the replay configurations and execute them but delay adding the ``LD_PRELOAD``, instead it +will inject a call to itself into the command-line right before the target +application. This recursive call to itself will inherit the configuration from +parent ``omnitrace-causal`` executable, insert an ``LD_PRELOAD`` into the environment, +and then invoke an ``execv`` to replace itself with the new process launched by the target +application. + +In other words, the following command: + +.. code-block:: shell + + omnitrace-causal -l foo -n 3 -- mpirun -n 2 foo` + +Effectively results in: + +.. code-block:: shell + + mpirun -n 2 omnitrace-causal -- foo + mpirun -n 2 omnitrace-causal -- foo + mpirun -n 2 omnitrace-causal -- foo + +Visualizing the causal output +------------------------------------------------------------------------- + +Omnitrace generates a ``causal/experiments.json`` and ``causal/experiments.coz`` in +``${OMNITRACE_OUTPUT_PATH}/${OMNITRACE_OUTPUT_PREFIX}``. A standalone GUI for viewing the causal profiling +results in under development but until this is available, visit +`plasma-umass.org/coz `_ and open the ``*.coz`` file. + +Omnitrace versus Coz +======================================= + +This comparison is intended for readers who are familiar with the +`Coz profiler `_. +Omnitrace provides several additional features and utilities for causal profiling: + +.. csv-table:: + :header: "Feature", "Coz", "Omnitrace", "Notes" + :widths: 20, 60, 60, 30 + + "Debug info", "requires debug info in DWARF v3 format (``-gdwarf-3``)", "optional, supports any DWARF format version", "See Note #1 below" + "Experiment selection", "``:``", "```` or ``:``", "See Note #2 below" + "Experiment speed-ups", "Randomly samples b/t 0..100 in increments of 5 or one fixed speed-up", "Supports specifying smaller subset", "See Note #3 below" + "Scope options", "Supports binary and source scopes", "Supports binary, source, and function scopes", "See Note #4, #5, and #6 below" + "Scope inclusion", "Uses ``%`` as wildcard for binary and source scopes", "Full regex support for binary, source, and function scopes", "" + "Scope exclusion", "Not supported", "Supports regexes for excluding binary/source/function", "See Note #7 below" + "Call-stack sampling", "Linux perf", "Linux perf, libunwind", "See Note #8 below" + +.. note:: + + #. Omnitrace supports a "function" mode which does not require debug info + #. Omnitrace supports selecting entire range of instruction pointers for a function instead + of instruction pointer for one line. In large codes, "function" mode + can resolve in fewer iterations and once a target function is identified, one can + switch to line mode and limit the function scope to the target function + #. Omnitrace supports randomly sampling from subsets, e.g. { 0, 0, 5, 10 } + where 0% is randomly selected 50% of time and 5% and 10% are randomly selected 25% of the time + #. Omnitrace and COZ have same definition for binary scope: the binaries + loaded at runtime (e.g. executable and linked libraries) + #. Omnitrace "source scope" supports both ```` and ``:`` formats + in contrast to COZ "source scope" which requires ``:`` format + #. Omnitrace supports a "function" scope which narrows the functions/lines + which are eligible for causal experiments to those within the matching functions + #. Omnitrace supports a second filter on scopes for removing binary/source/function + caught by inclusive match, e.g. ``BINARY_SCOPE=.*`` + ``BINARY_EXCLUDE=libmpi.*`` + initially includes all binaries but exclude regex removes MPI libraries + #. In Omnitrace, the Linux perf backend is preferred over use libunwind. However, + Linux perf usage can be restricted for security reasons. + Omnitrace will fallback to using a second POSIX timer and libunwind if + Linux perf is not available. diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index cc199b48d..6dae2c560 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -24,7 +24,9 @@ subtrees: - file: how-to/sampling-call-stack.rst title: Sampling the call stack - file: how-to/instrumenting-rewriting-binary-application.rst - title: Instrumenting and rewriting a binary application + title: Instrumenting and rewriting a binary application + - file: how-to/performing-causal-profiling.rst + title: Performing causal profiling - caption: Conceptual entries: