-
Notifications
You must be signed in to change notification settings - Fork 0
/
sloccount.html
2466 lines (2409 loc) · 96.3 KB
/
sloccount.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>SLOCCount User's Guide</title>
</head>
<body bgcolor="#FFFFFF">
<center>
<font size="+3"><b><span class="title">SLOCCount User's Guide</span></b></font>
<br>
<font size="+2"><span class="author">by David A. Wheeler (dwheeler, at, dwheeler.com)</span></font>
<br>
<font size="+2"><span class="pubdate">August 1, 2004</span></font>
<br>
<font size="+2"><span class="version">Version 2.26</span></font>
</center>
<p>
<h1><a name="introduction">Introduction</a></h1>
<p>
SLOCCount (pronounced "sloc-count") is a suite of programs for counting
physical source lines of code (SLOC) in potentially large software systems.
Thus, SLOCCount is a "software metrics tool" or "software measurement tool".
SLOCCount was developed by David A. Wheeler,
originally to count SLOC in a GNU/Linux distribution, but it can be
used for counting the SLOC of arbitrary software systems.
<p>
SLOCCount is known to work on Linux systems, and has been tested
on Red Hat Linux versions 6.2, 7, and 7.1.
SLOCCount should run on many other Unix-like systems (if Perl is installed),
in particular, I would expect a *BSD system to work well.
Windows users can run sloccount by first installing
<a href="http://sources.redhat.com/cygwin">Cygwin</a>.
SLOCCount is much slower on Windows/Cygwin, and it's not as easy to install
or use on Windows, but it works.
Of course, feel free to upgrade to an open source Unix-like system
(such as Linux or *BSD) instead :-).
<p>
SLOCCount can count physical SLOC for a wide number of languages.
Listed alphabetically, they are
Ada, Assembly (for many machines and assemblers),
awk (including gawk and nawk),
Bourne shell (and relatives such as bash, ksh, zsh, and pdksh),
C, C++, C# (also called C-sharp or cs), C shell (including tcsh),
COBOL, Expect, Fortran (including Fortran 90), Haskell,
Java, lex (including flex),
LISP (including Scheme),
makefiles (though they aren't usually shown in final reports),
Modula3, Objective-C, Pascal, Perl, PHP, Python, Ruby, sed,
SQL (normally not shown),
TCL, and Yacc.
It can gracefully handle awkward situations in many languages,
for example, it can determine the
syntax used in different assembly language files and adjust appropriately,
it knows about Python's use of string constants as comments, and it
can handle various Perl oddities (e.g., perlpods, here documents,
and Perl's _ _END_ _ marker).
It even has a "generic" SLOC counter that you may be able to use count the
SLOC of other languages (depending on the language's syntax).
<p>
SLOCCount can also take a large list of files and automatically categorize
them using a number of different heuristics.
The heuristics automatically determine if a file
is a source code file or not, and if so, which language it's written in.
For example,
it knows that ".pc" is usually a C source file for an Oracle preprocessor,
but it can detect many circumstances where it's actually a file about
a "PC" (personal computer).
For another example, it knows that ".m" is the standard extension for
Objective-C, but it will check the file contents to
see if really is Objective-C.
It will even examine file headers to attempt to accurately determine
the file's true type.
As a result, you can analyze large systems completely automatically.
<p>
Finally, SLOCCount has some report-generating tools
to collect the data generated,
and then present it in several different formats and sorted different ways.
The report-generating tool can also generate simple tab-separated files
so data can be passed on to other analysis tools (such as spreadsheets
and database systems).
<p>
SLOCCount will try to quickly estimate development time and effort given only
the lines of code it computes, using the original Basic COCOMO model.
This estimate can be improved if you can give more information about the project.
See the
<a href="#cocomo">discussion below about COCOMO, including intermediate COCOMO</a>,
if you want to improve the estimates by giving additional information about
the project.
<p>
SLOCCount is open source software/free software (OSS/FS),
released under the GNU General Public License (GPL), version 2;
see the <a href="#license">license below</a>.
The master web site for SLOCCount is
<a href="http://www.dwheeler.com/sloccount">http://www.dwheeler.com/sloccount</a>.
You can learn a lot about SLOCCount by reading the paper that caused its
creation, available at
<a href="http://www.dwheeler.com/sloc">http://www.dwheeler.com/sloc</a>.
Feel free to see my master web site at
<a href="http://www.dwheeler.com">http://www.dwheeler.com</a>, which has
other material such as the
<a href="http://www.dwheeler.com/secure-programs"><i>Secure Programming
for Linux and Unix HOWTO</i></a>,
my <a href="http://www.dwheeler.com/oss_fs_refs.html">list of
OSS/FS references</a>, and my paper
<a href="http://www.dwheeler.com/oss_fs_why.html"><i>Why OSS/FS? Look at
the Numbers!</i></a>
Please send improvements by email
to dwheeler, at, dwheeler.com (DO NOT SEND SPAM - please remove the
commas, remove the spaces, and change the word "at" into the at symbol).
<p>
The following sections first give a "quick start"
(discussing how to use SLOCCount once it's installed),
discuss basic SLOCCount concepts,
how to install it, how to set your PATH,
how to install source code on RPM-based systems if you wish, and
more information on how to use the "sloccount" front-end.
This is followed by material for advanced users:
how to use SLOCCount tools individually (for when you want more control
than the "sloccount" tool gives you), designer's notes,
the definition of SLOC, and miscellaneous notes.
The last sections states the license used (GPL) and gives
hints on how to submit changes to SLOCCount (if you decide to make changes
to the program).
<p>
<h1><a name="quick-start">Quick Start</a></h1>
<p>
Once you've installed SLOCCount (discussed below),
you can measure an arbitrary program by typing everything
after the dollar sign into a terminal session:
<pre>
$ sloccount <i>topmost-source-code-directory</i>
</pre>
<p>
The directory listed and all its descendants will be examined.
You'll see output while it calculates,
culminating with physical SLOC totals and
estimates of development time, schedule, and cost.
If the directory contains a set of directories, each of which is
a different project developed independently,
use the "--multiproject" option so the effort estimations
can correctly take this into account.
<p>
You can redisplay the data different ways by using the "--cached"
option, which skips the calculation stage and re-prints previously
computed information.
You can use other options to control what's displayed:
"--filecount" shows counts of files instead of SLOC, and
"--details" shows the detailed information about every source code file.
So, to display all the details of every file once you've previously
calculated the results, just type:
<pre>
sloccount --cached --details
</pre>
<p>
You'll notice that the default output ends with a request.
If you use this data (e.g., in a report), please
credit that data as being "generated using 'SLOCCount' by David A. Wheeler."
I make no money from this program, so at least please give me some credit.
<p>
SLOCCount tries to ignore all automatically generated files, but its
heuristics to detect this are necessarily imperfect (after all, even humans
sometimes have trouble determining if a file was automatically genenerated).
If possible, try to clean out automatically generated files from
the source directories --
in many situations "make clean" does this.
<p>
There's more to SLOCCount than this, but first we'll need to
explain some basic concepts, then we'll discuss other options
and advanced uses of SLOCCount.
<p>
<h1><a name="concepts">Basic Concepts</a></h1>
<p>
SLOCCount counts physical SLOC, also called "non-blank, non-comment lines".
More formally, physical SLOC is defined as follows:
``a physical source line of code (SLOC) is a line ending
in a newline or end-of-file marker,
and which contains at least one non-whitespace non-comment character.''
Comment delimiters (characters other than newlines starting and ending
a comment) are considered comment characters.
Data lines only including whitespace
(e.g., lines with only tabs and spaces in multiline strings) are not included.
<p>
In SLOCCount, there are 3 different directories:
<ol>
<li>The "source code directory", a directory containing the source code
being measured
(possibly in recursive subdirectories). The directories immediately
contained in the source code directory will normally be counted separately,
so it helps if your system is designed so that this top set of directories
roughly represents the system's major components.
If it doesn't, there are various tricks you can use to group source
code into components, but it's more work.
You don't need write access to the source code directory, but
you do need read access to all files, and read and search (execute) access
to all subdirectories.
<li>The "bin directory", the directory containing the SLOCCount executables.
By default, installing the program creates a subdirectory
named "sloccount-VERSION" which is the bin directory.
The bin directory must be part of your PATH.
<li>The "data directory", which stores the analysis results.
When measuring programs using "sloccount", by default
this is the directory ".slocdata" inside your home directory.
When you use the advanced SLOCCount tools directly,
in many cases this must be your "current" directory.
Inside the data directory are "data directory children" - these are
subdirectories that contain a file named "filelist", and each child
is used to represent a different project or a different
major component of a project.
</ol>
<p>
SLOCCount can handle many different programming languages, and separate
them by type (so you can compare the use of each).
Here is the set of languages, sorted alphabetically;
common filename extensions are in
parentheses, with SLOCCount's ``standard name'' for the language
listed in brackets:
<ol>
<li>Ada (.ada, .ads, .adb, .pad) [ada]
<li>Assembly for many machines and assemblers (.s, .S, .asm) [asm]
<li>awk (.awk) [awk]
<li>Bourne shell and relatives such as bash, ksh, zsh, and pdksh (.sh) [sh]
<li>C (.c, .pc, .ec, .ecp) [ansic]
<li>C++ (.C, .cpp, .cxx, .cc, .pcc) [cpp]
<li>C# (.cs) [cs]
<li>C shell including tcsh (.csh) [csh]
<li>COBOL (.cob, .cbl, .COB, .CBL) [cobol]
<li>Expect (.exp) [exp]
<li>Fortran 77 (.f, .f77, .F, .F77) [fortran]
<li>Fortran 90 (.f90, .F90) [f90]
<li>Haskell (.hs, .lhs) [haskell]; deals with both types of literate files.
<li>Java (.java) [java]
<li>lex (.l) [lex]
<li>LISP including Scheme (.cl, .el, .scm, .lsp, .jl) [lisp]
<li>makefiles (makefile) [makefile]
<li>ML (.ml, .ml3) [ml]
<li>Modula3 (.m3, .mg, .i3, .ig) [modula3]
<li>Objective-C (.m) [objc]
<li>Pascal (.p, .pas) [pascal]
<li>Perl (.pl, .pm, .perl) [perl]
<li>PHP (.php, .php[3456], .inc) [php]
<li>Pig (.pig, .piglett) [pig]
<li>Python (.py) [python]
<li>Ruby (.rb) [ruby]
<li>sed (.sed) [sed]
<li>sql (.sql) [sql]
<li>TCL (.tcl, .tk, .itk) [tcl]
<li>Yacc (.y) [yacc]
</ol>
<p>
<h1><a name="installing">Installing SLOCCount</a></h1>
<p>
Obviously, before using SLOCCount you'll need to install it.
SLOCCount depends on other programs, in particular perl, bash,
a C compiler (gcc will do), and md5sum
(you can get a useful md5sum program in the ``textutils'' package
on many Unix-like systems), so you'll need to get them installed
if they aren't already.
<p>
If your system uses RPM version 4 or greater to install software
(e.g., Red Hat Linux 7 or later), just download the SLOCCount RPM
and install it using a normal installation command; from the text line
you can use:
<pre>
rpm -Uvh sloccount*.rpm
</pre>
<p>
Everyone else will need to install from a tar file, and Windows users will
have to install Cygwin before installing sloccount.
<p>
If you're using Windows, you'll need to first install
<a href="http://sources.redhat.com/cygwin">Cygwin</a>.
By installing Cygwin, you'll install an environment and a set of
open source Unix-like tools.
Cygwin essentially creates a Unix-like environment in which sloccount can run.
You may be able to run parts of sloccount without Cygwin, in particular,
the perl programs should run in the Windows port of Perl, but you're
on your own - many of the sloccount components expect a Unix-like environment.
If you want to install Cygwin, go to the
<a href="http://sources.redhat.com/cygwin">Cygwin main page</a>
and install it.
If you're using Cygwin, <b>install it to use Unix newlines, not
DOS newlines</b> - DOS newlines will cause odd errors in SLOCCount
(and probably other programs, too).
I have only tested a "full" Cygwin installation, so I suggest installing
everything.
If you're short on disk space, at least install
binutils, bash, fileutils, findutils,
gcc, grep, gzip, make, man, perl, readline,
sed, sh-utils, tar, textutils, unzip, and zlib;
you should probably install vim as well,
and there may be other dependencies as well.
By default Cygwin will create a directory C:\cygwin\home\NAME,
and will set up the ability to run Unix programs
(which will think that the same directory is called /home/NAME).
Now double-click on the Cygwin icon, or select from the Start menu
the selection Programs / Cygnus Solutions / Cygwin Bash shell;
you'll see a terminal screen with a Unix-like interface.
Now follow the instructions (next) for tar file users.
<p>
If you're installing from the tar file, download the file
(into your home directory is fine).
Unpacking the file will create a subdirectory, so if you want the
unpacked subdirectory to go somewhere special, "cd" to where you
want it to go.
Most likely, your home directory is just fine.
Now gunzip and untar SLOCCount (the * replaces the version #) by typing
this at a terminal session:
<pre>
gunzip -c sloccount*.tar.gz | tar xvf -
</pre>
Replace "sloccount*.tar.gz" shown above
with the full path of the downloaded file, wherever that is.
You've now created the "bin directory", which is simply the
"sloccount-VERSION" subdirectory created by the tar command
(where VERSION is the version number).
<p>
Now you need to compile the few compiled programs in the "bin directory" so
SLOCCount will be ready to go.
First, cd into the newly-created bin directory, by typing:
<pre>
cd sloccount*
</pre>
<p>
You may then need to override some installation settings.
You can can do this by editing the supplied makefile, or alternatively,
by providing options to "make" whenever you run make.
The supplied makefile assumes your C compiler is named "gcc", which
is true for most Linux systems, *BSD systems, and Windows systems using Cygwin.
If this isn't true, you'll need to set
the "CC" variable to the correct value (e.g., "cc").
You can also modify where the files are stored; this variable is
called PREFIX and its default is /usr/local
(older versions of sloccount defaulted to /usr).
<p>
If you're using Windows and Cygwin, you
<b>must</b> override one of the installation
settings, EXE_SUFFIX, for installation to work correctly.
One way to set this value is to edit the "makefile" file so that
the line beginning with "EXE_SUFFIX" reads as follows:
<pre>
EXE_SUFFIX=.exe
</pre>
If you're using Cygwin and you choose to modify the "makefile", you
can use any text editor on the Cygwin side, or you can use a
Windows text editor if it can read and write Unix-formatted text files.
Cygwin users are free to use vim, for example.
If you're installing into your home directory and using the default locations,
Windows text editors will see the makefile as file
C:\cygwin\home\NAME\sloccount-VERSION\makefile.
Note that the Windows "Notepad" application doesn't work well, because it's not
able to handle Unix text files correctly.
Since this can be quite a pain, Cygus users may instead decide to override
make the makefile values instead during installation.
<p>
Finally, compile the few compiled programs in it by typing "make":
<pre>
make
</pre>
If you didn't edit the makefile in the previous step, you
need to provide options to make invocations to set the correct values.
This is done by simply saying (after "make") the name of the variable,
an equal sign, and its correct value.
Thus, to compile the program on a Windows system using Cygus, you can
skip modifying the makefile file by typing this instead of just "make":
<pre>
make EXE_SUFFIX=.exe
</pre>
<p>
If you want, you can install sloccount for system-wide use without
using the RPM version.
Windows users using Cygwin should probably do this, particularly
if they chose a "local" installation.
To do this, first log in as root (Cygwin users don't need to do this
for local installation).
Edit the makefile to match your system's conventions, if necessary,
and then type "make install":
<pre>
make install
</pre>
If you need to set some make options, remember to do that here too.
If you use "make install", you can uninstall it later using
"make uninstall".
Installing sloccount for system-wide use is optional;
SLOCCount works without a system-wide installation.
However, if you don't install sloccount system-wide, you'll need to
set up your PATH variable; see the section on
<a href="#path">setting your path</a>.
<p>
A note for Cygwin users (and some others): some systems, including Cygwin,
don't set up the environment quite right and thus can't display the manual
pages as installed.
The problem is that they forget to search /usr/local/share/man for
manual pages.
If you want to read the installed manual pages, type this
into a Bourne-like shell:
<pre>
MANPATH=/usr/local/share/man:/usr/share/man:/usr/man
export MANPATH
</pre>
Or, if you use a C shell:
<pre>
setenv MANPATH "/usr/local/share/man:/usr/share/man:/usr/man"
</pre>
From then on, you'll be able to view the reference manual pages
by typing "man sloccount" (or by using whatever manual page display system
you prefer).
<p>
<p>
<h1><a name="installing-source">Installing The Source Code To Measure</a></h1>
<p>
Obviously, you must install the software source code you're counting,
so somehow you must create the "source directory"
with the source code to measure.
You must also make sure that permissions are set so the software can
read these directories and files.
<p>
For example, if you're trying to count the SLOC for an RPM-based Linux system,
install the software source code by doing the following as root
(which will place all source code into the source directory
/usr/src/redhat/BUILD):
<ol>
<li>Install all source rpm's:
<pre>
mount /mnt/cdrom
cd /mnt/cdrom/SRPMS
rpm -ivh *.src.rpm
</pre>
<li>Remove RPM spec files you don't want to count:
<pre>
cd ../SPECS
(look in contents of spec files, removing what you don't want)
</pre>
<li>build/prep all spec files:
<pre>
rpm -bp *.spec
</pre>
<li>Set permissions so the source files can be read by all:
<pre>
chmod -R a+rX /usr/src/redhat/BUILD
</pre>
</ol>
<p>
Here's an example of how to download source code from an
anonymous CVS server.
Let's say you want to examine the source code in GNOME's "gnome-core"
directory, as stored at the CVS server "anoncvs.gnome.org".
Here's how you'd do that:
<ol>
<li>Set up site and login parameters:
<pre>
export CVSROOT=':pserver:[email protected]:/cvs/gnome'
</pre>
<li>Log in:
<pre>
cvs login
</pre>
<li>Check out the software (copy it to your local directory), using
mild compression to save on bandwidth:
<pre>
cvs -z3 checkout gnome-core
</pre>
</ol>
<p>
Of course, if you have a non-anonymous account, you'd set CVSROOT
to reflect this. For example, to log in using the "pserver"
protocol as ACCOUNT_NAME, do:
<pre>
export CVSROOT=':pserver:[email protected]:/cvs/gnome'
</pre>
<p>
You may need root privileges to install the source code and to give
another user permission to read it, but please avoid running the
sloccount program as root.
Although I know of no specific reason this would be a problem,
running any program as root turns off helpful safeguards.
<p>
Although SLOCCount tries to detect (and ignore) many cases where
programs are automatically generated, these heuristics are necessarily
imperfect.
So, please don't run any programs that generate other programs - just
do enough to get the source code prepared for counting.
In general you shouldn't run "make" on the source code, and if you have,
consider running "make clean" or "make really_clean" on the source code first.
It often doesn't make any difference, but identifying those circumstances
is difficult.
<p>
SLOCCount will <b>not</b> automatically uncompress files that are
compressed/archive files (such as .zip, .tar, or .tgz files).
Often such files are just "left over" old versions or files
that you're already counting.
If you want to count the contents of compressed files, uncompress them first.
<p>
SLOCCount also doesn't delve into files using "literate programming"
techniques, in part because there are too many incompatible formats
that implement it.
Thus, run the tools to extract the code from the literate programming files
before running SLOCCount. Currently, the only exception to this rule is
Haskell.
<h1><a name="path">Setting your PATH</a></h1>
Before you can run SLOCCount, you'll need to make sure
the SLOCCount "bin directory" is in your PATH.
If you've installed SLOCCount in a system-wide location
such as /usr/bin, then you needn't do more; the RPMs and "make install"
commands essentially do this.
<p>
Otherwise, in Bourne-shell variants, type:
<pre>
PATH="$PATH:<i>the directory with SLOCCount's executable files</i>"
export PATH
</pre>
Csh users should instead type:
<pre>
setenv PATH "$PATH:<i>the directory with SLOCCount's executable files</i>"
</pre>
<h1><a name="using-basics">Using SLOCCount: The Basics</a></h1>
Normal use of SLOCCount is very simple.
In a terminal window just type "sloccount", followed by a
list of the source code directories to count.
If you give it only a single directory, SLOCCount tries to be
a little clever and break the source code into
subdirectories for purposes of reporting:
<ol>
<li>if directory has at least
two subdirectories, then those subdirectories will be used as the
breakdown (see the example below).
<li>If the single directory contains files as well as directories
(or if you give sloccount some files as parameters), those files will
be assigned to the directory "top_dir" so you can tell them apart
from other directories.
<li>If there's a subdirectory named "src", then that subdirectory is again
broken down, with all the further subdirectories prefixed with "src_".
So if directory "X" has a subdirectory "src", which contains subdirectory
"modules", the program will report a separate count from "src_modules".
</ol>
In the terminology discussed above, each of these directories would become
"data directory children."
<p>
You can also give "sloccount" a list of directories, in which case the
report will be broken down by these directories
(make sure that the basenames of these directories differ).
SLOCCount normally considers all descendants of these directories,
though unless told otherwise it ignores symbolic links.
<p>
This is all easier to explain by example.
Let's say that we want to measure Apache 1.3.12 as installed using an RPM.
Once it's installed, we just type:
<pre>
sloccount /usr/src/redhat/BUILD/apache_1.3.12
</pre>
The output we'll see shows status reports while it analyzes things,
and then it prints out:
<pre>
SLOC Directory SLOC-by-Language (Sorted)
24728 src_modules ansic=24728
19067 src_main ansic=19067
8011 src_lib ansic=8011
5501 src_os ansic=5340,sh=106,cpp=55
3886 src_support ansic=2046,perl=1712,sh=128
3823 src_top_dir sh=3812,ansic=11
3788 src_include ansic=3788
3469 src_regex ansic=3407,sh=62
2783 src_ap ansic=2783
1378 src_helpers sh=1345,perl=23,ansic=10
1304 top_dir sh=1304
104 htdocs perl=104
31 cgi-bin sh=24,perl=7
0 icons (none)
0 conf (none)
0 logs (none)
ansic: 69191 (88.85%)
sh: 6781 (8.71%)
perl: 1846 (2.37%)
cpp: 55 (0.07%)
Total Physical Source Lines of Code (SLOC) = 77873
Estimated Development Effort in Person-Years (Person-Months) = 19.36 (232.36)
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Estimated Schedule in Years (Months) = 1.65 (19.82)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 11.72
Total Estimated Cost to Develop = $ 2615760
(average salary = $56286/year, overhead = 2.4).
Please credit this data as "generated using 'SLOCCount' by David A. Wheeler."
</pre>
<p>
Interpreting this should be straightforward.
The Apache directory has several subdirectories, including "htdocs", "cgi-bin",
and "src".
The "src" directory has many subdirectories in it
("modules", "main", and so on).
Code files directly
contained in the main directory /usr/src/redhat/BUILD/apache_1.3.12
is labelled "top_dir", while
code directly contained in the src subdirectory is labelled "src_top_dir".
Code in the "src/modules" directory is labelled "src_modules" here.
The output shows each major directory broken
out, sorted from largest to smallest.
Thus, the "src/modules" directory had the most code of the directories,
24728 physical SLOC, all of it in C.
The "src/helpers" directory had a mix of shell, perl, and C; note that
when multiple languages are shown, the list of languages in that child
is also sorted from largest to smallest.
<p>
Below the per-component set is a list of all languages used,
with their total SLOC shown, sorted from most to least.
After this is the total physical SLOC (77,873 physical SLOC in this case).
<p>
Next is an estimation of the effort and schedule (calendar time)
it would take to develop this code.
For effort, the units shown are person-years (with person-months
shown in parentheses); for schedule, total years are shown first
(with months in parentheses).
When invoked through "sloccount", the default assumption is that all code is
part of a single program; the "--multiproject" option changes this
to assume that all top-level components are independently developed
programs.
When "--multiproject" is invoked, each project's efforts are estimated
separately (and then summed), and the schedule estimate presented
is the largest estimated schedule of any single component.
<p>
By default the "Basic COCOMO" model is used for estimating
effort and schedule; this model
includes design, code, test, and documentation time (both
user/admin documentation and development documentation).
<a href="#cocomo">See below for more information on COCOMO</a>
as it's used in this program.
<p>
Next are several numbers that attempt to estimate what it would have cost
to develop this program.
This is simply the amount of effort, multiplied by the average annual
salary and by the "overhead multiplier".
The default annual salary is
$56,286 per year; this value was from the
<i>ComputerWorld</i>, September 4, 2000's Salary Survey
of an average U.S. programmer/analyst salary in the year 2000.
You might consider using other numbers
(<i>ComputerWorld</i>'s September 3, 2001 Salary Survey found
an average U.S. programmer/analyst salary making $55,100, senior
systems programmers averaging $68,900, and senior systems analysts averaging
$72,300).
<p>
Overhead is much harder to estimate; I did not find a definitive source
for information on overheads.
After informal discussions with several cost analysts,
I determined that an overhead of 2.4
would be representative of the overhead sustained by
a typical software development company.
As discussed in the next section, you can change these numbers too.
<p>
You may be surprised by the high cost estimates, but remember,
these include design, coding, testing, documentation (both for users
and for programmers), and a wrap rate for corporate overhead
(to cover facilities, equipment, accounting, and so on).
Many programmers forget these other costs and are shocked by the high figures.
If you only wanted to know the costs of the coding, you'd need to get
those figures.
<p>
Note that if any top-level directory has a file named PROGRAM_LICENSE,
that file is assumed to contain the name of the license
(e.g., "GPL", "LGPL", "MIT", "BSD", "MPL", and so on).
If there is at least one such file, sloccount will also report statistics
on licenses.
<p>
Note: sloccount internally uses MD5 hashes to detect duplicate files,
and thus needs some program that can compute MD5 hashes.
Normally it will use "md5sum" (available, for example, as a GNU utility).
If that doesn't work, it will try to use "md5" and "openssl", and you may
see error messages in this format:
<pre>
Can't exec "md5sum": No such file or directory at
/usr/local/bin/break_filelist line 678, <CODE_FILE> line 15.
Can't exec "md5": No such file or directory at
/usr/local/bin/break_filelist line 678, <CODE_FILE> line 15.
</pre>
You can safely ignore these error messages; these simply show that
SLOCCount is probing for a working program to compute MD5 hashes.
For example, Mac OS X users normally don't have md5sum installed, but
do have md5 installed, so they will probably see the first error
message (because md5sum isn't available), followed by a note that a
working MD5 program was found.
<h1><a name="options">Options</a></h1>
The program "sloccount" has a large number of options
so you can control what is selected for counting and how the
results are displayed.
<p>
There are several options that control which files are selected
for counting:
<pre>
--duplicates Count all duplicate files as normal files
--crossdups Count duplicate files if they're in different data directory
children.
--autogen Count automatically generated files
--follow Follow symbolic links (normally they're ignored)
--addlang Add languages to be counted that normally aren't shown.
--append Add more files to the data directory
</pre>
Normally, files which have exactly the same content are counted only once
(data directory children are counted alphabetically, so the child
"first" in the alphabet will be considered the owner of the master copy).
If you want them all counted, use "--duplicates".
Sometimes when you use sloccount, each directory represents a different
project, in which case you might want to specify "--crossdups".
The program tries to reject files that are automatically generated
(e.g., a C file generated by bison), but you can disable this as well.
You can use "--addlang" to show makefiles and SQL files, which aren't
usually counted.
<p>
Possibly the most important option is "--cached".
Normally, when sloccount runs, it computes a lot of information and
stores this data in a "data directory" (by default, "~/.slocdata").
The "--cached" option tells sloccount to use data previously computed,
greatly speeding up use once you've done the computation once.
The "--cached" option can't be used along with the options used to
select what files should be counted.
You can also select a different data directory by using the
"--datadir" option.
<p>
There are many options for controlling the output:
<pre>
--filecount Show counts of files instead of SLOC.
--details Present details: present one line per source code file.
--wide Show "wide" format. Ignored if "--details" selected
--multiproject Assume each directory is for a different project
(this modifies the effort estimation calculations)
--effort F E Change the effort estimation model, so that it uses
F as the factor and E as the exponent.
--schedule F E Change the schedule estimation model, so that it uses
F as the factor and E as the exponent.
--personcost P Change the average annual salary to P.
--overhead O Change the annual overhead to O.
-- End of options
</pre>
<p>
Basically, the first time you use sloccount, if you're measuring
a set of projects (not a single project) you might consider
using "--crossdups" instead of the defaults.
Then, you can redisplay data quickly by using "--cached",
combining it with options such as "--filecount".
If you want to send the data to another tool, use "--details".
<p>
If you're measuring a set of projects, you probably ought to pass
the option "--multiproject".
When "--multiproject" is used, efforts are computed for each component
separately and summed, and the time estimate used is the maximum
single estimated time.
<p>
The "--details" option dumps the available data in 4 columns,
tab-separated, where each line
represents a source code file in the data directory children identified.
The first column is the SLOC, the second column is the language type,
the third column is the name of the data directory child
(as it was given to get_sloc_details),
and the last column is the absolute pathname of the source code file.
You can then pipe this output to "sort" or some other tool for further
analysis (such as a spreadsheet or RDBMS).
<p>
You can change the parameters used to estimate effort using "--effort".
For example, if you believe that in the environment being used
you can produce 2 KSLOC/month scaling linearly, then
that means that the factor for effort you should use is 1/2 = 0.5 month/KSLOC,
and the exponent for effort is 1 (linear).
Thus, you can use "--effort 0.5 1".
<p>
You can also set the annual salary and overheads used to compute
estimated development cost.
While "$" is shown, there's no reason you have to use dollars;
the unit of development cost is the same unit as the unit used for
"--personcost".
<h1><a name="cocomo">More about COCOMO</a></h1>
<p>
By default SLOCCount uses a very simple estimating model for effort and schedule:
the basic COCOMO model in the "organic" mode (modes are more fully discussed below).
This model estimates effort and schedule, including design, code, test,
and documentation time (both user/admin documentation and development documentation).
Basic COCOMO is a nice simple model, and it's used as the default because
it doesn't require any information about the code other than the SLOC count
already computed.
<p>
However, basic COCOMO's accuracy is limited for the same reason -
basic COCOMO doesn't take a number of important factors into account.
If you have the necessary information, you can improve the model's accuracy
by taking these factors into account. You can at least quickly determine
if the right "mode" is being used to improve accuracy. You can also
use the "Intermediate COCOMO" and "Detailed COCOMO" models that take more
factors into account, and are likely to produce more accurate estimates as
a result. Take these estimates as just that - estimates - they're not grand truths.
If you have the necessary information,
you can improve the model's accuracy by taking these factors into account, and
pass this additional information to sloccount using its
"--effort" and "--schedule" options (as discussed in
<a href="#options">options</a>).
<p>
To use the COCOMO model, you first need to determine if your application's
mode, which can be "Organic", "embedded", or "semidetached".
Most software is "organic" (which is why it's the default).
Here are simple definitions of these modes:
<ul>
<li>Organic: Relatively small software teams develop software in a highly
familiar, in-house environment. It has a generally stable development
environment, minimal need for innovative algorithms, and requirements can
be relaxed to avoid extensive rework.</li>
<li>Semidetached: This is an intermediate
step between organic and embedded. This is generally characterized by reduced
flexibility in the requirements.</li>
<li>Embedded: The project must operate
within tight (hard-to-meet) constraints, and requirements
and interface specifications are often non-negotiable.
The software will be embedded in a complex environment that the
software must deal with as-is.</li>
</ul>
By default, SLOCCount uses the basic COCOMO model in the organic mode.
For the basic COCOMO model, here are the critical factors for --effort and --schedule:<br>
<ul>
<li>Organic: effort factor = 2.4, exponent = 1.05; schedule factor = 2.5, exponent = 0.38</li>
<li>Semidetached: effort factor = 3.0, exponent = 1.12; schedule factor = 2.5, exponent = 0.35</li>
<li>Embedded: effort factor = 3.6, exponent = 1.20; schedule factor = 2.5, exponent = 0.32</li>
</ul>
Thus, if you want to use SLOCCount but the project is actually semidetached,
you can use the options "--effort 3.0 1.12 --schedule 2.5 0.35"
to get a more accurate estimate.
<br>
For more accurate estimates, you can use the intermediate COCOMO models.
For intermediate COCOMO, use the following figures:<br>
<ul>
<li>Organic: effort base factor = 2.3, exponent = 1.05; schedule factor = 2.5, exponent = 0.38</li>
<li>Semidetached: effort base factor = 3.0, exponent = 1.12; schedule factor = 2.5, exponent = 0.35</li>
<li>Embedded: effort base factor = 2.8, exponent = 1.20; schedule factor = 2.5, exponent = 0.32</li>
</ul>
The intermediate COCOMO values for schedule are exactly the same as the basic
COCOMO model; the starting effort values are not quite the same, as noted
in Boehm's book. However, in the intermediate COCOMO model, you don't
normally use the effort factors as-is, you use various corrective factors
(called cost drivers). To use these corrections, you consider
all the cost drivers, determine what best describes them,
and multiply their corrective values by the effort base factor.
The result is the final effort factor.
Here are the cost drivers (from Boehm's book, table 8-2 and 8-3):
<table cellpadding="2" cellspacing="2" border="1" width="100%">
<tbody>
<tr>
<th rowspan="1" colspan="2">Cost Drivers
</th>
<th rowspan="1" colspan="6">Ratings
</th>
</tr>
<tr>
<th>ID
</th>
<th>Driver Name
</th>
<th>Very Low
</th>
<th>Low
</th>
<th>Nominal
</th>
<th>High
</th>
<th>Very High
</th>
<th>Extra High
</th>
</tr>
<tr>
<td>RELY
</td>
<td>Required software reliability
</td>
<td>0.75 (effect is slight inconvenience)
</td>
<td>0.88 (easily recovered losses)
</td>
<td>1.00 (recoverable losses)
</td>
<td>1.15 (high financial loss)
</td>
<td>1.40 (risk to human life)
</td>
<td>
</td>
</tr>
<tr>
<td>DATA
</td>
<td>Database size
</td>
<td>
</td>
<td>0.94 (database bytes/SLOC < 10)
</td>
<td>1.00 (D/S between 10 and 100)
</td>
<td>1.08 (D/S between 100 and 1000)
</td>
<td>1.16 (D/S > 1000)
</td>
<td>
</td>
</tr>
<tr>
<td>CPLX
</td>
<td>Product complexity
</td>
<td>0.70 (mostly straightline code, simple arrays, simple expressions)
</td>
<td>0.85
</td>
<td>1.00
</td>
<td>1.15
</td>
<td>1.30
</td>
<td>1.65 (microcode, multiple resource scheduling, device timing dependent coding)
</td>
</tr>
<tr>
<td>TIME
</td>
<td>Execution time constraint
</td>
<td>
</td>
<td>
</td>
<td>1.00 (<50% use of available execution time)
</td>
<td>1.11 (70% use)
</td>
<td>1.30 (85% use)
</td>
<td>1.66 (95% use)
</td>
</tr>
<tr>
<td>STOR
</td>
<td>Main storage constraint
</td>
<td>
</td>
<td>
</td>
<td>1.00 (<50% use of available storage)</td>
<td>1.06 (70% use)
</td>
<td>1.21 (85% use)
</td>
<td>1.56 (95% use)
</td>
</tr>
<tr>
<td>VIRT
</td>
<td>Virtual machine (HW and OS) volatility
</td>
<td>
</td>
<td>0.87 (major change every 12 months, minor every month)
</td>
<td>1.00 (major change every 6 months, minor every 2 weeks)</td>
<td>1.15 (major change every 2 months, minor changes every week)
</td>
<td>1.30 (major changes every 2 weeks, minor changes every 2 days)
</td>
<td>
</td>
</tr>
<tr>
<td>TURN
</td>
<td>Computer turnaround time
</td>
<td>
</td>
<td>0.87 (interactive)