01-intro.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <meta name="generator" content="pandoc">
    <title>Software Carpentry: Automation and Make</title>
    <link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
    <link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
    <link rel="stylesheet" type="text/css" href="css/swc.css" />
    <link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
    <meta charset="UTF-8" />
    <!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
    <!--[if lt IE 9]>
      <script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
    <![endif]-->
  </head>
  <body class="lesson">
    <div class="container card">
      <div class="banner">
        <a href="http://software-carpentry.org" title="Software Carpentry">
          <img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
        </a>
      </div>
      <article>
      <div class="row">
        <div class="col-md-10 col-md-offset-1">
                    <a href="index.html"><h1 class="title">Automation and Make</h1></a>
          <h2 class="subtitle">Introduction</h2>
          <section class="objectives panel panel-warning">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Explain what Make is for.</li>
<li>Explain why Make differs from shell scripts.</li>
<li>Name other popular build tools.</li>
</ul>
</div>
</section>
<p>Suppose we have a script, <code>wordcount.py</code>, that reads in a text file, counts the words in this text file, and outputs a data file:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> wordcount.py books/isles.txt isles.dat</code></pre>
<p>If we view the first 5 rows of the data file using <code>head</code>,</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">head</span> -5 isles.dat</code></pre>
<p>we can see that the file consists of one row per word.</p>
<pre class="output"><code>the 3822 6.7371760973
of 2460 4.33632998414
and 1723 3.03719372466
to 1479 2.60708619778
a 1308 2.30565838181</code></pre>
<p>Each row shows the word itself, the number of occurrences of that word, and the number of occurrences as a percentage of the total number of words in the text file. As another example:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> wordcount.py books/abyss.txt abyss.dat
$ <span class="kw">head</span> -5 abyss.dat</code></pre>
<pre class="output"><code>the 4044 6.35449402891
and 2807 4.41074795726
of 1907 2.99654305468
a 1594 2.50471401634
to 1515 2.38057825267</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Zipf’s Law</h2>
</div>
<div class="panel-body">
<p>The most frequently-occurring word occurs approximately twice as often as the second most frequent word. This is <a href="http://en.wikipedia.org/wiki/Zipf%27s_law">Zipf’s Law</a>.</p>
</div>
</aside>
<p>Suppose we also have a script, <code>plotcount.py</code>, that reads in a data file and plots the 10 most frequently occurring words:</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> plotcount.py isles.dat show</code></pre>
<p>Close the window to exit the plot.</p>
<p><code>plotcount.py</code> can also save the plot as an image (e.g. a JPEG):</p>
<pre class="sourceCode bash"><code class="sourceCode bash">$ <span class="kw">python</span> plotcount.py isles.dat isles.jpg</code></pre>
<p>Together these scripts implement a common workflow:</p>
<ol style="list-style-type: decimal">
<li>Read a data file.</li>
<li>Perform an analysis on this data file.</li>
<li>Write the analysis results to a new file.</li>
<li>Plot a graph of the analysis results.</li>
<li>Save the graph as an image, so we can put it in a paper.</li>
</ol>
<p>Running <code>wordcount.py</code> and <code>plotcount.py</code> at the shell prompt, as we have been doing, is fine for one or two files. If, however, we had 5 or 10 or 20 text files, this would quickly become monotonous. We could write a shell script to loop over our text files and create data files and images for each in turn, but this too can cause problems. If a text file changes then we need to re-run our analysis and recreate our graph, but only for the text file that changed, not all of them. Furthermore, we only want do this if and only if the text file has changed.</p>
<p><a href="http://www.gnu.org/software/make/">Make</a> (also known as GNU Make) is a fast, free and well-documented <a href="reference.html#build-manager">build manager</a>. Make was developed by Stuart Feldman in 1977 as a Bell Labs summer intern, and remains in widespread use today. Make can execute the commands needed to run our analysis and plot our results. Like shell scripts it allows us to execute complex sequences of commands via a single shell command. Unlike shell scripts it explicitly records the dependencies between files and so can determine when to recreate our data files or image files, if our text files change. Make can be used for any commands that follow the general pattern of processing files to create new files, for example:</p>
<ul>
<li>Run analysis scripts on raw data files to get data files that summarise the raw data.</li>
<li>Run visualisation scripts on data files to produce plots.</li>
<li>Parse and combine text files and plots to create papers.</li>
<li>Compile source code into executable programs or libraries.</li>
</ul>
<p>There are now many build tools available, for example <a href="http://ant.apache.org/">Apache ANT</a>, <a href="http://pydoit.org/">doit</a>, and <a href="https://msdn.microsoft.com/en-us/library/dd9y37ha.aspx">nmake</a> for Windows. There are also build tools that build scripts for use with these build tools and others e.g. <a href="http://www.gnu.org/software/autoconf/autoconf.html">GNU Autoconf</a> and <a href="http://www.cmake.org/">CMake</a>. Which is best for you depends on your requirements, intended usage, and operating system. However, they all share the same fundamental concepts as Make.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Why use Make if it is almost 40 years old?</h2>
</div>
<div class="panel-body">
<p>Today, researchers working with legacy codes in C or FORTRAN, which are very common in high-performance computing, will, very likely encounter Make.</p>
<p>Researchers are also finding Make of use in implementing reproducible research workflows, automating data analysis and visualisation (using Python or R) and combining tables and plots with text to produce reports and papers for publication.</p>
<p>Make’s fundamental concepts are common across build tools.</p>
</div>
</aside>
        </div>
      </div>
      </article>
      <div class="footer">
        <a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
        <a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
        <a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
        <a class="label swc-blue-bg" href="LICENSE.html">License</a>
      </div>
    </div>
    <!-- Javascript placed at the end of the document so the pages load faster -->
    <script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
    <script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
  </body>
</html>