Skip to content

Latest commit

 

History

History
714 lines (491 loc) · 20.3 KB

slides.wiki

File metadata and controls

714 lines (491 loc) · 20.3 KB

Table of Contents

Introduction

Introduction

  • Many scientists write code regularly but few have been formally trained to do so
% you know about variables and loops % good programs consists of so much more

  • Best practices evolved from programmer's folk wisdom
  • They increase productivity and decrease stress
  • Development methodologies, such as Agile Programming and Test Driven Development, are established in the software engineering industry
  • We can learn a lot from them to improve our coding skills
  • When programming in Python: Always bear in mind the \\Zen of Python
% this means stuff that is unique to python % these are like mantras, repeat them over and over again

Outline

\tableofcontents

Best Practices

Outline

\tableofcontents[currentsection]

Style and Documentation

Outline

\tableofcontents[currentsection,currentsubsection]

Coding Style

  • Readability counts
  • Explicit is better than implicit
  • Beautiful is better than ugly
  • Give your variables intention revealing names
--0.5cm--

<[example] \pyfile{code/my_product.py} [example]>

Formatting Code

Documenting Code

Example Docstring

\pyfile{code/my_product_docstring.py}

Example Autogenerated Website

<[figure]

    <<<images/epydoc.pdf, scale=0.3>>>

[figure]>

Using Exceptions

  • Use the _green_@try except@_ statements to detect anomalous behaviour:
<[example] \pyfile{code/my_product_try.py} [example]>

  • Allow you to recover or fail gracefully
  • Resist the temptation to use special return values: they will backfire!
    • @(-\textcolor{blue}{1}, \textcolor{blue}{0}, \textcolor{red}{False}, \textcolor{red}{None})@
  • Fail early, fail often
  • Errors should never pass silently...
  • Unless explicitly silenced
==== Appropriate Exceptions ====[containsverbatim]

\begin{pycode} Exception +-- StandardError

    +-- ArithmeticError
        +-- FloatingPointError
        +-- OverflowError
        +-- ZeroDivisionError
    +-- AssertionError
    +-- IndexError
    +-- TypeError
    +-- ValueError

\end{pycode}

@import@ Pitfalls

  • Don't use the star import: @\textcolor{red}{import} *@
    • Code is hard to read
    • Modules may overwrite each other
    • Where does this function come from?
    • You will import everything in a module
    • ...unless you are using the interpreter interactively
  • Put all imports at the beginning of the file...
  • ...unless you have a very good reason to do otherwise
==== @import foobar as fb@ \textbf{VS} @from foo import bar@ ====[containsverbatim] <[example] \begin{pycode} import my_product as mp mp.my_product([1,2,3]) \end{pycode} [example]>

  • _green_+_ origin of @my\_product@ known
  • _red_--_ slightly more to type
  • _red_--_ fails only on call (late)
--1cm--

<[example] \begin{pycode} from my_product import my_product my_product([1,2,3]) \end{pycode} [example]>

  • _green_+_ slightly less to type
  • _green_+_ fails on import (early)
  • _red_--_ must look at @import@ for origin

Unit Tests

Outline

\tableofcontents[currentsection,currentsubsection]

Write and Run Unit Tests

  • We wish to automate testing of our software
  • Instead of testing the whole system we test units
<[block]{Definition of a Unit}
  • The smallest testable piece of code
[block]>

Available Packages

Version Control

Outline

\tableofcontents[currentsection,currentsubsection]

Motivation to use Version Control

<[block]{Problem 1} "Help! my code worked yesterday, but I can't recall what I changed." [block]>

  • Version control is a method to track and retrieve modifications in source code
<[block]{Problem 2} "We would like to work together, but we don't know how!" [block]>

  • Concurrent editing by several developers is possible via merging

Features

  • Checkpoint significant improvements, for example releases
  • Document developer effort
    • Who changed what, when and why?
  • Use version control for anything that's text
    • Code
    • Thesis/Papers
    • (Love) letters
  • Easy collaboration across the globe

Vocabulary

  • Modifications to code are called commits
  • Commits are stored in a repository
  • Adding commits is called committing
<[figure]

    <<<images/repository.pdf, scale=0.2>>>

[figure]>

Centralised Version Control

  • All developers connect to a single resource over the network
  • Any interaction (history, previous versions, committing) require network access
<[figure]

    <<<images/centralised.pdf, scale=0.25>>>

[figure]>

Distributed Version Control

Distributed like Centralised

  • ... except that each developer has a complete copy of the entire repository
<[figure]

    <<<images/distributed_basic.pdf, scale=0.25>>>

[figure]>

Distributed Supports any Workflow :-)

<[figure]

    <<<images/distributed.pdf, scale=0.2>>>

[figure]>

What we will use...

<[figure]

    <<<images/git.pdf, scale=0.3>>>

[figure]>

  • More tomorrow...

Refactoring

Outline

\tableofcontents[currentsection,currentsubsection]

Refactor Continuously

  • As a program evolves it may become necessary to rethink earlier decisions and adapt the code accordingly
  • Re-organisation of your code without changing its function
  • Increase modularity by breaking large code blocks apart
  • Rename and restructure code to increase readability and reveal intention
  • Always refactor one step at a time, and use the tests to check code still works
  • Learn how to use automatic refactoring tools to make your life easier
  • Now is better than never
  • Although never is often better than right now

Common Refactoring Operations

  • Rename class/method/module/package/function
  • Move class/method/module/package/function
  • Encapsulate code in method/function
  • Change method/function signature
  • Organize imports (remove unused and sort)
  • Generally you will improve the readability and modularity of your code
  • Usually refactoring will reduce the lines of code

Refactoring Example

\pyfile{code/my_product_refactor.py}

Split into functions, and use built-ins

\pyfile{code/my_product_done.py}

Do not Repeat Yourself

Outline

\tableofcontents[currentsection,currentsubsection]

Do not Repeat Yourself (DRY Principle)

  • When developing software, avoid duplication
  • No cut\&paste!
  • Not just lines code, but knowledge of all sorts
  • Do not express the same piece of knowledge in two places
  • If you need to update this knowledge you will have to update it everywhere
  • It is not a question of how this may fail, but instead a question of when
  • Categories of Duplication:
    • Imposed Duplication
    • Inadvertent Duplication
    • Impatient Duplication
    • Interdeveloper Duplication
  • If you detect duplication in code thats already written, refactor mercilessly!

Imposed Duplication

  • When duplication seems to be forced on us
  • We feel like there is no other solution
  • The environment or programming language seems to require duplication
<[example]
  • Duplication of a program version number in:
    • Source code
    • Website
    • Licence
    • README
    • Distribution package
  • Result: Increasing version number consistently becomes difficult
[example]>

Inadvertent Duplication

  • When duplication happens by accident
  • You don't realize that you are repeating yourself
<[example]
  • Variable name: \textcolor{blue}{@list\_of\_numbers@} instead of just @\textcolor{blue}{numbers}@
  • Type information duplicated in variable name
  • What happens if the set of possible types grows or shrinks?
  • Side effect: Type information incorrect, function may operate on any sequence such as tuples
[example]>

Impatient Duplication

  • Duplication due to sheer laziness
  • Reasons:
    • End-of-day
    • Deadline
    • Insert _blue_@pretext@_ here
<[example]
  • Copy-and-paste a snippet, instead of refactoring it into a function
  • What happens if the original code contains a bug?
  • What happens if the original code needs to be changed?
[example]>

  • By far the easiest category to avoid, but requires discipline and willingness
  • Be patient, invest time now to save time later! (especially when facing oh so important deadlines)

Interdeveloper Duplication

  • Repeated implementation by more than one developer
  • Usually concerns utility methods
  • Often caused by lack of communication
  • Or lack of a module to contain utilities
  • Or lack of library knowledge

Interdeveloper Duplication Example

  • Product function may already exist in some library
  • (Though I admit this may also be classified as impatient duplication)
--1cm--

<[example] \pyfile{code/my_product_duplication.py} [example]>

Keep it Simple

Outline

\tableofcontents[currentsection,currentsubsection]

Keep it Simple (Stupid) (KIS(S) Principle)

  • Resist the urge to over-engineer
  • Write only what you need now
  • Simple is better than complex
  • Complex is better than complicated
  • Special cases aren't special enough to break the rules
  • Although practicality beats purity

Development Methodologies

Outline

\tableofcontents[currentsection]

Definition and Motivation

Outline

\tableofcontents[currentsection,]

What is a Development Methodology?

<[block]{Consists of:}

  • An attitude that informs the style and approach towards development
  • A set of tools and models to support that particular approach
[block]>

<[block]{Help answer the following questions:}

  • How far ahead should I plan?
  • What should I prioritize?
  • When do I write tests and documentation?
[block]>

Scenarios

  • Lone student/scientist
<[center]

    <<<images/lucky_luke.jpg, scale=0.30>>>

[center]>

  • Small team of scientists, working on a common library
  • Speed of development more important than execution speed
  • Often need to try out different ideas quickly:
    • rapid prototyping of a proposed algorithm
    • re-use/modify existing code

An Example: The Waterfall Model, Royce 1970

<[figure][ht]

    <<<images/waterfall.pdf, scale=0.2>>>

[figure]>

  • Sequential software development process
  • Originates in the manufacturing and construction industries
  • Rigid, inflexible model---focusing on one stage at a time

Agile Methods

Outline

\tableofcontents[currentsection,currentsubsection]

Agile Methods

  • Generic name for set of more specific paradigms
  • Set of best practices
  • Particularly suited for:
    • Small teams (Fewer than 10 people)
    • Unpredictable or rapidly changing requirements...
    • ... isn't this what science is all about?

Prominent Features of Agile methods

  • Minimal planning, small development iterations
  • Design/implement/test on a modular level
  • Rely heavily on testing
  • Promote collaboration and teamwork, including frequent input from customer/boss/professor
  • Very adaptive, since nothing is set in stone

The Agile Spiral

<[figure]

    <<<images/agile.pdf, scale=0.2>>>

[figure]>

Agile methods

<[figure][ht]

    <<<images/dilbert-agile_programming.jpg, scale=0.27>>>

[figure]>

Test Driven Development

Outline

\tableofcontents[currentsection,currentsubsection]

Test Driven Development (TDD)

<[figure]

    <<<images/testdriven.pdf, scale=0.2>>>

[figure]>

  • Define unit tests first!
  • Develop one unit at a time!

Benefits of TDD

  • Encourages simple designs and inspires confidence
  • No one ever forgets to write the unit tests
  • Helps you design a good API, since you are forced to use it when testing (dog fooding)
--2em--

  • Perhaps you may want to even write the documentation first?

Additional techniques

Outline

\tableofcontents[currentsection,currentsubsection]

Dealing with Bugs --- The Agile Way

  • Write a unit test to expose the bug
  • Isolate the bug using a debugger
  • Fix the code, and ensure the test passes
  • Use the test to catch the bug should it reappear (regression)
<[block]{Debugger} [block]>

Dealing with Bugs?

<[figure][ht]

    <<<images/phd_bug.jpg, scale=0.45>>>

[figure]>

Design by Contract

  • Functions carry their specifications around with them:
    • Keeping specification and implementation together makes both easier to understand
    • ...and improves the odds that programmers will keep them in sync
  • A function is defined by:
    • pre-conditions: what must be true in order for it to work correctly
    • post-conditions: what it guarantees it will be true if pre-conditions are met
  • Pre- and post-conditions constrain how the function can evolve:
    • can only ever relax pre-conditions (i.e., take a wider range of input)...
    • ...or tighten post-conditions (i.e., produce a narrower range of output)
    • tightening pre-conditions, or relaxing post-conditions, would violate the function's contract with its callers

Defensive Programming

  • Specify pre- and post-conditions using assertion:
    • @\textcolor{green}{assert} \textcolor{red}{len}(\textcolor{blue}{numbers}) > \textcolor{cyan}{0}@
    • @\textcolor{green}{raise} \textcolor{blue}{AssertionError}@
  • Use assertions liberally
  • Program as if the rest of the world is out to get you!
  • Fail early, fail often, fail better!
  • The less distance there is between the error and you detecting it, the easier it will be to find and fix
  • It's never too late to do it right
    • Every time you fix a bug, put in an assertion and a comment
    • If you made the error, the right code can't be obvious
    • You should protect yourself against someone “simplifying” the bug back in

Pair Programming

  • Two developers, one computer
  • Two roles: driver and navigator
  • Driver sits at keyboard
    • Can focus on the tactical aspects
    • See only the “road” ahead
  • Navigator observes and instructs
    • Can concentrate on the “map”
    • Pay attention to the big picture
  • Switch roles every so often!
  • In a team: switch pairs every so often!

Pair Programming --- Benefits

  • Know-How is shared/transfered:
    • Specifics of the system
    • Tool usage (editor, interpreter, debugger, version control)
    • Coding style, idioms, knowledge of library
  • Less likely to:
    • Surf the web, read personal email
    • Be interrupted by others
    • Cheat themselves (being impatient, taking shortcuts)
  • Pairs produce code which:\footnote[1]{Cockburn, Alistair, Williams, Laurie (2000). \href{http://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF}{\sl The Costs and Benefits of Pair Programming}}
    • Is shorter
    • Incorporates better designs
    • Contains fewer defects...
    • ... 1+1 > 2 !

Optimization for Speed --- My Point of View

  • Readable code is usually better than fast code
  • Programmer/Scientist time is more valuable than computer time
  • Don't optimize early, ensure code works, has tests and is documented...
  • before starting to optimize
  • Only optimize if it is absolutely necessary
  • Only optimize your bottlenecks
  • ...and identify these using a profiler

Profilers and Viewers

<[block]{Profiler}

[block]>

<[block]{Viewer}

[block]>

Prototyping

  • Ever tried to hit a moving target?
  • If you are unsure how to implement something, write a prototype
  • Hack together a proof of concept quickly
  • No tests, no documentation, keep it simple (stupid)
  • Use this to explore the feasibility of your idea
  • When you are ready, scrap the prototype and start with the unit tests
  • In the face of ambiguity, refuse the temptation to guess

Quality Assurance

  • The techniques I have mentioned above help to assure high quality of the software
  • Quality is not just testing:
    • Trying to improve the quality of software by doing more testing is like trying to lose weight by weighing yourself more often
  • Quality is designed in (For example, by using the DRY and KISS principles)
  • Quality is monitored and maintained through the whole software life cycle

Zen of Python

Outline

\tableofcontents[currentsection]

==== The Zen of Python ====[containsverbatim]

\begin{pyconcode} Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! \end{pyconcode}