- Many scientists write code regularly but few have been formally trained to do so
- Best practices evolved from programmer's folk wisdom
- They increase productivity and decrease stress
- Development methodologies, such as Agile Programming and Test Driven Development, are established in the software engineering industry
- We can learn a lot from them to improve our coding skills
- When programming in Python: Always bear in mind the \\Zen of Python
\tableofcontents
\tableofcontents[currentsection]
\tableofcontents[currentsection,currentsubsection]
- Readability counts
- Explicit is better than implicit
- Beautiful is better than ugly
- Give your variables intention revealing names
- For example: _blue_@numbers@_ instead of _blue_@nu@_
- For example: _blue_@numbers@_ instead of \textcolor{blue}{@list\_of\_float\_numbers@}
- See also: \href{http://tottinge.blogsome.com/meaningfulnames/}{Ottingers Rules for Naming}
<[example] \pyfile{code/my_product.py} [example]>
- Format code to coding conventions
- for example: \href{http://www.python.org/dev/peps/pep-0008/}{PEP-8}
- OR use a consistent style (especially when collaborating)
- Conventions Specify:
- variable naming convention
- Indentation
- import
- maximum line length
- blank lines, whitespace, comments
- Use automated tools to check adherence (aka static checking):
- Minimum requirement: at least a single line docstring
- Not only for others, but also for yourself!
- Serves as on-line help in the interpreter
- Document arguments and return objects, including types
- Use the \href{https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt}{numpy docstring conventions}
- Use tools to automatically generate website from docstrings
- For complex algorithms, document every line, and include equations in docstring
- When your project gets bigger: provide a how-to, FAQ or quick-start on your website
\pyfile{code/my_product_docstring.py}
<[figure]
<<<images/epydoc.pdf, scale=0.3>>>
[figure]>
- Use the _green_@try except@_ statements to detect anomalous behaviour:
- Allow you to recover or fail gracefully
- Resist the temptation to use special return values: they will backfire!
- @(-\textcolor{blue}{1}, \textcolor{blue}{0}, \textcolor{red}{False}, \textcolor{red}{None})@
- Fail early, fail often
- Errors should never pass silently...
- Unless explicitly silenced
- Python has a \href{http://docs.python.org/library/exceptions.html}{built-in Exception hierarchy}
- These will suit your needs most of the time, If not, subclass them
+-- ArithmeticError +-- FloatingPointError +-- OverflowError +-- ZeroDivisionError +-- AssertionError +-- IndexError +-- TypeError +-- ValueError
\end{pycode}
- Don't use the star import: @\textcolor{red}{import} *@
- Code is hard to read
- Modules may overwrite each other
- Where does this function come from?
- You will import everything in a module
- ...unless you are using the interpreter interactively
- Put all imports at the beginning of the file...
- ...unless you have a very good reason to do otherwise
- _green_+_ origin of @my\_product@ known
- _red_--_ slightly more to type
- _red_--_ fails only on call (late)
<[example] \begin{pycode} from my_product import my_product my_product([1,2,3]) \end{pycode} [example]>
- _green_+_ slightly less to type
- _green_+_ fails on import (early)
- _red_--_ must look at @import@ for origin
\tableofcontents[currentsection,currentsubsection]
- We wish to automate testing of our software
- Instead of testing the whole system we test units
- The smallest testable piece of code
- In python we have several packages available:
- Tests increase the confidence that your code works correctly, not only for yourself but also for your reviewers
- Tests are the only way to trust your code
- It might take you a while to get used to writing them, but it will pay off quite rapidly
\tableofcontents[currentsection,currentsubsection]
<[block]{Problem 1} "Help! my code worked yesterday, but I can't recall what I changed." [block]>
- Version control is a method to track and retrieve modifications in source code
- Concurrent editing by several developers is possible via merging
- Checkpoint significant improvements, for example releases
- Document developer effort
- Who changed what, when and why?
- Use version control for anything that's text
- Code
- Thesis/Papers
- (Love) letters
- Easy collaboration across the globe
- Modifications to code are called commits
- Commits are stored in a repository
- Adding commits is called committing
<<<images/repository.pdf, scale=0.2>>>
[figure]>
- All developers connect to a single resource over the network
- Any interaction (history, previous versions, committing) require network access
<<<images/centralised.pdf, scale=0.25>>>
[figure]>
- Example systems: \href{http://subversion.tigris.org/}{Subversion (svn)}, \href{http://www.cvshome.org/}{Concurrent Version System (cvs)}
- Several copies of the repository may exist all over the place
- Network access only required when synchronising repositories
- Much more flexible than centralised
- Widely regarded as state-of-the-art
- Example systems: \href{http://git-scm.com/}{git}, \href{http://mercurial.selenic.com/}{Mercurial (hg)}, \href{http://wiki.bazaar.canonical.com/DataStructures}{Bazaar (bzr)}
- ... except that each developer has a complete copy of the entire repository
<<<images/distributed_basic.pdf, scale=0.25>>>
[figure]>
<[figure]
<<<images/distributed.pdf, scale=0.2>>>
[figure]>
<[figure]
<<<images/git.pdf, scale=0.3>>>
[figure]>
- More tomorrow...
\tableofcontents[currentsection,currentsubsection]
- As a program evolves it may become necessary to rethink earlier decisions and adapt the code accordingly
- Re-organisation of your code without changing its function
- Increase modularity by breaking large code blocks apart
- Rename and restructure code to increase readability and reveal intention
- Always refactor one step at a time, and use the tests to check code still works
- Learn how to use automatic refactoring tools to make your life easier
- For example: \href{http://rope.sourceforge.net/ropeide.html}{ropeide}
- Now is better than never
- Although never is often better than right now
- Rename class/method/module/package/function
- Move class/method/module/package/function
- Encapsulate code in method/function
- Change method/function signature
- Organize imports (remove unused and sort)
- Generally you will improve the readability and modularity of your code
- Usually refactoring will reduce the lines of code
\pyfile{code/my_product_refactor.py}
\pyfile{code/my_product_done.py}
\tableofcontents[currentsection,currentsubsection]
- When developing software, avoid duplication
- No cut\&paste!
- Not just lines code, but knowledge of all sorts
- Do not express the same piece of knowledge in two places
- If you need to update this knowledge you will have to update it everywhere
- It is not a question of how this may fail, but instead a question of when
- Categories of Duplication:
- Imposed Duplication
- Inadvertent Duplication
- Impatient Duplication
- Interdeveloper Duplication
- If you detect duplication in code thats already written, refactor mercilessly!
- When duplication seems to be forced on us
- We feel like there is no other solution
- The environment or programming language seems to require duplication
- Duplication of a program version number in:
- Source code
- Website
- Licence
- README
- Distribution package
- Result: Increasing version number consistently becomes difficult
- When duplication happens by accident
- You don't realize that you are repeating yourself
- Variable name: \textcolor{blue}{@list\_of\_numbers@} instead of just @\textcolor{blue}{numbers}@
- Type information duplicated in variable name
- What happens if the set of possible types grows or shrinks?
- Side effect: Type information incorrect, function may operate on any sequence such as tuples
- Duplication due to sheer laziness
- Reasons:
- End-of-day
- Deadline
- Insert _blue_@pretext@_ here
- Copy-and-paste a snippet, instead of refactoring it into a function
- What happens if the original code contains a bug?
- What happens if the original code needs to be changed?
- By far the easiest category to avoid, but requires discipline and willingness
- Be patient, invest time now to save time later! (especially when facing oh so important deadlines)
- Repeated implementation by more than one developer
- Usually concerns utility methods
- Often caused by lack of communication
- Or lack of a module to contain utilities
- Or lack of library knowledge
- Product function may already exist in some library
- (Though I admit this may also be classified as impatient duplication)
<[example] \pyfile{code/my_product_duplication.py} [example]>
\tableofcontents[currentsection,currentsubsection]
- Resist the urge to over-engineer
- Write only what you need now
- Simple is better than complex
- Complex is better than complicated
- Special cases aren't special enough to break the rules
- Although practicality beats purity
\tableofcontents[currentsection]
\tableofcontents[currentsection,]
<[block]{Consists of:}
- An attitude that informs the style and approach towards development
- A set of tools and models to support that particular approach
<[block]{Help answer the following questions:}
- How far ahead should I plan?
- What should I prioritize?
- When do I write tests and documentation?
- Lone student/scientist
<<<images/lucky_luke.jpg, scale=0.30>>>
[center]>
- Small team of scientists, working on a common library
- Speed of development more important than execution speed
- Often need to try out different ideas quickly:
- rapid prototyping of a proposed algorithm
- re-use/modify existing code
<[figure][ht]
<<<images/waterfall.pdf, scale=0.2>>>
[figure]>
- Sequential software development process
- Originates in the manufacturing and construction industries
- Rigid, inflexible model---focusing on one stage at a time
\tableofcontents[currentsection,currentsubsection]
- Generic name for set of more specific paradigms
- Set of best practices
- Particularly suited for:
- Small teams (Fewer than 10 people)
- Unpredictable or rapidly changing requirements...
- ... isn't this what science is all about?
- Minimal planning, small development iterations
- Design/implement/test on a modular level
- Rely heavily on testing
- Promote collaboration and teamwork, including frequent input from customer/boss/professor
- Very adaptive, since nothing is set in stone
<[figure]
<<<images/agile.pdf, scale=0.2>>>
[figure]>
<[figure][ht]
<<<images/dilbert-agile_programming.jpg, scale=0.27>>>
[figure]>
\tableofcontents[currentsection,currentsubsection]
<[figure]
<<<images/testdriven.pdf, scale=0.2>>>
[figure]>
- Define unit tests first!
- Develop one unit at a time!
- Encourages simple designs and inspires confidence
- No one ever forgets to write the unit tests
- Helps you design a good API, since you are forced to use it when testing (dog fooding)
- Perhaps you may want to even write the documentation first?
\tableofcontents[currentsection,currentsubsection]
- Write a unit test to expose the bug
- Isolate the bug using a debugger
- Fix the code, and ensure the test passes
- Use the test to catch the bug should it reappear (regression)
- A program to run your code one step at a time, and giving you the ability to inspect the current state
- For example:
<[figure][ht]
<<<images/phd_bug.jpg, scale=0.45>>>
[figure]>
- Functions carry their specifications around with them:
- Keeping specification and implementation together makes both easier to understand
- ...and improves the odds that programmers will keep them in sync
- A function is defined by:
- pre-conditions: what must be true in order for it to work correctly
- post-conditions: what it guarantees it will be true if pre-conditions are met
- Pre- and post-conditions constrain how the function can evolve:
- can only ever relax pre-conditions (i.e., take a wider range of input)...
- ...or tighten post-conditions (i.e., produce a narrower range of output)
- tightening pre-conditions, or relaxing post-conditions, would violate the function's contract with its callers
- Specify pre- and post-conditions using assertion:
- @\textcolor{green}{assert} \textcolor{red}{len}(\textcolor{blue}{numbers}) > \textcolor{cyan}{0}@
- @\textcolor{green}{raise} \textcolor{blue}{AssertionError}@
- Use assertions liberally
- Program as if the rest of the world is out to get you!
- Fail early, fail often, fail better!
- The less distance there is between the error and you detecting it, the easier it will be to find and fix
- It's never too late to do it right
- Every time you fix a bug, put in an assertion and a comment
- If you made the error, the right code can't be obvious
- You should protect yourself against someone “simplifying” the bug back in
- Two developers, one computer
- Two roles: driver and navigator
- Driver sits at keyboard
- Can focus on the tactical aspects
- See only the “road” ahead
- Navigator observes and instructs
- Can concentrate on the “map”
- Pay attention to the big picture
- Switch roles every so often!
- In a team: switch pairs every so often!
- Know-How is shared/transfered:
- Specifics of the system
- Tool usage (editor, interpreter, debugger, version control)
- Coding style, idioms, knowledge of library
- Less likely to:
- Surf the web, read personal email
- Be interrupted by others
- Cheat themselves (being impatient, taking shortcuts)
- Pairs produce code which:\footnote[1]{Cockburn, Alistair, Williams, Laurie (2000). \href{http://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF}{\sl The Costs and Benefits of Pair Programming}}
- Is shorter
- Incorporates better designs
- Contains fewer defects...
- ... 1+1 > 2 !
- Readable code is usually better than fast code
- Programmer/Scientist time is more valuable than computer time
- Don't optimize early, ensure code works, has tests and is documented...
- before starting to optimize
- Only optimize if it is absolutely necessary
- Only optimize your bottlenecks
- ...and identify these using a profiler
<[block]{Profiler}
- A tool to measure and provide statistics on the execution of code.
<[block]{Viewer}
- Viewers display the profiler output, usually a call-graph
- Ever tried to hit a moving target?
- If you are unsure how to implement something, write a prototype
- Hack together a proof of concept quickly
- No tests, no documentation, keep it simple (stupid)
- Use this to explore the feasibility of your idea
- When you are ready, scrap the prototype and start with the unit tests
- In the face of ambiguity, refuse the temptation to guess
- The techniques I have mentioned above help to assure high quality of the software
- Quality is not just testing:
- Trying to improve the quality of software by doing more testing is like trying to lose weight by weighing yourself more often
- Quality is designed in (For example, by using the DRY and KISS principles)
- Quality is monitored and maintained through the whole software life cycle
\tableofcontents[currentsection]
==== The Zen of Python ====[containsverbatim]
\begin{pyconcode} Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! \end{pyconcode}