layout | title |
---|---|
page |
The (Scientist's) Python Inferno |
This is a brief description of some of the common stumbling points in Python for scientific programmers, named in deference to The R Iferno (which is much more expansive).
- Indexing starts at 0, not 1
- Dividing one integer by another yeilds an integer (Python 2 only)
- Some methods change an object in place and return None
- Data type conversion issues
If you want the first item in a list, string, tuple, etc. then the index for this value is 0. For example,
>>> my_list = [1, 2, 3, 4]
>>> my_list[1]
2
This is standard for most programming languages, but not for scripting languages like R and Matlab.
Dividing an integer by another integer returns an integer, even when the two numbers are not evenly divisible. For example,
>>> 2 / 3
0
>>> -3 / 2
-2
The resulting values are rounded down to the nearest integer. This happens because computers pay attention to the type of data they are working with and often like to maintain types when combining two things of the same type. However, since this result is clearly wrong to most humans (and to all scientists). Python 3 now does this division as expected. There are three approaches to fix this integer division problem:
- Import the expected functionality by typing
from __future__ import division
at the beginning of an interactive session or at the beginning of your script. - Convert one or more of the integers to floats prior to doing the calculation:
float(2) / 3
- Add a decimal place to one of the integers so that it's type is already float:
2 / 3.0
.
Note that this does not work very well programmatically so approaches (1) and (2) are preferred.
Methods are functions that are attached to objects of a particular type.
Some of these methods return modified versions of the object and do not change the
object itself, but some modify the object and return None
. For example,
all lists have a method called sort that sorts the values in the list.
When you call sort it sorts the list in question and returns None. So,
>>> my_list = ['A', 'F', 'D', 'B', 'E', 'C']
>>> my_new_list = my_list.sort()
>>> my_new_list
None
>>> my_list
['A', 'B', 'C', 'D', 'E']
Misunderstanding this behavior can lead to a common bug when attempting to sort a list where the values in the list are accidentally delected:
>>> my_list = ['A', 'F', 'D', 'B', 'E', 'C']
>>> my_list = my_list.sort()
>>> my_list
None
Most data type conversion in Python is seamless (unlike a certain capital letter language), but there are still tricky conversions in places.
Lists of lists will not convert to Structured Arrays properly, but lists of Tuples will.
>>> import numpy as np
>>> my_list = [['A', 1], ['B', 2], ['C', 3]]
>>> np.array(my_list, dtype={'names': ['ID', 'Value'], 'formats': ['a2', 'i2']})
Traceback (most recent call last):
File "<string>", line 1, in <fragment>
TypeError: expected a readable buffer object
>>> my_list = [('A', 1), ('B', 2), ('C', 3)]
>>> np.array(my_list, dtype={'names': ['ID', 'Value'], 'formats': ['a2', 'i2']})
array([('A', 1), ('B', 2), ('C', 3)],
dtype=[('ID', '|S1'), ('Value', '<i2')])
This can be easily addressed using a list comprehension:
>>> my_list = [['A', 1], ['B', 2], ['C', 3]]
>>> my_list = [tuple(row) for row in my_list]
>>> my_list
[('A', 1), ('B', 2), ('C', 3)][('A', 1), ('B', 2), ('C', 3)]