Devnews

Updates in the Trunk since the last release up to git log -p rel-0.5... git log -p rel-0.5... |grep Merge|less 3ef4a040c6a58ab4246e8ada429cbca9ee4527c4 merge #857 August 16th d52fc2e53b329cb4d82f3fde3449127bba8b64f8 merge #889 August 27th

Make theano.grad able to differentiate between not implementedg and undefined grad. Op.grad function should return theano.gradient.{grad_not_implemented,grad_undefined} (Ian G.) TODO: doc about those fct in doc/tutorial/extending_theano.txt

m = mean(log(1 - sigm(x))) x - scalar * theano.grad(m, x) now got stabilization optimization applyied correctly correctly (Pascal L.)

Bug fixes

Outputs of Scan nodes could contain corrupted values: some parts of the output would be repeated a second time, instead of the correct values. It happened randomly, and quite infrequently, but the bug has been present (both in Python and Cython) since April 2011. (Pascal L.)
In Sparse sandbox, fix the grad of theano.sparse.sandbox.sp.row_scale. It did not return the right number of elements. (Frederic B.)
set_subtensor(x[int vector], new_value) when moved to the GPU was transformed into inc_subtensor on the GPU. Now we have a correct (but slow) GPU implementation. Note 1: set_subtensor(x[slice[,...]], new_value) was working correctly in all cases as well as inc_subtensor(*, *). Note 2: If your code was affected by the incorrect behavior, we now print a warning by default (Frederic B.)
Fixed an issue whereby config values were used as default arguments, with those defaults then stuck at old values if the config variables were changed during program execution. (David W-F)
Fixed many subtle bugs involving mutable default arguments which may have led to unexpected behaviour, such as objects sharing instance variables they were not supposed to share. (David W-F)
Correctly record the GPU device number used when we let the driver select it. (Frederic B.)
CAReduce with NaN in inputs don't return the good output. (Pascal L.)
- This is used in tensor.{all,any,max,mean,prod,sum} and in the grad of PermuteRowElements.
The grad of TensorDot, was returning the wrong shape for some combination of axes. We now raise NotImplementedError in those cases. (Frederic B.)
conv2d with subsample >2 returned wrong values. (Pascal L.)
- Fixed when mode==valid, disabled when mode==full
theano.sparse.CSMGrad op(generated by the grad of CSM) didn't handle unsorted input correctly and grapdient that are more sparse then the input. In that case, bad result was returned. But this can happen only when a sparse input of a Theano function was not sorted. This happen for example with sparse advanted indexing from scipy. The conclusion is most of time Nan in the graph. (Yann Dauphin)
theano.sparse._dot(CSC matrix, dense) optimized version UsmmCSCDense didn't handled correctly not contiguous inputs/outputs. (Pascal L.)
Fix a corner case CVM updates case. (Pascal L.) This happen is the update to a shared variable is itself after optimization.

Documentation

Added in the tutorial documentation on how to extend Theano. This explains how to make a Theano Op from a Python function. http://deeplearning.net/software/theano/tutorial/extending_theano.html (Frederic B.)
New installation instructions for Windows using EPD (Pascal L.)
New installation on Windows by using a Linux VM from ContinuumIO (Frederic B.)
Revisions of Theano tutorial and addition of exercices to it. (Eric L.)
New tutorial on Sparse variable. (Nicolas B., Sébastien Lemieux, Frederic Bastien http://www.deeplearning.net/software/theano/tutorial/sparse.html
Installation documentation for CentOS6 (Frederic B.)
Installation documentation for Ubuntu (with GPU) (Frederic B., Matthias Zoehrer)
Doc type fix, Doc update, Better error messag: Olivier D., David W.F., Frederic B., James B., Matthew Rocklin, Ian G.

Interface changes

In 0.5, we removed the deprecated sharedvar.value property. Now we raise an error if you access it. (Frederic B.)
theano.function does not accept duplicate inputs, so function([x, x], ...) does not work anymore. (Pascal L.)
theano.function now raises an error if some of the provided inputs are not part of the computational graph needed to compute the output, for instance, function([x, y], [y]). You can use the kwarg on_unused_input={'raise', 'warn', 'ignore'} to control this. (Pascal L.)
New Theano flag "on_unused_input" that define the default value of the previous point. (Frederic B.)
tensor.alloc() now raises an error during graph build time when we try to create less dimensions than the number of dimensions the provided value have. In the past, the error was at run time. (Frederic B.)
Remove theano.Value and related stuff(Ian G.) This was a test of what ended up as SharedVariable.
Renamed Env to FunctionGraph, and object attribute "env" to "fgraph" (Ian G.) Deprecation warning printed when you try to access the "env" attribute.
warn when we don't handle correctly the parameter in Theano flags nvcc.flags (Frederic B.)
Do not reorder the user flags passed to the compiler. They get set after other thags.(Frederic B.)
Make setuptools optional (Ilan Schnell+)/Remove Dependency.
We warn when a user try to use an old GPU what we don't test with. This could cause crash and will also be very slow. (Frederic B.)

New memory output contract:

Now the output memory received can be preallocated by other stuff then just being the previous output an Apply node allowcated. So this mean that the shape and strides can be different the from previous call and there can be link to this memory at other place. This mean it can receice preallocated output that is not c_contiguous.
New Theano flags to test it DebugMode.check_preallocated_output (Pascal L.)
Updated the a few ops to respect this contract (Pascal L.)

Deprecation

Deprecated the Module class (Ian G.) This was a predecessor of SharedVariable with a less pythonic phylosophy.

Speed up

CPU convolution are now parallelized (Frederic B.) By default use all cores/hyper-threads To control it, use the OMP_NUM_THREADS=N environment variable where N is the number of parallel thread to use. By default it is equal to the number of CPU cores/hyper threads that you have. There is a new Theano flags openmp to allow/disallow openmp op. If you blas is parallelized, this flag won't affect it, but the env variable will.
Remove a corner case where du duplicated dot22/gemm in the graph.
Enable fusion of node that have the same clients multiple time. (Frederic B.)
New optimization: Remove reduction over broadcastable dimensions (James B., Frederic B.)
Faster theano.function compilation. (Pascal L.)
Remove GPU transfer around specify_shape op. (Frederic B.)
Implemented/tested MANY op.infer_shape method (Eric Larsen) This allow Theano to make better shape inferance.
Scan memory optimization now work more frequently. (Razvan P.) There was warning printed by the subtensor optimization in those cases.

Speed up GPU

Convolution on the GPU now check the generation of the card to make it faster in some cases (especially medium/big ouput image) (Frederic B.) * We had hardcoded 512 as the maximum number of thread per block. Newer card

support up to 1024 threads per block.
Faster GpuAdvancedSubtensor1 (Frederic B.)
We now pass the GPU architecture to nvcc when compiling (Frederic B.)

Buildbot

Now we use http://travis-ci.org/ to run all CPU tests with the default mode on all Pull Requests. This should make the trunk more stable. (Fredric B.)
Our nightly buildbot now check on python 2.4(Frederic B.) This should make the trunk work on it more frequently.

New Features

debugprint new param ids=["CHAR", "id", "int", ""] This makes the identifier printed to be the python id, a unique char, a unique int, or not have it printed. We changed the default to be "CHAR" as this is more readable. (Frederic B.)
debugprint new param stop_on_name=[False, True]. If True, we don't print anything below an intermediate variable that has a name. Defaults to False. (Frederic B.)
debugprint does not print anymore the "|" symbol in a column after the last input. (Frederic B.)
If you use Enthought Python Distribution (EPD) now we use its blas implementation by default. (Frederic B., Graham Taylor, Simon McGregor)
MRG random now raises an error with a clear message when the passed shape contains dimensions with bad value like 0. (Frederic B. reported by Ian G.)
"CudaNdarray[*] = ndarray" works in more cases (Frederic B.)
"CudaNdarray[*] += ndarray" works in more cases (Frederic B.)
We add dimensions to CudaNdarray to automatically broadcast more frequently. (Frederic B.)
New theano flag cmodule.warn_no_version. Default False. If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops. (Frederic B.)
CPU alloc now always generate C code (Pascal L.)
New Theano flag cmodule.warn_no_version=False. When True, warn when an op with C code is not versioned (which forces to recompile it everytimes). (Frederic B.)
C code reuses preallocated outputs (only done by Scan) (Pascal L.)
Garbage collection of intermediate results during Theano function calls for Ops with C code (Pascal L.)
Theano flags compiledir_format now support the parameter "numpy_version" and "g++". (Frederic B.)
Theano GPU variables, shared variable and constant now support <, <=, > and >= as as those not on the GPU.
AdvancedIncSubtensor now support the set_instead_of_inc parameter. (Eric L.)
Added Advanced Indexing support to inc_subtensor and set_subtensor. (Eric L.)
theano.tensor.{any,all,std,var,mean,prod,sum,argmin,argmax,min,max,max_and_argman} have a new parameter keepdims (Eric L.) This allow to broadcast it correctly again the input data to normalize it.
The Updates object now check that the key are SharedVariable when we pass them in the __init__ function. PL
Set a Theano Variable name on transposed op when the input have one (Frederic B).
The cvm linker now support garbage collection (enabled by default). (James B. Arnaud B., Pascal L.)
The cvm linker is now the default linker. This make the "loop" around the execution of apply node in C. So this lower the overhead.
theano_variable[numpy.newaxis] is now supported (James B.)
Enable ifelse on the GPU. (Frederic B.)
Correctly support numpy.memmap everywhere (Pascal L.) We add partial support for them before. Just use the normal tensor operation on them and it should work. But take care to don't exhaust your computer memory! (we always generate normal ndarray)
Add an optimization that stabilize log(softmax(x)). (Ian G.)
Re-enable the Images2Neibs grad. It was not broken, the problem was how we tested it. (Frederic B.)

New Op/function

Added element-wise operation theano.tensor.{GammaLn,Psi} (John Salvatier, Nicolas Bouchard)
Added element-wise operation theano.tensor.{arcsin,arctan,arccosh,arcsinh,arctanh,exp2,arctan2} (Nicolas Bouchard)
Added element-wise operation theano.tensor.{gamma,conj,complex_from_polar,expm1,deg2rad,rad2deg,trunc,gamma} (Nicolas Bouchard)
Added theano.tensor.argsort that wraps numpy.argsort (Hani Almousli).
Added theano.tensor.diff that wrap numpy.diff (Nicolas B.)
Added theano.tensor.bincount that wrap numpy.bincount (Nicolas B., Pascal L, Frederic B.)
Added theano.tensor.squeeze (Nicolas B.) This remove broadcasted dimensions from the variable. Theano-nesque version of numpy.squeeze.
Added theano.tensor.repeat that wrap numpy.repeat (Nicolas B. + PL)
Added theano.tensor.bartlett that wrap numpy.bartlett (Eric L.)
Added theano.tensor.fill_diagonal that wrap numpy.fill_diagonal (Eric L., Frederic B.)
Added Fourier op (Eric L.) TODO: what is the current state of this?
Added tensor.square that is an alias for tensor.sqr as NumPy (Ian G.)
Added theano.tensor.load(path, dtype, broadcastable, mmap_mode=None) op that allow to load a .npy file in a theano graph (Matthew Rocklin)
Added elemwise op expm1, deg2rad, rad2deg, trunc, gamma (Nicolas B.)
theano.sandbox.linalg.kron.py:Kron op. (Eric L.) Kronecker product

Sparse

Implement theano.sparse.mul(sparse1, sparse2) when both inputs don't have the same sparsity pattern. (Frederic B.)

Sparse Sandbox Additions (not reviewed/documented/tested, but used by some people)

They are all in the theano.sparse.sandbox.sp2 module
Op class: Cast, Poisson, Multinomial, EliminateZeros, Binomial
Op class: SamplingDot, SamplingDotCsr (inserted automatically)
Op function: structured_sigmoid, structured_exp, structured_pow, structured_minimum
Op class: StructuredAddSV, StrucutedAddSVCSR (inserted automatically)
opt: local_sampling_dot_csr, local_structured_add_s_v

sparse Nicolas Bouchard: - fix crash in sparse.sandbox.sp2.Multinomial crash when n was a scalar.

Sparse Sandbox graduate (moved from theano.sparse.sandbox.sp)

sparse.remove0:. (Frederic B., Nicolas B.)
sparse.sp_sum(a, axis=None) (Nicolas B.) * bugfix: the not structured grad was returning a structured grad.
sparse.{col_scale,row_scale,ensure_sorted_indices,clean} (Nicolas B.)
sparse.{diag,square_diagonal} (Nicolas B.)

Sparse

Support for uint* dtype.
New Op: sparse.expm1,deg2rad,rad2deg,trunc (Nicolas B.)
New Op: sparse.{sqrt,sqr,log1p,floor,ceil,sgn,round_half_to_even,arctanh,tanh,arcsinh,sinh,arctan,arcsin,tan,sin} (Nicolas B.)
New Op: sparse.mul_s_v multiplication of sparse matrix by broadcasted vector (Yann D.)
New Op: sparse.Cast() (Yann D., Nicolas B.)
- Add sparse_variable.astype() and theano.sparse.cast() and theano.sparse.{b,w,i,l,f,d,c,z}cast() as there tensor equivalent (NB)
Implement the CSMProperties grad method (Yann Dauphin)
Move optimizations to theano/sparse/opt.py (Nicolas B.)

New flags

profile=True flag now print a printing of the sum of all printed profile.(Frederic B.) * It work with the new default vm/cvm linker. * Also print compile time, optimizer time and linker time. * Also print a summary by op class.
new flag "profile_optimizer" (Frederic B.) when profile=True, will also print the time spent in each optimizer. Useful to find optimization bottleneck.
new flag "cmodule.remove_gxx_opt" (Frederic B.) If True, will remove -O* parameter passed to g++. This is useful to debug in gdb module compiled by Theano. The parameter -g is passed by default to g++.
new flag cmodule.compilation_warning if True, will print compilation warning.
new flag allow_gc (Frederic B.)
new flag vm.lazy (Frederic B.) Useful only for the vm linkers. When lazy is None, auto detect if lazy evaluation is needed and use the apropriate version. If lazy it True/False, force the version used between Loop/LoopGC and Stack.

Internal changes

Define new exceptions MissingInputError and UnusedInputError, and use them in theano.function, instead of TypeError and ValueError. (Pascal L.)
Better handling of bitwidth and max values of integers and pointers across platforms (Pascal L.)
Made a few Ops with C code versioned to reduce compilation time. (Frederic B, Pascal L.)
Better delete of files in the compiledir (Frederic B.)
Safer import on sort op (Nicolas Pinto)
hash_from_dict for elemwise op (Fredric B.) (Why this is needed?)
Renamed BadCLinkerOutput into BadThunkOutput. (PL)
tensor.utils.shape_of_variables (Matthew R.)
add the numpy abi version, g++/nvcc version in the key of compiled code. (Frederic B.)
env.replace_all_validate_remove (Frederic B.) This allow global optimizer to ensure it removed some nodes from the graph. This is a generic way to catch error that would otherwise duplicate computation. * It was used for GEMM and Scan optimization (Frédéric B., Razvan P.)
Fix how exception are raised in GPU code (James B.)
Made code respect pep8: OD, Fred, Pascal L., Nicolas Bouchard, Eric Larsen
Move sparse optimization to there own file theano/sparse/opt.py
TensorType and CudaNdarrayType now have a value_zeros method that call CudaNdarray.zeros or numpy.zeros with the right dtype. (Pascal L., Olivier D.) This allow to have the same code work with both type.
renamed FunctionGraph.extend function to FunctionGraph.attach_feature. (Ian G.)

Crash Fix

Fix import conflict name (usaar33, Frederic B.)
- This make Theano work with PiCloud.
Do not try to use the BLAS library when blas.ldflags is manually set to an empty string (Frederic B.)
When importing theano on a computer without GPU with the Theano flags 'device' or 'init_gpu_device' set to gpu* (Frederic B., reported by Luo Heng)
Optimization printed a useless error when scipy was not available. (Frederic B.)
GPU conv crash/slowdown on newer hardware (James B.)
Better error handling in GPU conv (Frederic B.)
GPU optimization that moves element-wise Ops to the GPU. Crash happened in a particular execution order of this optimization and the element-wise fusion optimization when upcasting some inputs to float32 (to compute them on the GPU). (Frederic B., reported by Sander Dieleman)
GpuReshape in some particular case when the input is not contiguous (Frederic B., reported by Sander Dieleman)
GpuSoftmaxWithBias with shape (0, N) with N > 1. (Frederic B., reported by Razvan P.)
Fix crash under 64-bit Windows, when taking subtensors of the form a[n:] (Pascal L., reported by Simon McGregor)
Fixed issue with the MaxAndArgmax Op not properly preserving broadcastable dimensions, which could typically result in optimization crashes (Olivier D.)
Fixed crash when concatenating some arrays with specific broadcasting patterns (Olivier D.)
Work around a known issue with nvcc 4.1 on MacOS X. (Graham Taylor)
In advanced indexing, if some inputs are constant, no need to call constant(...) on their value any more. (Pascal L., reported by John Salvatier)
Fix crash on GPU when the GpuSubtensor didn't put the right stride when the results tensor had a dimensions with size of 1. (Pascal L, reported Graham T.)
Fix scan crash that made it not run on the GPU in one case. (Guillaume D.)
If you grad again a random state, don't crash (Razvan P.)
GpuDownsampleFactorMax and its grad with inputs dimensions 0 and 1 bigger then 65535. (Frederic B. reported by Gabe Schwartz)
Potential crash due to parallel compilation when importing theano.sandbox.cuda (Olivier D.)
Crash fix on python 2.4 with slicing. (Pascal L.)
grad of argmin and argmax (Razvan P.)
Don't compute the Rop for shared variable with updates(mostly random). We don't use them and they caused crash. (Razvan P.)
MaxArgmax.grad() when one of the gradient it receive is None. (Razvan P, reported by Mark Fenner)

Tests

Use less memory (Olivier D.)(frix crash on 32-bits computers)
Fix test with Theano flag "blas.ldflags=". (Frederic B.)
fix crash with advanced subtensor and numpy constant.
fix random tests crash due to random value. (Pascal L.)
Always introduce Alloc node when calling alloc and let the optimizer remove them if needed. This allow DebugMode to catch some shape error. (Pascal L.)
theano-nose (Pascal L.)
- --profile-time (Eric L.)

Others

Remove python warning for some python version. (Gabe Schwartz)
remove useless fill op in fast_compile mode to make the graph more readable. (Fredric B.)

New stuff that will probably be reworked/removed before the release

theano.configdefaults.gxx_avail variable (Frederic B.)
new flag "time_seq_optimizer" (Frederic B.)
new flag "time_eq_optimizer" (Frederic B.)
Better PyCUDA sharing of the GPU context.(fix crash at exit) (Frederic B.) TODO: there is still a crash at exit!

Other thanks:

blaxill reported an error introduced into the trunk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devnews

Clone this wiki locally