Skip to content
Frédéric Bastien edited this page Mar 26, 2014 · 24 revisions

Theano is a critical component at the root of an ecosystem of machine learning projects, including PyLearn2, PyMC, HyperOpt, and the Deep Learning Tutorials. It is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently, on GPU or CPU. Theano has been in development and active use since 2008, and has achieved significant adoption across a range of industrial and research settings (at least 6K downloads per month). Their is also a SymPy->Theano bridge.

This is a list of projet ideas for GSoC for Theano. If you have an idea that is not on this list you can of course approach us on the mailing list (theano-dev) and tell us about it. In all cases you are strongly encouraged to disscuss your application with us beforehand.

For more information on the condition and how to apply, consult the Python SoC2014 page as we are participating via the Python Software Foundation this year.

The 3 registered mentors are: Frédéric Bastien, Arnaud Bergeron and James Bergstra. Other people have agreed to be co-mentors if they are needed.

Highlighted ideas:

Faster compilation

Project: Make the compile phase of Theano faster. This work would be optimizing certain parts of the code, bringing algorithmic improvements and other modifications where needed.

Details: The compiler works by taking a symbolic graph structure and applying a series of local and global transformations. After this is done, it will build and allocate runtime structures that depend on the executing backend (which may include compiling C code on the fly). Finally it will return a callable python function that wraps these structures. Pretty much any of these steps could benefit from improvements, especially on large graphs (1000+ ops).

Difficulty: medium to hard

Lower memory usage

Project: Analyze and optimize the memory used by Theano functions during their execution. Current code only does dumb greedy approach to memory allocation coupled with a garbage collector to deallocate memory once unneeded. This project would to do better by possibly lowering the maximum memory allocated at once and avoiding reallocations.

Details: Currently every op in Theano is responsible for its own memory and allocates its own outputs. There is some code that attempts to determine if a certain output isn't needed anymore and free it. Some ops also support reusing previous outputs when certain conditions are met. The work would be to add some way to reuse intermediate allocations when possible and to ensure that ops can use preallocated memory. This could be done with a central cache for allocations, but some smarter management would be better. An option to reorder the execution to lower the maximum memory used would be interesting.

Difficulty: easy to medium (a cache should be easy, but a central register-based allocation manager or reordering would be medium)

Lower overhead

Project: Theano has a significant execution overhead, especially when doing computations on scalars. This project would be to reduce that overhead as much as possible.

Details: The current code is geared towards computations on large matrices. In the interest of debugging a lot of checks are done on entry to the functions and for each op. Making all that code optional or faster would go a long way towards lowering the overhead.

Difficulty: easy

Improve pickling of Theano objects

Project: Pickling shared variables (included in graphs) is a problem right now since they have an explicit type and memory location. The project would find a solution that would let us move the memory location of a shared variable when it is unpickled. This should also let us unpickle a function that was compiled for a GPU on a machine without gpu and run it (by reoptimizing the graph).

Details: Shared variable currently decide their memory location based on what device was active when they were created. You would have to devise mechanism that would let them migrate to another device when they are unpickled. This is not simple since the memory location is currently part of the graph and is not trivial to change.

Difficulty: medium

Generate a dynamic library/dll

Project: Currently most ops can generate C code to implement their computations but this code is executed independently of the rest. This would be to generate C code for a complete graph and wrap it in a callable C function that does not depend on python at all.

Details: We already have a prototype of this, but it needs some work. It still needs some way to support shared variable (with a way to get/set them from C code). It would also benefit from a way to enable the GC of intermediate results. Finally we need to make sure it works natively on Windows/Mac/Linux.

Difficulty: medium

OpenCL conversion

Project: We have a new backend in development that supports OpenCL and CUDA and is way more flexible that our current CUDA code. We still need to port/convert code from the old CUDA ops to the new backend and possibly write new ops that were not possible before.

Details: This project will mean a lot of work on GPU kernels and auto-tuning. You may need to do some small code changes to the supporting library for the new backend also.

Difficulty: hard

Bridge theano with other python compilers (Numba Cython, Parakeet, ...)

  • Many compiler can give us
  • (done) Make an example how to use a Numba function in a Theano op
  • Allow to use a Numba function as a Theano op easily(an automatic wrapper?)
  • Make Theano use the c interface of a Numba function

Other unsorted and undeveloped ideas:

  • IfElse (lazy evaluation) c code and can be inplace on two inputs
  • Faster optimization phase (use a SAT Solver?)
  • Allow to do memory profiling in the CVM (now it use the VM)
  • Re-write DebugMode to reuse the CVM and simplify it
  • less opaque theano.function()
  • SymPy optimization
  • Track user usage of Theano with their permission
    • Allow to find bug that would have affected you in the past too.