Skip to content

Commit

Permalink
checkpoint on documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
marcoct committed May 18, 2021
1 parent 3fc5a0e commit b173841
Show file tree
Hide file tree
Showing 5 changed files with 115 additions and 52 deletions.
24 changes: 12 additions & 12 deletions docs/src/ref/modeling.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ See [Generative Function Interface](@ref) for more information about traces.

A `@gen` function may begin with an optional block of *trainable parameter declarations*.
The block consists of a sequence of statements, beginning with `@param`, that declare the name and Julia type for each trainable parameter.
The Julia type must be either a subtype of `Real` or subtype of `Array{<:Real}`.
The function below has a single trainable parameter `theta` with type `Float64`:
```julia
@gen function foo(prob::Float64)
Expand All @@ -264,23 +265,22 @@ The function below has a single trainable parameter `theta` with type `Float64`:
end
```
Trainable parameters obey the same scoping rules as Julia local variables defined at the beginning of the function body.
The value of a trainable parameter is undefined until it is initialized using [`init_param!`](@ref).
After the definition of the generative function, you must register all of the parameters used by the generative function using [`register_parameters!`](@ref) (this is not required if you instead use the [Static Modeling Language](@ref)):
```julia
register_parameters!(foo, [:theta])
```
The value of a trainable parameter is undefined until it is initialized using [`init_parameter!`](@ref):
```julia
init_parameter!((foo, :theta), 0.0)
```
In addition to the current value, each trainable parameter has a current **gradient accumulator** value.
The gradient accumulator value has the same shape (e.g. array dimension) as the parameter value.
It is initialized to all zeros, and is incremented by [`accumulate_param_gradients!`](@ref).

The following methods are exported for the trainable parameters of `@gen` functions:
It is initialized to all zeros, and is incremented by calling [`accumulate_param_gradients!`](@ref) on a trace.
Additional functions for retrieving and manipulating the values of trainable parameters and their gradient accumulators are described in [Optimizing Trainable Parameters](@ref).
```@docs
init_param!
get_param
get_param_grad
set_param!
zero_param_grad!
register_parameters!
```

Trainable parameters are designed to be trained using gradient-based methods.
This is discussed in the next section.

## Differentiable programming

Given a trace of a `@gen` function, Gen supports automatic differentiation of the log probability (density) of all of the random choices made in the trace with respect to the following types of inputs:
Expand Down
46 changes: 46 additions & 0 deletions docs/src/ref/parameter_optimization.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,52 @@
# Optimizing Trainable Parameters

## Parameter stores

Multiple traces of a generative function typically reference the same trainable parameters of the generative function, which are stored outside of the trace in a **parameter store**.
Different types of generative functions may use different types of parameter stores.
For example, the [`JuliaParameterStore`](@ref) (discussed below) stores parameters as Julia values in the memory of the Julia runtime process.
Other types of parameter stores may store parameters in GPU memory, in a filesystem, or even remotely.

When generating a trace of a generative function with [`simulate`](@ref) or [`generate`](@ref), we may pass in an optional **parameter context**, which is a `Dict` that provides information about which parameter store(s) in which to look up the value of parameters.
A generative function obtains a reference to a specific type of parameter store by looking up its key in the parameter context.

If you are just learning Gen, and are only using the built-in modeling language to write generative functions, you can ignore this complexity, because there is a [`default_julia_parameter_store`](@ref) and a default parameter context [`default_parameter_context`](@ref) that points to this default Julia parameter store that will be used if a parameter context is not provided in the call to `simulate` and `generate`.
```@docs
default_parameter_context
default_julia_parameter_store
```

## Julia parameter store

Parameters declared using the `@param` keyword in the built-in modeling language are stored in a type of parameter store called a [`JuliaParameterStore`](@ref).
A generative function can obtain a reference to a `JuliaParameterStore` by looking up the key [`JULIA_PARAMETER_STORE_KEY`](@ref) in a parameter context.
This is how the built-in modeling language implementation finds the parameter stores to use for `@param`-declared parameters.
Note that if you are defining your own [custom generative functions](@ref #Custom-generative-functions), you can also use a [`JuliaParameterStore`](@ref) (including the same parameter store used to store parameters of built-in modeling language generative functions) to store and optimize your trainable parameters.

Different types of parameter stores provide different APIs for reading, writing, and updating the values of parameters and gradient accumulators for parameters.
The `JuliaParameterStore` API is given below.
(Note that most user learning code only needs to use [`init_parameter!`](@ref), as the other API functions are called by [Optimizers](@ref) which are discussed below.)

```@docs
JuliaParameterStore
init_parameter!
increment_gradient!
reset_gradient!
get_parameter_value
get_gradient
JULIA_PARAMETER_STORE_KEY
```

### Multi-threaded gradient accumulation

Note that the [`increment_gradient!`](@ref) call is thread-safe, so that multiple threads can concurrently increment the gradient for the same parameters. This is helpful for parallelizing gradient computation for a batch of traces within stochastic gradient descent learning algorithms.

## Optimizers

TODO

Trainable parameters of generative functions are initialized differently depending on the type of generative function.

Trainable parameters of the built-in modeling language are initialized with [`init_param!`](@ref).

Gradient-based optimization of the trainable parameters of generative functions is based on interleaving two steps:
Expand Down
Empty file removed src/builtin_optimization.jl
Empty file.
11 changes: 6 additions & 5 deletions src/dynamic/dynamic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,14 @@ end
"""
register_parameters!(gen_fn::DynamicDSLFunction, parameters)
Register the altrainable parameters that are used by a DML generative function.
Register the trainable parameters that used by a DML generative function.
This includes all parameters used within any calls made by the generative function.
This includes all parameters used within any calls made by the generative function, and includes any parameters that may be used by any possible trace (stochastic control flow may cause a parameter to be used by one trace but not another).
There are two variants:
# TODO document the variants
The second argument is either a `Vector` or a `Function` that takes a parameter context and returns a `Dict` that maps parameter stores to `Vector`s of parameter IDs.
When the second argument is a `Vector`, each element is either a `Symbol` that is the name of a parameter declared in the body of `gen_fn` using `@param`, or is a tuple `(other_gen_fn::GenerativeFunction, name::Symbol)` where `@param <name>` was declared in the body of `other_gen_fn`.
The `Function` input is used when `gen_fn` uses parameters that come from more than one parameter store, including parameters that are housed in parameter stores that are not `JuliaParameterStore`s (e.g. if `gen_fn` invokes a generative function that executes in another non-Julia runtime).
See [Optimizing Trainable Parameters](@ref) for details on parameter contexts, and parameter stores.
"""
function register_parameters!(gen_fn::DynamicDSLFunction, parameters)
gen_fn.parameters = parameters
Expand Down
86 changes: 51 additions & 35 deletions src/optimization.jl
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
import Parameters

# we should modify the semantics of the log probability contribution to the gradient
# so that everything is gradient descent instead of ascent. this will also fix
# the misnomer names
# TODO we should modify the semantics of the log probability contribution to
# the gradient so that everything is gradient descent instead of ascent. this
# will also fix the misnomer names
#
# TODO add tests specifically for JuliaParameterStore etc.
#
# TODO in all update and regenerate implementations, need to pass in the parameter context to inner calls to generate

export in_place_add!


export FixedStepGradientDescent
export DecayStepGradientDescent
export init_optimizer
Expand All @@ -22,6 +21,10 @@ export increment_gradient!
export reset_gradient!
export get_parameter_value
export get_gradient
export JULIA_PARAMETER_STORE_KEY

export default_julia_parameter_store
export default_parameter_context

#################
# in_place_add! #
Expand Down Expand Up @@ -155,11 +158,9 @@ end
# TODO create diagram and document the overal framework
# including parameter contexts and parameter stores,and the default beahviors

abstract type ParameterStore end

"""
optimizer = init_optimizer(
conf, parameter_ids,
conf, parameter_ids::Vector,
store=default_julia_parameter_store)
Initialize an iterative gradient-based optimizer.
Expand Down Expand Up @@ -187,24 +188,10 @@ function apply_update!(optimizer)
error("Not implemented")
end

"""
optimizer = CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
Construct an optimizer that applies the given update to parameters in multiple parameter stores.
The first argument defines the mathematical behavior of the update;
the second argument defines the set of parameters to which the update should be applied at each iteration,
as a map from parameter stores to a vector of IDs of parameters within that parameter store.
optimizer = CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
Constructs a composite optimizer that applies the given update to all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
"""
struct CompositeOptimizer
conf::Any
optimizers::Dict{Any,Any}
function CompositeOptimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
function CompositeOptimizer(conf, parameter_stores_to_ids)
optimizers = Dict{Any,Any}()
for (store, parameter_ids) in parameter_stores_to_ids
optimizers[store] = init_optimizer(conf, parameter_ids, store)
Expand All @@ -218,10 +205,23 @@ function CompositeOptimizer(conf, gen_fn::GenerativeFunction; parameter_context=
end

"""
apply_update!(composite_opt::ComposieOptimizer)
Perform one step of an update, possibly mutating the values of parameters in multiple parameter stores.
optimizer = init_optimizer(conf, parameter_stores_to_ids::Dict{Any,Vector})
Construct an optimizer that updates parameters in multiple parameter stores.
The first argument configures the mathematical behavior of the update.
The second argument defines the set of parameters to which the update should be applied at each iteration,
The parameters are given in a map from parameter store to a vector of IDs of parameters within that parameter store.
optimizer = init_optimizer(conf, gen_fn::GenerativeFunction; parameter_context=default_parameter_context)
Constructs a composite optimizer that updates all parameters used by the given generative function, even when the parameters exist in multiple parameter stores.
"""
function init_optimizer(conf, parameter_stores_to_ids::Dict)
return CompositeOptimizer(conf, parameter_stores_to_ids)
end

function apply_update!(composite_opt::CompositeOptimizer)
for opt in values(composite_opt.optimizers)
apply_update!(opt)
Expand All @@ -247,7 +247,7 @@ Construct a parameter store stores the state of parameters in the memory of the
There is a global Julia parameter store automatically created and named `Gen.default_julia_parameter_store`.
Incrementing the gradients can be safely multi-threaded (see [`increment_gradient!`](@ref)).
Gradient accumulation is thread-safe (see [`increment_gradient!`](@ref)).
"""
function JuliaParameterStore()
return JuliaParameterStore(
Expand All @@ -263,29 +263,45 @@ function get_local_parameters(store::JuliaParameterStore, gen_fn)
end
end

const default_parameter_context = Dict{Symbol,Any}()
const default_julia_parameter_store = JuliaParameterStore()

# for looking up in a parameter context when tracing (simulate, generate)
# once a trace is generated, it is bound to use a particular store
"""
JULIA_PARAMETER_STORE_KEY
If a parameter context contains a value for this key, then the value is a `JuliaParameterStore`.
"""
const JULIA_PARAMETER_STORE_KEY = :julia_parameter_store

function get_julia_store(context::Dict)
if haskey(context, JULIA_PARAMETER_STORE_KEY)
return context[JULIA_PARAMETER_STORE_KEY]
else
return default_julia_parameter_store
end
return context[JULIA_PARAMETER_STORE_KEY]::JuliaParameterStore
end

"""
default_julia_parameter_store::JuliaParameterStore
The default global Julia parameter store.
"""
const default_julia_parameter_store = JuliaParameterStore()

"""
default_parameter_context::Dict
The default global parameter context, which is initialized to contain the mapping:
JULIA_PARAMETER_STORE_KEY => Gen.default_julia_parameter_store
"""
const default_parameter_context = Dict{Symbol,Any}(
JULIA_PARAMETER_STORE_KEY => default_julia_parameter_store)


"""
init_parameter!(
id::Tuple{GenerativeFunction,Symbol}, value,
store::JuliaParameterStore=default_julia_parameter_store)
Initialize the the value of a named trainable parameter of a generative function.
Also generates the gradient accumulator for that parameter to `zero(value)`.
Also initializes the gradient accumulator for that parameter to `zero(value)`.
Example:
```julia
Expand Down

0 comments on commit b173841

Please sign in to comment.