Proposal for New Linking Strategy supporting Lazy Evaluation: Op.make_thunk¶
Note
Proposal made June 2010.
Motivation¶
Conditional evaluation is useful to describe many optimization algorithms where the update expressions depend on internal state.
True conditional evaluation requires lazy graph evaluation. Without lazy graph evaluation, the runtime of a graph can be exponential in the number of conditionals instead of linear. No one waits an exponential amount of time, so instead people work around this problem in various other ways, but it would be better if theano had an ‘if-then-else’ expression (call it cond).
A lazily-evaluted ‘cond’ requires a linker to use a different method for interacting with Ops. Neither the current perform() nor c_code() approaches support lazy evaluation. Why do perform (and c_code) not handle lazy evaluation? The syntax of the current perform() could be extended to be compatible with lazy evaluation. For example, the linker could set all inputs to None, and use the return value from perform() to see which inputs are required. But all the Ops that currently implement a perform() function would be broken because their perform implementations do not ask for inputs before using them. I don’t see a way around this. The same restriction applies to c_code.
The way around this is to introduce a new interface for the linker to talk to Ops. I propose that we add an Op.make_thunk() that returns an object satisfying this interface.
At the same time, it appears that as we try to integrate PyCUDA Ops another problem arises. We would like to use Op.perform() to drive the GPU, but it is natural to move compilation of the CUDA kernel to a point after make_node() and a point before perform(). The point where the linker makes an thunk from the Op seems like a natural choice.
A third motivation for introducing an Op.make_thunk function is to clarify the relationship between Ops (the classes you implement in Python) and mathematical operations (the more abstract things in terms of which you think when using Theano). I propose that technically an Op, when conditioned by particular inputs, generates at most one implementation that defines the behaviour of that Op. In intuitive terms, the abstract mathematical steps that we sometimes talk about regarding Theano still correspond to Ops – it’s just that these Ops have relatively generic implementations. The process of optimization is to specialize those generic implementations by using information from the rest of the graph. If we accept that an Op corresponds to at most one implementation, then it makes sense to ask an Op instance to expose that implementation via a standard interface (Op.make_thunk). It does not make sense to pass arguments to Op.make_thunk such as ‘py’ or “c|py” to tell the Op which implementation to use. The Op instance represents just one implementation, and flags such as ‘py’ or ‘c|py’ should be passed to the Op’s constructor.
Proposal: Op.make_thunk¶
There are two interface items I propose to add. The first is a Thunk object (which we have never had before), and the second is a new function (make_thunk) in the PureOp class (a superclass of Op) that will return a Thunk.
class Thunk (object):
"""Abstract class / interface
It describes the interface used by a Theano linker to execute the nodes in a
graph. Thunk instances are in correspondance with Apply instances that
remain in the final form of the graph after optimization.
"""
lazy = property(...,
"""True means the thunk may trigger lazy evaluation.
False means the thunk always requires all inputs and computes all
outputs.
Consequently False implies that __call__ always returns None
"""
def __call__(self):
"""Thunk will compute some number (or zero) of outputs and in the case
that it cannot compute all its outputs for lack of inputs, this function
will return a list of input indexes that are required. The linker will
typically compute those required inputs and then call this
__call__ function again.
The thunk is considered to be finished when it returns an empty list or
None.
"""
class PureOp(object): # recall:
# Op inherits from PureOp
def make_node(self, *inputs): # leave alone
...
def perform(self, node,
inputs, output_storage): # move to `Op` class
...
def make_thunk(self, node, # new function
input_computed, output_computed,
input_registers, output_registers,
):
"""
:type node: Apply instance
:param node: previous rval from make_node(self, *inputs)
:type input_computed: list of len-1 lists, with values in (0,1).
:param input_computed: at runtime, input_computed[i][0]==1 implies
that the i'th input has been computed and stored at
input_registers[i][0], and is available for use.
Otherwise the content of input_registers[i][0] is undefined.
:type output_computed: list of len-1 lists, with values in (0,1).
:param output_computed: at runtime, output_computed[i][0]==1 implies
that the i'th output has already been computed and stored at
output_registers[i][0].
Otherwise, output_registers[i][0] will contain either None, or
a value that was previously computed by this thunk.
:type input_registers: list of len-1 lists
:type output_registers: list of len-1 lists
:param input_registers: the i'th input can be read from
input_registers[i][0] when input_computed[i][0] == 1.
:param output_registers: the i'th output must be stored to
output_registers[i][0], at which point the thunk must set output_computed[i][0] == 1.
:returns: a Thunk (subclass) instance
"""
The Thunk class can have subclasses that use Op.perform and Op.c_code as we use them now. The interface of Thunk is backward-compatible with the thunks built by the CLinker and PerformLinker. If a graph contains zero Thunks with lazy==True, then the current Linkers will continue to work. The new Thunk interface will support a new LazyLinker that can run programs for which some thunks have lazy==True.
The Thunk class can have subclasses that are implemented in C, which might help performance.