-
Notifications
You must be signed in to change notification settings - Fork 54
Framework Apps
Summary: We describe how to write the five key functions in every NEXT application.
We have written a YAML interface specifying what the inputs to the major API functions are in myApp.yaml
. Now we will discuss the actual app development.
- NEXT handles serving requests, load balancing, logging, database management, and most other components of a web server.
- This allows you to focus on the dataflow and the algorithm development.
- This page the arguments needed for your app. It is your job to glue together the various dataflows in your application.
The main application code belongs in:
apps/PoolBasedBinaryClassification/PoolBasedBinaryClassification.py
In this case the function definitions in the file look something like:
import json
import next.apps.SimpleTargetManager
class MyApp(object):
def __init__(self,db):
self.app_id = 'PoolBasedBinaryClassification'
self.TargetManager = next.apps.SimpleTargetManager.SimpleTargetManager(db)
def initExp(self, butler, alg, args):
return ...
def getQuery(self, butler, alg, args):
return ...
def processAnswer(self, butler, alg, args):
return ...
def getModel(self, butler, alg, args):
return ...
Firstly note that the functions in this class correspond to the major API functions. Secondly, each api function (excluding the __init__
which does some basic setup...more on this later), receives the same three arguments:
-
butler
. Thebutler
provides a way to store and retrieve application and algorithm variables and targets effectively hiding explicit database access.- Butler-API -- the API docs
- "Why can I not just save
n
usingself.n = n
?" in the classMyApp
? NEXT will usually be run with many instances of the same code running in multiple processes. To guarantee future access to data, we have to store it in a database that is shared across all these processes. The butler is our (nice) interface to these databases. - The
butler
provides a way to access four main collections-
experiment
which stores experiment information -
queries
which stores all queries made by NEXT -
algorithms
, a collection specific to each algorith -
targets
a list of the targets uploaded in an experiment initialization. - there's more; take a look at
next/apps/Butler.py
-
- The
butler
also contains ajob
function which is used to run asynchronous jobs (such as logging or model updates).
-
alg
- Refers to an algorithms implementation ofgetQuery
,processAnswer
,getStats
orgetModel
(depends on which function this is received in).- each
alg
is treated like a black box -- we specify the arguments and returns in Algs.yaml. - The application must be agnostic to the algorithm; the app only defines the interface
- In existing NEXT apps,
alg
only deals with indices, not the actual targets the user sees.
- each
-
args
- As demonstrated in Interface, the input of each function contains a dictionary with keyargs
. These parameters are specified inmyApp.yaml
.- If specified in
myApp.yaml
, these almost are guaranteed to exist (even if optional).
- If specified in
def initExp(self, butler, alg, args):
# Set the experiment_args to contain an additional key n, with the number of targets
args['n'] = len(args['targets']['targetset'])
# Get the first target, extract it's feature vector and save this as the dimension
# This assumes that feature dimension consistent across all targets
args['d'] = len(args['targets']['targetset'][0]['meta']['features'])
# Save the target set to the TargetManager associated to this app.
self.TargetManager.set_targetset(butler.exp_uid, args['targets']['targetset'])
# We do not want the experiment dictionary to contain the targets...this could make it's size very large if there are many targets.
del args['targets']
# Run the algorithm initExp
alg({ 'n': args['n'], 'd':args['d']})
# The args are now stored in the butler and can be accessed through butler.experiment
return args
initExp
is responsible for setting up the experiment and saving any variables that might be needed later. The return value of initExp
is a dictionary that will be saved in the butler.experiment
collection. It is best practice to append or remove any necessary values to the input args
dictionary and then return args
. So for example, in the code above, we add n
, the number of targets, and d
the dimension of a feature vector corresponding to a target to args
and we delete the associated targets
dictionary.
Generally initExp
is also expected to save the experiment's targets using a TargetManager that is available through butler.targets
. Note that the specific type of TargetManager
, aka SimpleTargetManager
is specified in __init__
. You can learn more about creating a TargetManager
here.
Finally, the initExp
runs the algorithm initExp
in the line alg({ 'n': args['n'], 'd':args['d']})
, passing on the number of targets and the ambient dimension.
def getQuery(self, butler, alg, args):
# Get the target_id that we wish to label from the algorithm.
target_id = alg({'participant_uid':args['participant_uid']})
# Get the associated target and remove the feature vector
target = butler.targets..get_target_item(butler.exp_uid, target_id)
del target['meta']
# Return a dictionary with the target - this will be returned to the user.
return {'target_indices':target}
The output of getQuery
is directly returned to the client that requested a query. So in general, getQuery
must retrieve an index of a target from the algorithm, it then gets the associated target, does any manipulations to it, and then returns it. Every query is also assigned a unique query_uid
. The returned query dictionary is also stored in the butler.queries
collection.
In this example, the getQuery
function first calls the algorithm getQuery
with the participant_uid
as an input. It receives an integer, the target_id
back by the TargetManager
. The target_id
is then used to retrieve the target. So for example if the algorithm returns a target_id
of 2 we may imagine that a dictionary like
{
"meta": {"features": [1.732, -1.882]},
"alt_type": "text",
"primary_type": "text",
"primary_description": "Target 2",
"alt_description": "2",
"target_id": 2
},
is passed back. The meta
key is then removed since the user does not need the feature vector to answer the question.
def processAnswer(self, butler, alg, args):
# Get the query associated to this answer
query = butler.queries.get(uid=args['query_uid'])
# Extract the target that a query was made about
target = query['target_indices']
# Get the label returned from the user in the args dictionary.
target_label = args['target_label']
# Update the number of reported answers
num_reported_answers = butler.experiment.increment(key='num_reported_answers_for_' + query['alg_label'])
# Pass the answer onto the algorithm for processing.
alg({'target_id':target['target_id'],'target_label':target_label})
# Get a copy of the model and log it ever n/4 queries. This will later help us in tracking how our algorithm performed
# as the number of labelled items increased.
experiment = butler.experiment.get()
if num_reported_answers % ((experiment['n']+4)/4) == 0:
butler.job('getModel', json.dumps({'exp_uid':butler.exp_uid,'args':{'alg_label':query['alg_label'], 'logging':True}}))
# Return the answer. This is added to the associated query entry stored in butler.queries for future reference.
return {'target_index':target['target_id'],'target_label':target_label}
processAnswer
begins by getting a copy of the original query corresponding to this answer. It then extracts the associated target, and passes the target_id
(obtained from the query) and target_label
(passed by the client) to the algorithm.
Next processAnswer
queues a getModel
job in the butler roughly every n/4 answers. As described below and in the algorithms section, the getModel
command just returns an internal algorithmic specific representation of the model. By logging this model every n/4 queries, we can track how the model changed over time and check it's performance on a test set. We will delve further into this when we discuss the dashboard.
Finally processAnswer
returns the answer. NEXT appends the answer to the query dictionary for the specific query_uid
in the butler.queries
collection.
def getModel(self, butler, alg, args):
# Run the getModel algorithm and return the results. Note that we are implicitly assuming there is no inputs to the alg.
return alg()
The getModel
command is very simple, it just returns the algorithms getModel
. This agrees with what we saw earlier when getModel
was not overwritten in the application specific YAML interface file.
Note, if algorithms may return slightly different types of models and it is desired that the model output be uniform across algorithms, this is the place to do the computation to make it so. Otherwise, this function may simply package the algorithm response into a dictionary and return it.
TODO: ALG_LABEL ISSUES IN GET_MODEL?