How to save the trained model to HDFS? #7

echoyes · 2016-08-29T02:15:51Z

After I have trained the model, I can not find an interface to save the trained model to HDFS. Is there any way that could solve this problem? much thanks.

illuzen · 2016-08-30T06:35:50Z

We originally had an API for saving it to local disk, for example here

tensorspark/parameter_server/tensorspark.py

Line 56 in 90e2d6a

    
           self.saver.save(self.model.session, './models/parameter_server_model', global_step=int(time.time()))

However we took it out when we removed the Sacred integration, I thing. We just use TF's Saver object. https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#Saver

As for persisting the model to HDFS, I imagine you can hook the Saver up to the HDFS thru whatever HDFS API you are using. In tensorspark, the model is not distributed, the driver and each worker have their own copies of the model. I hope this helps.

echoyes · 2016-08-30T10:04:27Z

@snakecharmer1024 thanks for answering. I have deployed tensorspark to a cluster which have 6 workers and a driver. As far as I can see, tensorspark is data parallel. There is still one question that when I hook the code by adding saving model code in on_close function, dose the model have integrated all the gradients from all the workers? Besides, is there will be a model parallel version in the future?

illuzen · 2016-08-31T21:11:00Z

Glad to hear someone is using it! Yes tensorspark is data parallel.

I don't think on_close is where you want to save, as there is one ParameterServerWebsocketHandler per websocket connection, so you would be saving the model 6 times for 6 workers. But it would probably guarantee getting the most up to date gradients if you had the workers close the websocket connection...

You could probably put the saving code at the end of train_epochs and it would provide the same guarantee.
https://github.com/adatao/tensorspark/blob/master/tensorspark.py#L153
Since the gradients have to be pushed before returning... altho it's asynchronous, so conceivably the last gradient could be in transit when train_partition returns... you might be able to just grab the lock before saving...
https://github.com/adatao/tensorspark/blob/master/tensorspark.py#L84
https://github.com/adatao/tensorspark/blob/master/parameterwebsocketclient.py#L51

As you can probably see, we used this for collecting data, but did not productionize it.

There are no current plans to make a model parallel version, but pull requests are welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to save the trained model to HDFS? #7

How to save the trained model to HDFS? #7

echoyes commented Aug 29, 2016

illuzen commented Aug 30, 2016

echoyes commented Aug 30, 2016

illuzen commented Aug 31, 2016

How to save the trained model to HDFS? #7

How to save the trained model to HDFS? #7

Comments

echoyes commented Aug 29, 2016

illuzen commented Aug 30, 2016

echoyes commented Aug 30, 2016

illuzen commented Aug 31, 2016