-
Notifications
You must be signed in to change notification settings - Fork 98
How to save the trained model to HDFS? #7
Comments
We originally had an API for saving it to local disk, for example here
However we took it out when we removed the Sacred integration, I thing. We just use TF's Saver object. https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#Saver As for persisting the model to HDFS, I imagine you can hook the Saver up to the HDFS thru whatever HDFS API you are using. In tensorspark, the model is not distributed, the driver and each worker have their own copies of the model. I hope this helps. |
@snakecharmer1024 thanks for answering. I have deployed tensorspark to a cluster which have 6 workers and a driver. As far as I can see, tensorspark is data parallel. There is still one question that when I hook the code by adding saving model code in on_close function, dose the model have integrated all the gradients from all the workers? Besides, is there will be a model parallel version in the future? |
Glad to hear someone is using it! Yes tensorspark is data parallel. I don't think on_close is where you want to save, as there is one ParameterServerWebsocketHandler per websocket connection, so you would be saving the model 6 times for 6 workers. But it would probably guarantee getting the most up to date gradients if you had the workers close the websocket connection... You could probably put the saving code at the end of train_epochs and it would provide the same guarantee. As you can probably see, we used this for collecting data, but did not productionize it. There are no current plans to make a model parallel version, but pull requests are welcome! |
After I have trained the model, I can not find an interface to save the trained model to HDFS. Is there any way that could solve this problem? much thanks.
The text was updated successfully, but these errors were encountered: