TF_Serving: challenges and errors faced in freezing and loading the graph

To be able to serve models with tensorflow serving we should export the model as a protobuf file. We can export it in two ways

  1. To have a graph defined in protobuf and have variable's value in another file.
    1. This gives us more flexiblity.
  2. To have weights and graph serialized in same protobuf file.

Since we first trained the model and saved it as checkpoint which created following files

model.ckpt-22001-00000-of-00001
model.ckpt-22001.meta
checkpoint

Alone these files cannot be used with tensorflow serving. We have generate a protobuf files with the above file.

Following code can be used to generate protobuf file,

from tensorflow.python.framework import graph_util
from tensorflow.contrib.session_bundle import session_bundle
import tensorflow as tf

export_dir = '/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825' # This directory contains files mentioned above
output_graph = '/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/export.pb'
clear_devices = True # we need to make sure we dont carry device information to the protobuf file.

checkpoint = tf.train.get_checkpoint_state(export_dir)
input_checkpoint = checkpoint.model_checkpoint_path
saver = tf.train.import_meta_graph('/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/export.meta')

graph = tf.get_default_graph()
input_graph_def = graph.as_graph_def()

output_graph_node = []

with tf.Session() as sess:
 saver.restore(sess, input_checkpoint)
 # we are exporting all the operations/nodes exiting in our model to the protobuf file.
 for op in graph.get_operations():
  output_graph_node.append(op.name)
 output_graph_def = graph_util.convert_variables_to_constants(sess,input_graph_def,output_graph_node)
 with tf.gfile.GFile(output_graph, "wb") as f:
  f.write(output_graph_def.SerializeToString())
 print("%d ops in the final graph." % len(output_graph_def.node))

Initiallly I was not sure how to export the nodes to the protobuf file so i was trying to export it with

output___graph__node = ['prediction']

I got following error while doing so,

Traceback (most recent call last):
  File "convert2.py", line 21, in <module>
    output_graph_def = graph_util.convert_variables_to_constants(sess,input_graph_def,["prediction"])
  File "/home/cdpdemo/tensorflow/reterival-based-model/tensorflow-old-version-for-this-use-case/local/lib/python2.7/site-packages/tensorflow/python/framework/graph_util.py", line 234, in convert_variables_to_constants
    inference_graph = extract_sub_graph(input_graph_def, output_node_names)
  File "/home/cdpdemo/tensorflow/reterival-based-model/tensorflow-old-version-for-this-use-case/local/lib/python2.7/site-packages/tensorflow/python/framework/graph_util.py", line 158, in extract_sub_graph
    assert d in name_to_node_map, "%s is not in graph" % d
AssertionError: prediction is not in graph

This error was because the output___graph__node is not scaler but a plural or an array variable. Basically it contains nodes or operations from the tensorflow graph that we want to export to the "pb" file.

For example, if I list all the operations in a tensorflow graph using below code,

for op in graph.get_operations():
   print(op.name)

I get following nodes,

[u'global_step',
 u'global_step/Initializer/zeros',
 u'global_step/Assign',
 u'global_step/read',
 u'read_batch_features_train/file_name_queue/input',
 u'read_batch_features_train/file_name_queue/Size',
 u'read_batch_features_train/file_name_queue/Greater/y',
 u'read_batch_features_train/file_name_queue/Greater',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/Switch',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/switch_t',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/switch_f',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/pred_id',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/NoOp',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/control_dependency',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/Assert/data_0',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/Assert/Switch',
 u'read_batch_features_train/file_name_queue/Assert/AssertGuard/Assert',
...
...
 u'prediction/logistic_loss/add/x',
 u'prediction/logistic_loss/add',
 u'prediction/logistic_loss/Log',
 u'prediction/logistic_loss',
 u'Const_3',
 u'mean_loss']

Among these nodes, i have to decide which one to export to a "pb" file. For now I am exporting everything.

After successfully exporting i get below message,

Converted 7 variables to const ops.
169 ops in the final graph.

Once I have exported every thing into the "pb" file. I rename and move files as following,

 model.ckpt-22001.meta --> export.meta
 model.ckpt-22001-00000-of-00001 --> export-00000-of-00001
 export.pb as it is

# Now moving files to a directory structure as follows,
# <base dir>/runs/<version>/

  mv export.meta export-00000-of-00001 export.pb /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/1/

After this to test whether i am able to import this model to tensorflow serving, i use below command,

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=export --model_base_path=/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/

Below is the output from above command,

2017-05-19 04:31:49.719962: I tensorflow_serving/model_servers/main.cc:155] Building single TensorFlow model file config:  model_name: export model_base_path: /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/ model_version_policy: 0
2017-05-19 04:31:49.720457: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-05-19 04:31:49.720486: I tensorflow_serving/model_servers/server_core.cc:421]  (Re-)adding model: export
2017-05-19 04:31:49.824347: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: export version: 1}
2017-05-19 04:31:49.824393: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: export version: 1}
2017-05-19 04:31:49.824422: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: export version: 1}
2017-05-19 04:31:49.824513: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:366] Attempting to up-convert SessionBundle to SavedModelBundle in bundle-shim from: /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/1
2017-05-19 04:31:49.824559: I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:161] Attempting to load a SessionBundle from: /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/1
2017-05-19 04:31:49.824643: I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:162] Using RunOptions:
2017-05-19 04:31:49.843385: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-19 04:31:49.843444: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-19 04:31:49.843458: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-05-19 04:31:49.843464: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-05-19 04:31:49.843473: W external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-05-19 04:31:49.970640: I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:135] Running restore op for SessionBundle: save/restore_all, save/Const:0
2017-05-19 04:31:50.219595: I external/org_tensorflow/tensorflow/contrib/session_bundle/session_bundle.cc:244] Loading SessionBundle: success. Took 395024 microseconds.
2017-05-19 04:31:50.320556: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: export version: 1}
2017-05-19 04:31:50.383813: I tensorflow_serving/model_servers/main.cc:298] Running ModelServer at 0.0.0.0:9000 ...

If I had no "pb" file in the directory then running above command will generate following error,

2017-05-19 04:30:39.187463: I tensorflow_serving/model_servers/main.cc:155] Building single TensorFlow model file config:  model_name: export model_base_path: /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/1494400825/tf_serving/ model_version_policy: 0
2017-05-19 04:30:39.187827: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-05-19 04:30:39.187850: I tensorflow_serving/model_servers/server_core.cc:421]  (Re-)adding model: export
2017-05-19 04:30:39.288368: I tensorflow_serving/core/basic_manager.cc:698] Successfully reserved resources to load servable {name: export version: 1}
2017-05-19 04:30:39.288421: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: export version: 1}
2017-05-19 04:30:39.288435: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: export version: 1}
2017-05-19 04:30:39.288567: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: export version: 1} failed: Not found: Session bundle or SavedModel bundle not found at specified export location

If I used completely wrong path then i will get below error,

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=export --model_base_path=/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/temp/
2017-05-19 04:27:42.683155: I tensorflow_serving/model_servers/main.cc:155] Building single TensorFlow model file config:  model_name: export model_base_path: /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/temp/ model_version_policy: 0
2017-05-19 04:27:42.683973: I tensorflow_serving/model_servers/server_core.cc:375] Adding/updating models.
2017-05-19 04:27:42.684148: I tensorflow_serving/model_servers/server_core.cc:421]  (Re-)adding model: export
2017-05-19 04:27:42.695901: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:306] FileSystemStoragePathSource encountered a file-system access error: Could not find base path /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/temp/ for servable export
2017-05-19 04:27:43.695941: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:306] FileSystemStoragePathSource encountered a file-system access error: Could not find base path /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/temp/ for servable export
2017-05-19 04:27:44.696045: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:306] FileSystemStoragePathSource encountered a file-system access error: Could not find base path /home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/runs/temp/ for servable export

/home/cdpdemo/tensorflow/reterival-based-model/chatbot-retrieval/temp

problem saving the freeze model because i didnt understand what part of the graph to save

Next loading the saved freezing model

https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc

https://github.com/tensorflow/tensorflow/issues/3628

http://www.nqbao.com/blog/2017/02/tensorflow-exporting-model-for-serving/

<TODO>

results for ""

    No results matching ""