解決模型載入NotFoundError (see above for traceback) Key v1 not found in checkp錯誤
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key v1 not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
出現這樣的問題,大多是在使用時,checkpoint檔案中的變數名和呼叫的檔名不匹配造成的。解決方法就是檢視checkpoint檔案中的變數名,將程式呼叫變數名修改為checkpoint檔案中的變數名即可解決問題。下邊具體講如何檢視checkpoint檔案中的變數名、修改程式呼叫變數名
下邊例子是《TensorFlow實戰Google深度學習框架》中模型持久化的例子,同時也解決書中ch5 重新命名載入的問題:
模型儲存的程式碼為:
#!/usr/bin/env python # -*- coding:utf-8 -*- import tensorflow as tf # 儲存計算兩個變數和的模型 v1 = tf.Variable(tf.random_normal([1], stddev=1, seed=1)) v2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1)) result = v1 + v2 init_op = tf.global_variables_initializer() saver = tf.train.Saver() with tf.Session() as sess: sess.run(init_op) saver.save(sess, "Saved_model/model.ckpt")
模型載入的程式碼為(模型全部載入):
#!/usr/bin/env python # -*- coding:utf-8 -*- import tensorflow as tf # 儲存計算兩個變數和的模型 v1 = tf.Variable(tf.random_normal([1], stddev=1, seed=1)) v2 = tf.Variable(tf.random_normal([1], stddev=1, seed=1)) result = v1 + v2 saver = tf.train.Saver() # 載入儲存的模型,載入全部模型 with tf.Session() as sess: saver.restore(sess, "Saved_model/model.ckpt") print(sess.run(result))
這段程式碼並不會出現問題,正常執行。
執行結果為:
模型載入(重新命名變數) 程式碼:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# tf.reset_default_graph()
# 宣告變數
V1 = tf.Variable(tf.constant(1.0, shape=[1]), name="a1")
V2 = tf.Variable(tf.constant(2.0, shape=[1]), name="a2")
result = V1 + V2
saver = tf.train.Saver({"v1": V1, "v2": V2})
# 載入儲存的模型,載入全部模型
with tf.Session() as sess:
saver.restore(sess, "Saved_model/model.ckpt")
print(sess.run(result))
執行這段程式碼時,會出現下述錯誤:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key v1 not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
出現這樣的問題是程式碼中:saver = tf.train.Saver({"v1": V1, "v2": V2})指定的變數名“v1”、“v2”與checkpoint檔案中的變數名名稱不符合。
執行下邊程式碼,檢視checkpoint檔案中的變數名(具體請參考博文TensorFlow中檢視checkpoint檔案中的變數名和對應值):
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import os
from tensorflow.python import pywrap_tensorflow
model_dir = "Saved_model"
checkpoint_path = os.path.join(model_dir, "model.ckpt")
reader = pywrap_tensorflow.NewCheckpointReader(checkpoint_path)
var_to_shape_map = reader.get_variable_to_shape_map()
for key in var_to_shape_map:
print("tensor_name: ", key, end=' ')
print(reader.get_tensor(key))
執行結果為:
由執行結果可以看出,checkpoint檔案的變數名是Variable和Variable_1,並不是v1和v2,所以將上述載入模型(重新命名變數) 中saver = tf.train.Saver({"v1": V1, "v2": V2})的v1和v2分別改為Variable和Variable_1即可解決錯誤。
修改後程式碼為:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import tensorflow as tf
# tf.reset_default_graph()
# 宣告變數
V1 = tf.Variable(tf.constant(1.0, shape=[1]), name="a1")
V2 = tf.Variable(tf.constant(2.0, shape=[1]), name="a2")
result = V1 + V2
saver = tf.train.Saver({"Variable": V1, "Variable_1": V2})
# 載入儲存的模型,載入全部模型
with tf.Session() as sess:
saver.restore(sess, "Saved_model/model.ckpt")
print(sess.run(result))
執行結果為: