我正在运行关于煤角样例知识蒸馏的Keras示例,我的问题是:我可以用来进行预测的结果压缩模型是蒸馏器还是学生模型?在这种情况下,如何添加softmax分类层并使用结果模型运行预测?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
batch_size = 64
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255.0
x_train = np.reshape(x_train, (-1, 28, 28, 1))
x_test = x_test.astype("float32") / 255.0
x_test = np.reshape(x_test, (-1, 28, 28, 1))
teacher = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(256, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
layers.Conv2D(512, (3, 3), strides=(2, 2), padding="same"),
layers.Flatten(),
layers.Dense(10),
],
name="teacher",
)
# Create the student
student = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(16, (3, 3), strides=(2, 2), padding="same"),
layers.LeakyReLU(alpha=0.2),
layers.MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding="same"),
layers.Conv2D(32, (3, 3), strides=(2, 2), padding="same"),
layers.Flatten(),
layers.Dense(10),
],
name="student",
)
teacher.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
teacher.fit(x_train, y_train, epochs=5)
teacher.evaluate(x_test, y_test)
distiller = Distiller(student=student, teacher=teacher)
distiller.compile(
optimizer=keras.optimizers.Adam(),
metrics=[keras.metrics.SparseCategoricalAccuracy()],
student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
distillation_loss_fn=keras.losses.KLDivergence(),
alpha=0.1,
temperature=10,
)
# Distill teacher to student
distiller.fit(x_train, y_train, epochs=3)
# Evaluate student on test dataset
distiller.evaluate(x_test, y_test)尽管能够运行这个例子,但我不认为这些信息对我来说是清楚的,我想在看不见的数据上测试模型,因此我想知道,如何从知识蒸馏建立一个模型并进行预测并检查其分类报告?
发布于 2022-09-04 20:19:16
“压缩”模式就是学生模式。Distiller只是用来训练学生尝试和模仿老师的包装,而不是训练学生尝试和估计真实的标签。
您所链接的页面中有一节将蒸馏结果与等效的轻量级学生体系结构进行比较,并根据实际标签进行“从头开始”培训,因此,从教程中可以看出预测是相当直接的。
请注意,教师和学生在其末尾都只有一个dense层,因此培训假定损失应该通过将模型输出视为logits来计算。因此,教师和学生的输出都需要一个简单的tf.nn.softmax来获得标准的分类分数。
如果需要的话,别忘了重新调整软最高温度。
https://stackoverflow.com/questions/73602436
复制相似问题