我试图在多个不同的“专家”层之间交换,基于“门控”层的输出(作为专家的混合体)。我创建了一个自定义层,它接收专家和门控层的输出,但这最终会丢弃一些输出,而不是一开始就不计算它们。
如何使模型“短路”只对选通层和所选专家层进行评估,以节省计算时间?
我正在使用tensorflow 2.0 gpu和keras函数api
发布于 2019-12-24 23:03:46
Keras模型可以完全动态地实现,以支持您提到的高效路由。下面的示例演示了可以这样做的一种方法。该示例使用以下前提编写:
它假设有两个专家( LayerB)
LayerA),一个混合专家模型(MixOfExpertsModel)根据Keras稠密层
注意代码中的注释,看看切换是如何完成的。
import numpy as np
import tensorflow as tf
# This is your Expert A class.
class LayerA(tf.keras.layers.Layer):
def build(self, input_shape):
self.weight = self.add_weight("weight_a", shape=input_shape[1:])
@tf.function
def call(self, x):
return x + self.weight
# This is your Expert B class.
class LayerB(tf.keras.layers.Layer):
def build(self, input_shape):
self.weight = self.add_weight("weight_b", shape=input_shape[1:])
@tf.function
def call(self, x):
return x * self.weight
class MixOfExpertsModel(tf.keras.models.Model):
def __init__(self):
super(MixOfExpertsModel, self).__init__()
self._expert_a = LayerA()
self._expert_b = LayerB()
self._gating_layer = tf.keras.layers.Dense(1, activation="sigmoid")
@tf.function
def call(self, x):
z = self._gating_layer(x)
# The switching logic:
# - examples with gating output <= 0.5 are routed to expert A
# - examples with gating output > 0.5 are routed to expert B.
mask_a = tf.squeeze(tf.less_equal(z, 0.5), axis=-1)
mask_b = tf.squeeze(tf.greater(z, 0.5), axis=-1)
# `input_a` is a subset of slices of the original input (`x`).
# So is `input_b`. As such, no compute is wasted.
input_a = tf.boolean_mask(x, mask_a, axis=0)
input_b = tf.boolean_mask(x, mask_b, axis=0)
if tf.size(input_a) > 0:
output_a = self._expert_a(input_a)
else:
output_a = tf.zeros_like(input_a)
if tf.size(input_b) > 0:
output_b = self._expert_b(input_b)
else:
output_b = tf.zeros_like(input_b)
# Return `mask_a`, and `mask_b`, so that the caller can know
# which example is routed to which expert and whether its output
# appears in `output_a` or `output_b`. # This is necessary
# for writing a (custom) loss function for this class.
return output_a, output_b, mask_a, mask_b
# Create an intance of the mix-of-experts model.
mix_of_experts_model = MixOfExpertsModel()
# Generate some dummy data.
num_examples = 32
xs = np.random.random([num_examples, 8]).astype(np.float32)
# Call the model.
print(mix_of_experts_model(xs))我没有写一个自定义的损失函数来支持这个班的培训。但是,通过使用MixOfExpertsModel.call()的返回值,即输出和掩码,这是可行的。
https://stackoverflow.com/questions/59460638
复制相似问题