文章/答案/技术大牛

发布

社区首页 >问答首页 >Tensorflow2.x 2.x多处理定制数据生成器

问Tensorflow2.x 2.x多处理定制数据生成器
EN

Stack Overflow用户

提问于 2020-10-14 15:33:36

回答 1查看 4K关注 0票数 2

我刚升级到tensorflow 2.3。我想做我自己的数据发生器进行培训。在tensorflow 1.x中，我这样做了：

def get_data_generator(test_flag):
  item_list = load_item_list(test_flag)
  print('data loaded')
  while True:
    X = []
    Y = []
    for _ in range(BATCH_SIZE):
      x, y = get_random_augmented_sample(item_list)
      X.append(x)
      Y.append(y)
    yield np.asarray(X), np.asarray(Y)

data_generator_train = get_data_generator(False)
data_generator_test = get_data_generator(True)
model.fit_generator(data_generator_train, validation_data=data_generator_test, 
                    epochs=10000, verbose=2,
                    use_multiprocessing=True,
                    workers=8,
                    validation_steps=100,
                    steps_per_epoch=500,
                    )

此代码在tensorflow 1.x中运行良好。在该系统中创建了8个进程。处理器和显卡装得很好。“数据加载”打印8次。

关于tensorflow 2.3，我收到了警告：

警告: tensorflow:多处理会与TensorFlow发生严重的交互，从而导致不确定的死锁。对于高性能的数据管道，建议使用tf.data。

“数据加载”只打印一次(应该打印8次)。GPU没有得到充分利用。它也有内存泄漏的每一个时代，所以培训将停止后，几个时代。use_multiprocessing标志没有帮助。

如何使tensorflow(keras) 2.x中的生成器/迭代器易于跨多个CPU进程并行化？死锁和数据顺序并不重要。

tf.keras

data-generation

custom-training

tensorflow

keras

回答 1

Stack Overflow用户

发布于 2020-10-18 22:36:21

使用tf.data管道，有几个可以并行化的点。根据数据存储和读取的方式，可以并行读取数据。您还可以并行化增强，并且可以在训练时预取数据，这样您的GPU (或其他硬件)就不会渴求数据。

在下面的代码中，我演示了如何并行化增强和添加预取。

import numpy as np
import tensorflow as tf

x_shape = (32, 32, 3)
y_shape = ()  # A single item (not array).
classes = 10

# This is tf.data.experimental.AUTOTUNE in older tensorflow.
AUTOTUNE = tf.data.AUTOTUNE

def generator_fn(n_samples):
    """Return a function that takes no arguments and returns a generator."""
    def generator():
        for i in range(n_samples):
            # Synthesize an image and a class label.
            x = np.random.random_sample(x_shape).astype(np.float32)
            y = np.random.randint(0, classes, size=y_shape, dtype=np.int32)
            yield x, y
    return generator

def augment(x, y):
    return x * tf.random.normal(shape=x_shape), y

samples = 10
batch_size = 5
epochs = 2

# Create dataset.
gen = generator_fn(n_samples=samples)
dataset = tf.data.Dataset.from_generator(
    generator=gen, 
    output_types=(np.float32, np.int32), 
    output_shapes=(x_shape, y_shape)
)
# Parallelize the augmentation.
dataset = dataset.map(
    augment, 
    num_parallel_calls=AUTOTUNE,
    # Order does not matter.
    deterministic=False
)
dataset = dataset.batch(batch_size, drop_remainder=True)
# Prefetch some batches.
dataset = dataset.prefetch(AUTOTUNE)

# Prepare model.
model = tf.keras.applications.VGG16(weights=None, input_shape=x_shape, classes=classes)
model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")

# Train. Do not specify batch size because the dataset takes care of that.
model.fit(dataset, epochs=epochs)

票数 5

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64356769

复制

相似问题

问Tensorflow2.x 2.x多处理定制数据生成器
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow2.x 2.x多处理定制数据生成器EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Tensorflow2.x 2.x多处理定制数据生成器
EN