文章/答案/技术大牛

发布

问无多GPU加速GAN
EN

Stack Overflow用户

提问于 2019-04-05 23:25:52

回答 1查看 858关注 0票数 0

我有一个代码库，在那里我试图复制GAN文件。我最近买了第二个gpu，我试图更新我的代码，以利用额外的硬件。我尝试了Tensorflow cifar10多gpu实例中概述的方法。但是，当我用2个gpu运行我的代码时，它的运行速度并不快，事实上，它的运行速度比我使用单个gpu运行的速度慢了10%。看看资源管理器，它说我的两个gpus的容量都在50%左右。

我在Windows 10上运行，使用python3.7，TF 1.13。我用的是带有2950 cpu的2台2080秒。

我的第一个想法是我的输入管道有问题，所以我尝试了许多不同的方法，比如使用多个数据迭代器、使用tf.data.experimental.prefetch_to_device()、没有在我的潜在向量中喂食等等。没有任何影响，而且由于我的CPU利用率在5%左右，所以我很确定我没有瓶颈。

我也尝试过一些不同的方法来设置高塔的可变范围，但这并没有帮助。

我还尝试翻倍的批处理大小，以防我只是没有把足够的数据通过gpu，但这导致了2倍的时间来计算每批，以相同的50%的gpu利用率。

我的代码是这里，相关部分是：

        d_grads = []
        g_grads = []
        for i in range(FLAGS.num_gpus):
            with tf.device('/gpu:{:d}'.format(i)):
                with tf.variable_scope('D', reuse=tf.AUTO_REUSE):
                    Dx, Dx_logits = self.discriminator(xs[i], yxs[i])
                with tf.variable_scope('G', reuse=tf.AUTO_REUSE):
                    G = self.generator(z[i], labels[i])
                with tf.variable_scope('D', reuse=tf.AUTO_REUSE):
                    Dg, Dg_logits = self.discriminator(G, labels[i])

                loss_d, loss_g = self.losses(Dx_logits, Dg_logits, Dx, Dg)

                vars = tf.trainable_variables()
                for v in vars:
                    print(v.name)
                d_params = [v for v in vars if v.name.startswith('D/')]
                g_params = [v for v in vars if v.name.startswith('G/')]

                d_grads.append(d_adam.compute_gradients(loss_d, var_list=d_params))
                g_grads.append(g_adam.compute_gradients(loss_g, var_list=g_params))

        d_opt = d_adam.apply_gradients(average_gradients(d_grads))
        g_opt = g_adam.apply_gradients(average_gradients(g_grads))

tensorflow

回答 1

Stack Overflow用户

发布于 2019-04-06 00:42:12

在您的gan.py文件中，请参见第17行gpus设置为1。其次，检查这个链接以获得允许GPU内存增长。默认情况下，TensorFlow映射进程可见的所有GPU (受CUDA_VISIBLE_DEVICES约束)的几乎所有GPU内存。在某些情况下，进程只需要分配可用内存的子集，或者只根据进程的需要增长内存使用量。TensorFlow在会话上提供了两个Config选项来控制这一点。

第一个是allow_growth选项，它尝试根据运行时分配来分配最多的GPU内存:它开始分配很少的内存，并且随着Sessions的运行，需要更多的GPU内存。

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

第二个方法是per_process_gpu_memory_fraction选项，它确定每个可见GPU应该分配的内存总量的比例。例如，您可以通过以下方法告诉TensorFlow只分配每个GPU总内存的40%：

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

如果您想真正绑定TensorFlow进程可用的GPU内存量，这是非常有用的。

在多GPU系统上使用单个GPU

如果系统中有多个GPU，默认情况下将选择ID最低的GPU。如果希望在不同的GPU上运行，则需要显式地指定首选项：

#创建一个图.

with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

如果您指定的设备不存在，您将得到InvalidArgumentError。

InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
Could not satisfy explicit device specification '/device:GPU:2'
   [[{ {node b}} = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
   values: 1 2 3...>, _device="/device:GPU:2"]()]]

如果希望TensorFlow自动选择现有和受支持的设备来运行指定的操作，则可以在创建会话时在configuration选项中将allow_soft_placement设置为True。

#创建一个图.

with tf.device('/device:GPU:2'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
  c = tf.matmul(a, b)
# Creates a session with allow_soft_placement and log_device_placement set
# to True.
sess = tf.Session(config=tf.ConfigProto(
      allow_soft_placement=True, log_device_placement=True))
# Runs the op.
print(sess.run(c))

使用多个GPU的

如果您想在多个GPU上运行TensorFlow，您可以以多塔的方式构造模型，其中每个塔被分配给不同的GPU。例如：

#创建一个图.

c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
  with tf.device(d):
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
    b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
    c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
  sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55544489

复制

相似问题

问无多GPU加速GAN
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无多GPU加速GANEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无多GPU加速GAN
EN