首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >有可能在火把中运行散点剂吗?

有可能在火把中运行散点剂吗?
EN

Stack Overflow用户
提问于 2022-02-06 19:58:33
回答 1查看 129关注 0票数 0

编辑:显然DGL已经在做了:https://github.com/dmlc/dgl/pull/3641

我有几种类型的嵌入,每一种都需要自己的线性投影。我可以使用类型为for的循环来解决这个问题:

代码语言:javascript
复制
emb_out = dict()
for ntype in ntypes:
    emb_out[ntype] = self.lin_layer[ntype](emb[ntype])

但理想情况下,我想做一些分散操作来并行运行它。类似于:

pytorch_scatter(lin_layers, embeddings, layer_map, reduce='matmul'),图层映射指示应该通过哪个层进行嵌入。如果我有两种类型的线性层,并且batch_size = 5,那么layer_map应该是1,0-1,1,0。

是否有可能像在散落中那样以一种高效的方式将for循环矢量化?请检查下面的最低限度的例子。

代码语言:javascript
复制
import torch
import random
import numpy as np 

seed = 42
torch.manual_seed(seed)
random.seed(seed)

def matmul_single_embtype(lin_layers, embeddings, layer_map):
   #run single linear layer over all embeddings, irrespective of type
   output_embeddings = torch.matmul(lin_layers[0], embeddings.T).T
   return output_embeddings

def matmul_for_loop(lin_layers, embeddings, layer_map):
   #let each embedding type have its own projection, looping over emb types
   output_embeddings = dict()
   for emb_type in np.unique(layer_map):
       output_embeddings[emb_type] = torch.matmul(lin_layers[emb_type], embeddings[layer_map == emb_type].T).T
   return output_embeddings

def matmul_scatter(lin_layers, embeddings, layer_map):
   #parallelize the for loop by creating a diagonal matrix of lin layers
   #this is very innefficient, because creates a copy of the layer for each embedding, instead of broadcasting
   mapped_lin_layers = [lin_layers[i] for i in layer_map]
   mapped_lin_layers = torch.block_diag(*mapped_lin_layers) #batch_size*inp_size x batch_size*output_size
   embeddings_stacked = embeddings.view(-1,1) #stack all embeddings to multiply the linear block
   output_embeddings = torch.matmul(mapped_lin_layers, embeddings_stacked).view(embeddings.shape)
   return output_embeddings

"""
GENERATE DATA
lin_layers:
   List of matrices of size n_layer x inp_size x output_size
embeddings:
   Matrix of size batch_size x inp_size
layer_map:
   Vector os size batch_size stating which embedding should go thorugh each layer
"""

emb_size = 32
batch_size = 500
emb_types = 20
layer_map = [random.choice(list(range(emb_types))) for i in range(batch_size)]

lin_layers = [torch.arange(emb_size*emb_size, dtype=torch.float32).view(emb_size,emb_size) for i in range(emb_types)]
embeddings = torch.arange(batch_size*emb_size, dtype=torch.float32).view(batch_size,emb_size)
grouped_emb = {i: embeddings[layer_map==i] for i in np.unique(layer_map)} #separate embeddings by embedding type

#Run experiments
%timeit matmul_scatter(lin_layers, embeddings, layer_map)
%timeit matmul_for_loop(lin_layers, embeddings, layer_map)
%timeit matmul_single_embtype(lin_layers, embeddings, layer_map)

>>>>>133 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>>>>1.64 ms ± 14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>>>>31.4 µs ± 805 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

相关堆栈溢出问题:离散矩阵运算的矢量化方法

火炬手相关问题:https://github.com/pytorch/pytorch/issues/31942

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-07 17:09:00

刚刚发现DGL已经在处理这个特性了:https://github.com/dmlc/dgl/pull/3641

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71011135

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档