为了快速总结这个问题,我需要在PyTorch码头容器之间传输图像(大小为(1920,1200,3)并处理它们。容器位于同一个系统中。速度是非常重要的,传输的方式不应超过2-3毫秒。两个容器将通过IPC共享,因此我发现使用缓冲区通过共享内存传输NumPy数组没有问题(例如https://docs.python.org/3/library/multiprocessing.shared_memory.html)。我很好奇,在GPU上分配的PyTorch张量是否有类似的方法?
据我所知,CUDA张量已经在共享记忆中了。我试着通过套接字传输它们和Py火炬张量存储对象,但是它需要50到60 is左右的单程,这太慢了。为了测试目的,我只是在不同的终端上运行两个程序。
容器1代码:
import torch
import zmq
def main():
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://0.0.0.0:6000')
x = torch.randn((1, 1920, 1200, 3), device='cuda')
storage = x.storage()
while True:
sock.send_pyobj(storage)
sock.recv()
if __name__ == "__main__":
main()容器2代码:
import torch
import zmq
import time
def main():
ctx = zmq.Context()
sock = ctx.socket(zmq.REP)
sock.bind('tcp://*:6000')
for i in range(10):
before = time.time()
storage = sock.recv_pyobj()
tensor = torch.tensor((), device=storage.device)
tensor.set_(storage)
after = time.time()
print(after - before)
sock.send_string('')
if __name__ == "__main__":
main()编辑:
4年前我发现了一个类似的话题。在这里,person使用share_cuda()函数从存储中提取其他信息,该函数给出了cudaIpcMemHandle_t。
是否有一种使用cudaIpcMemHandle_t或使用Pytoch函数从share_cuda()函数中提取的信息重构存储/张量的方法?还是有更好的方法来达到同样的效果?
发布于 2022-07-26 16:51:47
我在torch.multiprocessing.reductions中找到了一个函数,它从_share_cuda_()生成的输出中重建张量。现在,我的代码如下所示:
容器1代码:
import torch
import zmq
def main():
ctx = zmq.Context()
sock = ctx.socket(zmq.REQ)
sock.connect('tcp://0.0.0.0:6000')
image = torch.randn((1, 1920, 1200, 3), dtype=torch.float, device='cuda:0')
storage = image.storage()
(storage_device, storage_handle, storage_size_bytes, storage_offset_bytes,
ref_counter_handle, ref_counter_offset, event_handle, event_sync_required) = storage._share_cuda_()
while True:
sock.send_pyobj({
"dtype": image.dtype,
"tensor_size": (1920, 1200, 3),
"tensor_stride": image.stride(),
"tensor_offset": image.storage_offset(), # !Not sure about this one.
"storage_cls": type(storage),
"storage_device": storage_device,
"storage_handle": storage_handle,
"storage_size_bytes": storage_size_bytes,
"storage_offset_bytes":storage_offset_bytes,
"requires_grad": False,
"ref_counter_handle": ref_counter_handle,
"ref_counter_offset": ref_counter_offset,
"event_handle": event_handle,
"event_sync_required": event_sync_required,
})
sock.recv_string()
if __name__ == "__main__":
main()容器2代码:
import torch
import zmq
import time
from torch.multiprocessing.reductions import rebuild_cuda_tensor
def main():
ctx = zmq.Context()
sock = ctx.socket(zmq.REP)
sock.bind('tcp://*:6000')
for i in range(10):
before = time.time()
cuda_tensor_info = sock.recv_pyobj()
rebuilt_tensor = rebuild_cuda_tensor(torch.Tensor, **cuda_tensor_info)
after = time.time()
print(after - before)
sock.send_string('')
if __name__ == "__main__":
main()https://stackoverflow.com/questions/73024975
复制相似问题