文章/答案/技术大牛

发布

社区首页 >问答首页 >在您自己的数据集上微调SOTA视频模型-手语

问在您自己的数据集上微调SOTA视频模型-手语
EN

Stack Overflow用户

提问于 2021-04-20 19:04:01

回答 1查看 154关注 0票数 1

我正在尝试使用gluoncv实现一个符号分类器，作为我最后一年的大学项目的一部分。

数据集：http://facundoq.github.io/datasets/lsa64/

我在您自己的数据集教程中跟踪了微调SOTA视频模型，并进行了微调。教程：custom.html

i3d_resnet50_v1_custom 精度图I3D
slowfast_4x16_resnet50_custom 精度图慢快

图表显示了几乎90%的准确性，但当我运行我的推论，我得到的分类，甚至在我过去训练的视频。

所以我被卡住了，能不能给你一些指导，给什么都会帮满忙。

谢谢

我的数据加载器用于I3D:

num_gpus = 1
ctx = [mx.gpu(i) for i in range(num_gpus)]
transform_train = video.VideoGroupTrainTransform(size=(224, 224), scale_ratios=[1.0, 0.8], mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
per_device_batch_size = 5
num_workers = 0
batch_size = per_device_batch_size * num_gpus

train_dataset = VideoClsCustom(root=os.path.expanduser('DataSet/train/'),
                               setting=os.path.expanduser('DataSet/train/train.txt'),
                               train=True,
                               new_length=64,
                               new_step=2,
                               video_loader=True,
                               use_decord=True,
                               transform=transform_train)

print('Load %d training samples.' % len(train_dataset))
train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size,
                                   shuffle=True, num_workers=num_workers)

推理运行：

from gluoncv.utils.filesystem import try_import_decord
decord = try_import_decord()

video_fname = 'DataSet/test/006_001_001.mp4'
vr = decord.VideoReader(video_fname)
frame_id_list = range(0, 64, 2)
video_data = vr.get_batch(frame_id_list).asnumpy()
clip_input = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)]

transform_fn = video.VideoGroupValTransform(size=(224, 224), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
clip_input = transform_fn(clip_input)
clip_input = np.stack(clip_input, axis=0)
clip_input = clip_input.reshape((-1,) + (32, 3, 224, 224))
clip_input = np.transpose(clip_input, (0, 2, 1, 3, 4))
print('Video data is readed and preprocessed.')

# Running the prediction
pred = net(nd.array(clip_input,  ctx = mx.gpu(0)))
topK = 5
ind = nd.topk(pred, k=topK)[0].astype('int')
print('The input video clip is classified to be')
for i in range(topK):
    print('\t[%s], with probability %.3f.'%
          (CLASS_MAP[ind[i].asscalar()], nd.softmax(pred)[0][ind[i]].asscalar()))

video

mxnet

mxnet-gluon

python

machine-learning

回答 1

Stack Overflow用户

发布于 2021-04-27 18:30:39

我发现了我的错误，这是因为增强较少，所以我改变了对训练数据加载器和推理的转换，如下所示，它现在正常工作。

transform_train = transforms.Compose([
    # Fix the input video frames size as 256×340 and randomly sample the cropping width and height from
    # {256,224,192,168}. After that, resize the cropped regions to 224 × 224.
    video.VideoMultiScaleCrop(size=(224, 224), scale_ratios=[1.0, 0.875, 0.75, 0.66]),
    # Randomly flip the video frames horizontally
    video.VideoRandomHorizontalFlip(),
    # Transpose the video frames from height*width*num_channels to num_channels*height*width
    # and map values from [0, 255] to [0,1]
    video.VideoToTensor(),
    # Normalize the video frames with mean and standard deviation calculated across all images
    video.VideoNormalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67184869

复制

相似问题

问在您自己的数据集上微调SOTA视频模型-手语
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在您自己的数据集上微调SOTA视频模型-手语EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在您自己的数据集上微调SOTA视频模型-手语
EN