我一直在研究如何在ARMv8系统上优化使用OpenCV。
通过谷歌搜索几个教程,我发现在从源代码构建OpenCV时,很多时候VFPV3或NEON选项都没有激活。
然后我被告知“通常情况下,GCC会处理与处理器匹配的扩展。ARMv7有不同的处理器版本,有些支持VFPV3和霓虹灯,因此有了这些标志。所有的ARMv8,比如Xavier AGX,都内置了这些扩展,所以当遇到这些扩展时,GCC足够聪明地使用它们/编译它们。”
这是否意味着在为ARMv8系统构建OpenCV时不需要指定VFPV3或NEON?默认情况下这些是活动的吗?
发布于 2021-05-06 21:57:51
根据ARM documentation - AArch64 Floating-point and NEON
Both floating-point and NEON are required in all standard ARMv8 implementations. However, implementations targeting specialized markets may support the following combinations:
No NEON or floating-point.
Full floating-point and SIMD support with exception trapping.
Full floating-point and SIMD support without exception trapping. 也就是说,如果你使用的Armv8-a实现是“标准的”,它应该支持完整的浮点和单指令多数据,如果你指定了-march=armv8-a+simd,编译器应该在所有情况下都使用它们。
看起来结果和gcc 10.2.0是一样的:
op.c:
double op( double value)
{
double v3 = v1 + v2 + value;
return v3;
}
/opt/arm/10/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -march=armv8 -S op.c
cat op.s
.arch armv8-a
.file "op.c"
.text
.section .rodata
.align 3
.type v1, %object
.size v1, 8
v1:
.word 0
.word 1072693248
.align 3
.type v2, %object
.size v2, 8
v2:
.word 0
.word 1073741824
.text
.align 2
.global op
.type op, %function
op:
.LFB0:
.cfi_startproc
sub sp, sp, #32
.cfi_def_cfa_offset 32
str d0, [sp, 8]
fmov d1, 1.0e+0
fmov d0, 2.0e+0
fadd d0, d1, d0
ldr d1, [sp, 8]
fadd d0, d1, d0
str d0, [sp, 24]
ldr d0, [sp, 24]
add sp, sp, 32
.cfi_def_cfa_offset 0
ret
.cfi_endproc
.LFE0:
.size op, .-op
.ident "GCC: (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.2.1 20201103"
.section .note.GNU-stack,"",@progbits
/opt/arm/10/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc -march=armv8-a+simd -S op.c
cat op.s
.arch armv8-a
.file "a.c"
.text
.section .rodata
.align 3
.type v1, %object
.size v1, 8
v1:
.word 0
.word 1072693248
.align 3
.type v2, %object
.size v2, 8
v2:
.word 0
.word 1073741824
.text
.align 2
.global op
.type op, %function
op:
.LFB0:
.cfi_startproc
sub sp, sp, #32
.cfi_def_cfa_offset 32
str d0, [sp, 8]
fmov d1, 1.0e+0
fmov d0, 2.0e+0
fadd d0, d1, d0
ldr d1, [sp, 8]
fadd d0, d1, d0
str d0, [sp, 24]
ldr d0, [sp, 24]
add sp, sp, 32
.cfi_def_cfa_offset 0
ret
.cfi_endproc
.LFE0:
.size op, .-op
.ident "GCC: (GNU Toolchain for the A-profile Architecture 10.2-2020.11 (arm-10.16)) 10.2.1 20201103"
.section .note.GNU-stack,"",@progbitshttps://stackoverflow.com/questions/67412826
复制相似问题