首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >arma::Mat给出了GCC和NVCC的不同结果。

arma::Mat给出了GCC和NVCC的不同结果。
EN

Stack Overflow用户
提问于 2022-09-07 08:07:12
回答 2查看 78关注 0票数 2
代码语言:javascript
复制
#include <iostream>
#include <armadillo>
using namespace std;

int main()
{
    arma::Mat<float> a;
    cout << sizeof(a) << "\n";
    return 0;
}

当我将NVCC用于CUDA时,上面的代码给出了不同的结果。

代码语言:javascript
复制
$ g++ -o main test.cu.cpp -O3 -larmadillo
$ ./main
112
$ nvcc -o main test.cu.cpp -O3 -larmadillo
$ ./main
104

我希望使NVCC版本与GCC版本一样。

这种差异是从何而来的?我的项目需要单独编译不同的部分,由于遗留部分必须使用GCC,所以不可能将所有内容转换为NVCC。

编辑:这里是NVCC和GCC之间的编译日志,我不知道要找什么

代码语言:javascript
复制
[huyduc@ny5-dtlgpu06 test]$ nvcc -o main test.cu.cpp -O3 -larmadillo --verbose
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda-11.4/bin
#$ _THERE_=/usr/local/cuda-11.4/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda-11.4/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda-11.4/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda-11.4/bin/../lib:/local/export/scratch/pulse_packages/lib/:/local/export/scratch/pulse_packages/lib64/:/local/export/scratch/pulse_packages/usr/lib:/local/export/scratch/pulse_packages/usr/local/lib/:/local/export/scratch/pulse_packages/mkl/lib:/local/export/scratch/pulse_packages/mkl/mkl/lib/intel64:/local/export/scratch/pulse_packages/mods/libtorch/lib::/usr/local/cuda-11.4/lib64:/usr/lib/x86_64-linux-gnu
#$ PATH=/usr/local/cuda-11.4/bin/../nvvm/bin:/usr/local/cuda-11.4/bin:/local/export/scratch/pulse_packages/bin:/local/export/scratch/pulse_packages/usr/bin:/local/export/scratch/pulse_packages/mods/libtorch/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/home/huyduc/.local/bin:/home/huyduc/bin:/usr/local/cuda-11.4/bin
#$ INCLUDES="-I/usr/local/cuda-11.4/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -c -x c++ -D__NVCC__  -O3 "-I/usr/local/cuda-11.4/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=4 -D__CUDACC_VER_BUILD__=100 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=4 -m64 "test.cu.cpp" -o "/tmp/tmpxft_000140a2_00000000-5_test.cu.o" 
#$ nvlink --arch=sm_52 --register-link-binaries="/tmp/tmpxft_000140a2_00000000-3_main_dlink.reg.c"  -m64 -larmadillo   "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib" -cpu-arch=X86_64 "/tmp/tmpxft_000140a2_00000000-5_test.cu.o"  -lcudadevrt  -o "/tmp/tmpxft_000140a2_00000000-6_main_dlink.sm_52.cubin"
#$ fatbinary -64 -link "--image3=kind=elf,sm=52,file=/tmp/tmpxft_000140a2_00000000-6_main_dlink.sm_52.cubin" --embedded-fatbin="/tmp/tmpxft_000140a2_00000000-4_main_dlink.fatbin.c" 
#$ rm /tmp/tmpxft_000140a2_00000000-4_main_dlink.fatbin
#$ gcc -c -x c++ -DFATBINFILE="\"/tmp/tmpxft_000140a2_00000000-4_main_dlink.fatbin.c\"" -DREGISTERLINKBINARYFILE="\"/tmp/tmpxft_000140a2_00000000-3_main_dlink.reg.c\"" -I. -D__NV_EXTRA_INITIALIZATION= -D__NV_EXTRA_FINALIZATION= -D__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__  -O3 "-I/usr/local/cuda-11.4/bin/../targets/x86_64-linux/include"    -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=4 -D__CUDACC_VER_BUILD__=100 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=4 -m64 "/usr/local/cuda-11.4/bin/crt/link.stub" -o "/tmp/tmpxft_000140a2_00000000-7_main_dlink.o" 
#$ g++ -O3 -m64 -Wl,--start-group "/tmp/tmpxft_000140a2_00000000-7_main_dlink.o" "/tmp/tmpxft_000140a2_00000000-5_test.cu.o" -larmadillo   "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda-11.4/bin/../targets/x86_64-linux/lib"  -lcudadevrt  -lcudart_static  -lrt -lpthread  -ldl  -Wl,--end-group -o "main" 
代码语言:javascript
复制
[huyduc@ny5-dtlgpu06 test]$ g++ -o gccc test.cu.cpp -O3 -larmadillo --verbose
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-10.2.0/configure --prefix=/local/export/scratch/pulse_packages --libexecdir=/local/export/scratch/pulse_packages/lib --enable-shared --enable-threads=posix --enable-__cxa_atexit --disable-multilib --enable-bootstrap --enable-clocale=gnu --enable-languages=c,c++,fortran --with-zstd=no
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.2.0 (GCC) 
COLLECT_GCC_OPTIONS='-o' 'gccc' '-O3' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/cc1plus -quiet -v -D_GNU_SOURCE test.cu.cpp -quiet -dumpbase test.cu.cpp -mtune=generic -march=x86-64 -auxbase test.cu -O3 -version -o /tmp/cctgH2WC.s
GNU C++14 (GCC) version 10.2.0 (x86_64-pc-linux-gnu)
    compiled by GNU C version 10.2.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/local/export/scratch/pulse_packages/usr/include/"
ignoring nonexistent directory "/local/export/scratch/pulse_packages/usr/local/include"
ignoring nonexistent directory "/usr/include/x86_64-linux-gnu"
ignoring duplicate directory "/local/export/scratch/pulse_packages/mods/libtorch/include"
ignoring duplicate directory "/local/export/scratch/pulse_packages/include"
ignoring nonexistent directory "/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /local/export/scratch/pulse_packages/include/
 /local/export/scratch/pulse_packages/mods/libtorch/include
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/x86_64-pc-linux-gnu
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/backward
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include
 /usr/local/include
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/include-fixed
 /usr/include
End of search list.
GNU C++14 (GCC) version 10.2.0 (x86_64-pc-linux-gnu)
    compiled by GNU C version 10.2.0, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: fcab5cdad8fab5c0a9dfd14a10ab3fb4
COLLECT_GCC_OPTIONS='-o' 'gccc' '-O3' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/bin/as -v --64 -o /tmp/ccPwcVUC.o /tmp/cctgH2WC.s
GNU assembler version 2.36.1 (x86_64-pc-linux-gnu) using BFD version (GNU Binutils) 2.36.1
COMPILER_PATH=/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/bin/
LIBRARY_PATH=/local/export/scratch/pulse_packages/lib/../lib64/:/local/export/scratch/pulse_packages/lib64/../lib64/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64/:/lib/../lib64/:/usr/lib/../lib64/:/local/export/scratch/pulse_packages/lib/:/local/export/scratch/pulse_packages/lib64/:/local/export/scratch/pulse_packages/mkl/lib/:/local/export/scratch/pulse_packages/mkl/mkl/lib/intel64/:/local/export/scratch/pulse_packages/mods/libtorch/lib/:./:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/lib/:/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-o' 'gccc' '-O3' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
 /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/collect2 -plugin /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/liblto_plugin.so -plugin-opt=/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/lto-wrapper -plugin-opt=-fresolution=/tmp/ccd147SC.res -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lgcc --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o gccc /lib/../lib64/crt1.o /lib/../lib64/crti.o /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtbegin.o -L/local/export/scratch/pulse_packages/lib/../lib64 -L/local/export/scratch/pulse_packages/lib64/../lib64 -L/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0 -L/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/local/export/scratch/pulse_packages/lib -L/local/export/scratch/pulse_packages/lib64 -L/local/export/scratch/pulse_packages/mkl/lib -L/local/export/scratch/pulse_packages/mkl/mkl/lib/intel64 -L/local/export/scratch/pulse_packages/mods/libtorch/lib -L. -L/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../x86_64-pc-linux-gnu/lib -L/local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../.. /tmp/ccPwcVUC.o -larmadillo -lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc /local/export/scratch/pulse_packages/lib/gcc/x86_64-pc-linux-gnu/10.2.0/crtend.o /lib/../lib64/crtn.o
COLLECT_GCC_OPTIONS='-o' 'gccc' '-O3' '-v' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-09-07 15:39:43

我希望使NVCC版本的行为与GCC版本相同。

我不知道你用的是什么版本。我安装了10.8,因为这样做对我来说很容易。(我使用的是数据自动化系统11.4,与您相同。)当我运行您的测试用例时,我得到的大小在nvccg++之间确实是不同的,尽管它们并不完全匹配您的大小,但是我也可能从armadillo开发人员那里得到了有用的输出,您注意到了吗?

代码语言:javascript
复制
$ cat t2106.cpp
#include <iostream>
#ifdef USE_SUGGESTION
#define ARMA_ALLOW_FAKE_GCC
#endif
#include <armadillo>
using namespace std;

int main()
{
    arma::arma_version ver;
    cout << "ARMA version: "<< ver.as_string() << std::endl;

    arma::Mat<float> a;
    cout << sizeof(a) << "\n";
    return 0;
}
$ g++ t2106.cpp -o t2106g++ -larmadillo
$ ./t2106g++
ARMA version: 10.8.2 (Realm Raider)
128
$ nvcc t2106.cpp -o t2106nvcc -larmadillo
In file included from /usr/include/armadillo:68:0,
                 from t2106.cpp:5:
/usr/include/armadillo_bits/compiler_setup.hpp:151:106: note: #pragma message: WARNING: this compiler is pretending to be GCC but it may not be fully compatible;
 NG: this compiler is pretending to be GCC but it may not be fully compatible;"

/usr/include/armadillo_bits/compiler_setup.hpp:152:110: note: #pragma message: WARNING: to allow this compiler to use GCC features such as data alignment attributes,
 to allow this compiler to use GCC features such as data alignment attributes,"

/usr/include/armadillo_bits/compiler_setup.hpp:153:88: note: #pragma message: WARNING: #define ARMA_ALLOW_FAKE_GCC before #include <armadillo>
 ma message ("WARNING: #define ARMA_ALLOW_FAKE_GCC before #include <armadillo>"

$ ./t2106nvcc
ARMA version: 10.8.2 (Realm Raider)
112
$ nvcc t2106.cpp -o t2106nvcc -larmadillo -DUSE_SUGGESTION
$ ./t2106nvcc
ARMA version: 10.8.2 (Realm Raider)
128
$

因此,开发人员在10.8中已经指出的,允许在这两种情况下大小相等,这似乎解决了您所提供的测试用例中的问题。

我不知道这里的所有后果是什么,当然也没有做过任何详尽的测试。但这些大小似乎与代码中的定义相同。

不可能将所有内容转换为NVCC。

另一种选择是,您不必转换所有东西才能使用nvcc。在用g++编译的.c/.cpp文件中只包括,然后使用包装器函数将编译的文件与gcc / g++编译的文件连接起来。这样,您就可以完全避免这个问题,而不必使用特殊的#define

补充讨论:

相关的对象定义是这里。看看这段代码,再加上对警告消息的观察,我猜arma_aligned和/或arma_aligned_mem装饰器基于开发人员已经实现的编译器检测而表现得不同。但除此之外我还没研究过。

票数 3
EN

Stack Overflow用户

发布于 2022-09-08 04:10:23

我已经找到了一个解决方案,它适用于较早版本的Armadillo (当您无法更改遗留代码时非常有用)。

这个问题是由于usr/include/armadillo_bits/compiler_setup.hpp中的这些行,Arma7.500.2中的150->156号行(其他版本不同,您可以通过CTRL +F文本#define ARMA_GOOD_COMPILER来查看它附近)。

代码语言:javascript
复制
#if (defined(__GNUG__) || defined(__GNUC__)) && (defined(__clang__) || defined(__INTEL_COMPILER) || defined(__NVCC__) || defined(__CUDACC__) || defined(__PGI) || defined(__PATHSCALE__) || defined(__ARMCC_VERSION) || defined(__IBMCPP__))
  #undef  ARMA_FAKE_GCC
  #define ARMA_FAKE_GCC
#endif


#if defined(__GNUG__) && !defined(ARMA_FAKE_GCC)
... define arma_aligned and arma_align_mem here
#endif

新版本有一个标志#define ARMA_ALLOW_FAKE_GCC,禁止检查ARMA_FAKE_GCC,因此它与CUDA一起工作。但是由于旧版本没有它,所以您必须自己编辑/usr/include/armadillo_bits/compiler_setup.hpp来添加它(这条路径在PC机之间可能有所不同)

代码语言:javascript
复制
#if defined(__GNUG__) && (!defined(ARMA_FAKE_GCC) || defined(__NVCC__) || defined(__CUDACC__))
... define arma_aligned and arma_align_mem here
#endif

示例程序来测试下面的差异。在我的机器上,使用ARMA7.500.2,nvcc在更改前给出104个,在修改后给出112个(与g++相同)。而且,在更改之前,arma_alignedarma_align_mem是未定义的(空字符串),而在它们变成__attribute__((__aligned__))__attribute__((__aligned__(16)))之后(同样,与g++相同)。

代码语言:javascript
复制
#include <iostream>
#define ARMA_ALLOW_FAKE_GCC
#define ARMA_ALLOW_FAKE_CLANG
#include <armadillo>
#include <cstdint>
using namespace std;

#define STRINGIFY(x) #x
#define STRINGIFYMACRO(y) STRINGIFY(y)

int main()
{
    arma::arma_version ver;
        std::cout << "ARMA version: "<< ver.as_string() << std::endl;
    
    arma::Mat<float> a;
    cout << sizeof(a) << "\n";
    
    cout << STRINGIFYMACRO(arma_aligned) << "\n";
    cout << STRINGIFYMACRO(arma_align_mem) << "\n";    

    return 0;
}
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73632076

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档