文章/答案/技术大牛

发布

社区首页 >问答首页 >GATK: HaplotypceCaller IntelPairHmm只检测到一个线程

问GATK: HaplotypceCaller IntelPairHmm只检测到一个线程
EN

Stack Overflow用户

提问于 2022-02-09 17:17:48

回答 1查看 201关注 0票数 0

我似乎无法让GATK识别可用线程的数量。我正在conda环境中运行GATK (4.2.4.1)，这是我正在编写的nextflow (v20.10.0)管道的一部分。无论出于什么原因，我都无法让GATK看到不止一个线程。我尝试过不同的节点类型，增加和减少可用cpus的数量，使用taskset提供java参数(如taskset)，但它总是只检测到1。

下面是来自.command.sh的命令

gatk HaplotypeCaller \
  --tmp-dir tmp/ \
  -ERC GVCF \
  -R VectorBase-54_AgambiaePEST_Genome.fasta \
  -I AE12A_S24_BP.bam \
  -O AE12A_S24_BP.vcf

下面是.command.log文件的顶部：

12:10:00.695 INFO  HaplotypeCaller - ------------------------------------------------------------
12:10:00.695 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.2.4.1
12:10:00.695 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
12:10:00.696 INFO  HaplotypeCaller - Executing on Linux v4.18.0-193.6.3.el8_2.x86_64 amd64
12:10:00.696 INFO  HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v11.0.13+7-b1751.21
12:10:00.696 INFO  HaplotypeCaller - Start Date/Time: 9 February 2022 at 12:10:00 GMT
12:10:00.696 INFO  HaplotypeCaller - ------------------------------------------------------------
12:10:00.696 INFO  HaplotypeCaller - ------------------------------------------------------------
12:10:00.697 INFO  HaplotypeCaller - HTSJDK Version: 2.24.1
12:10:00.697 INFO  HaplotypeCaller - Picard Version: 2.25.4
12:10:00.697 INFO  HaplotypeCaller - Built for Spark Version: 2.4.5
12:10:00.697 INFO  HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
12:10:00.697 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
12:10:00.697 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
12:10:00.697 INFO  HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
12:10:00.697 INFO  HaplotypeCaller - Deflater: IntelDeflater
12:10:00.697 INFO  HaplotypeCaller - Inflater: IntelInflater
12:10:00.697 INFO  HaplotypeCaller - GCS max retries/reopens: 20
12:10:00.698 INFO  HaplotypeCaller - Requester pays: disabled
12:10:00.698 INFO  HaplotypeCaller - Initializing engine
12:10:01.126 INFO  HaplotypeCaller - Done initializing engine
12:10:01.129 INFO  HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
12:10:01.143 INFO  HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
12:10:01.143 INFO  HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
12:10:01.162 INFO  NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/home/anaconda3/envs/NF_GATK/share/gatk4-4.2.4.1-0/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_utils.so
12:10:01.169 INFO  NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/home/anaconda3/envs/NF_GATK/share/gatk4-4.2.4.1-0/gatk-package-4.2.4.1-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
12:10:01.209 INFO  IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
12:10:01.210 INFO  IntelPairHmm - Available threads: 1
12:10:01.210 INFO  IntelPairHmm - Requested threads: 4
12:10:01.210 WARN  IntelPairHmm - Using 1 available threads, but 4 were requested
12:10:01.210 INFO  PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
12:10:01.271 INFO  ProgressMeter - Starting traversal

我在博大学院的网站上发现了一个线程，暗示它可能是OMP库，但这似乎是加载的，我正在使用他们建议更新的版本.

不用说，这有点慢。我总是可以通过使用-L选项来并行化，但是这并不能解决这个问题，在流水线中的每一步都会非常慢。

提前谢谢。

gatk

java

multithreading

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-02-15 17:02:30

如果其他人有同样的问题，我不得不将提交配置为MPI作业。

因此，在我使用的HPC上，下面是nextflow过程：

process DNA_HCG {
  errorStrategy { sleep(Math.pow(2, task.attempt) * 600 as long); return 'retry' }
  maxRetries 3
  maxForks params.HCG_Forks

  tag { SampleID+"-"+chrom }

  executor = 'pbspro'
  clusterOptions = "-lselect=1:ncpus=${params.HCG_threads}:mem=${params.HCG_memory}gb:mpiprocs=1:ompthreads=${params.HCG_threads} -lwalltime=${params.HCG_walltime}:00:00"

  publishDir(
    path: "${params.HCDir}",
    mode: 'copy',
  )

  input:
  each chrom from chromosomes_ch
  set SampleID, path(bam), path(bai) from processed_bams
  path ref_genome
  path ref_dict
  path ref_index

  output:
  tuple chrom, path("${SampleID}_${chrom}.vcf") into HCG_ch
  path("${SampleID}_${chrom}.vcf.idx") into idx_ch
  
  beforeScript 'module load anaconda3/personal; source activate NF_GATK'

  script:
  """
  mkdir tmp
  n_slots=`expr ${params.GVCF_threads} / 2 - 3`
  if [ \$n_slots -le 0 ]; then n_slots=1; fi
  taskset -c 0-\${n_slots} gatk --java-options \"-Xmx${params.HCG_memory}G -XX:+UseParallelGC -XX:ParallelGCThreads=\${n_slots}\" HaplotypeCaller \\
    --tmp-dir tmp/ \\
    --pair-hmm-implementation AVX_LOGLESS_CACHING_OMP \\
    --native-pair-hmm-threads \${n_slots} \\
    -ERC GVCF \\
    -L ${chrom} \\
    -R ${ref_genome} \\
    -I ${bam} \\
    -O ${SampleID}_${chrom}.vcf ${params.GVCF_args}
  """
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71053941

复制

相似问题

问GATK: HaplotypceCaller IntelPairHmm只检测到一个线程
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GATK: HaplotypceCaller IntelPairHmm只检测到一个线程EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问GATK: HaplotypceCaller IntelPairHmm只检测到一个线程
EN