首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PowerShell,使用启动作业和启动过程测试异步任务的性能/效率。

PowerShell,使用启动作业和启动过程测试异步任务的性能/效率。
EN

Stack Overflow用户
提问于 2022-10-08 13:10:47
回答 2查看 111关注 0票数 2

我很好奇在PowerShell中用Start-ThreadJobStart-JobStart-Process测试异步任务的性能/有用性。我有一个文件夹,里面有大约100个压缩文件,所以我想出了下面的测试:

代码语言:javascript
复制
New-Item "000" -ItemType Directory -Force   # Move the old zip files in here
foreach ($i in $zipfiles) {
    $name = $i -split ".zip"
    Start-Job -scriptblock {
        7z.exe x -o"$name" .\$name
        Move-Item $i 000\ -Force
        7z.exe a $i .\$name\*.*
    }
}

这方面的问题是,它将为所有100个zip启动作业,这可能太过了,所以我想设置一个值$numjobs,比如我可以更改的值5,这样只有$numjobs才能同时启动,然后脚本将检查在下一个5块开始之前结束的所有5个作业。然后我想根据$numjobs的值来观察CPU和内存。

我如何告诉一个循环只运行5次,然后等待乔布斯完成,然后继续?

我知道等待工作完成很容易

代码语言:javascript
复制
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange

但是我该如何等待Start-Process任务的结束呢?

虽然我想使用Parallel-ForEach,但我工作的企业在未来3-4年内将与PowerShell 5.1紧密联系在一起,我希望没有机会安装PowerShell 7.x (尽管我很想在我的家庭系统上使用Parallel-ForEach进行测试,比较所有的方法)。

EN

回答 2

Stack Overflow用户

发布于 2022-10-08 17:51:01

添加到Santiago Squarzon's helpful answer

下面是帮助函数Measure-Parallel,它允许您比较以下并行处理方法的速度:

代码语言:javascript
复制
- Child-process-based: creates a child PowerShell process behind the scenes, which makes this approach both slow and resource-intensive.

  • Start-ThreadJob -附带PowerShell (核心) (v6+);可通过Windows PowerShell V5.1中的Install-Module ThreadJob安装:

代码语言:javascript
复制
- Thread-based: Much lighter-weight than `Start-Job` while providing the same functionality; additionally avoids potential loss of type fidelity due to cross-process serialization / deserialization.

中提供

代码语言:javascript
复制
- Thread-based: In essence a simplified wrapper around `Start-ThreadJob` with **support for direct pipeline input and direct output**, with invariably synchronous overall execution (all launched threads are waited for).

代码语言:javascript
复制
- Child-process-based: Invokes an _external program_ asynchronously by default, on Windows in a _new window_ by default.
- Note that **this approach only makes sense if your parallel tasks** _**only**_ **consist of a** _**single**_ **call to an** _**external program**_, as opposed to needing to execute _a block of PowerShell code_.
- Notably, **the only way to** _**capture output**_ **with this approach is by** _**redirection to a file**_, invariably as _text only_.

注意:

鉴于下面的测试包含了对外部可执行文件的单个调用(例如,在您的示例中是),那么Start-Process方法的性能将是最好的,因为它没有作业管理的开销。但是,如上所述,这种方法具有基本的limitations.

由于

  • 的复杂性,圣地亚哥的答案中基于运行空间池的方法没有包括在内;如果Start-ThreadJobForEach-Object -Parallel对您可用,您就不需要使用这种方法了。

示例Measure-Parallelism调用,它对比了这些方法的运行时性能:

代码语言:javascript
复制
# Run 20 jobs / processes in parallel, 5 at a time, comparing
# all approaches.
# Note: Omit the -Approach argument to enter interactive mode.
Measure-Parallel -Approach All -BatchSize 5 -JobCount 20

运行macOS 7.2.6的PowerShell机器的示例输出(时间因许多因素而异,但比率应提供相对性能的感觉):

代码语言:javascript
复制
# ... output from the jobs

JobCount                         : 20
BatchSize                        : 5
BatchCount                       : 4
Start-Job (secs.)                : 2.20
Start-ThreadJob (secs.)          : 1.17
Start-Process (secs.)            : 0.84
ForEach-Object -Parallel (secs.) : 0.94

结论:

  • ForEach-Object -Parallel增加的线程/作业管理开销最少,其次是Start-ThreadJob

由于需要额外的子进程(对于运行每个任务的隐藏的

  • Start-Job,实例),PowerShell明显地慢了一些。在Windows上,性能差异似乎要明显得多。

Measure-Parallel 源代码:

  • Important

代码语言:javascript
复制
- **The function** _**hard-codes**_ **sample input objects as well as what external program to invoke** - you'll have to edit it yourself as needed; the hard-coded external program is the platform-native shell in this case (`cmd.exe` on Windows, `/bin/sh` on Unix-like platform), which is passed a command to simply _echo_ each input object.
代码语言:javascript
复制
    - It wouldn't be too hard to modify the function to accept a script block as an argument, and to receive input objects for the jobs via the pipeline (though that would preclude the `Start-Process` approach, except if you explicitly call the block via the PowerShell CLI - but in that case `Start-Job` could just be used).
代码语言:javascript
复制
- What the jobs / processes _output_ goes _directly to the display_ and cannot be captured.

  • 默认为5,可以使用-BatchSize修改批处理大小;对于基于线程的方法,批处理大小也用作-ThrottleLimit参数,即限制允许同时运行多少线程。默认情况下,运行单个批处理,但您可以通过将并行运行的总数传递给-JobCount

间接地请求多个批。

  • 您可以通过数组值-Approach参数来选择方法,该参数支持JobThreadJobProcessForEachParallelAll,后者结合了前面的所有内容。

f 285

代码语言:javascript
复制
- If `-Approach` isn't specified, _interactive_ mode is entered, where you're (repeatedly) prompted for the desired approach.

  • ,除了在交互模式下,输出一个具有比较时间的自定义对象.

代码语言:javascript
复制
function Measure-Parallel {

  [CmdletBinding()]
  param(
    [ValidateRange(2, 2147483647)] [int] $BatchSize = 5,
    [ValidateSet('Job', 'ThreadJob', 'Process', 'ForEachParallel', 'All')] [string[]] $Approach,
    [ValidateRange(2, 2147483647)] [int] $JobCount = $BatchSize # pass a higher count to run multiple batches
  )

  $noForEachParallel = $PSVersionTable.PSVersion.Major -lt 7
  $noStartThreadJob = -not (Get-Command -ErrorAction Ignore Start-ThreadJob)

  $interactive = -not $Approach
  if (-not $interactive) {
    # Translate the approach arguments into their corresponding hashtable keys (see below).
    if ('All' -eq $Approach) { $Approach = 'Job', 'ThreadJob', 'Process', 'ForEachParallel' }
    $approaches = $Approach.ForEach({
      if ($_ -eq 'ForEachParallel') { 'ForEach-Object -Parallel' }
      else { $_ -replace '^', 'Start-' }
    })
  }

  if ($noStartThreadJob) {
    if ($interactive -or $approaches -contains 'Start-ThreadJob') {
      Write-Warning "Start-ThreadJob is not installed, omitting its test; install it with ``Install-Module ThreadJob``"
      $approaches = $approaches.Where({ $_ -ne 'Start-ThreadJob' })
    }
  }
  if ($noForEachParallel) {
    if ($interactive -or $approaches -contains 'ForEach-Object -Parallel') {
      Write-Warning "ForEach-Object -Parallel is not available in this PowerShell version (requires v7+), omitting its test."
      $approaches = $approaches.Where({ $_ -ne 'ForEach-Object -Parallel' })
    }
  }

  # Simulated input: Create 'f0.zip', 'f1'.zip', ... file names.
  $zipFiles = 0..($JobCount - 1) -replace '^', 'f' -replace '$', '.zip'

  # Sample executables to run - here, the native shell is called to simply 
  # echo the argument given.
  # The external program to invoke.
  $exe = if ($env:OS -eq 'Windows_NT') { 'cmd.exe' } else { 'sh' }
  # The list of its arguments *as a single string* - use '{0}' as the placeholder for where the input object should go.
  $exeArgList = if ($env:OS -eq 'Windows_NT') { '/c "echo {0}"' } else { '-c "echo {0}"' }

  # A hashtable with script blocks that implement the 3 approaches to parallelism.
  $approachImpl = [ordered] @{}

  $approachImpl['Start-Job'] = { # child-process-based job
    param([array] $batch)
    $batch | 
    ForEach-Object {
      Start-Job { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
    } |
    Receive-Job -Wait -AutoRemoveJob # wait for all jobs, relay their output, then remove them.
  }

  if (-not $noStartThreadJob) {
    # If Start-ThreadJob is available, add an approach for it.
    $approachImpl['Start-ThreadJob'] = { # thread-based job - requires Install-Module ThreadJob in WinPS
      param([array] $batch)
      $batch |
      ForEach-Object {
        Start-ThreadJob -ThrottleLimit $BatchSize { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
      } |
      Receive-Job -Wait -AutoRemoveJob
    }
  }

  if (-not $noForEachParallel) {
    # If ForEach-Object -Parallel is supported (v7+), add an approach for it.
    $approachImpl['ForEach-Object -Parallel'] = {  
      param([array] $batch)
      $batch | ForEach-Object -ThrottleLimit $BatchSize -Parallel {
        Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $_)) 
      }
    }
  }

  $approachImpl['Start-Process'] = { # direct execution of an external program
    param([array] $batch)
    $batch |
    ForEach-Object {
      Start-Process -NoNewWindow -PassThru $exe -ArgumentList ($exeArgList -f $_)
    } |
    Wait-Process # wait for all processes to terminate.
  }

  # Partition the array of all indices into subarrays (batches)
  $batches = @(
    0..([math]::Ceiling($zipFiles.Count / $batchSize) - 1) | ForEach-Object {
      , $zipFiles[($_ * $batchSize)..($_ * $batchSize + $batchSize - 1)]
    }
  )

  # In interactive use, print verbose messages by default
  if ($interactive) { $VerbosePreference = 'Continue' }

  :menu while ($true) {
    if ($interactive) {
      # Prompt for the approach to use.
      $choices = $approachImpl.Keys.ForEach({
        if ($_ -eq 'ForEach-Object -Parallel') { '&' + $_ }
        else { $_ -replace '-', '-&' }
      }) + '&Quit'
      $choice = $host.ui.PromptForChoice("Approach", "Select parallelism approach:", $choices, 0)
      if ($choice -eq $approachImpl.Count) { break }
      $approachKey = @($approachImpl.Keys)[$choice]
    }
    else {
      # Use the given approach(es)
      $approachKey = $approaches
    }
    $tsTotals = foreach ($appr in $approachKey) {
      $i = 0; $tsTotal = [timespan] 0
      $batches | ForEach-Object {
        $ts = Measure-Command { & $approachImpl[$appr] $_ | Out-Host }
        Write-Verbose "$batchSize-element '$appr' batch finished in $($ts.TotalSeconds.ToString('N2')) secs."
        $tsTotal += $ts
        if (++$i -eq $batches.Count) {
          # last batch processed.
          if ($batches.Count -gt 1) {
            Write-Verbose "'$appr' processing of $JobCount items overall finished in $($tsTotal.TotalSeconds.ToString('N2')) secs." 
          }
          $tsTotal # output the overall timing for this approach
        }
        elseif ($interactive) {
          $choice = $host.ui.PromptForChoice("Continue?", "Select action", ('&Next batch', '&Return to Menu', '&Quit'), 0)
          if ($choice -eq 1) { continue menu }
          if ($choice -eq 2) { break menu }
        }
      }
    }
    if (-not $interactive) {
      # Output a result object with the overall timings.
      $oht = [ordered] @{}; $i = 0
      $oht['JobCount'] = $JobCount
      $oht['BatchSize'] = $BatchSize
      $oht['BatchCount'] = $batches.Count
      foreach ($appr in $approachKey) {        
        $oht[($appr + ' (secs.)')] = $tsTotals[$i++].TotalSeconds.ToString('N2')
      }
      [pscustomobject] $oht
      break # break out of the infinite :menu loop
    }
  }

}
票数 3
EN

Stack Overflow用户

发布于 2022-10-08 13:20:32

您可以在foreach循环中添加一个计数器,并在计数器达到所需值时中断该计数器。

代码语言:javascript
复制
$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
  $counter++
  if ($counter -ge $numjobs) {
    break 
  }
  <your code>
}

或者使用Powershells对象

代码语言:javascript
复制
$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
  <your code>
}

如果要分批处理整个数组并等待每个批处理完成,则必须保存Start-Job返回的对象,并将其传递给Wait-Job,如下所示:

代码语言:javascript
复制
$items = 1..100

$batchsize = 5

while ($true) {
    $jobs = @()
    $counter = 0
    foreach ($i in $items) {
        if ($counter -ge $batchsize) {
            $items = $items[$batchsize..($items.Length)]
            break 
        }
        $jobs += Start-Job -ScriptBlock { Start-Sleep 10 }
        $counter++
    }
    foreach ($job in $jobs) {
        $job | Wait-Job | Out-Null
    }
    if (!$items) {
        break
    }
}

通过设计,数组有固定的长度,这就是为什么我要用$items = $items[$batchsize..($items.Length)]重写整个数组的原因

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73997250

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档