我很好奇在PowerShell中用Start-ThreadJob、Start-Job和Start-Process测试异步任务的性能/有用性。我有一个文件夹,里面有大约100个压缩文件,所以我想出了下面的测试:
New-Item "000" -ItemType Directory -Force # Move the old zip files in here
foreach ($i in $zipfiles) {
$name = $i -split ".zip"
Start-Job -scriptblock {
7z.exe x -o"$name" .\$name
Move-Item $i 000\ -Force
7z.exe a $i .\$name\*.*
}
}这方面的问题是,它将为所有100个zip启动作业,这可能太过了,所以我想设置一个值$numjobs,比如我可以更改的值5,这样只有$numjobs才能同时启动,然后脚本将检查在下一个5块开始之前结束的所有5个作业。然后我想根据$numjobs的值来观察CPU和内存。
我如何告诉一个循环只运行5次,然后等待乔布斯完成,然后继续?
我知道等待工作完成很容易
$jobs = $commands | Foreach-Object { Start-ThreadJob $_ }
$jobs | Receive-Job -Wait -AutoRemoveJobchange但是我该如何等待Start-Process任务的结束呢?
虽然我想使用Parallel-ForEach,但我工作的企业在未来3-4年内将与PowerShell 5.1紧密联系在一起,我希望没有机会安装PowerShell 7.x (尽管我很想在我的家庭系统上使用Parallel-ForEach进行测试,比较所有的方法)。
发布于 2022-10-08 17:51:01
添加到Santiago Squarzon's helpful answer
下面是帮助函数Measure-Parallel,它允许您比较以下并行处理方法的速度:
- Child-process-based: creates a child PowerShell process behind the scenes, which makes this approach both slow and resource-intensive.Start-ThreadJob -附带PowerShell (核心) (v6+);可通过Windows PowerShell V5.1中的Install-Module ThreadJob安装:- Thread-based: Much lighter-weight than `Start-Job` while providing the same functionality; additionally avoids potential loss of type fidelity due to cross-process serialization / deserialization.ForEach-Object -Parallel -仅在PowerShell (核心) 7.0+:中提供
- Thread-based: In essence a simplified wrapper around `Start-ThreadJob` with **support for direct pipeline input and direct output**, with invariably synchronous overall execution (all launched threads are waited for).- Child-process-based: Invokes an _external program_ asynchronously by default, on Windows in a _new window_ by default.
- Note that **this approach only makes sense if your parallel tasks** _**only**_ **consist of a** _**single**_ **call to an** _**external program**_, as opposed to needing to execute _a block of PowerShell code_.
- Notably, **the only way to** _**capture output**_ **with this approach is by** _**redirection to a file**_, invariably as _text only_.注意:
鉴于下面的测试包含了对外部可执行文件的单个调用(例如,在您的示例中是),那么Start-Process方法的性能将是最好的,因为它没有作业管理的开销。但是,如上所述,这种方法具有基本的limitations.
由于
Start-ThreadJob或ForEach-Object -Parallel对您可用,您就不需要使用这种方法了。示例Measure-Parallelism调用,它对比了这些方法的运行时性能:
# Run 20 jobs / processes in parallel, 5 at a time, comparing
# all approaches.
# Note: Omit the -Approach argument to enter interactive mode.
Measure-Parallel -Approach All -BatchSize 5 -JobCount 20运行macOS 7.2.6的PowerShell机器的示例输出(时间因许多因素而异,但比率应提供相对性能的感觉):
# ... output from the jobs
JobCount : 20
BatchSize : 5
BatchCount : 4
Start-Job (secs.) : 2.20
Start-ThreadJob (secs.) : 1.17
Start-Process (secs.) : 0.84
ForEach-Object -Parallel (secs.) : 0.94结论:
ForEach-Object -Parallel增加的线程/作业管理开销最少,其次是Start-ThreadJob。
由于需要额外的子进程(对于运行每个任务的隐藏的
Start-Job,实例),PowerShell明显地慢了一些。在Windows上,性能差异似乎要明显得多。Measure-Parallel 源代码:
- **The function** _**hard-codes**_ **sample input objects as well as what external program to invoke** - you'll have to edit it yourself as needed; the hard-coded external program is the platform-native shell in this case (`cmd.exe` on Windows, `/bin/sh` on Unix-like platform), which is passed a command to simply _echo_ each input object. - It wouldn't be too hard to modify the function to accept a script block as an argument, and to receive input objects for the jobs via the pipeline (though that would preclude the `Start-Process` approach, except if you explicitly call the block via the PowerShell CLI - but in that case `Start-Job` could just be used).- What the jobs / processes _output_ goes _directly to the display_ and cannot be captured.5,可以使用-BatchSize修改批处理大小;对于基于线程的方法,批处理大小也用作-ThrottleLimit参数,即限制允许同时运行多少线程。默认情况下,运行单个批处理,但您可以通过将并行运行的总数传递给-JobCount间接地请求多个批。
-Approach参数来选择方法,该参数支持Job、ThreadJob、Process、ForEachParallel和All,后者结合了前面的所有内容。f 285
- If `-Approach` isn't specified, _interactive_ mode is entered, where you're (repeatedly) prompted for the desired approach.function Measure-Parallel {
[CmdletBinding()]
param(
[ValidateRange(2, 2147483647)] [int] $BatchSize = 5,
[ValidateSet('Job', 'ThreadJob', 'Process', 'ForEachParallel', 'All')] [string[]] $Approach,
[ValidateRange(2, 2147483647)] [int] $JobCount = $BatchSize # pass a higher count to run multiple batches
)
$noForEachParallel = $PSVersionTable.PSVersion.Major -lt 7
$noStartThreadJob = -not (Get-Command -ErrorAction Ignore Start-ThreadJob)
$interactive = -not $Approach
if (-not $interactive) {
# Translate the approach arguments into their corresponding hashtable keys (see below).
if ('All' -eq $Approach) { $Approach = 'Job', 'ThreadJob', 'Process', 'ForEachParallel' }
$approaches = $Approach.ForEach({
if ($_ -eq 'ForEachParallel') { 'ForEach-Object -Parallel' }
else { $_ -replace '^', 'Start-' }
})
}
if ($noStartThreadJob) {
if ($interactive -or $approaches -contains 'Start-ThreadJob') {
Write-Warning "Start-ThreadJob is not installed, omitting its test; install it with ``Install-Module ThreadJob``"
$approaches = $approaches.Where({ $_ -ne 'Start-ThreadJob' })
}
}
if ($noForEachParallel) {
if ($interactive -or $approaches -contains 'ForEach-Object -Parallel') {
Write-Warning "ForEach-Object -Parallel is not available in this PowerShell version (requires v7+), omitting its test."
$approaches = $approaches.Where({ $_ -ne 'ForEach-Object -Parallel' })
}
}
# Simulated input: Create 'f0.zip', 'f1'.zip', ... file names.
$zipFiles = 0..($JobCount - 1) -replace '^', 'f' -replace '$', '.zip'
# Sample executables to run - here, the native shell is called to simply
# echo the argument given.
# The external program to invoke.
$exe = if ($env:OS -eq 'Windows_NT') { 'cmd.exe' } else { 'sh' }
# The list of its arguments *as a single string* - use '{0}' as the placeholder for where the input object should go.
$exeArgList = if ($env:OS -eq 'Windows_NT') { '/c "echo {0}"' } else { '-c "echo {0}"' }
# A hashtable with script blocks that implement the 3 approaches to parallelism.
$approachImpl = [ordered] @{}
$approachImpl['Start-Job'] = { # child-process-based job
param([array] $batch)
$batch |
ForEach-Object {
Start-Job { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob # wait for all jobs, relay their output, then remove them.
}
if (-not $noStartThreadJob) {
# If Start-ThreadJob is available, add an approach for it.
$approachImpl['Start-ThreadJob'] = { # thread-based job - requires Install-Module ThreadJob in WinPS
param([array] $batch)
$batch |
ForEach-Object {
Start-ThreadJob -ThrottleLimit $BatchSize { Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $args[0])) } -ArgumentList $_
} |
Receive-Job -Wait -AutoRemoveJob
}
}
if (-not $noForEachParallel) {
# If ForEach-Object -Parallel is supported (v7+), add an approach for it.
$approachImpl['ForEach-Object -Parallel'] = {
param([array] $batch)
$batch | ForEach-Object -ThrottleLimit $BatchSize -Parallel {
Invoke-Expression ($using:exe + ' ' + ($using:exeArgList -f $_))
}
}
}
$approachImpl['Start-Process'] = { # direct execution of an external program
param([array] $batch)
$batch |
ForEach-Object {
Start-Process -NoNewWindow -PassThru $exe -ArgumentList ($exeArgList -f $_)
} |
Wait-Process # wait for all processes to terminate.
}
# Partition the array of all indices into subarrays (batches)
$batches = @(
0..([math]::Ceiling($zipFiles.Count / $batchSize) - 1) | ForEach-Object {
, $zipFiles[($_ * $batchSize)..($_ * $batchSize + $batchSize - 1)]
}
)
# In interactive use, print verbose messages by default
if ($interactive) { $VerbosePreference = 'Continue' }
:menu while ($true) {
if ($interactive) {
# Prompt for the approach to use.
$choices = $approachImpl.Keys.ForEach({
if ($_ -eq 'ForEach-Object -Parallel') { '&' + $_ }
else { $_ -replace '-', '-&' }
}) + '&Quit'
$choice = $host.ui.PromptForChoice("Approach", "Select parallelism approach:", $choices, 0)
if ($choice -eq $approachImpl.Count) { break }
$approachKey = @($approachImpl.Keys)[$choice]
}
else {
# Use the given approach(es)
$approachKey = $approaches
}
$tsTotals = foreach ($appr in $approachKey) {
$i = 0; $tsTotal = [timespan] 0
$batches | ForEach-Object {
$ts = Measure-Command { & $approachImpl[$appr] $_ | Out-Host }
Write-Verbose "$batchSize-element '$appr' batch finished in $($ts.TotalSeconds.ToString('N2')) secs."
$tsTotal += $ts
if (++$i -eq $batches.Count) {
# last batch processed.
if ($batches.Count -gt 1) {
Write-Verbose "'$appr' processing of $JobCount items overall finished in $($tsTotal.TotalSeconds.ToString('N2')) secs."
}
$tsTotal # output the overall timing for this approach
}
elseif ($interactive) {
$choice = $host.ui.PromptForChoice("Continue?", "Select action", ('&Next batch', '&Return to Menu', '&Quit'), 0)
if ($choice -eq 1) { continue menu }
if ($choice -eq 2) { break menu }
}
}
}
if (-not $interactive) {
# Output a result object with the overall timings.
$oht = [ordered] @{}; $i = 0
$oht['JobCount'] = $JobCount
$oht['BatchSize'] = $BatchSize
$oht['BatchCount'] = $batches.Count
foreach ($appr in $approachKey) {
$oht[($appr + ' (secs.)')] = $tsTotals[$i++].TotalSeconds.ToString('N2')
}
[pscustomobject] $oht
break # break out of the infinite :menu loop
}
}
}发布于 2022-10-08 13:20:32
您可以在foreach循环中添加一个计数器,并在计数器达到所需值时中断该计数器。
$numjobs = 5
$counter = 0
foreach ($i in $zipfiles) {
$counter++
if ($counter -ge $numjobs) {
break
}
<your code>
}或者使用Powershells对象
$numjobs = 5
$zipfiles | select -first $numjobs | Foreach-Object {
<your code>
}如果要分批处理整个数组并等待每个批处理完成,则必须保存Start-Job返回的对象,并将其传递给Wait-Job,如下所示:
$items = 1..100
$batchsize = 5
while ($true) {
$jobs = @()
$counter = 0
foreach ($i in $items) {
if ($counter -ge $batchsize) {
$items = $items[$batchsize..($items.Length)]
break
}
$jobs += Start-Job -ScriptBlock { Start-Sleep 10 }
$counter++
}
foreach ($job in $jobs) {
$job | Wait-Job | Out-Null
}
if (!$items) {
break
}
}通过设计,数组有固定的长度,这就是为什么我要用$items = $items[$batchsize..($items.Length)]重写整个数组的原因
https://stackoverflow.com/questions/73997250
复制相似问题