首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >PowerShell获取大(大)文件的行数

PowerShell获取大(大)文件的行数
EN

Stack Overflow用户
提问于 2012-08-23 12:13:53
回答 8查看 78.2K关注 0票数 45

从文件中获取行数的方法之一是PowerShell中的以下方法:

代码语言:javascript
复制
PS C:\Users\Pranav\Desktop\PS_Test_Scripts> $a=Get-Content .\sub.ps1
PS C:\Users\Pranav\Desktop\PS_Test_Scripts> $a.count
34
PS C:\Users\Pranav\Desktop\PS_Test_Scripts> 

但是,当我有一个很大的800MB文本文件时,如何在不读取整个文件的情况下获得行号?

上面的方法会消耗太多的RAM,导致脚本崩溃或需要太长时间才能完成。

EN

回答 8

Stack Overflow用户

发布于 2012-08-23 12:50:04

使用Get-Content -Read $nLinesAtTime逐部分读取文件:

代码语言:javascript
复制
$nlines = 0;

# Read file by 1000 lines at a time
gc $YOURFILE -read 1000 | % { $nlines += $_.Length };
[string]::Format("{0} has {1} lines", $YOURFILE, $nlines)

下面是一个简单但很慢的脚本,用来验证一个小文件上的工作:

代码语言:javascript
复制
gc $YOURFILE | Measure-Object -Line
票数 33
EN

Stack Overflow用户

发布于 2015-12-14 04:21:07

下面是我拼凑起来的一个PowerShell脚本,它演示了在文本文件中计算行数的几种不同方法,以及每种方法所需的时间和内存。结果(如下)显示了时间和内存需求的明显差异。在我的测试中,Get-Content似乎是最好的选择,使用的ReadCount设置为100。其他测试需要更多的时间和/或内存使用。

代码语言:javascript
复制
#$testFile = 'C:\test_small.csv' # 245 lines, 150 KB
#$testFile = 'C:\test_medium.csv' # 95,365 lines, 104 MB
$testFile = 'C:\test_large.csv' # 285,776 lines, 308 MB

# Using ArrayList just because they are faster than Powershell arrays, for some operations with large arrays.
$results = New-Object System.Collections.ArrayList

function AddResult {
param( [string] $sMethod, [string] $iCount )
    $result = New-Object -TypeName PSObject -Property @{
        "Method" = $sMethod
        "Count" = $iCount
        "Elapsed Time" = ((Get-Date) - $dtStart)
        "Memory Total" = [System.Math]::Round((GetMemoryUsage)/1mb, 1)
        "Memory Delta" = [System.Math]::Round(((GetMemoryUsage) - $dMemStart)/1mb, 1)
    }
    [void]$results.Add($result)
    Write-Output "$sMethod : $count"
    [System.GC]::Collect()
}

function GetMemoryUsage {
    # return ((Get-Process -Id $pid).PrivateMemorySize)
    return ([System.GC]::GetTotalMemory($false))
}

# Get-Content -ReadCount 1
[System.GC]::Collect()
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = 0
Get-Content -Path $testFile -ReadCount 1 |% { $count++ }
AddResult "Get-Content -ReadCount 1" $count

# Get-Content -ReadCount 10,100,1000,0
# Note: ReadCount = 1 returns a string.  Any other value returns an array of strings.
# Thus, the Count property only applies when ReadCount is not 1.
@(10,100,1000,0) |% {
    $dMemStart = GetMemoryUsage
    $dtStart = Get-Date
    $count = 0
    Get-Content -Path $testFile -ReadCount $_ |% { $count += $_.Count }
    AddResult "Get-Content -ReadCount $_" $count
}

# Get-Content | Measure-Object
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = (Get-Content -Path $testFile -ReadCount 1 | Measure-Object -line).Lines
AddResult "Get-Content -ReadCount 1 | Measure-Object" $count

# Get-Content.Count
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = (Get-Content -Path $testFile -ReadCount 1).Count
AddResult "Get-Content.Count" $count

# StreamReader.ReadLine
$dMemStart = GetMemoryUsage
$dtStart = Get-Date
$count = 0
# Use this constructor to avoid file access errors, like Get-Content does.
$stream = New-Object -TypeName System.IO.FileStream(
    $testFile,
    [System.IO.FileMode]::Open,
    [System.IO.FileAccess]::Read,
    [System.IO.FileShare]::ReadWrite)
if ($stream) {
    $reader = New-Object IO.StreamReader $stream
    if ($reader) {
        while(-not ($reader.EndOfStream)) { [void]$reader.ReadLine(); $count++ }
        $reader.Close()
    }
    $stream.Close()
}

AddResult "StreamReader.ReadLine" $count

$results | Select Method, Count, "Elapsed Time", "Memory Total", "Memory Delta" | ft -auto | Write-Output

以下是包含约95k行、104MB的文本文件的结果:

代码语言:javascript
复制
Method                                    Count Elapsed Time     Memory Total Memory Delta
------                                    ----- ------------     ------------ ------------
Get-Content -ReadCount 1                  95365 00:00:11.1451841         45.8          0.2
Get-Content -ReadCount 10                 95365 00:00:02.9015023         47.3          1.7
Get-Content -ReadCount 100                95365 00:00:01.4522507         59.9         14.3
Get-Content -ReadCount 1000               95365 00:00:01.1539634         75.4         29.7
Get-Content -ReadCount 0                  95365 00:00:01.3888746          346        300.4
Get-Content -ReadCount 1 | Measure-Object 95365 00:00:08.6867159         46.2          0.6
Get-Content.Count                         95365 00:00:03.0574433        465.8        420.1
StreamReader.ReadLine                     95365 00:00:02.5740262         46.2          0.6

以下是一个较大文件(包含约285k行,308 MB)的结果:

代码语言:javascript
复制
Method                                    Count  Elapsed Time     Memory Total Memory Delta
------                                    -----  ------------     ------------ ------------
Get-Content -ReadCount 1                  285776 00:00:36.2280995         46.3          0.8
Get-Content -ReadCount 10                 285776 00:00:06.3486006         46.3          0.7
Get-Content -ReadCount 100                285776 00:00:03.1590055         55.1          9.5
Get-Content -ReadCount 1000               285776 00:00:02.8381262         88.1         42.4
Get-Content -ReadCount 0                  285776 00:00:29.4240734        894.5        848.8
Get-Content -ReadCount 1 | Measure-Object 285776 00:00:32.7905971         46.5          0.9
Get-Content.Count                         285776 00:00:28.4504388       1219.8       1174.2
StreamReader.ReadLine                     285776 00:00:20.4495721           46          0.4
票数 31
EN

Stack Overflow用户

发布于 2017-06-16 18:00:39

这是基于Pseudothink的帖子的一行代码。

一个特定文件中的行:

代码语言:javascript
复制
"the_name_of_your_file.txt" |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"}

当前目录中的所有文件(单独):

代码语言:javascript
复制
Get-ChildItem "." |% {$n = $_; $c = 0; Get-Content -Path $_ -ReadCount 1000 |% { $c += $_.Count }; "$n; $c"}

说明:

"the_name_of_your_file.txt" ->什么也不做,只提供下一步的文件名,需要用双引号引起来

|% ->别名ForEach-Object,遍历提供的项(在本例中只有一个),接受管道内容作为输入,当前项保存到$_

$n = $_ -> $n提供的文件名是从$_保存的,实际上可能不需要这样做

$c as计数的$c = 0 ->初始化

Get-Content -Path $_ -ReadCount 1000 ->从提供的文件中读取1000行(请参阅该线程的其他答案)

|% -> foreach确实添加了实际读取到$c的行数(将类似于1000 + 1000 + 123)

"$n; $c" ->读取完文件后,打印文件名;行数

Get-ChildItem "." ->只是向管道中添加了比单个文件名多的项目

票数 21
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/12084642

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档