我一直在为我的硕士论文编写一个Perl脚本,从10K (一家公司的年度报告)中提取一小块文本(CAE)。经过大量的工作,我终于完成了这个剧本。现在我需要写一个新的剧本,但由于下周的最后期限,我恐怕赶不上完成的时间。我想知道是否有人能帮我解决以下问题:
我有将近52.000个.txt文件和一小块文本。我需要一个脚本来写下每个.txt文件的名称,以及这个文件中的单词和/或字符的数量,并将所有文件复制到一个文本文件中。
有人能帮我吗?我会很感激的!
到目前为止,我得到的是:
#!/usr/bin/perl -w
use strict;
use warnings;
my $folder; #Base directory for the 10K filings
my $subfolder="2012"; #Subdirectory where 10K filings are placed (Default is ./10K/10K_Raw/2012/*.txt)
my $folder10kcae="10K_CAE"; #Name of subdirectory for output (CAE)
my $folderwc="10K_WC"; #Name of subdirectory for output (WordCount)
my $target_cae; #Name of target directory for output (CAE)
my $target_wc; #Name of target directory for output (WordCount)
my $slash; #Declare slash (dependent on operating system)
my $file; #Filename
my @allfiles; #All files in directory, put into an array
my $allfiles; #Total files in directory
my $data; #Input file contents
my $cae; #Results of the search query (CAE)
my $wc #Results of the search query (WordCount)
my $output_cae; #Output file with CAE
my $output_wc; #Output file with WordCount
my $log; #Log file (also used to determine point to continue progress)
my $logfile="$subfolder".".log";#Filename of log file
my @filesinlog; #Files that have been processed according to log file
{
#Set folders for Windows. Put raw 10K filings in folder\subfolder
$slash="\\";
$folder="C:\\10KK\\"; ###specify correct base-map###
}
#Open source folder and read all files
opendir(DIR,"$folder$slash$subfolder") or die $!;
@allfiles=grep /(.\.txt)/, readdir DIR;
chomp(@allfiles);
#Creates destination folder
$target_wc="$folder$slash$folder10kwc$slash$subfolder";
mkdir "$folder$slash$folder10kwc";
mkdir $target_wc;
#Count lines, words and characters
my ($lines, $words, $chars) = (0,0,0);
while ($data=@allfiles) {
$lines++;
$chars += length($_);
$words += scalar(split(/\s+/, $_));
}
open $output_wc, ">", "$target_wc$slash$file" or die $!;
print $output_wc $wc;
close $output_wc;
print("lines=$lines words=$words chars=$chars\n");发布于 2014-06-04 10:09:11
我想说,这里有一些轮盘重新设计的问题,我不会使用perl脚本。有一个名为'wc‘的unix命令行工具( word count的缩写),它可以在不需要编程的情况下完成任何你想做的事情。
论unix
$ wc /path/to/my/folder/* > /path/to/my/output/file.txt在windows上,您可以下载wc程序作为用于Windows的GNU Coreutils包的一部分,然后在windows样式中运行相同的命令。
C:\ > wc \path\to\my\folder\* > \path\to\my\output\file.txthttps://stackoverflow.com/questions/24034390
复制相似问题