我在黑一些AWK。我是个初学者。我已经做了我的家庭作业在下面的问题,只是无法使它工作。
原始数据样本:
Start Date 12/3/17
End Date 12/30/17
Report Type Report1
Currency ZAR
Country Identifier MType Quantity Net Net Net Code Title Contrib I_Type M_Type Vendor Identifier Offline Indicator LSN
ZA 44057330 FMP 1 0.050666 0.050666 USYYYYYYYYYY ABC Tom 1 1 USYYYYYYYYYY 0 SUT
ZA 1267456726 SIMT 1 0.03 0.03 USXXXXXXXXXX DEF Frances 1 1 USXXXXXXXXXX 0 XYZ
Row Count 657
Storefront Name MType Quantity Net Net
ZA FMP 601 30.45
ZA IAP 13 0.68
ZA IMP 1035 69.36
ZA SIMP 54 1.4
ZA FMT 70 0.53
ZA IMT 92 1.68
ZA SIMT 6 0.18期望产出:
(我把那些特殊人物留在了这里,没有逃脱。)
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "NetNet" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX" "DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"基本上,我只需要从第5行获取大部分标头,但是我需要的三个字段在第1-4行中。此外,我不需要数据包括和后面的行开始“行计数”。
到目前为止我最好的“猜测”:
gawk '
function basename(file) {
sub(".*/", "", file)
return file
}
/^Row Count/ {nextfile}
FNR == 1 { StartDate=$2; }
FNR == 2 { EndDate=$2; }
FNR == 4 { curr=$2; }
NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print}
FNR > 5 {$0 = StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt谢谢!
编辑:
新表
这些是每个文件中字段标题之前的行。内容从第4行开始:
Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,“我的”剧本
它几乎成功了。但是它包括每个文件的第1-3行: gawk的函数basename( file ) { sub(".*/“、”、file)返回文件} BEGIN { FS=OFS=“、”} NR <3{ if ( NR == 2){ hdr = "Report_Period“OFS val = val $1 OFS } next } FNR>3 { print”{打印basename(文件名),val $0 }‘OFS=",“/path/to/input/files > ~/path/to/output/file/file.csv”
编辑结束
发布于 2018-07-02 23:11:18
您的示例输入格式还不清楚,但这可能是您所要寻找的,或者它可能做的太多了,或者完全是其他的事情:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
fname = FILENAME
sub(/.*[/]/,"",fname)
}
{
gsub(/[\\]t/,FS)
gsub(/[\\][/]/,"/")
gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
if ( FNR != 3 ) {
hdr = hdr $1 OFS
val = val $2 OFS
}
next
}
FNR==5 {
print "\"Filename\"", hdr $0
next
}
{ print "\""fname"\"", val $0 }
$ awk -f tst.awk file
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "Net Net" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX""DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"上面使用GNU awk作为nextfile,您已经在使用该文件了。
https://stackoverflow.com/questions/51144323
复制相似问题