我有下面的文本文件,其中的字段用空格分隔,在某些行中,有些字段为空。
DISTRIBUTION MINIMUM_SYSTEM_REQUIREMENTS BASED_ON IMAGE_SIZE LATEST_RELEASE_YEAR FOUNDER
------------------ ------------------------------------ ---------------- ------------------------ ----------------------- --------------------
Absolute Linux CPU: Intel/AMD 64bit RAM: 64 MB Slackware 2020 Absolute Linux Team
Alpine Linux RAM: 128MB BusyBox, musl 2022 LEAF Project members
antiX CPU: Intel/AMD X86, RAM 256MB 700 MB Base, 1GB Full 2020 Anticapitalista我想根据每个标头的起始位置添加,或|作为分隔符,因为如果用regex替换超过2个空格,则空字段将出现不正确的位置或得到不正确的字段数。
到目前为止,我尝试的是读取第一行并将所有头存储在数组中。但目前,我的标题位置错了。
headers=("DISTRIBUTION" "MINIMUM SYSTEM REQUIREMENTS" "BASED ON" "IMAGE SIZE" "LATEST RELEASE YEAR" "FOUNDER")
firstline=$(head -n 1 input.txt)
for w in "${headers[@]}"; do
echo $firstline| grep -b -o "$w"
done
0:DISTRIBUTION
13:MINIMUM SYSTEM REQUIREMENTS
41:BASED ON
50:IMAGE SIZE
61:LATEST RELEASE YEAR
81:FOUNDER我得到标题位置0,13,41,50,61,81和正确的是0,20,57,74,99,123。
也许有人可以在bash或awk中帮助我,我认为这可能更容易,但不知道如何实现。谢谢
我想要的输出如下
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
-------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista更新
所提供的善意解决方案的产出。
马克输出
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER
------------------|------------------------------------|----------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Team
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista埃德·莫顿的产出:
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER ||
------------------|------------------------------------|----------------|------------------------|-----------------------|--------------------||
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Te||
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project memb||s
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista||kvantour输出
DISTRIBUTION |MINIMUM_SYSTEM_REQUIREMENTS |BASED_ON |IMAGE_SIZE |LATEST_RELEASE_YEAR |FOUNDER
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Tea|m
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project membe|r
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista齐奥诺输出
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB | Slackware | | 2020 | Absolute Linux Team
Alpine Linux |RAM: 128MB | BusyBox, musl | | 2022 | LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | | 700 MB Base, 1GB Full | 2020 | Anticapitalista上一个字段中的差异(注意,不让我显示图像,只显示链接)
发布于 2022-08-22 13:23:01
awk的一个想法是用管道替换字符:
awk '
function addpipe(line) {
offset=0
for (i=1;i<n;i++) { # loop through separator array using lengths to break line into chunks and piece back together with a pipe
line= substr(line,1,offset+length(a[i])) "|" substr(line,offset+length(a[i])+2)
offset=offset + length(a[i]) +1
}
print line
}
FNR==1 { header=$0; next }
FNR==2 { n=split($0,a,".") # split separator line on periods
addpipe(header)
}
{ addpipe($0) }
' file这就产生了:
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
------------------|------------------------------------|-- -------------|------------------------|-----------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista发布于 2022-08-22 07:42:50
您可以使用GNU使用FIELDWIDTHS将输入作为固定宽度字段处理。为此,在BEGIN规则中提供一个空格分隔字符串,其中包含要处理的每个字段的宽度。要获得分隔符,可以插入(sub())一个'|'作为新的第一个字符。你可以这样做:
awk '
BEGIN { FIELDWIDTHS = "19 37 17 25 24 8" }
{ for (i=2; i<=NF; i++) sub(/^/,"|",$i) }1
' file输出
使用file中的示例数据中的上述内容,您将收到:
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
------------------. |------------------------------------. |-- -------------. |------------------------. |-----------------------. |--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista我看到的唯一不同是,'.'字符没有被另一个'-'替换。不知道这是否重要,这看起来会满足你的需求。如果需要调整的话请告诉我。
另外,FIELDWIDTHS是由GNU (gawk)提供的,而不是awk本身提供的。
如果要确保-----字段扩展整个字段的宽度,可以添加一条附加规则,将'.'和' '替换为'-'。
awk '
BEGIN { FIELDWIDTHS = "19 37 17 25 24 20" }
{ for (i=2; i<=NF; i++) sub(/^/,"|",$i) }
/^-/ {gsub(/(\.| )/,"-") }1
' file输出
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
--------------------|--------------------------------------|------------------|--------------------------|-------------------------|--------------------
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista发布于 2022-08-22 08:03:45
正如在David C. Rankin的答复中提到的,FIELDWIDTHS是一个GNU扩展,它的使用就是答案。请参阅GNU手册的处理固定宽度数据一节。字段的宽度实际上是在第二行中定义的。
其思想是将头存储在内存中,解析第二行,定义FIELDSWITHS并解析完整文件:
awk 'BEGIN{OFS="|"}
(FNR==1){ header=$0; next }
(FNR==2){ n=split($0,a,"[.]")
for(i=1;i<=n;++i) s = s length(a[i])+1 " "
FIELDWIDTHS=s; $0=header
}
{ $1=$1 }1' file根据给定的输入,返回:
DISTRIBUTION |MINIMUM SYSTEM REQUIREMENTS |BASED ON |IMAGE SIZE |LATEST RELEASE YEAR |FOUNDER
Absolute Linux |CPU: Intel/AMD 64bit RAM: 64 MB |Slackware | |2020 |Absolute Linux Team
Alpine Linux |RAM: 128MB |BusyBox, musl | |2022 |LEAF Project members
antiX |CPU: Intel/AMD X86, RAM 256MB | |700 MB Base, 1GB Full |2020 |Anticapitalista要获得符合POSIX标准的模拟,请参阅solution的解决方案!
https://stackoverflow.com/questions/73440828
复制相似问题