我有两个文件,par1.txt,par2.txt。我想查看两个文件的第一个字段或列,比较它们,然后如果它们匹配,则打印它们匹配的记录或行。
放大文件:
par1.txt
ocean;stuff about an ocean;definitions of oeans
park;stuff about parks;definitions of parks
ham;stuff about ham;definitions of hampar2.txt
hand,stuff about hands,definitions of hands
bread,stuff about bread,definitions of bread
ocean,different stuff about an ocean,difference definitions of oceans
ham,different stuff about ham,different definitions of ham至于我的输出,我想要的是
ocean:stuff about an ocean:definitions of oeans
ocean:different stuff about an ocean:difference definitions of oceans
ham:different stuff about ham:different definitions of ham
ham:stuff about ham:definitions of ham如示例所示,文件中的FS是不同的。输出FS不必是":“,它不可能是一个空格。
发布于 2014-08-07 22:18:14
编辑的答案
根据您的评论,我相信您有两个以上的文件,这些文件有时有逗号,有时有分号作为分隔符,而且您希望打印出与第一个字段匹配的任何行,只要第一个字段有多个。如果是这样的话,我想你想要这个:
awk -F, '
{
gsub(/;/,",");$0=$0; # Replace ";" with "," and reparse line using new field sep
sep=""; # Preset record separator to blank
if(counts[$1]++) sep="\n"; # Add newline if anything already stored in records[$1]
records[$1] = records[$1] sep $0; # Append this record to other records with same key
}
END { for (x in counts) if (counts[x]>1) print records[x] }' par*.txt原始答案
我想出了这个:
awk -F';' '
FNR==NR {x[$1]=$0; next}
$1 in x {printf "%s\n%s\n",$0,x[$1]}' par1.txt <(sed 's/,/;/' par2.txt)在par1.txt中读取并存储在由第一个字段索引的数组x[]中。将par2.txt中的逗号替换为分号,以便分隔符匹配。在读取par2.txt的每一行时,查看它是否在存储的数组x[]中,如果是,则打印存储的数组x[]和当前行。
发布于 2014-08-07 22:07:06
使用awk:
awk -v OFS=":" '
{ $1 = $1 }
NR==FNR { lines[$1] = $0; next }
($1 in lines) { print lines[$1] RS $0 }
' FS=";" par1.txt FS="," par2.txt输出:
ocean:stuff about an ocean:definitions of oeans
ocean:different stuff about an ocean:difference definitions of oceans
ham:stuff about ham:definitions of ham
ham:different stuff about ham:different definitions of ham解释:
:。如果希望分隔空格,则不需要设置-v OFS。$1=$1帮助我们重新格式化整行,以便在重新构造时获得OFS的值。NR==FNR将第一个文件读取到数组中。FS=";" par1.txt FS="," par2.txt是一种可以为不同文件指定不同字段分隔符的技术。如果在两个文件中都有重复的第一列,并且希望捕获所有内容,那么请使用以下内容。这是相似的逻辑,但我们保持所有的线在数组和打印在最后。
awk -v OFS=":" '
{ $1 = $1 }
NR==FNR {
lines[$1] = (lines[$1] ? lines[$1] RS $0 : $0);
next
}
($1 in lines) {
lines[$1] = lines[$1] RS $0;
seen[$1]++
}
END { for (patt in seen) print lines[patt] }
' FS=";" par1.txt FS="," par2.txthttps://stackoverflow.com/questions/25192783
复制相似问题