注意:我已经在AWK:在源词后面插入目标词的快速方法中问了一个类似的问题,而且我还处于AWK的初级阶段。
此问题考虑在多个随机选择的行中在源项之后插入多个目标项。
使用此AWK代码片段
awk '(NR==FNR){a[$1];next}
FNR in a { gsub(/\/,"& target term") }
1
' <(shuf -n 5 -i 1-$(wc -l < file)) file我想在target term的5行随机行中在source term后面插入一个file。
例如:我有一个双语词典dict,它包含左侧的源词和右边的目标词,如
apple : Apfel
banana : Banane
raspberry : Himbeere我的file由以下行组成:
I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple pi?
Apple pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?假设对于第一个单词apple,随机行1,3,5,4,7被选中。苹果这个词的输出是这样的:
I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?然后再为单词banana选择5条随机行: 3、3、5、6、7:
I love the Raspberry Pi .
The monkey loves eating a banana .
Who wants an apple Apfel pi ?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana Banane is tasty and healthy .
An apple a day keeps the doctor away .
Which fruit is tastes better: raspberry or strawberry?dict中的所有其他条目也是如此,直到最后一个条目匹配为止。
我想选择5条随机行。如果这些行有一个像apple这样的完整的源词,我只想把Apfel和apple作为整个单词来匹配(像“菠萝”这样的词就会被忽略)。如果一个行包含一个源术语两次,比如apple,那么我也希望在它之后插入目标项。匹配应该是不区分大小写的,所以我也可以匹配源术语,如apple和Apple。
我的问题是:如何重写上面的代码片段,这样我就可以使用字典dict,它可以在file中随机选择行并在源术语后面插入目标术语。
发布于 2022-01-31 23:23:23
下面是如何使用awk随机从输入文件中选择5个行号(而wc第一次只计算行号):
$ awk -v numLines="$(wc -l < file)" 'BEGIN{srand(); for (i=1; i<=5; i++) print int(1+rand()*numLines)}'
7
2
88
13
18现在您所要做的就是取我先前的回答,对于ARGIND==1块中读取的每一个“旧”字符串生成5个行号,填充一个数组,该数组将生成的行号映射到与每个行号关联的旧字符串,在读取最终输入文件时,检查当前行号是否在数组中,如果是,则循环遍历存储在该行号数组中的“旧”S,执行我上一个答复中显示的gsub()。
使用GNU表示ARGIND、IGNORECASE、word边界、数组数组和\s缩写为[[:space:]]:
$ cat tst.sh
#!/usr/bin/env bash
awk -v numLines=$(wc -l < file) '
BEGIN {
FS = "\\s*:\\s*"
IGNORECASE = 1
srand()
}
ARGIND == 1 {
old = "\\<" $1 "\\>"
new = "& " $2
for (i=1; i<=5; i++) {
lineNr = int(1+rand()*numLines)
map[lineNr][old] = new
}
next
}
FNR in map {
for ( old in map[FNR] ) {
new = map[FNR][old]
gsub(old,new)
}
}
{ print }
' dict file$ ./tst.sh
I love the Raspberry Pi.
The monkey loves eating a banana Banane.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple Apfel-pen!
The banana Banane is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry Himbeere or strawberry?发布于 2022-02-01 05:27:13
GNU使用扩展正则表达式(-E)和S/命令的(/e)修饰符:
n=$(< file wc -l)
sed -E '/\n/ba
s#^(\S+)\s*:\s*(\S+)$#s/\\<\1\\>/\& \2/Ig#;h'"
s/.*/shuf -n 5 -i '1-$n'/e;G
:a
s/^([0-9]+)(\n.*\n(.*))/\1 \3\2/
/\n.*\n/!s/\n/ /
P;D
" dict | sed -f /dev/stdin filehttps://unix.stackexchange.com/questions/688689
复制相似问题