文章/答案/技术大牛

发布

社区首页 >问答首页 >AWK:在字典的源项后面插入随机选定的行中的目标项

问AWK:在字典的源项后面插入随机选定的行中的目标项
EN

Unix & Linux用户

提问于 2022-01-31 12:39:16

回答 2查看 138关注 0票数 1

注意:我已经在AWK:在源词后面插入目标词的快速方法中问了一个类似的问题，而且我还处于AWK的初级阶段。

此问题考虑在多个随机选择的行中在源项之后插入多个目标项。

使用此AWK代码片段

awk '(NR==FNR){a[$1];next}
    FNR in a { gsub(/\/,"& target term") }
     1
    ' <(shuf -n 5 -i 1-$(wc -l < file)) file

我想在target term的5行随机行中在source term后面插入一个file。

例如:我有一个双语词典dict，它包含左侧的源词和右边的目标词，如

apple     : Apfel
banana    : Banane
raspberry : Himbeere

我的file由以下行组成：

I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple pi?
Apple pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?

假设对于第一个单词apple，随机行1，3，5，4，7被选中。苹果这个词的输出是这样的：

I love the Raspberry Pi.
The monkey loves eating a banana.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry or strawberry?

然后再为单词banana选择5条随机行: 3、3、5、6、7：

I love the Raspberry Pi .
The monkey loves eating a banana .
Who wants an apple Apfel pi ?
Apple Apfel pen... pineapple pen... pen-pineapple-apple-pen!
The banana Banane is tasty and healthy .
An apple a day keeps the doctor away .
Which fruit is tastes better: raspberry or strawberry?

dict中的所有其他条目也是如此，直到最后一个条目匹配为止。

我想选择5条随机行。如果这些行有一个像apple这样的完整的源词，我只想把Apfel和apple作为整个单词来匹配(像“菠萝”这样的词就会被忽略)。如果一个行包含一个源术语两次，比如apple，那么我也希望在它之后插入目标项。匹配应该是不区分大小写的，所以我也可以匹配源术语，如apple和Apple。

我的问题是:如何重写上面的代码片段，这样我就可以使用字典dict，它可以在file中随机选择行并在源术语后面插入目标术语。

awk

files

回答 2

Unix & Linux用户

回答已采纳

发布于 2022-01-31 23:23:23

下面是如何使用awk随机从输入文件中选择5个行号(而wc第一次只计算行号)：

$ awk -v numLines="$(wc -l < file)" 'BEGIN{srand(); for (i=1; i<=5; i++) print int(1+rand()*numLines)}'
7
2
88
13
18

现在您所要做的就是取我先前的回答，对于ARGIND==1块中读取的每一个“旧”字符串生成5个行号，填充一个数组，该数组将生成的行号映射到与每个行号关联的旧字符串，在读取最终输入文件时，检查当前行号是否在数组中，如果是，则循环遍历存储在该行号数组中的“旧”S，执行我上一个答复中显示的gsub()。

使用GNU表示ARGIND、IGNORECASE、word边界、数组数组和\s缩写为[[:space:]]：

$ cat tst.sh
#!/usr/bin/env bash

awk -v numLines=$(wc -l < file) '
    BEGIN {
        FS = "\\s*:\\s*"
        IGNORECASE = 1
        srand()
    }
    ARGIND == 1 {
        old = "\\<" $1 "\\>"
        new = "& " $2
        for (i=1; i<=5; i++) {
            lineNr = int(1+rand()*numLines)
            map[lineNr][old] = new
        }
        next
    }
    FNR in map {
        for ( old in map[FNR] ) {
            new = map[FNR][old]
            gsub(old,new)
        }
    }
    { print }
' dict file

$ ./tst.sh
I love the Raspberry Pi.
The monkey loves eating a banana Banane.
Who wants an apple Apfel pi?
Apple Apfel pen... pineapple pen... pen-pineapple-apple Apfel-pen!
The banana Banane is tasty and healthy.
An apple a day keeps the doctor away.
Which fruit is tastes better: raspberry Himbeere or strawberry?

票数 1

Unix & Linux用户

发布于 2022-02-01 05:27:13

GNU使用扩展正则表达式(-E)和S/命令的(/e)修饰符：

n=$(< file wc -l)
sed -E '/\n/ba
  s#^(\S+)\s*:\s*(\S+)$#s/\\<\1\\>/\& \2/Ig#;h'"
  s/.*/shuf -n 5 -i '1-$n'/e;G
  :a
  s/^([0-9]+)(\n.*\n(.*))/\1 \3\2/
  /\n.*\n/!s/\n/ /
  P;D
" dict | sed -f /dev/stdin file

从管道文件的内容生成GNU命令。
将命令保存在“暂停”中。
滚动骰子并在输入文件的行长范围内生成5个随机数。
按住模式并生成sed命令，只在这些特定行上运行。
应用在输入文件上生成的这些命令。

票数 1

页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://unix.stackexchange.com/questions/688689

复制

相似问题

问AWK:在字典的源项后面插入随机选定的行中的目标项
EN

回答 2

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AWK:在字典的源项后面插入随机选定的行中的目标项EN

回答 2

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问AWK:在字典的源项后面插入随机选定的行中的目标项
EN