我正在编写一个程序来检查字符串STRING,看看它在哪里匹配使用gawk的SUBSTRING。我遇到的一个问题是,match函数只给出字符串中最左边的匹配。我目前的想法是使用gsub来找出SUBSTRING出现了多少次,然后使用match多次使用最后一个substring(STRING,RSTART+1)来查找每个位置的真正起始位置,当然还需要对代码进行一些编辑。我想知道是否有一种比这更简单的方法,或者一个内置的函数来提供所有的RSTARTS。
Example:
STRING=DDDADDCDFFDFGSDD
SUBSTRING=D编辑:
我查看了match的数组函数(感谢您为我提供了比我之前阅读的更多的最新文档)。这仍然不起作用,因为它允许您在同一个字符串中搜索多个东西,但仍然只给出每个字符串的最左边位置。
例如:
$ echo DDDADDCDFFDFGSDD | gawk '{match($0,/D/,a); for (i in a) print i,a[i]}'
0start 1
0length 1
0 D它可以找到最左边的多个东西。
echo gDDDADDCDFFDFGSDD | gawk '{match($0,/(D)(A)/,a); for (i in a) print i,a[i]}'
0start 4
0length 2
1start 4
2start 5
2length 1
1length 1
0 DA
1 D
2 A因此,我们仍在寻找最左边的匹配项(这就是文档中说的那样)。
发布于 2018-06-13 18:08:33
我发现没有一种本地的方法来处理这个问题,所以我编写了这个函数来完成它。这将只适用于允许多维数组的gawk版本,尽管使用旧版本的awk也会很简单,但之后的解析会更加困难。
该函数在字符串中搜索regex并填充数组MM。它返回-1如果有错误,0如果没有找到匹配,否则它返回找到的匹配数。
function multiMatch(string,subs){
split("",MM,"")
RLENGTH=0
RSTART=0
t=0
s=string
if (length(string) == 0 || length(subs) == 0){
print "Must have string and Regex to look for"
return -1
}
while (1) {
t=RSTART+t
s=substr(string,t+1)
if ( length(s) == 0 ){
break
}
match(s,subs)
if (RLENGTH == -1) {
break
}
found=substr(string,0,length(string)-(length(string)-t-RSTART+1))"-"substr(string,t+RSTART,RLENGTH)"-"substr(string,t+RSTART+RLENGTH);
MM[n]["RSTART"]=RSTART
MM[n]["RLENGTH"]=RLENGTH
MM[n]["STR"]=found
n++
}
return n
}示例
echo doogggogogggggggooogggogggggooogoooggoooo g*o | awk '
BEGIN{PROCINFO["sorted_in"]="@ind_num_asc"}
{
print "Found "multiMatch($1,$2)" Matches"
for (x in MM) {
print x,MM[x]["RSTART"],MM[x]["RLENGTH"],MM[x]["STR"]
}
}' 输出
Found 40 Matches
2 1 d-o-ogggogogggggggooogggogggggooogoooggoooo
1 1 1 do-o-gggogogggggggooogggogggggooogoooggoooo
2 1 4 doo-gggo-gogggggggooogggogggggooogoooggoooo
3 1 3 doog-ggo-gogggggggooogggogggggooogoooggoooo
4 1 2 doogg-go-gogggggggooogggogggggooogoooggoooo
5 1 1 dooggg-o-gogggggggooogggogggggooogoooggoooo
6 1 2 doogggo-go-gggggggooogggogggggooogoooggoooo
7 1 1 doogggog-o-gggggggooogggogggggooogoooggoooo
8 1 8 doogggogo-gggggggo-oogggogggggooogoooggoooo
9 1 7 doogggogog-ggggggo-oogggogggggooogoooggoooo
10 1 6 doogggogogg-gggggo-oogggogggggooogoooggoooo
11 1 5 doogggogoggg-ggggo-oogggogggggooogoooggoooo
12 1 4 doogggogogggg-gggo-oogggogggggooogoooggoooo
13 1 3 doogggogoggggg-ggo-oogggogggggooogoooggoooo
14 1 2 doogggogogggggg-go-oogggogggggooogoooggoooo
15 1 1 doogggogoggggggg-o-oogggogggggooogoooggoooo
16 1 1 doogggogogggggggo-o-ogggogggggooogoooggoooo
17 1 1 doogggogogggggggoo-o-gggogggggooogoooggoooo
18 1 4 doogggogogggggggooo-gggo-gggggooogoooggoooo
19 1 3 doogggogogggggggooog-ggo-gggggooogoooggoooo
20 1 2 doogggogogggggggooogg-go-gggggooogoooggoooo
21 1 1 doogggogogggggggoooggg-o-gggggooogoooggoooo
22 1 6 doogggogogggggggooogggo-gggggo-oogoooggoooo
23 1 5 doogggogogggggggooogggog-ggggo-oogoooggoooo
24 1 4 doogggogogggggggooogggogg-gggo-oogoooggoooo
25 1 3 doogggogogggggggooogggoggg-ggo-oogoooggoooo
26 1 2 doogggogogggggggooogggogggg-go-oogoooggoooo
27 1 1 doogggogogggggggooogggoggggg-o-oogoooggoooo
28 1 1 doogggogogggggggooogggogggggo-o-ogoooggoooo
29 1 1 doogggogogggggggooogggogggggoo-o-goooggoooo
30 1 2 doogggogogggggggooogggogggggooo-go-ooggoooo
31 1 1 doogggogogggggggooogggogggggooog-o-ooggoooo
32 1 1 doogggogogggggggooogggogggggooogo-o-oggoooo
33 1 1 doogggogogggggggooogggogggggooogoo-o-ggoooo
34 1 3 doogggogogggggggooogggogggggooogooo-ggo-ooo
35 1 2 doogggogogggggggooogggogggggooogooog-go-ooo
36 1 1 doogggogogggggggooogggogggggooogooogg-o-ooo
37 1 1 doogggogogggggggooogggogggggooogoooggo-o-oo
38 1 1 doogggogogggggggooogggogggggooogoooggoo-o-o
39 1 1 doogggogogggggggooogggogggggooogoooggooo-o-https://stackoverflow.com/questions/35327841
复制相似问题