我以前要求打印连续两个“”中的文本。例如,我有以下字符串:
gfdg "jkfgh" "jkfd fdgj fd-" ghjhgj
gfggf "kfdjfdgfhbg" "fhfghg" jhgj
jhfjhg "dfgdf" fgf
fgfdg "dfj jfdg jhfgjd" "hfgdh jfdhgd jkfghfd" hgjghj我只想打印以下内容:
"jkfgh" "jkfd fdgj fd-"
"kfdjfdgfhbg" "fhfghg"
"dfgdf"
"dfj jfdg jhfgjd" "hfgdh jfdhgd jkfghfd"我得到了使用这个命令的答案:
awk -F'"' '{for (i=2;i<5;i+=2) printf "%s%s%s%s", FS, $i, FS, (i>5-2?"\n":" ")}' sample.txt现在我必须将' '添加到我的问题中。例如,我的文本可以在' '中,也可以在" "中。示例如下:
gfdg "jkfgh" "jkfd fdgj fd-" ghjhgj
gfggf "kfdjfdgfhbg" "fhfghg" jhgj
jhfjhg "dfgdf 'ffdg' gfd" "dgffd 'fdg'"fgf
fgfdg 'dfj "jfdg" jhfgjd' 'hfgdh jfdhgd jkfghfd' hgjghj我希望得到以下结果:
"jkfgh" "jkfd fdgj fd-"
"kfdjfdgfhbg" "fhfghg"
"dfgdf 'ffdg' gfd" "dgffd 'fdg'"
'dfj "jfdg" jhfgjd' 'hfgdh jfdhgd jkfghfd'有人能帮帮我吗?
发布于 2015-04-10 22:37:04
{
a = ""
s = $0
# while s contains a delimiter (either " or ')
while (match(s, /['"]/)) {
# save the delimiter
c = substr(s, RSTART, 1)
# remove up to and including the delimiter
s = substr(s, RSTART + 1)
# find the matching delimiter
i = index(s, c)
# append the saved delimiter and the first segment of s to the accumulator
a = a " " c substr(s, 1, i)
# remove the segment
s = substr(s, i + 1)
}
# print the accumulator (dropping the first space)
print substr(a, 2)
}发布于 2015-04-10 20:41:43
最简单的事情可能是一次执行一个字符:
$ cat tst.awk
BEGIN { FS="" }
{
rec = ""
for (i=1;i<=NF;i++) {
if ( ($i=="\"") && !inSq ) {
rec = rec (inDq ? $i : (rec ? " " : ""))
inDq = !inDq
}
else if ( ($i=="'") && !inDq ) {
rec = rec (inSq ? $i : (rec ? " " : ""))
inSq = !inSq
}
if ( inDq || inSq ) {
rec = rec $i
}
}
print rec
}
$ awk -f tst.awk file
"jkfgh" "jkfd fdgj fd-"
"kfdjfdgfhbg" "fhfghg"
"dfgdf 'ffdg' gfd" "dgffd 'fdg'"
'dfj "jfdg" jhfgjd' 'hfgdh jfdhgd jkfghfd'也许你可以在gawk中使用一个RE来代替FPAT,但是我懒得去想它。即使在引号中有换行符,上面也可以通过各种方式工作,包括在gawk中使用RS='^$'将整个文件作为一个记录读取。
我真的很喜欢Dave Sines的答案(https://stackoverflow.com/a/29564199/1745001),但我认为它可以更简洁一些,所以我将其修改为:
$ cat tst.awk
{
rec = ""
while (match($0,/['"]/)) {
delim = substr($0,RSTART,1)
fldLgth = index(substr($0,RSTART+1),delim) + 1
rec = (rec ? rec " " : "") substr($0,RSTART,fldLgth)
$0 = substr($0,RSTART+fldLgth)
}
print rec
}
$ awk -f tst.awk file
"jkfgh" "jkfd fdgj fd-"
"kfdjfdgfhbg" "fhfghg"
"dfgdf 'ffdg' gfd" "dgffd 'fdg'"
'dfj "jfdg" jhfgjd' 'hfgdh jfdhgd jkfghfd'如果您喜欢,那么请接受dave的回答,并将其作为替代实现。
发布于 2015-04-11 01:21:55
引用我在https://stackoverflow.com/a/29513125/45375的答案的-适应-核心,在那里你已经问了基本上相同的问题(只是被一些误解弄混了)。
如果您有GNU Awk,您可以使用特殊的<>e212 FPAT 变量,而不是定义一个分隔符来定义描述字段的正则表达式(并忽略不能识别的标记),对引用字符串进行近似的识别:
gawk -v FPAT="\"[^\"]*\"|'[^']*'" '{
for(i=1;i<=NF;++i) printf "%s%s", $i, (i==NF ? "\n" : " ")
}' sample.txt此将使用单引号和双引号字符串,但不支持的嵌入式转义引号相同类型。
解释:
FPAT="\"[^\"]*\"|'[^']*'"将字段定义为双引号或单引号字符串,甚至是空字符串。$1中,...并且循环for(i=1;i<=NF;++i)已经被限制为仅枚举匹配的字段。字段中包含引号,如此处所需。https://stackoverflow.com/questions/29559774
复制相似问题