我正在尝试提取一个包含domain:的行,然后是从nameservers"到]的一系列行。我知道如何分别使用grep和awk分别完成这些工作,但不知道如何一次完成。
输入数据:
domain: stackexchange.com date: Mon Jul 3 00:43:49 2017 output_dir: /tmp/stackexchange.com.12653
INFO:root:crawl: exiting dom: 'stackexchange.com' took: 10s
INFO:root:2017-07-03 00:44:06:370 slave.py: exiting args.url: 'stackexchange.com' took: 3s
+ comparing web systems
"mail_server_ip": [], | "mail_server_ip": []
"nameservers": [
"ns-925.awsdns-51.net.",
"ns-1029.awsdns-00.org.",
"ns-cloud-d1.googledomains.com.",
"ns-cloud-d2.googledomains.com.",
],
"nameservers_domains": [ | "nameservers_domains": [],
"m期望产出:
domain: stackexchange.com date: Mon Jul 3 00:43:49 2017 output_dir:
"nameservers": [
"ns-925.awsdns-51.net.",
"ns-1029.awsdns-00.org.",
"ns-cloud-d1.googledomains.com.",
"ns-cloud-d2.googledomains.com.",
],单独提取数据的命令:
grep "domain:" test_sample.txt
awk '/nameservers"/,/]/' test_sample.txt发布于 2017-07-03 21:48:03
awk方法
awk '/^domain:/{print}/"nameservers":/,/]/' test_sample.txt发布于 2017-07-03 21:03:32
采用P (PCRE)选项的grep方法:
grep -Poz 'domain: .+ output_dir:|\s*"nameservers": \[[^][]+\],\n' test_sample.txt产出:
domain: stackexchange.com date: Mon Jul 3 00:43:49 2017 output_dir:
"nameservers": [
"ns-925.awsdns-51.net.",
"ns-1029.awsdns-00.org.",
"ns-cloud-d1.googledomains.com.",
"ns-cloud-d2.googledomains.com.",
],主要的模式是基于regex交替组<domain_line>|<nameservers_lines>。
https://unix.stackexchange.com/questions/375065
复制相似问题