首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >解析半结构化json数据(Python/R)

解析半结构化json数据(Python/R)
EN

Stack Overflow用户
提问于 2014-11-18 06:01:18
回答 2查看 383关注 0票数 0

我不擅长正则表达式或编程。

我在文本文件中有这样的数据:

代码语言:javascript
复制
RAMCHAR@HOTMAIL.COM (): 
PATTY.FITZGERALD327@GMAIL.COM ():
OHSCOACHK13@AOL.COM (19OB3IRCFHHYO): [{"num":1,"name":"Bessey VAS23 Vario Angle Strap Clamp","link":"http:\/\/www.amazon.com\/dp\/B0000224B3\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I1YMLERDXCK3UU&psc=1","old-price":"N\/A","new-price":"","date-added":"October 19, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51VMDDHT20L._SL500_SL135_.jpg","page":1},{"num":2,"name":"Designers Edge L-5200 500-Watt Double Bulb Halogen 160 Degree Wide Angle Surround Portable Worklight, Red","link":"http:\/\/www.amazon.com\/dp\/B0006OG8MY\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I1BZH206RPRW8B","old-price":"N\/A","new-price":"","date-added":"October 8, 2014","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/5119Z4RDFYL._SL500_SL135_.jpg","page":1},{"num":3,"name":"50 Pack - 12"x12" (5) Bullseye Splatterburst Target - Instantly See Your Shots Burst Bright Florescent Yellow Upon Impact!","link":"http:\/\/www.amazon.com\/dp\/B00C88T12K\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I31RJXFVF14TBM","old-price":"N\/A","new-price":"","date-added":"October 8, 2014","priority":"","rating":"N\/A","total-ratings":"67","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/51QwsvI43IL._SL500_SL135_.jpg","page":1},{"num":4,"name":"DEWALT DW618PK 12-AMP 2-1\/4 HP Plunge and Fixed-Base Variable-Speed Router Kit","link":"http:\/\/www.amazon.com\/dp\/B00006JKXE\/ref=wl_it_dp_v_nS_ttl\/181-6441163-6563619?_encoding=UTF8&colid=37XI10RRD17X2&coliid=I39QDQSBY00R56&psc=1","old-price":"N\/A","new-price":"","date-added":"September 3, 2012","priority":"","rating":"N\/A","total-ratings":"","comment":"","picture":"http:\/\/ecx.images-amazon.com\/images\/I\/416a5nzkYTL._SL500_SL135_.jpg","page":1}]

是否有人建议将数据分成两列(第一列为email id,第二列为json格式)。有些行可能只具有电子邮件id(如第1行),而没有相应的json数据。

请帮帮忙。谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-11-18 06:59:17

请尝试以下解决方案(对于Python 2)。这假设每个条目都位于一行上(这意味着JSON子字符串中可能没有换行)。我选择了in.txt作为数据文件的文件名-将其更改为实际的文件名/路径:

代码语言:javascript
复制
import csv
import re
regex = re.compile("""
    ([^:]*)  # Match and capture any characters except colons
    :[ ]*    # Match a colon, followed by optional spaces
    (.*)     # Match and capture the rest of the line""", 
    re.VERBOSE)
with open("in.txt") as infile, open("out.csv", "wb") as outfile:
    writer = csv.writer(outfile)
    for line in infile:
       writer.writerow(regex.match(line).groups())
票数 0
EN

Stack Overflow用户

发布于 2014-11-18 07:08:59

如果您在Linux/Unix环境中,可以像这样使用sed (a.txt是您的输入文件):

代码语言:javascript
复制
<a.txt sed 's/\(^[^ (]*\)[^:]*: */\1 /'

正则表达式^[^ (]*匹配每一行(^)的开头和不是空格或左括号([^ (]*)的更多字符的零,并将其放在\(\)周围,使sed“记住”匹配的字符串为\1。然后,[^:]*: *表达式匹配所有向上字符,并包括冒号和后面的零或多个空格。然后,所有匹配的表达式都在每一行中替换为记住的/1字符串,这实际上是电子邮件。行的其余部分是JSON数据,它们保持不变。

如果您想要一个CSV或Tab分隔的文件,则需要在\1之后替换空格字符。

代码语言:javascript
复制
<a.txt sed 's/\(^[^ (]*\)[^:]*:/\1,/'
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/26987662

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档