假设我有一个文本文件,其中包含以下两个观察:
liame@ziggo.nl:horse22| homeAddress = {
"city": "AMSTERDAM",
"houseNumber": "5",
"houseNumberAddition": null,
"postalCode": "1111 AN",
"street": "Walker",
"__typename": "ShopperAddress"
}
johndoe@live.nl:pizzalover1 | homeAddress = {
"city": "NEW YOK",
"houseNumber": "23",
"houseNumberAddition": null,
"postalCode": "9999 HV",
"street": "Marie Curie",
"__typename": "ShopperAddress"
}是否有一种方法可以读取此文本文件,使数据框架看起来如下:
username1 username2 city housenumber housenumber_addition postalcode street typename
liam@ziggo.nl horse22 AMSTERDAM 5 null 1111 AN Walker ShopperAddress
johndoe@live.nl pizzalover1 NEW YORK 23 null 9999 HV Marie Curie ShopperAddressThx
发布于 2021-08-21 14:11:59
您的文本文件显示了如何对数据进行编码的模式:
<username1>:<username2> | homeAddress = {
<json_data>
}我们将在两次传递中解析该文件:第一次传递将一个记录与另一个记录分离,第二次传递以选择记录中的字段:
中的字段
import json, re
import pandas as pd
data = []
pattern = re.compile(r"(.+?):(.+?)\s*\|\s*homeAddress = (.+)", re.DOTALL)
with open('data.txt') as fp:
record = ""
for line in fp:
record += line
if line == "}\n":
m = pattern.match(record)
if m:
username1 = m.group(1)
username2 = m.group(2)
home_address = json.loads(m.group(3))
data.append({
"username1": username1,
"username2": username2,
**home_address
})
record = ""
df = pd.DataFrame(data).rename(columns={"__typename": "typename"})发布于 2021-08-21 15:40:26
您可以稍加修改原始文本,使其成为有效的字典/JSON,并将其提供给pandas.read_json
(pd.read_json('[%s]'%re.sub(r'([^:\n]+):([^\|:]+)\s*\|\s*homeAddress = {',
r',{\n "username1":"\1",\n "username2":"\2",',
text)[1:])
.rename(columns={'houseNumber': 'housenumber',
'houseNumberAddition': 'housenumber_addition',
'postalCode': 'postalcode',
'__typename': 'typename'})
)产出:
username1 username2 city housenumber housenumber_addition postalcode street typename
0 liame@ziggo.nl horse22 AMSTERDAM 5 NaN 1111 AN Walker ShopperAddress
1 johndoe@live.nl pizzalover1 NEW YOK 23 NaN 9999 HV Marie Curie ShopperAddress中间重加工数据:
[{
"username1":"liame@ziggo.nl",
"username2":"horse22",
"city": "AMSTERDAM",
"houseNumber": "5",
"houseNumberAddition": null,
"postalCode": "1111 AN",
"street": "Walker",
"__typename": "ShopperAddress"
}
,{
"username1":"johndoe@live.nl",
"username2":"pizzalover1 ",
"city": "NEW YOK",
"houseNumber": "23",
"houseNumberAddition": null,
"postalCode": "9999 HV",
"street": "Marie Curie",
"__typename": "ShopperAddress"
}]https://stackoverflow.com/questions/68873272
复制相似问题