嗨,我正在尝试清理数据,但在读取作为单独数据帧列的json文件时遇到问题。我在一个文件中有数千条这样的记录:
{"hotel_class": 4.0,
"region_id": 60763,
"url": "http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html",
"phone": "",
"details": null,
"address": {"region": "NY", "street-address": "147 West 43rd Street", "postal-code": "10036", "locality": "New York City"},
"type": "hotel",
"id": 113317,
"name": "Casablanca Hotel Times Square"}我尝试将其加载为:
with open('offering.txt') as datafile:
data_json = json.load(datafile)但是它给出了一个错误,即
JSONDecodeError: Extra data: line 2 column 1 (char 398)所以我试着用
df=pd.read_json('offering.txt',lines=True)但是如果我这样做,我的address列有嵌套的值,我想将它们分隔在不同的列中。该怎么做呢?
df['address']
0 {'region': 'NY', 'street-address': '147 West 4...
1 {'region': 'CA', 'street-address': '300 S Dohe...
2 {'region': 'NY', 'street-address': '790 Eighth...
3 {'region': 'NY', 'street-address': '152 West 5...
4 {'region': 'NY', 'street-address': '130 West 4...
Name: address, Length: 4333, dtype: object发布于 2021-07-11 20:34:59
尝试:
df = pd.read_json("offering.txt", lines=True)
df_out = pd.concat([df, df.pop("address").apply(pd.Series)], axis=1)
print(df_out)打印:
hotel_class region_id url phone details type id name region street-address postal-code locality
0 4 60763 http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html NaN hotel 113317 Casablanca Hotel Times Square NY 147 West 43rd Street 10036 New York City
1 5 60763 http://www.tripadvisor.com/Hotel_Review-g60763-d113317-Reviews Casablanca_Hotel_Times_Square-New_York_City_New_York.html NaN hotel 113317 Casablanca Hotel Times Square CA 147 West 43rd Street 10036 New York City
...https://stackoverflow.com/questions/68336247
复制相似问题