我有一个文件,其中包含以下数据。我正在分析数据。
08/23/21 04:00:05 AM
/* ----------------- data1----------------- */
make: honda model: civic
year: 2019
trim: "lx"
owner: phillip
/* ----------------- data2----------------- */
make: toyota model: highlander
year: 2021
trim: "Platinum"我想看到这样的数据:
Make, Model, Year, trim, Owner
Honda, civic, 2019, lx, phillip
toyota, highlander, 2021, platinum, Rex这里是我的代码:,我尝试创建字典,然后加载到熊猫数据。我想我的方向不对。
def fix_line(record):
#split every field and value into a seperate line
results = []
mini_collection = []
if not record.startswith("/*"):
#for data in record.rstrip('\n').strip().split(' '):
for data in record.rstrip('\n').split(' '):
if ':' not in data:
mini_collection.append(data)
else:
results.append(data)
return results
def create_dictionary(data):
record = {}
for line in fix_line(data):
line = line.strip()
name, value = line.split(':', 1)
record[name.strip()] = value.strip()
return record发布于 2021-09-19 04:06:46
以下是一种方法:
import re
import yaml #python -m pip install pyyaml
import pandas as pd
s = """08/23/21 04:00:05 AM
/* ----------------- data1----------------- */
make: honda
model: civic
year: 2019
trim: lx
owner: phillip
/* ----------------- data2----------------- */
make: toyota
model: highlander
year: 2021
trim: Platinum
owner: Rex
"""
lines = re.split("/*\s*/", s)
records = [yaml.load(line) for line in lines if "make:" in line]
df = pd.DataFrame(records)产出:
make model year trim owner
0 honda civic 2019 lx phillip
1 toyota highlander 2021 Platinum Rex发布于 2021-09-19 03:02:52
尝试使用re.finditer和下面的pattern创建基于查找的dictionary。然后附加到数据文件中。
import re
pattern = """
(?P<make>(?<=(make:\ ))\w+) #use lookbehind regex to get make
(\s + model: \ ) #Skip to model
(?P<model>\w+) #Get Model
(\s year: \ ) #Skip to year
(?P<year>\d+) #Get year
(\s + trim: \ ") #Skip to trim
(?P<trim>\w+) #Get trim
("\s) #Skip to owner
(?P<owner>.*) #Get owner
"""
df = pd.DataFrame([item.groupdict() for item in re.finditer(pattern, data, re.VERBOSE)])
df["owner"] = df["owner"].str.replace("owner: ", "")
df
Out[563]:
make model year trim owner
0 honda civic 2019 lx phillip
1 toyota highlander 2021 Platinum https://stackoverflow.com/questions/69239802
复制相似问题