我有一个文本文件,在这个文本文件中,我想要存储的数据在我想要分配的每个名称之后依次存在。基本上,我想要一个如下所示的文本文件:
弗雷德
quiz1,B
quiz2,C
苏西
quiz1,A
quiz2,B
并创建如下所示的数据框架
姓名、任务、职系
弗雷德,quiz1,B
弗雷德,quiz1,C
苏西,quiz1,A
苏西,quiz2,B
“”“
发布于 2019-12-23 21:27:41
你可以试试这样的东西:
import pandas as pd
from io import StringIO
# Create textfile
txtfile = StringIO("""Fred
quiz1, B
quiz2, C
Suzie
quiz1, A
quiz2, B""")
#use pandas to read in text file as a single column
df = pd.read_csv(txtfile, header=None, sep='\s\s+', engine='python')
#Use str split to seperate columns
df = df[0].str.split(',', expand=True)
#Use groupby with transform to take first value of the "name" column and copy down to the rest of the group
df[2] = df.groupby(df[1].isna().cumsum())[0].transform('first')
#drop the first record which has None.
df_out = df.dropna()
print(df_out)输出:
0 1 2
1 quiz1 B Fred
2 quiz2 C Fred
4 quiz1 A Suzie
5 quiz2 B Suzie发布于 2019-12-23 21:48:38
这里有一个例子:
from io import StringIO
import pandas as pd
import numpy as np
data = """
Fred
quiz1, B
quiz2, C
Suzie
quiz1, A
quiz2, B
Susy
quiz1, E
quiz2, F
"""
df = pd.read_csv(StringIO(data),sep=',', names=['Assignment', 'Grade','Name'], header=None)
df['Name']= np.where(df['Grade'].isnull(),df['Assignment'],np.NaN)
df['Name'] = df['Name'].ffill()
print(df.dropna(subset=['Grade']))发布于 2019-12-23 21:37:16
我建议不要把这个文件直接装入熊猫。只需逐行读取它,并创建一个可以转换为DataFrame的新数组。
grades = []
with open("your_file.txt", "r") as f:
for line in f.read().split('\n'):
if "," not in line:
# should be a name line
name = line
else:
# split into [test, grade] and append to grades with the name
grades.append([name, line.split(',')[0].strip(), line.split(',')[1].strip()])
# convert to DataFrame
grades = pd.DataFrame(grades, columns=['Name', 'Assignment', 'Grade'])https://stackoverflow.com/questions/59461205
复制相似问题