我正在尝试更改文本文件(.txt)中的数据结构,其数据如下:
:1:A
:2:B
:3:C
:1:D
:2:E
:3:F
:4:G
:1:H
:3:I
:4:J我想把它们转换成这样的格式(就像excel中的pivot-table,列名是":“之间的字符,每组总是以:1:开头。)
Group :1: :2: :3: :4:
1 A B C
2 D E F G
3 H I J有谁知道吗?提前谢谢。
发布于 2019-03-11 16:34:44
使用:
# Reading text file (assuming stored in CSV format, you can also use pd.read_fwf)
df = pd.read_csv('SO.csv', header=None)
# Splitting data into two columns
ndf = df.iloc[:, 0].str.split(':', expand=True).iloc[:, 1:]
# Grouping and creating a dataframe. Later dropping NaNs
res = ndf.groupby(1)[2].apply(pd.DataFrame).apply(lambda x: pd.Series(x.dropna().values))
# Post processing (optional)
res.columns = [':' + ndf[1].unique()[i] + ':' for i in range(ndf[1].nunique())]
res.index.name = 'Group'
res.index = range(1, res.shape[0] + 1)
res
Group :1: :2: :3: :4:
1 A B C
2 D E F G
3 H I J发布于 2019-03-11 16:17:29
首先使用header=None通过read_csv创建DataFrame,因为文件中没有头部:
import pandas as pd
temp=u""":1:A
:2:B
:3:C
:1:D
:2:E
:3:F
:4:G
:1:H
:3:I
:4:J"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0
0 :1:A
1 :2:B
2 :3:C
3 :1:D
4 :2:E
5 :3:F
6 :4:G
7 :1:H
8 :3:I
9 :4:J按DataFrame.pop提取原始列,然后将按Series.str.strip和Series.str.split值处理的:删除为2个新列。然后,通过与==的Series.eq进行比较来创建组,通过字符串0和Series.cumsum创建组,通过DataFrame.set_index创建组,最后通过Series.unstack进行整形
df[['a','b']] = df.pop(0).str.strip(':').str.split(':', expand=True)
df1 = df.set_index([df['a'].eq('1').cumsum(), 'a'])['b'].unstack(fill_value='')
print (df1)
a 1 2 3 4
a
1 A B C
2 D E F G
3 H I J发布于 2019-03-11 18:14:39
另一种方法是:
#read the file
with open("t.txt") as f:
content = f.readlines()
#Create a dictionary and read each line from file to keep the column names (ex, :1:) as keys and rows(ex, A) as values in dictionary.
my_dict={}
for v in content:
key = v.rstrip(':')[0:3] # take the value ':1:'
value = v.rstrip(':')[3] # take value 'A'
my_dict.setdefault(key,[]).append(value)
#convert dictionary to dataframe and transpose it
df = pd.DataFrame.from_dict(my_dict,orient='index').transpose()
df输出将如下所示:
:1: :2: :3: :4:
0 A B C G
1 D E F J
2 H None I Nonehttps://stackoverflow.com/questions/55097539
复制相似问题