首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >大熊猫数据汇总

大熊猫数据汇总
EN

Stack Overflow用户
提问于 2020-07-06 17:30:04
回答 3查看 160关注 0票数 1

我有一个模糊数据,如下所示。请注意,第一项有重复的名称(这是一种重要的考虑)。

代码语言:javascript
复制
('Alex', ['String1', 'String34'])
('Piper', ['String5', 'String64', 'String12'])
('Nicky', ['String3', 'String21', 'String42', 'String51'])
('Linda', ['String14'])
('Suzzane', ['String11', 'String36', 'String16'])
('Alex', ['String64', 'String34', 'String12', 'String5'])
('Linda', ['String3', 'String77'])
('Piper', ['String41', 'String64', 'String11', 'String34'])
('Suzzane', ['String12'])
('Nicky', ['String11',  'String51'])
('Alex', ['String77', 'String64', 'String3', 'String5'])
('Linda', ['String51'])
('Nicky', ['String77', 'String12', 'String34'])
('Suzzane', ['String51', 'String3'])
('Piper', ['String11', 'String64', 'String5'])

如果上述数据位于一个名为"output.txt“的文件中,那么如何导入这些数据并进行汇总,如下所示?

只保留唯一的名称,对于每个主名称,只有唯一的字符串将从存在的所有重复项中填充。

代码语言:javascript
复制
('Alex', ['String1', 'String34', 'String64', 'String12', 'String5', 'String77', 'String3'])
('Piper', ['String5', 'String64', 'String12', 'String11', 'String41', 'String34'])
('Nicky', ['String3', 'String21', 'String42', 'String51', 'String11', 'String77', 'String12', 'String34'])
('Linda', ['String14', 'String3', 'String77', 'String51'])
('Suzzane', ['String11', 'String36', 'String16', 'String12', 'String51', 'String3'])
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-07-26 22:16:26

代码语言:javascript
复制
import ast
import csv
import pandas as pd

#load data from txt file, doesnt has to be csv, can be a txt file!
df = pd.read_csv(r"D:\test\output.txt", sep="/n", header=None, names=["data"], engine='python')

#convert text data to tupels and list
df["data"] = df["data"].map(lambda x: ast.literal_eval(x))
#extract surename
df["surename"] = df["data"].map(lambda x: x[0])
#extract list of strings
df["strings"] = df["data"].map(lambda x: x[1])
#create 1 row for each string in the list of strings
df = df.explode("strings")
#remove duplicate entries
df = df.drop_duplicates(subset=["surename", "strings"], keep="first")
#group the data by surename to get a list of unique strings (unique because we removed duplicates, order will be kept)
df_result = df.groupby(["surename"]).aggregate({"strings":list}).reset_index()
#combine both th extractd surename and the modified list of strings again
df_result["result"] = df_result.apply(lambda x: (x["surename"], x["strings"]), axis=1)

#output the data to a file of your choice
df_result[["result"]].to_csv(r"D:\test\result.txt",index=False, header=None, quoting=csv.QUOTE_NONE, escapechar = '')
票数 1
EN

Stack Overflow用户

发布于 2020-07-26 18:00:23

您可以将数据加载到熊猫dataframe中。

代码语言:javascript
复制
import pandas as pd

df = pd.DataFrame(data=[('Alex', ['String1', 'String34']),
('Alex', ['String64', 'String34', 'String12', 'String5']),
('Nicky', ['String11',  'String51']),
('Nicky', ['String77', 'String12', 'String34'])])
df = df.rename(columns={0:'name', 1:'strings'})

然后创建一个function来连接熊猫列上的列表:

代码语言:javascript
复制
def concatenate(strings):
   strings_agg = []
    for string in strings:
        strings_agg.extend(string)
    return strings_agg

最后,将函数apply到列:

代码语言:javascript
复制
df.groupby('name').apply(lambda x: concatenate(x['strings'])).to_frame()
票数 3
EN

Stack Overflow用户

发布于 2020-08-02 00:22:03

代码语言:javascript
复制
data = []
a_dict = {}
unique = []

#considering that the file name is a.txt here.
#After opening the file i used the eval function to turn the string into code
#now the list data will have all the file's data, all elements inside list data are tuples
with open('a.txt','r') as file:
    for i in file.readlines():
        a = eval(i)
        data.append(a)

#here i wrote this code for collecting all unique name in a list
for i in data:
    if i[0] not in unique:
        unique.append(i[0])


#after collecting unique names inside list unique, i performed iteration over all values inside list unique.
#
#then i performed iteration on the list which is holding all the data
#
#compared all the unique values with the list data and
#then if they are matching then adding those values inside a list a_list
#
#when it is finished with the iteration inside list data, it will add that list into a dict a_dict with its unique value
#
#a_list will be assigned a new list for the next unique value
for i in unique:
    a_list = []
    for j in data:
        if i==j[0]:
            a_list.extend(j[1])
    a_dict[i] = list(tuple(a_list))
    
#This piece of code is to print out the data in a formatted way.
for i,j in a_dict.items():
    print("('{}', {})".format(i,j))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62761394

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档