我有一个带有嵌入列表的文档结构,在我使用聚合管道展开它之后,我得到了以下内容。
注意:我在这个查询中使用Python和pymongo。
pipeline = [
{'$unwind': '$saved_alloys'},
{
'$project': {
'_id': 0,
'name': '$saved_alloys.name',
'compositions': '$saved_alloys.compositions'
}
}
]
res = db['alloys'].aggregate(pipeline)
for e in res:
print(e)输出(截断):
{
'name': 'alloy-1',
'compositions': [
{'symbol': 'C', 'weight': 0.36},
{'symbol': 'Mn', 'weight': 1.41}
{'symbol': 'Si', 'weight': 1.03},
{'symbol': 'Ni', 'weight': 1.7}
]
}
{
'name': 'alloy-2',
'compositions': [
{'symbol': 'C', 'weight': 0.21},
{'symbol': 'Mn', 'weight': 0.23},
{'symbol': 'Si', 'weight': 0.86},
{'symbol': 'Ni', 'weight': 0.67},
{'symbol': 'Cr', 'weight': 0.12},
]
}
...我试图通过将构图的维度减少到笛卡尔平面来对此进行一些数据分析。因此,我想通过执行pd.DataFrame(list(res))将其放入熊猫DataFrame中。
我想要得到一个具有以下结构的表:
{
"name": "alloy-1",
"C": 0.36,
"Mn": 1.41,
"Si": 1.03,
"Ni": 1.7,
"Cr": 0.0
},
{
"name": "alloy-2",
"C": 0.21,
"Mn": 0.23,
"Si": 0.86,
"Ni": 0.67,
"Cr": 0.12
}请注意,compositions列表可以具有可变大小,因此对于那些不在列表中的元素,我希望添加它们,但weights值为零(就像alloy-1中的Cr一样)。
提前感谢您的帮助。
发布于 2019-10-20 18:34:29
毫无疑问,这是可以优化的,但作为一种开始的简单方法,为返回的每个输出构造一个pandas序列,并将其附加到DataFrame;最后将任何“缺少”的值替换为0.0。
from pymongo import MongoClient
import pandas as pd
import numpy as np
db = MongoClient()["mydatabase"]
db.alloys.insert_one({
'saved_alloys': [{
'name': 'alloy-1',
'compositions': [
{'symbol': 'C', 'weight': 0.36},
{'symbol': 'Mn', 'weight': 1.41},
{'symbol': 'Si', 'weight': 1.03},
{'symbol': 'Ni', 'weight': 1.7}
]
},
{
'name': 'alloy-2',
'compositions': [
{'symbol': 'C', 'weight': 0.21},
{'symbol': 'Mn', 'weight': 0.23},
{'symbol': 'Si', 'weight': 0.86},
{'symbol': 'Ni', 'weight': 0.67},
{'symbol': 'Cr', 'weight': 0.12},
]
}]
}
)
pipeline = [
{'$unwind': '$saved_alloys'},
{
'$project': {
'_id': 0,
'name': '$saved_alloys.name',
'compositions': '$saved_alloys.compositions'
}
}
]
res = db['alloys'].aggregate(pipeline)
df = pd.DataFrame()
for alloy in res:
ser = pd.Series()
# Set the series name as the alloy
ser.name = alloy['name']
for composition in alloy['compositions']:
# Add in each alloy to the series
ser.at[composition['symbol']] = composition['weight']
df = df.append(ser)
# Once we have our DataFrame, replace any missing values with 0.0
df = df.replace(np.nan, 0.0)
print(df)结果:
C Mn Ni Si Cr
alloy-1 0.36 1.41 1.70 1.03 0.00
alloy-2 0.21 0.23 0.67 0.86 0.12https://stackoverflow.com/questions/58469999
复制相似问题