文章/答案/技术大牛

发布

社区首页 >问答首页 >如何提取字典键并将它们设置为Pandas数据帧中的列标题

问如何提取字典键并将它们设置为Pandas数据帧中的列标题
EN

Stack Overflow用户

提问于 2016-10-20 16:57:20

回答 1查看 3K关注 0票数 0

这个问题是How to remove curly braces, apostrophes and square brackets from dictionaries in a Pandas dataframe (Python)的一个分支。

我在csv文件中有以下数据

from StringIO import StringIO
import pandas as pd

the_data = """
ABC,2016-6-9 0:00,95,"{'//PurpleCar': [115L], '//YellowCar': [403L]}","{'GBP/NOK PAWS': [151L], 'CAD/EUR': [41L], 'EDM8-EDM9': [1833L]}"   
ABC,2016-6-10 0:00,0,"{'//PurpleCar': [219L], '//YellowCar': [381L]}","{'FBTPM5 2015-06-08': [472L], 'HKD/MXN': [0L], 'AUD/SEK DEWS': [19482L]}"   
ABC,2016-6-11 0:00,0,"{'//PurpleCar': [572L], '//YellowCar': [184L]}","{'V 2.000 03/31/25': [759L], 'AUD/JPY': [742L], 'AUD/SEK PAWS': [1784L]}"   
ABC,2016-6-12 0:00,0,"{'//PurpleCar': [80L], '//YellowCar': [2011L]}","{'CAR/FIN SWAP': [151L], 'HKD/MXN': [41L], 'RU4': [5829L]}"   
ABC,2016-6-13 0:00,0,"{'//PurpleCar': [32L], '//YellowCar': [15L]}","{'TRY/CHY OIS': [673L], 'NZD/MXN': [582L], 'AUD/SEK PAPS': [4846242L]}"   
DEF,2016-6-9 0:00,0,"{'//PurpleCar': [19L], '//BlackCar': [17L]}","{'ULM5-ULU5 2015-06-19': [18L], 'HKD/MXN': [64L], 'USD/JPY OPTS': [14714L]}"   
DEF,2016-6-10 0:00,0,"{'//PurpleCar': [32L], '//BlackCar': [15L]}","{'U 4.500 2/15/14': [151L], 'FVU6-FVZ6 2016-09-30': [194], 'AUD/SEK': [0L]}"   
DEF,2016-6-11 0:00,0,"{'//PurpleCar': [32L], '//BlackCar': [15L]}","{'EUR/JPY': [158L], 'ARS/MXN': [562L], 'GBP/JPY PAWS': [1759L]}"   
DEF,2016-6-12 0:00,0,"{'//PurpleCar': [28L], '//BlackCar': [96L]}","{'GBP/NOK OIS': [319], 'HKD/SAG': [103L], 'USD/INR': [3L]}"  
DEF,2016-6-13 0:00,0,"{'//PurpleCar': [32L], '//BlackCar': [15L]}","{'TNM6 2016-06-21': [193], 'EDH9': [1713L], 'GZ5': [0]}"
"""

从这个新数据集的第一行可以看到，在双引号中有两个字典，用逗号分隔：

"{'//PurpleCar': [115L], '//YellowCar': [403L]}"

和

"{'GBP/NOK PAWS': [151L], 'CAD/EUR': [41L], 'EDM8-EDM9': [1833L]}"

(我在How to remove curly braces, apostrophes and square brackets from dictionaries in a Pandas dataframe (Python)中提出的最初问题只涉及ONE字典。)

还请注意，在这个新的数据集中，第二个字典中的键值基本上可以是任何。

我使用以下代码读取数据。前三列是固定的，我们保持原样。第四列("Cars_str")我使用ast.literal_eval进行解析，因为它是一个dict

import ast
import pandas as pd

fixed_columns = pd.read_csv(StringIO(the_data),
                            names=["Company", "Date", "Value", "Cars_str",
                                   "Currency_str"])


cars = fixed_columns["Cars_str"].apply(ast.literal_eval)
del fixed_columns["Cars_str"]

接下来，我们准备函数来处理dict的键和值。

def get_single_item(list_that_always_has_single_item):
    v, = list_that_always_has_single_item
    return v

def extract_car_name(car_str):
    assert car_str.startswith("//"), car_str
    return car_str[2:]

然后，应用这些函数构造了pd.Series。

dynamic_columns = cars.apply(
    lambda x: pd.Series({
            extract_car_name(k): get_single_item(v) 
            for k, v in x.items()
    }))

最后，我们将列添加到数据框架中：

result = pd.concat([fixed_columns, dynamic_columns], axis=1)
result

这给了我们以下几点：

    Company Date    Value   Currency_str                                        BlackCar    PurpleCar   YellowCar
0   ABC 2016-6-9 0:00   95  {'GBP/NOK PAWS': [151L], 'CAD/EUR': [41L], 'ED...   NaN         115.0       403.0
1   ABC 2016-6-10 0:00  0   {'FBTPM5 2015-06-08': [472L], 'HKD/MXN': [0L],...   NaN         219.0       381.0
2   ABC 2016-6-11 0:00  0   {'V 2.000 03/31/25': [759L], 'AUD/JPY': [742L]...   NaN         572.0       184.0
3   ABC 2016-6-12 0:00  0   {'CAR/FIN SWAP': [151L], 'HKD/MXN': [41L], 'RU...   NaN         80.0        2011.0
4   ABC 2016-6-13 0:00  0   {'TRY/CHY OIS': [673L], 'NZD/MXN': [582L], 'AU...   NaN         32.0        15.0
5   DEF 2016-6-9 0:00   0   {'ULM5-ULU5 2015-06-19': [18L], 'HKD/MXN': [64...   17.0        19.0        NaN
6   DEF 2016-6-10 0:00  0   {'U 4.500 2/15/14': [151L], 'FVU6-FVZ6 2016-09...   15.0        32.0        NaN
7   DEF 2016-6-11 0:00  0   {'EUR/JPY': [158L], 'ARS/MXN': [562L], 'GBP/JP...   15.0        32.0        NaN
8   DEF 2016-6-12 0:00  0   {'GBP/NOK OIS': [319], 'HKD/SAG': [103L], 'USD...   96.0        28.0        NaN
9   DEF 2016-6-13 0:00  0   {'TNM6 2016-06-21': [193], 'EDH9': [1713L], 'G...   15.0        32.0        NaN

我遇到的问题是，我希望能够接受'Currency_str'列并执行以下操作：

1)提取键并将其设置为数据帧中的列标题。

2)将这些键的关联值保留为行的元素

这正是，与我们对'Cars_str'字典(以及我在How to remove curly braces, apostrophes and square brackets from dictionaries in a Pandas dataframe (Python)中接受的解决方案)所做的一样。

本质上，我希望能够将上述代码应用于和，其中键是列标题，值是行的元素。

有人能帮我修改代码吗?我们可以完成上面的第1点和第2点吗？

谢谢!

更新->>解决方案：

我找到了一个可行的解决方案。下面是：

import ast
import pandas as pd

fixed_columns = pd.read_csv(StringIO(the_data),
                            names=["Company", "Date", "Value", "Cars_str",
                                       "Currency_str"])


cars = fixed_columns["Cars_str"].apply(ast.literal_eval)
del fixed_columns["Cars_str"]

currencies = fixed_columns["Currency_str"].apply(ast.literal_eval)
del fixed_columns["Currency_str"]

def get_single_item(list_that_always_has_single_item):
    v, = list_that_always_has_single_item
    return v

def extract_car_name(car_str):
    assert car_str.startswith("//"), car_str
    return car_str[2:]

def extract_instrument_name(currency_str):
    assert currency_str.startswith(""), currency_str
    return currency_str[2:]


dynamic_column_01 = cars.apply(
    lambda x: pd.Series({
            extract_car_name(k): get_single_item(v) 
            for k, v in x.items()
    }))

dynamic_column_02 = currencies.apply(
    lambda x: pd.Series({
            extract_instrument_name(k): get_single_item(v) 
            for k, v in x.items()
    }))


result = pd.concat([fixed_columns, dynamic_column_01, dynamic_column_02], axis=1)
result

python

pandas

dictionary

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-10-21 16:40:20

以下是答案：

import ast
import pandas as pd

fixed_columns = pd.read_csv(StringIO(the_data),
                            names=["Company", "Date", "Value", "Cars_str",
                                       "Currency_str"])


cars = fixed_columns["Cars_str"].apply(ast.literal_eval)
del fixed_columns["Cars_str"]

currencies = fixed_columns["Currency_str"].apply(ast.literal_eval)
del fixed_columns["Currency_str"]

def get_single_item(list_that_always_has_single_item):
    v, = list_that_always_has_single_item
    return v

def extract_car_name(car_str):
    assert car_str.startswith("//"), car_str
    return car_str[2:]

def extract_instrument_name(currency_str):
    assert currency_str.startswith(""), currency_str
    return currency_str[2:]


dynamic_column_01 = cars.apply(
    lambda x: pd.Series({
            extract_car_name(k): get_single_item(v) 
            for k, v in x.items()
    }))

dynamic_column_02 = currencies.apply(
    lambda x: pd.Series({
            extract_instrument_name(k): get_single_item(v) 
            for k, v in x.items()
    }))


result = pd.concat([fixed_columns, dynamic_column_01, dynamic_column_02], axis=1)
result

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40160268

复制

相似问题

问如何提取字典键并将它们设置为Pandas数据帧中的列标题
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何提取字典键并将它们设置为Pandas数据帧中的列标题EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何提取字典键并将它们设置为Pandas数据帧中的列标题
EN