我有基于列表字典中的规则对数据进行分类的工作代码。我想知道是否有可能通过使用列表/字典理解或.values()来消除嵌套的for循环,从而提高代码的效率。
import pandas as pd
df=pd.DataFrame({'Animals': [ 'Python', 'Anaconda', 'Viper', 'Cardinal',
'Trout', 'Robin', 'Bass', 'Salmon', 'Turkey', 'Chicken'],
'Noise': ['Hiss','SSS','Hisss','Chirp','Splash','Chirp',
'Gulp','Splash','Gobble','Cluck'],
})
snakenoise =['Hiss','SSS','Hisss', 'Wissss', 'tseee']
birdnoise =['Chirp', 'squeak', 'Cluck', 'Gobble']
fishnoise =['Splash', 'Gulp', 'Swim']
AnimalDex = {'Snake':['0', 'slither',snakenoise],
'Bird':['2','fly', birdnoise],
'Fish':['0','swim',fishnoise],
}
df['movement'] = ''
for key, value in AnimalDex.items():
for i in range(len(AnimalDex[key][2])):
df.loc[df.Noise.str.contains(AnimalDex[key][2][i]),'movement'] = AnimalDex[key][1]
print (df)以下是输出
Animals Noise movement
0 Python Hiss slither
1 Anaconda SSS slither
2 Viper Hisss slither
3 Cardinal Chirp fly
4 Trout Splash swim
5 Robin Chirp fly
6 Bass Gulp swim
7 Salmon Splash swim
8 Turkey Gobble fly
9 Chicken Cluck fly发布于 2015-04-28 18:44:10
如果您只使用值而不是键和索引,则可以真正简化您的循环。
for animal in AnimalDex.values():
for value in animal[2]:
df.loc[df.Noise.str.contains(value),'movement'] = animal[1]发布于 2015-04-28 20:22:59
效率并不来自将循环重写为理解,因为理解主要为循环提供了更好的语法。相反,重要的是数据结构查找的效率。问题是df.Noise.str.contains(AnimalDex[key][2][i])执行暴力匹配。
如果您的目标是将AnimalDex中定义的运动合并到df中,并根据噪波加入,那么构建一个将噪波映射到运动的字典是值得的:
noise_to_movement = {}
for order in AnimalDex.values():
for noise in order[2]:
noise_to_movement[noise] = order[1]为了进行比较,这里有另一种构造noise_to_movement的方法,使用不可理解的理解:
import itertools
noise_to_movement = dict(itertools.chain(*[list(
itertools.product(order[2], [order[1]])) for order in AnimalDex.values()
]))无论采用哪种方法,一旦构建了字典,设置'movement'列就变成了一件微不足道的事情:
df['movement'] = list(noise_to_movement[n] for n in df.Noise)发布于 2015-04-28 20:29:29
要真正提高性能,根本不应该遍历字典。而是从数据中生成一个pandas.DataFrame,并连接这两个DataFrames。
import pandas as pd
df = pd.DataFrame({'Animals': [ 'Python', 'Anaconda', 'Viper', 'Cardinal',
'Trout', 'Robin', 'Bass', 'Salmon', 'Turkey', 'Chicken'],
'Noise': ['Hiss','SSS','Hisss','Chirp','Splash','Chirp',
'Gulp','Splash','Gobble','Cluck']})
snakenoise =['Hiss','SSS','Hisss', 'Wissss', 'tseee']
birdnoise =['Chirp', 'squeak', 'Cluck', 'Gobble']
fishnoise =['Splash', 'Gulp', 'Swim']
noises = [(snakenoise, 'Snake', '0', 'slither'),
(birdnoise, 'Bird', '2', 'fly'),
(fishnoise, 'Fish', '0', 'swim')]
animal_dex = {'Animal Type': [],
'Whatever': [],
'Movement': [],
'Noise': []}
for noise in noises:
animal_dex['Noise'] += noise[0]
animal_dex['Animal Type'] += map(lambda x: noise[1], noise[0])
animal_dex['Whatever'] += map(lambda x: noise[2], noise[0])
animal_dex['Movement'] += map(lambda x: noise[3], noise[0])
df1 = pd.DataFrame(animal_dex)
df = df.merge(df1, on='Noise')
df
Animals Noise Animal Type Movement Whatever
0 Python Hiss Snake slither 0
1 Anaconda SSS Snake slither 0
2 Viper Hisss Snake slither 0
3 Cardinal Chirp Bird fly 2
4 Robin Chirp Bird fly 2
5 Trout Splash Fish swim 0
6 Salmon Splash Fish swim 0
7 Bass Gulp Fish swim 0
8 Turkey Gobble Bird fly 2
9 Chicken Cluck Bird fly 2https://stackoverflow.com/questions/29916372
复制相似问题