我有一个7000万行的数据框,我正尝试在swifter库的帮助下使用apply函数在数据框中添加一列
swifter库https://github.com/jmcarpenter2/swifter/blob/master/README.md
当我试图运行时,它给了我一个错误
‘级别必须与名称相同(无)’
#myfunction
def alert(c):
if c.count(" ") == 0:
return 'ngram1'
elif c.count(" ") == 1:
return 'ngram2'
elif c.count(" ")==2:
return 'ngram3'
else:
return 'NotAvailable'
all_dfs['ngram'] = all_dfs["word"].swifter.apply(alert,axis=1)
# sample dataframe
df = pd.DataFrame({'word': ["abc","abd cds" ,"abc cds fgh"], 'freq': [5, 6, 7],"doc":["666","5555","333"})预期的输出是一个列应该用特定值添加,但我得到一个错误'Level same as name (None)‘
根据我的想法,swifter只能处理数字列,
任何其他方式都将不胜感激。
发布于 2019-03-31 05:00:26
我认为这与“count”方法有关。我使用freq字段尝试了你的代码,但没有起作用。
然而,这个游戏在小示例中得到了您预期的结果。
import string
df['ngram'] = df["word"].apply(alert)
def alert_1(s):
ng = sum([i.strip(string.punctuation).isalpha() for i in s.split()])
if ng == 1:
return 'ngram1'
elif ng == 2:
return 'ngram2'
elif ng ==3:
return 'ngram3'
else:
return 'NotAvailable'
return sum([i.strip(string.punctuation).isalpha() for i in s.split()])
df.loc[:,"ngram_2"] = df["word"].swifter.apply(alert_1)
df
word freq doc ngram ngram_2
0 abc 5 666 ngram1 ngram1
1 abd cds 6 5555 ngram2 ngram2
2 abc cds fgh 7 333 ngram3 ngram3让我知道这是否适用于您的大型数据集。
https://stackoverflow.com/questions/55435380
复制相似问题