我希望使用循环将单词DISCONTINUED从列中删除,并对模型进行训练以供进一步使用。我试过的密码:
# 1.
for i in df['DESCRIPTION'][0]:
if i[0]== 'DISCONTINUED':
df.i[0].pop(0)
# 2.
for item in df['DESCRIPTION']:
if str(item)[0]=='DISCONTINUED':
df.remove(item[0])注意:df是dataset名称,DESCRIPTION是列名。该列具有dtype object。我曾试图将其转换为str,但没有奏效。
列描述中的值是:
data = {'DESCRIPTION':
['ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038',
'COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753',
'DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES',
'COMMSCOPE, 252110, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE',
'ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP']}我想从包含多个值的列中删除DISCONTINUED,其中用逗号分隔。
发布于 2022-10-16 08:43:49
您的初始方法失败,部分原因是您将df.DESCRIPTION中的值视为包含字符串的列表,而不是简单的字符串。例如:
print(type(df['DESCRIPTION'][0]))
<class 'str'>在这种情况下,您可以使用Series.replace:
数据
import pandas as pd
# just adding some extra `DISCONTINUED` in `3, 4`
data = {'DESCRIPTION': ['ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038',
'COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753',
'DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES',
'COMMSCOPE, 252110, DISCONTINUED, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE',
'ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP, DISCONTINUED']}
df = pd.DataFrame(data)
print(df)
DESCRIPTION
0 ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038
1 COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753
2 DISCONTINUED, COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES
3 COMMSCOPE, 252110, DISCONTINUED, LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE
4 ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP, DISCONTINUED
# So, we have `DISCONTINUED` in `2` (at start), in `3` (third "elem"), and `4` (at end).码
df.DESCRIPTION = df.DESCRIPTION.replace(r',?\s?DISCONTINUED,?\s?','', regex=True)
print(df)
DESCRIPTION
0 ANDREW, 245173, 1/2-1/2 COLD SHRINK KIT, CEQ.24038
1 COMMSCOPE, 245174, 1/2, 3/8 COLDSRINK WTHRPRFNG KIT, CEQ.24753
2 COMMSCOPE, 252107, LACE UP I-LINE HOISTING GRIP FOR 1/2 CABLES
3 COMMSCOPE, 252110LACE UP HOISTING GRIP FOR 1-1/4 COAX & EW63/64 WAVEGUIDE
4 ANDREW, 252111, 1-5/8 HOISTING GRIP, LACE UP解释正则表达式,?\s?DISCONTINUED,?\s?
?匹配在0到1次之间的前一个令牌。例如,我们将它用于DISCONTINUED.之前和/或之后的潜在逗号(,)和空格(\s)
替代方法可以是:
df.DESCRIPTION = df.DESCRIPTION.apply(lambda row: ', '.join(
[c for c in row.split(', ') if c != 'DISCONTINUED']))发布于 2022-10-16 08:12:28
如果您的数据集是一个矩阵,其中第一列可能包含停止一词,那么它可能应该如下所示:
for row in df['DESCRIPTION']:
if row[0] == 'DISCONTINUED'
df.remove(row[0])您尝试过的代码的问题是#1只迭代df的第一行,#2只检查str(item)创建的字符串的第一个字符,因此它永远不会等于终止。将str(item)更改为str(item)也可以,但如果它已经是一个字符串,则没有理由将其“重新转换”为字符串。
https://stackoverflow.com/questions/74085472
复制相似问题