我有一个混合数字和字符数据的数据集。我只想提取数字数据和字母"W“(我不需要'2×HDMI \2×USB‘.)。
例如在本例中(20W,30W等)。谢谢你的帮助
v=['2 x HDMI | 2 x USB', '20 W Speaker Output', '10 W Speaker Output',
'20 W Speaker Output', '20 W Speaker Output',
'20 W Speaker Output', '20 W Speaker Output', '20 Speaker Output',
'20 W Speaker Output', '20 W Speaker Output',
'30 W Speaker Output', '20 W Speaker Output',
'20 W Speaker Output', '2 x HDMI | 2 x USB', '20 W Speaker Output',
'20 Speaker Output', '24 W Speaker Output', '20 W Speaker Output']
df=pd.DataFrame({"col_1":v})发布于 2022-07-06 15:40:17
您可以使用正则表达式和一些列表理解技巧来获得您想要的:
import re
import pandas as pd
v=['2 x HDMI | 2 x USB', '20 W Speaker Output', '10 W Speaker Output',
'20 W Speaker Output', '20 W Speaker Output',
'20 W Speaker Output', '20 W Speaker Output', '20 Speaker Output',
'20 W Speaker Output', '20 W Speaker Output',
'30 W Speaker Output', '20 W Speaker Output',
'20 W Speaker Output', '2 x HDMI | 2 x USB', '20 W Speaker Output',
'20 Speaker Output', '24 W Speaker Output', '20 W Speaker Output']
df=pd.DataFrame({"col_1":[v.group(0) for v in [re.search('\d+\s?[Ww]', v) for v in v] if v]})..。在以下方面的成果:
>>> df
col_1
0 20 W
1 10 W
2 20 W
3 20 W
4 20 W
5 20 W
6 20 W
7 20 W
8 30 W
9 20 W
10 20 W
11 20 W
12 24 W
13 20 W发布于 2022-07-06 15:40:38
尝试:
import re
for x in v:
ms = re.compile(r'\d+\s[wW]')
m = re.search(ms, x)
print(m.group())发布于 2022-07-06 15:54:15
print(df['col_1'].str.extract(r'(\d+ [W])*'))这上面的提取方法用正则表达式,会像预期的那样进行过滤,然后清除Nan值。
https://stackoverflow.com/questions/72886233
复制相似问题