首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Python、sklearn、Pandas对CSV数据帧中的新列进行字符替换和拆分

使用Python、sklearn、Pandas对CSV数据帧中的新列进行字符替换和拆分
EN

Stack Overflow用户
提问于 2021-03-02 02:03:55
回答 2查看 42关注 0票数 0

目前,我正在尝试将第6列从使用反斜杠的日期格式(例如:2/4/09)转换为dash和no 0 (2-4-9)。此外,我希望获取每个值并为其指定自己的列(如所需输出中所示)。我试图研究和实现一些解决方案,但我似乎无法弄清楚。我仍然在尝试如何替换/删除字符(如下所示)。我对使用Python处理数据帧非常陌生。任何提示或帮助都将不胜感激。谢谢。

代码语言:javascript
复制
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('file.csv')

df[6].replace(['\/'],['-'],regex=True, regex=True)
df[6].replace('0','',regex=True,inplace=True)

错误:

代码语言:javascript
复制
classifier_v1.4.py:18: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.dropna(inplace=True, subset=['Name', 'TRY', 'LOC', 'OUTPUT', 'TYPE_A', 'SIGNAL', 'A-B', 'SPOT'])
Traceback (most recent call last):
  File "/Users/namel/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "file.py", line 20, in <module>
    df[5].replace(['\/'],['-'],regex=True)
  File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/name/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1618, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1626, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 5

当前数据帧:

代码语言:javascript
复制
         0    1    2        3          4       5        6     7  
0     Name  TRY  LOC   OUTPUT     TYPE_A   SIGNAL     A-B  SPOT 
1    inc 1    2   20   TYPE-1    TORPEDO   ULTRA   2/4/09   -21
2    inc 2    3   16   TYPE-2    TORPEDO     ILH   2/4/09   -14
3    inc 3    2   20  BLACK47    TORPEDO    LION   2/4/09    49
4    inc 4    3   12   TYPE-2  CENTRALPA    LION   2/4/09    25
5    inc 5    3   10   TYPE-2      THREE    LION   2/4/09   -21
6    inc 6    2   20   TYPE-2        ATF    LION   2/4/09   -48
7    inc 7    4    2  NIVEA-1        ATF    LION   7/3/03   -23
8    inc 8    3   16  NIVEA-1        ATF    LION   7/3/03    18
9    inc 9    3   18  BLENDER  CENTRALPA    LION   7/3/03    48
10   inc 10   4   20    DELCO        ATF    LION   7/3/03   -26
11   inc 11   3   20    VE248        ATF    LION   7/3/03    44
12   inc 12   1   20   SILVER  CENTRALPA    LION   5/9/02   -35
13   inc 13   2   20  CALVIN3     SEVENX    LION   5/9/02   -20
14   inc 14   3   14  DECK-BT  CENTRALPA    LION   5/9/02   -38
15   inc 15   4    4  10-LEVI    BERWYEN     OWL   5/9/02   -29
16   inc 16   4   14   TYPE-2        ATF     NOV   5/9/02   -31
17   inc 17   4   10     NYNY    TORPEDO     NOV   5/9/02    21
18   inc 18   2   20  NIVEA-1  CENTRALPA     NOV   1/7/06    45
19   inc 19   3   27   FMRA97    TORPEDO     NOV   1/7/06   -26
20   inc 20   4   18   SILVER        ATF     NOV   1/7/06   -46

所需输出:

代码语言:javascript
复制
         0    1    2        3          4       5       6   7   8   9     7   
0     Name  TRY  LOC   OUTPUT     TYPE_A   SIGNAL    A-B  D1  D2  D3  SPOT 
1    inc 1    2   20   TYPE-1    TORPEDO   ULTRA   2-4-9   2   4   9   -21
2    inc 2    3   16   TYPE-2    TORPEDO     ILH   2-4-9   2   4   9   -14
3    inc 3    2   20  BLACK47    TORPEDO    LION   2-4-9   2   4   9    49
4    inc 4    3   12   TYPE-2  CENTRALPA    LION   2-4-9   2   4   9    25
5    inc 5    3   10   TYPE-2      THREE    LION   2-4-9   2   4   9   -21
6    inc 6    2   20   TYPE-2        ATF    LION   2-4-9   2   4   9   -48
7    inc 7    4    2  NIVEA-1        ATF    LION   7-3-3   7   3   3   -23
8    inc 8    3   16  NIVEA-1        ATF    LION   7-3-3   7   3   3    18
9    inc 9    3   18  BLENDER  CENTRALPA    LION   7-3-3   7   3   3    48
10   inc 10   4   20    DELCO        ATF    LION   7-3-3   7   3   3   -26
11   inc 11   3   20    VE248        ATF    LION   7-3-3   7   3   3    44
12   inc 12   1   20   SILVER  CENTRALPA    LION   5-9-2   5   9   2   -35
13   inc 13   2   20  CALVIN3     SEVENX    LION   5-9-2   5   9   2   -20
14   inc 14   3   14  DECK-BT  CENTRALPA    LION   5-9-2   5   9   2   -38
15   inc 15   4    4  10-LEVI    BERWYEN     OWL   5-9-2   5   9   2   -29
16   inc 16   4   14   TYPE-2        ATF     NOV   5-9-2   5   9   2   -31
17   inc 17   4   10     NYNY    TORPEDO     NOV   5-9-2   5   9   2    21
18   inc 18   2   20  NIVEA-1  CENTRALPA     NOV   1-7-6   1   7   6    45
19   inc 19   3   27   FMRA97    TORPEDO     NOV   1-7-6   1   7   6   -26
20   inc 20   4   18   SILVER        ATF     NOV   1-7-6   1   7   6   -46
EN

回答 2

Stack Overflow用户

发布于 2021-03-02 03:18:23

可能有一种更有效的方法可以做到这一点,但下面的代码将实现您想要的结果。

代码语言:javascript
复制
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('file.csv')

# insert columns
df.insert(7, 'D1', '')
df.insert(8, 'D2', '')
df.insert(9, 'D3', '')

# replace
df['A-B'] = df['A-B'].str.replace('/', '-')
df['A-B'] = df['A-B'].str.replace('0', '')

# update new columns values
df['D1'] = df.apply(lambda x: str(x['A-B']).split('-')[0], axis=1)
df['D2'] = df.apply(lambda x: str(x['A-B']).split('-')[1], axis=1)
df['D3'] = df.apply(lambda x: str(x['A-B']).split('-')[2], axis=1)

print(df)
票数 0
EN

Stack Overflow用户

发布于 2021-03-02 04:46:32

KeyError: 5表示密钥5不存在。在本例中,它不是整数,而是一个字符串,因此需要使用引号。

另一种(可能更实用的)方法是删除第一行,并使用第一行作为列标题。

您的.replace使用原始值和新值的列表是没有问题的。有几种可供选择的方法,其中两种如下所示。

使用如下所示的split,您可以同时添加三个新列。

代码语言:javascript
复制
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn import ensemble
import pandas as pd
import numpy as np

df = pd.read_csv('/Users/ciit2/downloads/test.csv', header=1)
df['A-B'].replace({'/': '-'}, regex=True, inplace=True)
df['A-B'].replace('0', '', regex=True, inplace=True)
df[['D1', 'D2', 'D3']] = pd.DataFrame(df['A-B'].str.split('-').tolist())
df
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66427279

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档