我现在有一个数据文件,如下所示,我想要做的只是将Maturity中的字符串替换为它们中的数字。例如,我想将FZCY0D替换为0等等。
Date Maturity Yield_pct Currency
0 2009-01-02 FZCY0D 4.25 AUS
1 2009-01-05 FZCY0D 4.25 AUS
2 2009-01-06 FZCY0D 4.25 AUS我的代码如下所示,我尝试用数字替换这些字符串,但这会导致result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()])行中的错误result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()])。因此,我很难理解如何做到这一点。
from pandas.io.excel import read_excel
import pandas as pd
import numpy as np
import xlrd
url = 'http://www.rba.gov.au/statistics/tables/xls/f17hist.xls'
xls = pd.ExcelFile(url)
#Gets rid of the information that I dont need in my dataframe
df = xls.parse('Yields', skiprows=10, index_col=None, na_values=['NA'])
df.rename(columns={'Series ID': 'Date'}, inplace=True)
# This line assumes you want datetime, ignore if you don't
#combined_data['Date'] = pd.to_datetime(combined_data['Date'])
result = pd.melt(df, id_vars=['Date'])
result['Currency'] = 'AUS'
result.rename(columns={'value': 'Yield_pct'}, inplace=True)
result.rename(columns={'variable': 'Maturity'}, inplace=True)
result.Maturity.replace(result['Maturity'], [int(s) for s in result['Maturity'].split() if s.isdigit()])
print result发布于 2015-06-19 18:06:00
您可以使用矢量化的str方法并传递一个正则表达式来提取数字:
In [15]:
df['Maturity'] = df['Maturity'].str.extract('(\d+)')
df
Out[15]:
Date Maturity Yield_pct Currency
0 2009-01-02 0 4.25 AUS
1 2009-01-05 0 4.25 AUS
2 2009-01-06 0 4.25 AUS您可以调用astype(int)将系列转换为int:
In [17]:
df['Maturity'] = df['Maturity'].str.extract('(\d+)').astype(int)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3 entries, 0 to 2
Data columns (total 4 columns):
Date 3 non-null object
Maturity 3 non-null int32
Yield_pct 3 non-null float64
Currency 3 non-null object
dtypes: float64(1), int32(1), object(2)
memory usage: 108.0+ byteshttps://stackoverflow.com/questions/30944507
复制相似问题