刚接触Pandas所以如果有明显的解决方案我很抱歉...我导入了一个只有2列的CSV,并创建了第3列。下面是前10行和标题的屏幕截图:Screen shot of DataFrame
我已经知道如何在“更改的金额”列中找到最小值和最大值,但还需要提取与最小值和最大值关联的日期-而不是索引和“利润/亏损”。我尝试过iloc,loc,阅读有关groupby的内容--我不能让它们中的任何一个返回可以再次使用的单个值(在本例中是一个日期)。
我的目标是创建一个新变量'Gi_Date‘,它与'Amount Changed’中的最大值在同一行中,但与' date‘列中的日期相关。
我试图将变量分开,这样我就可以在print语句中使用它们,将它们写入txt文件,等等。
import os
import csv
import pandas as pd
import numpy as np
#path for CSV file
csvpath = ("budget_data.csv")
#Read CSV into Panadas and give it a variable name Bank_pd
Bank_pd = pd.read_csv(csvpath, parse_dates=True)
#Number of month records in the CSV
Months = Bank_pd["Date"].count()
#Total amount of money captured in the data converted to currency
Total_Funds = '${:.0f}'.format(Bank_pd["Profit/Losses"].sum())
#Determine the amount of increase or decrease from the previous month
AmtChange = Bank_pd["Profit/Losses"].diff()
Bank_pd["Amount Changed"] = AmtChange
#Identify the greatest positive change
GreatestIncrease = '${:.0f}'.format(Bank_pd["Amount Changed"].max())
Gi_Date = Bank_pd[Bank_pd["Date"] == GreatestIncrease]
#Identify the greatest negative change
GreatestDecrease = '${:.0f}'.format(Bank_pd["Amount Changed"].min())
Gd_Date = Bank_pd[Bank_pd['Date'] == GreatestDecrease]
print(f"Total Months: {Months}")
print(f"Total: {Total_Funds}")
print(f"Greatest Increase in Profits: {Gi_Date} ({GreatestIncrease})")
print(f"Greatest Decrease in Profits: {Gd_Date} ({GreatestDecrease})")当我在git bash中运行脚本时,我不再得到错误,所以我认为我越来越接近了,而不是显示它显示的日期:
$ python PyBank.py
Total Months: 86
Total: $38382578
Greatest Increase in Profits: Empty DataFrame
Columns: [Date, Profit/Losses, Amount Changed]
Index: [] ($1926159)
Greatest Decrease in Profits: Empty DataFrame
Columns: [Date, Profit/Losses, Amount Changed]
Index: [] ($-2196167)我希望它像这样打印出来:
$ python PyBank.py
Total Months: 86
Total: $38382578
Greatest Increase in Profits: Feb-2012 ($1926159)
Greatest Decrease in Profits: Sept-2013 ($-2196167)以下是原始DataFrame的一年价值:
bank_pd = pd.DataFrame({'Date':['Jan-10', 'Feb-10', 'Mar-10', 'Apl-10', 'May-10', 'Jun-10', 'Jul-10', 'Aug-10', 'Sep-10', 'Oct-10', 'Nov-10', 'Dec-10'],
'Profit/Losses':[867884, 984655, 322013, -69417, 310503, 522857, 1033096, 604885, -216386, 477532, 893810, -80353]})样本df的预期产出为:总月数: 12个月总资金:5651079美元利润最大增长:10月10日(693918美元)利润最大降幅:12月10日(-974163美元)
我在上面的示例数据框中也有一个错误,当我快速地输入它时,我错过了一个月-现在已经修复了。
谢谢!
发布于 2018-09-06 13:51:00
我在使用的变量中看到了很少的小故障。
Bank_pd["Amount Changed"] = AmtChange上面的语句实际上是将dataframe替换为"Amount Changed“列。在此语句之后,您可以使用此列进行任何操作。
下面是更新的代码,并突出显示了新添加的行。您可以添加进一步的格式设置:
import pandas as pd
csvpath = ("budget_data.csv")
Bank_pd = pd.read_csv(csvpath, parse_dates=True)
inp_bank_pd = pd.DataFrame(Bank_pd)
Months = Bank_pd["Date"].count()
Total_Funds = '${:.0f}'.format(Bank_pd["Profit/Losses"].sum())
AmtChange = Bank_pd["Profit/Losses"].diff()
GreatestIncrease = Bank_pd["Amount Changed"].max()
Gi_Date = inp_bank_pd.loc[Bank_pd["Amount Changed"] == GreatestIncrease]
print(Months)
print(Total_Funds)
print(Gi_Date['Date'].values[0])
print(GreatestIncrease)发布于 2018-09-06 14:07:30
在示例代码中,Gi_date和Gd_date尝试初始化新的DF,而不是调用值。更改Gi_Date和Gd_Date:
Gi_Date = Bank_pd.sort_values('Profit/Losses').tail(1).Date
Gd_Date = Bank_pd.sort_values('Profit/Losses').head(1).Date检查输出:
Gi_Date
Jul-10
Gd_Date
Sep-10要使用字符串格式打印打印方式,请执行以下操作:
print("Total Months: %s" %(Months))
print("Total: %s" %(Total_Funds))
print("Greatest Increase in Profits: %s %s" %(Gi_Date.to_string(index=False), GreatestIncrease))
print("Greatest Decrease in Profits: %s %s" %(Gd_Date.to_string(index=False), GreatestDecrease))注意,如果您不使用:
(Gd_Date.to_string(index=False)pandas对象信息将包含在打印输出中,就像您在示例中看到的DataFrame信息一样。12个月样本DF的输出:
Total Months: 12
Total: $5651079
Greatest Increase in Profits: Jul-10 $693918
Greatest Decrease in Profits: Sep-10 $-974163发布于 2018-09-06 14:48:54
在loc中使用Series.idxmin和Series.idxmax
df.loc[df['Amount Changed'].idxmin(), 'Date']
df.loc[df['Amount Changed'].idxmax(), 'Date']基于您的示例DataFrame的完整示例:
df = pd.DataFrame({'Date':['Jan-2010', 'Feb-2010', 'Mar-2010', 'Apr-2010', 'May-2010',
'Jun-2010', 'Jul-2010', 'Aug-2010', 'Sep-2010', 'Oct-2010'],
'Profit/Losses': [867884,984655,322013,-69417,310503,522857,
1033096,604885,-216386,477532]})
df['Amount Changed'] = df['Profit/Losses'].diff()
print(df)
Date Profit/Losses Amount Changed
0 Jan-2010 867884 NaN
1 Feb-2010 984655 116771.0
2 Mar-2010 322013 -662642.0
3 Apr-2010 -69417 -391430.0
4 May-2010 310503 379920.0
5 Jun-2010 522857 212354.0
6 Jul-2010 1033096 510239.0
7 Aug-2010 604885 -428211.0
8 Sep-2010 -216386 -821271.0
9 Oct-2010 477532 693918.0
print(df.loc[df['Amount Changed'].idxmin(), 'Date'])
print(df.loc[df['Amount Changed'].idxmax(), 'Date'])
Sep-2010
Oct-2010https://stackoverflow.com/questions/52196651
复制相似问题