我有下面的熊猫数据,其中有十月份的员工销售数据。
Employee Timerange Dials Conn Conv Mtg Bkd Talk Dial
0 Ricky Ponting 10/3 - 10/7 1,869 102 60.0 2.0 3h:08m 5h:23m
1 Adam Gilchrist 10/3 - 10/7 1,336 53 30.0 1.0 1h:10m 3h:58m
2 Michael Clarke 10/3 - 10/7 1,960 74 42.0 1.0 2h:02m 5h:28m
3 Shane Warne 10/3 - 10/7 1,478 62 45.0 1.0 1h:55m 4h:07m模式-
# Column Non-Null Count Dtype
--- ------ -------------- -----
1 Timerange 40 non-null object
2 Dials 40 non-null object
3 Conn 40 non-null int64
4 Conv 39 non-null float64
5 Mtg Bkd 39 non-null float64
6 Talk 40 non-null object
7 Dial︎ 40 non-null object我基本上想要检查整个团队这个月的拨号连接和拨号通话平均费率。示例输出如下所示-
Month Dials Conn Dials -> Conn Dials -> Conv
October 60517 2702 0.045 0.026我尝试使用pd.DatetimeIndex(df‘’Timerange‘).Month并分隔列,但是它给了我错误dateutil.parser._parser.ParserError:未知字符串格式:10/3-10/7。
发布于 2022-11-19 21:46:09
下面是一个使用和的命题:
#Extract the month number from the start date and convert it to a month name
df["Month"]= pd.to_datetime(df["Timerange"].str.extract(r"(\d+)/\d+", expand=False), format="%m").dt.month_name()
#Convert comma separated strings to numbers
df["Dials"]= df["Dials"].str.replace(",", "").astype(float)
out = (
df.groupby("Month", as_index=False)
.apply(lambda x: pd.Series({"Dials": x["Dials"].sum(),
"Conn": x["Conn"].sum(),
"Dials -> Conn": x["Conn"].sum()/x["Dials"].sum(),
"Dials -> Conv": x["Conv"].sum()/x["Dials"].sum()}))
)#产出:
print(out)
Month Dials Conn Dials -> Conn Dials -> Conv
0 October 6643.0 291.0 0.043806 0.026645发布于 2022-11-19 21:10:36
我假设您的Timerange总是从您感兴趣的月份开始,并且所有数据都来自同一年(今年)。如果这些是合理的假设,这是可行的:
emps = [
"Ricky Ponting", "Adam Gilchrist", "Michael Clarke", "Shane Warne"
]
timeranges = [
"10/3 - 10/7", "10/3 - 10/7", "10/3 - 10/7", "10/3 - 10/7"
]
dials = [1869, 1336, 1960, 1478]
conn = [102, 53, 74, 62]
conv = [60, 30, 42, 45]
import pandas as pd
df = pd.DataFrame(
{
"Employee": emps,
"Timerange": timeranges,
"Dials": dials,
"Conn": conn,
"Conv": conv
}
)
import datetime
def get_month(row):
month = int(row["Timerange"].split("/")[0])
return datetime.date(year=2022, month=month, day=1).strftime("%B")
df["Month"] = df.apply(get_month, axis=1)
sums = df.groupby("Month").sum()
sums["Dials -> Conn"] = sums["Conn"] / sums["Dials"]
sums["Dials -> Conv"] = sums["Conv"] / sums["Dials"]
sums

https://stackoverflow.com/questions/74503501
复制相似问题