我正在尝试计算此水质数据集(整个英格兰,2019年,合规性和监测) https://environment.data.gov.uk/water-quality/view/download/new中sample.purpose.label列中每月的值数
例如,我想知道1月份发布了多少次合规正式(许可),等等。
我似乎不能计算出按月完成的代码。我确信这是一个非常简单的解决方案,但已经用了好几天了,没有任何运气。我创建了一个pandas数据帧,然后又创建了一个单独的月份列。这是我到目前为止所拥有的代码:
import pandas as pd
import numpy as np
import datetime as dt
df = pd.read_csv (r"C:\Users\Downloads\water_testing\2019.csv")
df['sample.sampleDateTime'] = pd.to_datetime(df['sample.sampleDateTime'])
df.drop(['sample.samplingPoint','sample.samplingPoint.notation','sample.samplingPoint.label', 'determinand.label', 'determinand.definition', 'determinand.notation', 'resultQualifier.notation', 'result', 'codedResultInterpretation.interpretation', 'determinand.unit.label', 'sample.sampledMaterialType.label', 'sample.isComplianceSample', 'sample.samplingPoint.easting', 'sample.samplingPoint.northing'],inplace=True,axis=1)
df.rename(columns = {'sample.sampleDateTime':'time'}, inplace = True)
df['time'].dt.month_name()任何指导都将非常感谢,非常感谢。
发布于 2021-07-12 21:48:04
您可以使用pandas.Grouper按月分组,使用value_counts计算sample.purpose.label中每个项目的所有出现次数,使用unstack调整其形状
df = pd.read_csv('2019.csv')
df['sample.sampleDateTime'] = pd.to_datetime(df['sample.sampleDateTime'])
df.set_index('sample.sampleDateTime').groupby(pd.Grouper(freq='M'))['sample.purpose.label'].value_counts().unstack(level=1)输出:
sample.purpose.label COMPLIANCE AUDIT (PERMIT) COMPLIANCE FORMAL (PERMIT) ENVIRONMENTAL MONITORING STATUTORY (EU DIRECTIVES) IPPC/IPC MONITORING (AGENCY AUDIT - PERMIT) IPPC/IPC MONITORING (AGENCY INVESTIGATION) IPPC/IPC MONITORING (FORMAL SAMPLE) MONITORING (NATIONAL AGENCY POLICY) MONITORING (UK GOVT POLICY - NOT GQA OR RE) PLANNED FORMAL NON-STATUTORY (PERMIT/ENV MON) PLANNED INVESTIGATION (LOCAL MONITORING) PLANNED INVESTIGATION (NATIONAL AGENCY POLICY) STATUTORY FAILURES (FOLLOW UPS AT DESIGNATED POINTS) STATUTORY FAILURES (FOLLOW UPS AT NON-DESIGNATED POINTS) UNPLANNED REACTIVE MONITORING (POLLUTION INCIDENTS) UNPLANNED REACTIVE MONITORING FORMAL (POLLUTION INCIDENTS) WASTE MONITORING (AGENCY AUDIT - PERMIT) WASTE MONITORING (AGENCY INVESTIGATION) WASTE MONITORING (FORMAL SAMPLE) WATER QUALITY OPERATOR SELF MONITORING COMPLIANCE DATA WATER QUALITY UWWTD MONITORING DATA
sample.sampleDateTime
2019-01-31 5219.0 44.0 47360.0 NaN 887.0 NaN 2498.0 10624.0 NaN 8534.0 823.0 2599.0 7556.0 2787.0 1491.0 NaN 12.0 NaN 11800.0 8359.0
2019-02-28 3971.0 22.0 53000.0 38.0 536.0 NaN 2537.0 10053.0 NaN 10688.0 702.0 3023.0 8295.0 2930.0 1151.0 NaN 534.0 624.0 11781.0 5443.0
2019-03-31 3838.0 52.0 42524.0 31.0 689.0 18.0 5548.0 9273.0 9.0 9630.0 786.0 2084.0 5995.0 2792.0 734.0 NaN 415.0 443.0 12275.0 5672.0
2019-04-30 3865.0 34.0 73070.0 24.0 55.0 NaN 3450.0 6665.0 NaN 8202.0 7073.0 1261.0 4827.0 2225.0 1144.0 NaN 2175.0 NaN 12168.0 5409.0
2019-05-31 3695.0 44.0 96911.0 27.0 58.0 NaN 3053.0 7006.0 NaN 7663.0 9299.0 1356.0 4665.0 2293.0 1296.0 NaN 194.0 188.0 11982.0 5934.0
2019-06-30 3029.0 36.0 83428.0 13.0 8.0 NaN 5284.0 5362.0 24.0 10340.0 9402.0 2344.0 5318.0 2299.0 1217.0 NaN 278.0 14.0 12109.0 5643.0
2019-07-31 4385.0 62.0 100697.0 24.0 NaN NaN 6574.0 11328.0 NaN 7820.0 10448.0 1551.0 4539.0 3536.0 1647.0 NaN 1278.0 535.0 13413.0 5858.0
2019-08-31 3196.0 38.0 85826.0 38.0 15.0 NaN 4018.0 6266.0 NaN 9474.0 10375.0 2253.0 5989.0 2304.0 1023.0 240.0 95.0 186.0 12303.0 5514.0
2019-09-30 3519.0 8.0 88239.0 17.0 244.0 NaN 7008.0 7007.0 NaN 8234.0 10809.0 1702.0 4151.0 1774.0 1345.0 NaN NaN 2422.0 12895.0 5617.0
2019-10-31 4569.0 28.0 91347.0 24.0 NaN NaN 7416.0 7214.0 NaN 16760.0 12467.0 1152.0 3567.0 1690.0 1078.0 120.0 595.0 373.0 13023.0 5975.0
2019-11-30 4290.0 8.0 76318.0 38.0 NaN 30.0 8172.0 6024.0 NaN 5315.0 11514.0 31.0 1352.0 1540.0 1036.0 NaN 1255.0 360.0 13196.0 6042.0
2019-12-31 3356.0 17.0 71900.0 24.0 14.0 NaN 5331.0 6138.0 NaN 6520.0 10673.0 849.0 1930.0 1725.0 1010.0 NaN 2894.0 895.0 9425.0 4329.0发布于 2021-07-12 21:43:47
试试这个:
df["Month"] = df["time"].dt.month
>>> df.groupby(["Month", "sample.purpose.label"]).count().unstack()
@id ... time
sample.purpose.label COMPLIANCE AUDIT (PERMIT) ... WATER QUALITY UWWTD MONITORING DATA
Month ...
1 5219.0 ... 8359.0
2 3971.0 ... 5443.0
3 3838.0 ... 5672.0
4 3865.0 ... 5409.0
5 3695.0 ... 5934.0
6 3029.0 ... 5643.0
7 4385.0 ... 5858.0
8 3196.0 ... 5514.0
9 3519.0 ... 5617.0
10 4569.0 ... 5975.0
11 4290.0 ... 6042.0
12 3356.0 ... 4329.0
[12 rows x 40 columns]https://stackoverflow.com/questions/68348298
复制相似问题