原型Dataframe缺少A Prime列中的连续数字。一旦它们被填充,B列中缺少的值需要从A素数中插值。
完成之后,我可以将Prototype Dataframe与主Dataframe合并。
插值按原型名称分组。
什么是pythonic方法来重新创建这个数据的值,插值B列,并将Prototype dataframe连接到主Dataframe?
Prototype Dataframe
| Proto Name| A Prime | B |
|-----------|---------|---------|
| Alpha | 3.26 | 0.16608 |
| Alpha | 3.24 | 0.16575 |
| Alpha | 3.22 | 0.16540 |
| Alpha | 3.20 | 0.16506 |
| Alpha | 3.18 | 0.16470 |
| Alpha | 3.16 | 0.16434 |
| Alpha | 3.14 | 0.16398 |
| Bravo | 1.52 | 0.10759 |
| Bravo | 1.50 | 0.10687 |
| Bravo | 1.48 | 0.10614 |
| Bravo | 1.46 | 0.10541 |
| Bravo | 1.44 | 0.10469 |
| Bravo | 1.42 | 0.10396 |下面是我在使用左联接时合并的主要数据格式。
Primary Dataframe
| Name | Date | A | A Prime | B | Proto Name |
|--------|-----------|------|---------|---------|------------|
| Z3Pats | 6/1/2018 | 3.24 | 3.24 | 0.16575 | Alpha |
| Z3Pats | 5/1/2018 | 3.23 | NaN | NaN | Alpha |
| Z3Pats | 4/1/2018 | 3.21 | NaN | NaN | Alpha |
| Z3Pats | 3/1/2018 | 3.20 | 3.20 | 0.16506 | Alpha |
| Z3Pats | 2/1/2018 | 3.16 | 3.16 | 0.16434 | Alpha |
| CO-119 | 5/1/2018 | 1.53 | NaN | NaN | Bravo |
| CO-119 | 4/1/2018 | 1.51 | NaN | NaN | Bravo |
| CO-119 | 3/1/2018 | 1.48 | 1.48 | 0.10614 | Bravo |
| CO-119 | 2/1/2018 | 1.48 | 1.48 | 0.10614 | Bravo |
| CO-119 | 1/1/2018 | 1.45 | NaN | NaN | Bravo |
| CO-119 | 12/1/2017 | 1.44 | 1.44 | 0.10469 | Bravo |
| CO-119 | 11/1/2017 | 1.41 | 1.41 | 0.10396 | Bravo |发布于 2018-06-26 16:29:08
我使用multiIndex、重新索引和内插:
data = {'A Prime': {0: 3.26, 1: 3.24, 2: 3.22, 3: 3.2, 4: 3.18, 5: 3.16,
6: 3.14, 7: 1.52, 8: 1.5, 9: 1.48, 10: 1.46, 11: 1.44, 12: 1.42},
'B': {0: 0.16608, 1: 0.16575, 2: 0.1654, 3: 0.16505999999999998, 4: 0.1647, 5: 0.16434, 6: 0.16398, 7: 0.10759, 8: 0.10687, 9: 0.10614000000000001,
10: 0.10540999999999999, 11: 0.10469, 12: 0.10396}, 'Proto Name': {0: 'Alpha',
1: 'Alpha', 2: 'Alpha', 3: 'Alpha', 4: 'Alpha', 5: 'Alpha', 6: 'Alpha', 7: 'Bravo', 8: 'Bravo', 9: 'Bravo', 10: 'Bravo', 11: 'Bravo', 12: 'Bravo'}}
df = pd.DataFrame(data)
df_ranges = df[['Proto Name', 'A Prime']].groupby(['Proto Name']).agg(['min', 'max'])
df =df.set_index(['Proto Name','A Prime'])
print df_ranges
A Prime
min max
Proto Name
Alpha 3.1400 3.2600
Bravo 1.4200 1.5200
l_idx = []
for i in range(len(df_ranges)):
PN = df_ranges.index[i]
min = df_ranges.iloc[i][0]
max = df_ranges.iloc[i][1]
l_idx.extend([(PN,i) for i in np.array(range(int(min*100),int(max*100)+1,1))/100.]) # make tuples to reindex with
idx = pd.MultiIndex.from_tuples(l_idx)
df = df.reindex(idx)
print df
print df.interpolate()这给
B
Alpha 3.14 0.16398
3.15 NaN
3.16 0.16434
3.17 NaN
3.18 0.16470
3.19 NaN
3.20 0.16506
3.21 NaN
3.22 0.16540
3.23 NaN
3.24 0.16575
3.25 NaN
3.26 0.16608
Bravo 1.42 0.10396
1.43 NaN
1.44 0.10469
1.45 NaN
1.46 0.10541
1.47 NaN
1.48 0.10614
1.49 NaN
1.50 0.10687
1.51 NaN
1.52 0.10759
B
Alpha 3.14 0.163980
3.15 0.164160
3.16 0.164340
3.17 0.164520
3.18 0.164700
3.19 0.164880
3.20 0.165060
3.21 0.165230
3.22 0.165400
3.23 0.165575
3.24 0.165750
3.25 0.165915
3.26 0.166080
Bravo 1.42 0.103960
1.43 0.104325
1.44 0.104690
1.45 0.105050
1.46 0.105410
1.47 0.105775
1.48 0.106140
1.49 0.106505
1.50 0.106870
1.51 0.107230
1.52 0.107590https://stackoverflow.com/questions/51046122
复制相似问题