我想检查来自dataframe-1 (df1)的坐标(x,y,z),看看位置是否足够接近一个不规则的表面,它有自己的坐标(x,y,z)存储在dataframe-2 (df2)中。
我可以在df1中遍历每个坐标,然后循环遍历df2中的所有坐标,并检查它的距离。然后对df1中的所有坐标重复,但是当我在df1中有超过1,000,000个坐标要检查时,这将花费很长的时间。
我在用熊猫,想知道它是否可以不用循环就能完成。
如果df1中的坐标接近df2,那么我希望选择它并将其存储到df3中。
发布于 2019-04-16 06:37:47
西皮可以帮你。请看下面的假设示例:
import pandas as pd
from scipy.spatial import cKDTree
dataset1 = pd.DataFrame(pd.np.random.rand(100,3))
dataset2 = pd.DataFrame(pd.np.random.rand(10, 3))
ck = cKDTree(dataset1.values)
ck.query_ball_point(dataset2.values, r=0.1)数组([ list([]),list(28,83),list(79),list(]),list(86),list(40),list(29,60,95)],dtype=object)
发布于 2019-04-16 08:25:41
使用Numpy方法:
如果您的两个数据文件如下所示:
df1
coords
0 (4,3,5)
1 (5,4,3)
df2
coords
0 (6,7,8)
1 (8,7,6)然后:
import numpy as np
from itertools import product
#convert dataframes into numpy arrays
df1_arr = np.array([np.array(x) for x in df1.coords.values])
df2_arr = np.array([np.array(x) for x in df2.coords.values])
#create array of cartesian product of elements of the two arrays
cart_arr = np.array([x for x in product(df1_arr,df2_arr)])
#compute Euclidian distance (or norm) between pairs of elements in two arrays
#outputs new array with one value per pair of coordinates
norms_arr = np.linalg.norm(np.diff(cart_arr,axis=1)[:,0,:],axis=1)
#create distance threshold for "close enough"
radius = 5.5
#find values in norms array that are less than or equal to distance threshold
good_idxs = np.argwhere(norms_arr <= radius)[:,0]
good_coord_pairs = cart_arr[good_idxs]
#store corresponding pairs of coordinates and distances in new dataframe
final_df = pd.DataFrame({'df1_coords':list(map(tuple,good_coord_pairs[:,0,:])),
'df2_coords':list(map(tuple(good_coord_pairs[:,1,:])), 'distance':norms_arr[good_idxs],
index=list(range(len(good_coord_pairs))))将产生:
final_df
df1_coords df2_coords distance
0 (4,3,5) (6,7,8) 5.385165
1 (5,4,3) (8,7,6) 5.196152https://stackoverflow.com/questions/55701301
复制相似问题