首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Palantir铸造企业地理空间索引的最佳方法

Palantir铸造企业地理空间索引的最佳方法
EN

Stack Overflow用户
提问于 2022-08-06 15:24:14
回答 1查看 115关注 0票数 2

建议的方法是在Planatir Foundry中建立一个需要在多边形(形状)中找到点的管道?在过去,这在星火中是相当困难的。GeoSpark一直很受欢迎,但仍可能落后。如果没有具体的铸造,我可以用Geospark实现一些东西。我有13K的形状和成批的数千点。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-08-10 18:30:01

数据集有多大?有了足够大的驱动程序和一些优化,我以前让它使用地质公园。只需确保坐标点与多边形的投影相同。

下面是一个帮助函数:

代码语言:javascript
复制
from shapely import geometry
import json
import geopandas
from pyspark.sql import functions as F



def geopandas_spatial_join(df_left, df_right, geometry_left, geometry_right, how='inner', op='intersects'):
    '''
    Computes a spatial join of two Geopandas dataframes. Implemetns the Geopandas "sjoin" method, reference: https://geopandas.org/reference/geopandas.sjoin.html.
    Expects both dataframes to contain a GeoJSON geometry column, whose names are passed as the 'geometry_left' and 'geometry_right' arguments/

    Inputs:
        df_left (PANDAS_DATAFRAME): Left input dataframe.
        df_right (PANDAS_DATAFRAME): Right input dataframe.
        geometry_left (string): Name of the geometry column of the left dataframe.
        geometry_right (string): Name of the geometry column of the right dataframe.
        how (string): The type of join, one of {'left', 'right', 'inner'}.
        op (string): Binary predicate, one of {‘intersects’, ‘contains’, ‘within’}.

    Outputs:
        (PANDAS_DATAFRAME): Joined dataframe.
    '''

    df1 = df_left
    df1["geometry_left_shape"] = df1[geometry_left].apply(json.loads)
    df1["geometry_left_shape"] = df1["geometry_left_shape"].apply(geometry.shape)
    gdf_left = geopandas.GeoDataFrame(df1, geometry="geometry_left_shape")

    df2 = df_right
    df2["geometry_right_shape"] = df2[geometry_right].apply(json.loads)
    df2["geometry_right_shape"] = df2["geometry_right_shape"].apply(geometry.shape)
    gdf_right = geopandas.GeoDataFrame(df2, geometry="geometry_right_shape")

    joined = geopandas.sjoin(gdf_left, gdf_right, how=how, op=op)
    joined = joined.drop(joined.filter(items=["geometry_left_shape", "geometry_right_shape"]).columns, axis=1)

    return joined

然后我们可以运行连接:

代码语言:javascript
复制
    import pandas as pd

    left_df = points_df.toPandas()
    left_geo_column = "point_geometry"

    right_df = polygon_df.toPandas()
    right_geo_column = "polygon_geometry"

    pdf = geopandas_spatial_join(left_df,right_df,left_geo_column,right_geo_column)

    return_df = spark.createDataFrame(pdf).dropDuplicates()

    return return_df
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73261022

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档