首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >pyspark UDF函数返回类型

pyspark UDF函数返回类型
EN

Stack Overflow用户
提问于 2021-07-30 20:44:09
回答 1查看 17关注 0票数 0

在我的spark数据框中,我有一个here is模式

代码语言:javascript
复制
root
 |-- locations: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- address_line_2: string (nullable = true)
 |    |    |-- continent: string (nullable = true)
 |    |    |-- country: string (nullable = true)
 |    |    |-- geo: string (nullable = true)
 |    |    |-- is_primary: boolean (nullable = true)
 |    |    |-- last_updated: string (nullable = true)
 |    |    |-- locality: string (nullable = true)
 |    |    |-- most_recent: boolean (nullable = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- postal_code: string (nullable = true)
 |    |    |-- region: string (nullable = true)
 |    |    |-- street_address: string (nullable = true)
 |    |    |-- subregion: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- zip_plus_4: string (nullable = true)

以下是该位置的示例

代码语言:javascript
复制
[Row(locations=[Row(address_line_2=None, continent='north america', country='united states', geo='40.41,-74.36', is_primary=True, last_updated=None, locality='old bridge', most_recent=True, name='old bridge, new jersey, united states', postal_code=None, region='new jersey', street_address=None, subregion=None, type=None, zip_plus_4=None)])]

正如您所看到的,有一个名为isPrimary的字段,它是我想要选择的字段,这是我编写的函数

代码语言:javascript
复制
def geoLambda(locations):

    """
    Pre process geo locations
    :param x:
    :return: dict
    """
    try:
        for x in locations:
            if x.get("is_primary") == "True" or x.get("is_primary") == True:
                data = x
                data = data.get("geo", None)
                if data is None:
                    lat,lon = -83, 135
                else:
                    lat,lon = data.split(",")
                Payload = {"lat":float(lat), "lon":float(lon)}
                return Payload
            else:
                pass
    except Exception as e:
        print("EXCEPTION: {} ".format(e))
        lat,lon = -83, 135
        Payload = {"lat":float(lat), "lon":float(lon)}
        return Payload
代码语言:javascript
复制
udfValueToCategoryGeo = udf(geoLambda, StructType())
df = df.withColumn("myloc", udfValueToCategoryGeo("locations"))

输出

代码语言:javascript
复制
 |-- myloc: struct (nullable = true)

----+
|   {}|
|   {}|
|   {}|
|   {}|
|   {}|
|   {}|
|   {}|

如果我选择类型为字符串

代码语言:javascript
复制
udfValueToCategoryGeo = udf(geoLambda, StringType())
df = df.withColumn("myloc", udfValueToCategoryGeo("locations"))
代码语言:javascript
复制
|               myloc|
+--------------------+
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|
|{lon=135.0, lat=-...|

我变得一成不变不知道为什么?

同样的功能在pandas中运行良好,但我不想使用pandas,任何帮助都会很好

下面是单行的外观

位置行

代码语言:javascript
复制
[{'name': 'princeton, new jersey, united states',
  'locality': 'princeton',
  'region': 'new jersey',
  'subregion': None,
  'country': 'united states',
  'continent': 'north america',
  'type': None,
  'geo': '40.34,-74.65',
  'postal_code': None,
  'zip_plus_4': None,
  'street_address': None,
  'address_line_2': None,
  'most_recent': True,
  'is_primary': True,
  'last_updated': '2021-03-01'}]

有什么帮助吗?

EN

回答 1

Stack Overflow用户

发布于 2021-07-30 21:08:18

这就是我如何解决

代码语言:javascript
复制
def geoLambda(locations):
  for x in locations:
      if x["is_primary"] == True:
          data = x["geo"]
          if data is None:
              lat,lon = -83, 135
          else:
              lat,lon = data.split(",")
          Payload = {"lat":float(lat), "lon":float(lon)}
          return Payload
      else:
          pass
代码语言:javascript
复制
udfValueToCategoryGeo = udf(geoLambda, StructType(
    
[
 
  StructField('lat', nullable=True, dataType=FloatType()),
  StructField('lon', nullable=True, dataType=FloatType())
]

))
df = df.withColumn("myloc", udfValueToCategoryGeo("locations"))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68596914

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档