首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Pyspark检测“行为者”

Pyspark检测“行为者”
EN

Stack Overflow用户
提问于 2022-05-18 05:21:47
回答 1查看 43关注 0票数 0

我有一个矩阵(Dataframe),我希望在那里找到所有行--行和列与'1‘相交。(“字符”行值与列名匹配)

举例说明。萨姆是个演员。(他有一个'1‘列’演员‘和行’字符‘值’演员‘)。这将是我想要归还的一排。

代码语言:javascript
复制
df = spark.createDataFrame(
    [
        ("actor", "sam", "1", "0", "0", "0", "0"),  
        ("villan", "jack", "0", "0", "0", "0", "0"),
        ("actress", "rose", "0", "0", "0", "1", "0"),
        ("comedian", "mike", "0", "1", "1", "0", "1"),
        ("musician", "young", "1", "1", "1", "1", "0")
    ],
    ["character", "name", "actor", "villan", "comedian", "actress", "musician"]  
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
|    actor|  sam|    1|     0|       0|      0|       0|
|   villan| jack|    0|     0|       0|      0|       0|
|  actress| rose|    0|     0|       0|      1|       0|
| comedian| mike|    0|     1|       1|      0|       1|
| musician|young|    1|     1|       1|      1|       0|
+---------+-----+-----+------+--------+-------+--------+
EN

回答 1

Stack Overflow用户

发布于 2022-05-18 15:18:09

代码语言:javascript
复制
#create function
def myMatch( needle, haystack ):
  return haystack[needle]

#create udf
matched = udf(myMatch, StringType()) # your existing data is strings

#apply udf
df.select(\
  df.name ,\ 
  matched( \
    df.character, \
    f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
  .alias("IsPlayingCharacter") )\
.show()
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72283569

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档