我有一个矩阵(Dataframe),我希望在那里找到所有行--行和列与'1‘相交。(“字符”行值与列名匹配)
举例说明。萨姆是个演员。(他有一个'1‘列’演员‘和行’字符‘值’演员‘)。这将是我想要归还的一排。
df = spark.createDataFrame(
[
("actor", "sam", "1", "0", "0", "0", "0"),
("villan", "jack", "0", "0", "0", "0", "0"),
("actress", "rose", "0", "0", "0", "1", "0"),
("comedian", "mike", "0", "1", "1", "0", "1"),
("musician", "young", "1", "1", "1", "1", "0")
],
["character", "name", "actor", "villan", "comedian", "actress", "musician"]
)
+---------+-----+-----+------+--------+-------+--------+
|character| name|actor|villan|comedian|actress|musician|
+---------+-----+-----+------+--------+-------+--------+
| actor| sam| 1| 0| 0| 0| 0|
| villan| jack| 0| 0| 0| 0| 0|
| actress| rose| 0| 0| 0| 1| 0|
| comedian| mike| 0| 1| 1| 0| 1|
| musician|young| 1| 1| 1| 1| 0|
+---------+-----+-----+------+--------+-------+--------+发布于 2022-05-18 15:18:09
#create function
def myMatch( needle, haystack ):
return haystack[needle]
#create udf
matched = udf(myMatch, StringType()) # your existing data is strings
#apply udf
df.select(\
df.name ,\
matched( \
df.character, \
f.struct( *[df[col] for col in df.columns] ) )\ # shortcut to add all columns to a struct so it can be passed to udf
.alias("IsPlayingCharacter") )\
.show()https://stackoverflow.com/questions/72283569
复制相似问题