我有一个由id列和text列组成的dataframe。
temp = spark.createDataFrame([
(0, ['Julia', 'is', 'awesome']),
(1, ['Data-science', 'is','cool']),
(2, ['Machine,learning,was,my,subject'])
], ["id", "words"])
+---+---------------------------------+
|id |words |
+---+---------------------------------+
|0 |[Julia, is, awesome] |
|1 |[Data-science, is, cool] |
|2 |[Machine,learning,was,my,subject]|
+---+---------------------------------+我想把它转换成元组。我以前用熊猫的数据来做这个。下面是元组
tup = []
for _,i in df.iterrows():
tup.append((i['word'],{'text_id':i['id']}))sample_output:
[(['Julia','is','awesome'],{'text_id': 0})]如何实现对整个火花放电数据的相同?有办法在火星雨中做到这一点吗?
发布于 2022-08-02 09:41:49
您可以使用map()进行RDD转换。
# use RDD and map to create tuples
data_sdf.rdd. \
map(lambda k: (k.words, {"text_id": k.id})). \
collect()
# [(['Julia', 'is', 'awesome'], {'text_id': 0}),
# (['Data-science', 'is', 'cool'], {'text_id': 1}),
# (['Machine,learning,was,my,subject'], {'text_id': 2})]https://stackoverflow.com/questions/73205205
复制相似问题