这是我的数据集
lastvalue_month
DataFrame[msisdn: string, year: string, month: string, day: string, date_id: string, province: string, district: string, sub_district: string, handset_brand: string, handset_os: string, handset_type: string, payment_category: string, sub_status: string, hpos_to_ios: string, hpos_from_ios: string, hptype_to_smart: string, hptype_from_smart: string, hpbrand_change: string]`这是我的代码
from pyspark.sql import functions as F
lastvalue_month = [ F.upper(F.col(x)).alias(x) for x in lastvalue_month.columns ]下面是输出
Column<b'upper(msisdn) AS `msisdn`'>,
Column<b'upper(year) AS `year`'>,
Column<b'upper(month) AS `month`'>,
Column<b'upper(day) AS `day`'>,
Column<b'upper(date_id) AS `date_id`'>,
Column<b'upper(province) AS `province`'>,
Column<b'upper(district) AS `district`'>,
Column<b'upper(sub_district) AS `sub_district`'>,
Column<b'upper(handset_brand) AS `handset_brand`'>,
Column<b'upper(handset_os) AS `handset_os`'>,
Column<b'upper(handset_type) AS `handset_type`'>,
Column<b'upper(payment_category) AS `payment_category`'>,
Column<b'upper(sub_status) AS `sub_status`'>,
Column<b'upper(hpos_to_ios) AS `hpos_to_ios`'>,
Column<b'upper(hpos_from_ios) AS `hpos_from_ios`'>,
Column<b'upper(hptype_to_smart) AS `hptype_to_smart`'>,
Column<b'upper(hptype_from_smart) AS `hptype_from_smart`'>,
Column<b'upper(hpbrand_change) AS `hpbrand_change`'>]我想要的是大写所有列名称的pyspark dataframe条目保持相似。如何在Pyspark中做到这一点?
发布于 2021-09-03 12:17:41
我认为你可以这样处理withColumn
for x in lastvalue_month.columns:
lastvalue_month = lastvalue_month.withColumn(x, F.upper(F.col(x)))或者使用select
lastvalue_month = \
lastvalue_month.select(*[F.upper(F.col(x)).alias(x)
for x in lastvalue_month.columns ])https://stackoverflow.com/questions/69044512
复制相似问题