我有一个对象列表,我想用一个模式创建一个dataframe。如何将对象属性名称映射到架构列名。下面是我们班的样子
@dataclass
class UberLog(object):
upi: int
event_name: str
event_date: datetime
source: str
page_url: str = ""
event_string: str = ""
event_details: dict = field(default_factory=dict)
_event_time: int = field(init=False)
_id:str = field(init=False)下面是我想要映射到的架构
UBER_LOG_SCHEMA: Final[StructType] = StructType([
StructField('id', StringType(), False),
StructField('upi', IntegerType(), False),
StructField('eventName', StringType(), True),
StructField('eventDate', TimestampType(), False),
StructField('eventEpoch', LongType(), False),
StructField('source', StringType(), False),
StructField('pageUrl', StringType(), True),
StructField('eventString', StringType(), True),
StructField('eventDetails', MapType(StringType(), StringType()), True)
])何时执行以下操作
df = spark.createDataFrame(logs, schema=UBER_LOG_SCHEMA)
df.show()我得到以下错误ValueError: field id: This field is not nullable, but got None
如何在数据创建中告诉_id映射到id等的映射。?或者我还有其他方法可以映射列名吗?
*更新*
唯一能够隐藏数据类以匹配架构的方法是通过列重命名。我想知道是否有比这更好的解决方案?
df = spark.createDataFrame(logs)
df = df.select(df['*']) \
.withColumnRenamed("_id", "id") \
.withColumnRenamed("event_name", "eventName") \
.withColumnRenamed("event_date", "eventDate") \
.withColumnRenamed("_event_time", "eventEpoch") \
.withColumnRenamed("page_url", "pageUrl") \
.withColumnRenamed("event_string", "eventString") \
.withColumnRenamed("event_details", "eventDetails")
df.show()发布于 2022-04-18 04:37:27
唯一能够隐藏数据类以匹配架构的方法是通过列重命名。我想知道是否有比这更好的解决方案?
df = spark.createDataFrame(logs)
df = df.select(df['*']) \
.withColumnRenamed("_id", "id") \
.withColumnRenamed("event_name", "eventName") \
.withColumnRenamed("event_date", "eventDate") \
.withColumnRenamed("_event_time", "eventEpoch") \
.withColumnRenamed("page_url", "pageUrl") \
.withColumnRenamed("event_string", "eventString") \
.withColumnRenamed("event_details", "eventDetails")
df.show()https://stackoverflow.com/questions/71898284
复制相似问题