我在spark环境中使用python,想要将dataframe coulmn从时间戳数据类型转换为bigint (UNIX时间戳)。列如下所示:("yyyy-MM-dd hh:mm:ss.SSSSSS")
timestamp_col
2014-06-04 10:09:13.334422
2015-06-03 10:09:13.443322
2015-08-03 10:09:13.232431我读了一遍,并尝试了其他方法:
from pyspark.sql.functions import from_unixtime, unix_timestamp
from pyspark.sql.types import TimestampType
df1 = df.select((from_unixtime(unix_timestamp(df.timestamp_col, "yyyy-MM-dd hh:mm:ss.SSSSSS"))).cast(TimestampType()).alias("unix_time_col"))但是输出给出的值相当于空值。
+-------------+
|unix_time_col|
+-------------+
| null|
| null|
| null|我在hadoop环境下使用python3.7on spark & hadoop版本:google-colaboratory上的spark-2.3.1-bin-hadoop2.7,我一定是遗漏了什么。有什么可以帮忙的吗?
发布于 2019-09-16 15:33:52
请删除您的代码中的".SSSSSS“,然后它将在转换为unixtimestamp时工作,即代替"yyyy-MM-dd hh:mm:ss.SSSSSS”,如下:
df1 = df.select(unix_timestamp(df.timestamp_col,"yyyy-MM-dd hh:mm:ss"))
发布于 2019-09-16 01:32:54
from pyspark.sql import SparkSession
from pyspark.sql.functions import unix_timestamp
from pyspark.sql.types import (DateType, StructType, StructField, StringType)
spark = SparkSession.builder.appName('abc').getOrCreate()
column_schema = StructType([StructField("timestamp_col", StringType())])
data = [['2014-06-04 10:09:13.334422'], ['2015-06-03 10:09:13.443322'], ['2015-08-03 10:09:13.232431']]
data_frame = spark.createDataFrame(data, schema=column_schema)
data_frame.withColumn("timestamp_col", data_frame['timestamp_col'].cast(DateType()))
data_frame = data_frame.withColumn('timestamp_col', unix_timestamp('timestamp_col'))
data_frame.show()输出
+-------------+
|timestamp_col|
+-------------+
| 1401894553|
| 1433344153|
| 1438614553|
+-------------+https://stackoverflow.com/questions/57945174
复制相似问题