我正在学习推荐系统,我想把在肌萎缩侧索硬化症模型中创建的userFactor和itemFactor传递给K-means,但k-means想要VectorUDT,但我从模型中获得了ArrayType(FloatType)。
我还试图将userFactor传递给向量汇编程序,然后创建一个向量到k均值的向量,但同样的错误也会产生帮助。
这是个新手。
from pyspark.sql.types import IntegerType
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import VectorAssembler
userFactorsDF= alsmodel.userFactors.select("features")
vecAssembler = VectorAssembler(inputCols=["features"], outputCol="features")
featuresdf = vecAssembler.transform(userFactorsDF)
kmeans = KMeans().setK(2).setSeed(1)
model1 = kmeans.fit(featuresdf)
ERROR
IllegalArgumentException: u'Data type ArrayType(FloatType,false) is not supported.'
---------------------------------------------------------------------------
IllegalArgumentException Traceback (most recent call last)
<ipython-input-77-05324b5cde72> in <module>()
7 vecAssembler = VectorAssembler(inputCols=["features"], outputCol="features")
8
----> 9 featuresdf = vecAssembler.transform(userFactorsDF)
10
11 kmeans = KMeans().setK(2).setSeed(1)发布于 2018-11-06 13:38:27
您可以尝试编写一个UDF,并在将其传递给VectorAssembler之前从中提取值。
UserDefinedFunction mode = udf((Seq<String> array, fromIndex int, toIndex int) -> array.slice(fromIndex ,toIndex ).mkString(","));https://stackoverflow.com/questions/42130387
复制相似问题