首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在Pyspark中将稀疏向量转换为密集向量

在Pyspark中将稀疏向量转换为密集向量
EN

Stack Overflow用户
提问于 2016-12-26 16:39:05
回答 0查看 9.9K关注 0票数 6

我有一个稀疏向量,如下所示

代码语言:javascript
复制
>>> countVectors.rdd.map(lambda vector: vector[1]).collect()
[SparseVector(13, {0: 1.0, 2: 1.0, 3: 1.0, 6: 1.0, 8: 1.0, 9: 1.0, 10: 1.0, 12: 1.0}), SparseVector(13, {0: 1.0, 1: 1.0, 2: 1.0, 4: 1.0}), SparseVector(13, {0: 1.0, 1: 1.0, 3: 1.0, 4: 1.0, 7: 1.0}), SparseVector(13, {1: 1.0, 2: 1.0, 5: 1.0, 11: 1.0})]

我尝试在pyspark 2.0.0中将其转换为密集向量,如下所示

代码语言:javascript
复制
>>> frequencyVectors = countVectors.rdd.map(lambda vector: vector[1])
>>> frequencyVectors.map(lambda vector: Vectors.dense(vector)).collect()

我得到一个错误,如下所示:

代码语言:javascript
复制
16/12/26 14:03:35 ERROR Executor: Exception in task 0.0 in stage 13.0 (TID 13)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "<stdin>", line 1, in <lambda>
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/mllib/linalg/__init__.py", line 878, in dense
    return DenseVector(elements)
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/mllib/linalg/__init__.py", line 286, in __init__
    ar = np.array(ar, dtype=np.float64)
  File "/opt/BIG-DATA/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/ml/linalg/__init__.py", line 701, in __getitem__
    raise ValueError("Index %d out of bounds." % index)
ValueError: Index 13 out of bounds.

如何实现此转换?这里有什么问题吗?

EN

回答

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41328549

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档