首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >语法为sparksql dataframe定义模式时出错

语法为sparksql dataframe定义模式时出错
EN

Stack Overflow用户
提问于 2017-03-08 06:58:25
回答 1查看 238关注 0票数 0

我的pyspark控制台告诉我,在我的for循环后面的行上有无效的语法。控制台直到包含SyntaxError的schema =StructType(字段)行才执行for循环,但是for循环在我看来很不错……

代码语言:javascript
复制
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc)

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv')
parts = lines.map(lambda l: l.split(','))
surveys_responses = parts.map(lambda p: (p[0:33]))
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score'
fields = []
for field_name in schemaString.split(", "):
    if field_name != ("HCAHPS Base Score" | "HCAHPS Consistency Score"):
        fields.append(StructField(field_name, StringType(), True))
    else:
        fields.append(StructField(field_name, IntegerType(), True))
schema = StructType(fields)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-03-08 12:29:40

这里|!=条件是错误的,所以请使用:-

代码语言:javascript
复制
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
sqlContext = SQLContext(sc)

lines = sc.textFile('file:///home/w205/hospital_compare/surveys_responses.csv')
parts = lines.map(lambda l: l.split(','))
surveys_responses = parts.map(lambda p: (p[0:33]))
schemaString = 'Provider Number, Hospital Name, Address, City, State, ZIP Code, County Name, Communication with Nurses Achievement Points, Communication with Nurses Improvement Points, Communication with Nurses Dimension Score, Communication with Doctors Achievement Points, Communication with Doctors Improvement Points, Communication with Doctors Dimension Score, Responsiveness of Hospital Staff Achievement Points, Responsiveness of Hospital Staff Improvement Points, Responsiveness of Hospital Staff Dimension Score, Pain Management Achievement Points, Pain Management Improvement Points, Pain Management Dimension Score, Communication about Medicines Achievement Points, Communication about Medicines Improvement Points, Communication about Medicines Dimension Score, Cleanliness and Quietness of Hospital Environment Achievement Points, Cleanliness and Quietness of Hospital Environment Improvement Points, Cleanliness and Quietness of Hospital Environment Dimension Score, Discharge Information Achievement Points, Discharge Information Improvement Points, Discharge Information Dimension Score, Overall Rating of Hospital Achievement Points, Overall Rating of Hospital Improvement Points, Overall Rating of Hospital Dimension Score, HCAHPS Base Score, HCAHPS Consistency Score'
fields = []
for field_name in schemaString.split(", "):
    if field_name not in ("HCAHPS Base Score", "HCAHPS Consistency Score"):
        fields.append(StructField(field_name, StringType(), True))
    else:
        fields.append(StructField(field_name, IntegerType(), True))
schema = StructType(fields)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42659944

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档