首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >SPARK -无法读取多行JSON (corrupt_record: string (nullable = true))

SPARK -无法读取多行JSON (corrupt_record: string (nullable = true))
EN

Stack Overflow用户
提问于 2018-08-20 04:34:37
回答 1查看 674关注 0票数 1

我正在寻找关于书名问题的建议。我在databricks (https://docs.databricks.com/spark/latest/data-sources/read-json.html)中读到,我可以用以下表达式读取一个多行json数据帧:

代码语言:javascript
复制
 println("2.2 Dataframe Multiline")
       MULTILINE MODE!!
    val df2=spark.read.option("multiline","true").option("charset","UTF-8").json("EXPORT1.json")
    df2.printSchema()

这对我不起作用。如果我手动从JSON中删除所有隔断线,结果模式如下:

代码语言:javascript
复制
root
 |-- results: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- address_components: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- long_name: string (nullable = true)
 |    |    |    |    |-- short_name: string (nullable = true)
 |    |    |    |    |-- types: array (nullable = true)
 |    |    |    |    |    |-- element: string (containsNull = true)
 |    |    |-- formatted_address: string (nullable = true)
 |    |    |-- geometry: struct (nullable = true)
 |    |    |    |-- bounds: struct (nullable = true)
 |    |    |    |    |-- northeast: struct (nullable = true)
 |    |    |    |    |    |-- lat: double (nullable = true)
 |    |    |    |    |    |-- lng: double (nullable = true)
 |    |    |    |    |-- southwest: struct (nullable = true)
 |    |    |    |    |    |-- lat: double (nullable = true)
 |    |    |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- location: struct (nullable = true)
 |    |    |    |    |-- lat: double (nullable = true)
 |    |    |    |    |-- lng: double (nullable = true)
 |    |    |    |-- location_type: string (nullable = true)
 |    |    |    |-- viewport: struct (nullable = true)
 |    |    |    |    |-- northeast: struct (nullable = true)
 |    |    |    |    |    |-- lat: double (nullable = true)
 |    |    |    |    |    |-- lng: double (nullable = true)
 |    |    |    |    |-- southwest: struct (nullable = true)
 |    |    |    |    |    |-- lat: double (nullable = true)
 |    |    |    |    |    |-- lng: double (nullable = true)
 |    |    |-- place_id: string (nullable = true)
 |    |    |-- types: array (nullable = true)
 |    |    |    |-- element: string (containsNull = true)
 |-- status: string (nullable = true)+

这是我从Google下载的JSON示例:

代码语言:javascript
复制
{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "30152",
               "short_name" : "30152",
               "types" : [ "postal_code" ]
            },
            {
               "long_name" : "Murcia",
               "short_name" : "Murcia",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Murcia",
               "short_name" : "MU",
               "types" : [ "administrative_area_level_2", "political" ]
            },
            {
               "long_name" : "Region of Murcia",
               "short_name" : "Region of Murcia",
               "types" : [ "administrative_area_level_1", "political" ]
            },
            {
               "long_name" : "Spain",
               "short_name" : "ES",
               "types" : [ "country", "political" ]
            }
         ],
         "formatted_address" : "30152 Murcia, Spain",
         "geometry" : {
            "bounds" : {
               "northeast" : {
                  "lat" : 37.9659196,
                  "lng" : -1.1346723
               },
               "southwest" : {
                  "lat" : 37.9442828,
                  "lng" : -1.1687921
               }
            },
            "location" : {
               "lat" : 37.9569734,
               "lng" : -1.1496969
            },
            "location_type" : "APPROXIMATE",
            "viewport" : {
               "northeast" : {
                  "lat" : 37.9659196,
                  "lng" : -1.1346723
               },
               "southwest" : {
                  "lat" : 37.9442828,
                  "lng" : -1.1687921
               }
            }
         },
         "place_id" : "ChIJZbDcb0Z_Yw0RUK0TPnKvAhw",
         "types" : [ "postal_code" ]
      }
   ],
   "status" : "OK"
}

因为我想向Google提交很多请愿书,所以我不能手动删除隔断线。

有谁能帮我吗?提前谢谢。

EN

回答 1

Stack Overflow用户

发布于 2018-08-22 04:56:12

为了解决这个问题,我存储了JSON,删除了所有的换行符:

下面的类接受地址,组件,...并将地理位置请求写入JSON

代码语言:javascript
复制
class Geolocation(var Address: String, var Component: String, var APIKey: String,  var JSONName:Int ){
 val GeoLocURL_REQ="https://maps.googleapis.com/maps/api/geocode/json?address="+Address+"&components="+Component+"&key="+APIKey
  val filename=JSONName.toString+"_LatLon.json"
  val file = new File(filename)
  val bw = new BufferedWriter(new FileWriter(file))
  val svc = url(GeoLocURL_REQ)
  val response : Future[String] = Http(svc OK as.String)

  response onComplete {
    case Success(content) => {
      println("worked!" + content)
      bw.write(content.replaceAll("\\s", ""))  //con un \\n va
      //bw.write(content)
      bw.close()
    }
    case Failure(t) => {
      println("failed:! " + t.getMessage)
    }
  }
}

代码语言:javascript
复制
import dispatch._, Defaults._


  var APIKey="TYPE YOUR OWN API HERE"
    var PostalCode=30152
    var Localidad = "Murcia"
    val Component="postal_code="+PostalCode+"%7Ccountry=ES"  // "|" = %7C
    var Address=Localidad+"+"+PostalCode

    val geolocation= new Geolocation(Address,Component,APIKey, PostalCode )

希望这对某些人有帮助!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51921636

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档