我正在寻找关于书名问题的建议。我在databricks (https://docs.databricks.com/spark/latest/data-sources/read-json.html)中读到,我可以用以下表达式读取一个多行json数据帧:
println("2.2 Dataframe Multiline")
MULTILINE MODE!!
val df2=spark.read.option("multiline","true").option("charset","UTF-8").json("EXPORT1.json")
df2.printSchema()
这对我不起作用。如果我手动从JSON中删除所有隔断线,结果模式如下:
root
|-- results: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- address_components: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- long_name: string (nullable = true)
| | | | |-- short_name: string (nullable = true)
| | | | |-- types: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | |-- formatted_address: string (nullable = true)
| | |-- geometry: struct (nullable = true)
| | | |-- bounds: struct (nullable = true)
| | | | |-- northeast: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | | |-- southwest: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | |-- location: struct (nullable = true)
| | | | |-- lat: double (nullable = true)
| | | | |-- lng: double (nullable = true)
| | | |-- location_type: string (nullable = true)
| | | |-- viewport: struct (nullable = true)
| | | | |-- northeast: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | | | |-- southwest: struct (nullable = true)
| | | | | |-- lat: double (nullable = true)
| | | | | |-- lng: double (nullable = true)
| | |-- place_id: string (nullable = true)
| | |-- types: array (nullable = true)
| | | |-- element: string (containsNull = true)
|-- status: string (nullable = true)+
这是我从Google下载的JSON示例:
{
"results" : [
{
"address_components" : [
{
"long_name" : "30152",
"short_name" : "30152",
"types" : [ "postal_code" ]
},
{
"long_name" : "Murcia",
"short_name" : "Murcia",
"types" : [ "locality", "political" ]
},
{
"long_name" : "Murcia",
"short_name" : "MU",
"types" : [ "administrative_area_level_2", "political" ]
},
{
"long_name" : "Region of Murcia",
"short_name" : "Region of Murcia",
"types" : [ "administrative_area_level_1", "political" ]
},
{
"long_name" : "Spain",
"short_name" : "ES",
"types" : [ "country", "political" ]
}
],
"formatted_address" : "30152 Murcia, Spain",
"geometry" : {
"bounds" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
},
"location" : {
"lat" : 37.9569734,
"lng" : -1.1496969
},
"location_type" : "APPROXIMATE",
"viewport" : {
"northeast" : {
"lat" : 37.9659196,
"lng" : -1.1346723
},
"southwest" : {
"lat" : 37.9442828,
"lng" : -1.1687921
}
}
},
"place_id" : "ChIJZbDcb0Z_Yw0RUK0TPnKvAhw",
"types" : [ "postal_code" ]
}
],
"status" : "OK"
}
因为我想向Google提交很多请愿书,所以我不能手动删除隔断线。
有谁能帮我吗?提前谢谢。
发布于 2018-08-22 04:56:12
为了解决这个问题,我存储了JSON,删除了所有的换行符:
下面的类接受地址,组件,...并将地理位置请求写入JSON
class Geolocation(var Address: String, var Component: String, var APIKey: String, var JSONName:Int ){
val GeoLocURL_REQ="https://maps.googleapis.com/maps/api/geocode/json?address="+Address+"&components="+Component+"&key="+APIKey
val filename=JSONName.toString+"_LatLon.json"
val file = new File(filename)
val bw = new BufferedWriter(new FileWriter(file))
val svc = url(GeoLocURL_REQ)
val response : Future[String] = Http(svc OK as.String)
response onComplete {
case Success(content) => {
println("worked!" + content)
bw.write(content.replaceAll("\\s", "")) //con un \\n va
//bw.write(content)
bw.close()
}
case Failure(t) => {
println("failed:! " + t.getMessage)
}
}
}
import dispatch._, Defaults._
var APIKey="TYPE YOUR OWN API HERE"
var PostalCode=30152
var Localidad = "Murcia"
val Component="postal_code="+PostalCode+"%7Ccountry=ES" // "|" = %7C
var Address=Localidad+"+"+PostalCode
val geolocation= new Geolocation(Address,Component,APIKey, PostalCode )
希望这对某些人有帮助!
https://stackoverflow.com/questions/51921636
复制相似问题