我有一个AVRO模式,目前在单个avsc文件,如下所示。现在我想将地址记录移动到一个不同的公共avsc文件中,该文件应该从许多其他avsc文件中引用。因此,客户和地址将是单独的avsc文件。怎样才能将它们分开,并有客户的avsc文件引用地址avsc文件。另外,如何使用python处理这两个文件。我目前正在使用python3中的fast avro来处理单个avsc文件,但可以使用python3或pyspark中的任何其他实用程序。
文件名- customer_details.avsc
[
{
"type": "record",
"namespace": "com.company.model",
"name": "AddressRecord",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "zip",
"type": "string"
}
]
},
{
"namespace": "com.company.model",
"type": "record",
"name": "Customer",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "email",
"type": "string"
},
{
"name": "phone",
"type": "string"
},
{
"name": "address",
"type": {
"type": "array",
"items": "com.company.model.AddressRecord"
}
}
]
}
]import fastavro
s1 = fastavro.schema.load_schema('customer_details.avsc')如何在地址记录文件可以从其他avsc文件引用的不同文件中拆分模式。那么,我如何使用快速Avro ( python )或任何其他python实用程序处理多个avsc文件?
发布于 2020-08-18 00:13:49
为此,AddressRecord的模式应位于名为com.company.model.AddressRecord.avsc的文件中,该文件包含以下内容:
{
"type": "record",
"namespace": "com.company.model",
"name": "AddressRecord",
"fields": [
{
"name": "streetaddress",
"type": "string"
},
{
"name": "city",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "zip",
"type": "string"
}
]
}Customer模式不一定需要特殊的命名约定,因为它是顶级模式,但是遵循相同的约定可能是个好主意。因此,它将位于名为com.company.model.Customer.avsc的文件中,其中包含以下内容:
{
"namespace": "com.company.model",
"type": "record",
"name": "Customer",
"fields": [
{
"name": "firstname",
"type": "string"
},
{
"name": "lastname",
"type": "string"
},
{
"name": "email",
"type": "string"
},
{
"name": "phone",
"type": "string"
},
{
"name": "address",
"type": {
"type": "array",
"items": "com.company.model.AddressRecord"
}
}
]
}这些文件必须位于同一目录中。
然后,您应该能够执行fastavro.schema.load_schema('com.company.model.Customer.avsc')
https://stackoverflow.com/questions/63443131
复制相似问题