我正在尝试从我的nodeJS服务器上运行一个作业,它将云存储上的AVRO文件加载到一个BigQuery表中。作业运行良好,但是date列作为表中的INTEGER类型加载。我在加载作业中包括了useAvroLogicalTypes参数,但它似乎没有任何效果。
如果我使用一个date在表中强制转换SELECT DATE(TIMESTAMP_MILLIS(date))列,我确实得到了正确的日期,但希望避免这个额外的转换步骤。我到处读到,如果设置了参数,则可以隐式转换avro逻辑类型,但我一直无法使它正常工作。表是由作业创建的,因此没有预先存在的架构。
我使用的客户端库版本是:4.4.0 for @google-cloud/bigquery 4.1.2 for @google-cloud/storage。
AVRO模式:
const schema = {
"name": "root",
"type": "record",
"fields": [
{ "name": "date", "type": ["null", { "type": "long", "logicalType": "date" }]},
{ "name": "medium", "type": ["null", "string"] },
{ "name": "source", "type": ["null", "string"] },
{ "name": "campaign", "type": ["null", "string"] },
]
};作业代码
const options = {
sourceFormat: 'AVRO',
writeDisposition: 'WRITE_TRUNCATE',
useAvroLogicalTypes: true,
datasetID,
};
bigquery
.dataset(datasetID)
.table(tableID)
.load(storage.bucket(bucketName).file(fileName), options)
.then(results => {
res = results[0];
// load() waits for the job to finish
console.log(`Job ${res.id} completed.`);
// Check the job's status for errors
const errors = res.status.errors;
if (errors && errors.length > 0) {
E = errors;
}
// This kicks the execution back to where the Fiber.yield() statement stopped it
fiber.resume();
})
.catch(err => {
console.error('ERROR:', err);
});原始数据样本:
data = [
{"date":"2019-08-01","medium":"(none)","source":"(direct)","campaign":"(not set)","users":3053},
{"date":"2019-08-01","medium":"(not set)","source":"email-client","campaign":"(not set)","users":3},
{"date":"2019-08-01","medium":"affiliate","source":"sdn","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"corner","campaign":"onboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"custom-playlist","campaign":"fonboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"deref-mail.com","campaign":"(not set)","users":2},
{"date":"2019-08-01","medium":"email","source":"faketempmail","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"fundx","campaign":"email_campaign","users":1},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"(not set)","users":14},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"100k","users":2},
]我使用long和momentJS以及一个简单的映射函数将date属性转换为underscoreJS:
data = _.map(data, row => {
row.date = moment(row.date).isValid() ? +moment(row.date).valueOf() : null;
return row;
});发布于 2019-11-15 22:04:34
您提到的方式是"默认设置“方式,因为当您不使用useAvroLogicalTypes时,Avro逻辑类型date将作为INTEGER存储在BigQuery中。
这还取决于您的AVRO模式是如何生成的。例如,我必须使用架构构建一个AVRO文件。
"fields":[
{"logicaltype": "date", "type": "string", "name": "field1"}
]我使用以下代码正确地上传了我的日期数据:
const metadata = {
sourceFormat: 'AVRO',
useAvroLogicalTypes: true,
createDisposition: 'CREATE_IF_NEEDED',
writeDisposition: 'WRITE_TRUNCATE',
schema: {
fields: [
{
name: "field1",
type: "DATE",
logicalType: "STRING",
mode: "NULLABLE"
}
],
},
location: 'US',
};
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(bucketName).file(filename), metadata);根据您共享的数据,它应该使用此配置,因为您的date数据是String。
希望能帮上忙。
https://stackoverflow.com/questions/58839444
复制相似问题