首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes

BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes
EN

Stack Overflow用户
提问于 2019-11-13 14:34:15
回答 1查看 765关注 0票数 0

我正在尝试从我的nodeJS服务器上运行一个作业,它将云存储上的AVRO文件加载到一个BigQuery表中。作业运行良好,但是date列作为表中的INTEGER类型加载。我在加载作业中包括了useAvroLogicalTypes参数,但它似乎没有任何效果。

如果我使用一个date在表中强制转换SELECT DATE(TIMESTAMP_MILLIS(date))列,我确实得到了正确的日期,但希望避免这个额外的转换步骤。我到处读到,如果设置了参数,则可以隐式转换avro逻辑类型,但我一直无法使它正常工作。表是由作业创建的,因此没有预先存在的架构。

我使用的客户端库版本是:4.4.0 for @google-cloud/bigquery 4.1.2 for @google-cloud/storage

AVRO模式:

代码语言:javascript
复制
const schema = {
    "name": "root",
    "type": "record",
    "fields": [
      { "name": "date", "type": ["null", { "type": "long", "logicalType": "date" }]},
      { "name": "medium", "type": ["null", "string"] },
      { "name": "source", "type": ["null", "string"] },
      { "name": "campaign", "type": ["null", "string"] },
    ]
  };

作业代码

代码语言:javascript
复制
const options = {
    sourceFormat: 'AVRO',
    writeDisposition: 'WRITE_TRUNCATE',
    useAvroLogicalTypes: true,
    datasetID,
  };

bigquery
    .dataset(datasetID)
    .table(tableID)
    .load(storage.bucket(bucketName).file(fileName), options)
    .then(results => {

      res = results[0];

      // load() waits for the job to finish
      console.log(`Job ${res.id} completed.`);

      // Check the job's status for errors
      const errors = res.status.errors;

      if (errors && errors.length > 0) {
        E = errors;
      }
      // This kicks the execution back to where the Fiber.yield() statement stopped it
      fiber.resume();
    })
    .catch(err => {
      console.error('ERROR:', err);
    });

原始数据样本:

代码语言:javascript
复制
data = [
{"date":"2019-08-01","medium":"(none)","source":"(direct)","campaign":"(not set)","users":3053},
{"date":"2019-08-01","medium":"(not set)","source":"email-client","campaign":"(not set)","users":3},
{"date":"2019-08-01","medium":"affiliate","source":"sdn","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"corner","campaign":"onboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"custom-playlist","campaign":"fonboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"deref-mail.com","campaign":"(not set)","users":2},
{"date":"2019-08-01","medium":"email","source":"faketempmail","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"fundx","campaign":"email_campaign","users":1},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"(not set)","users":14},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"100k","users":2},
]

我使用long和momentJS以及一个简单的映射函数将date属性转换为underscoreJS:

代码语言:javascript
复制
data = _.map(data, row => {
    row.date = moment(row.date).isValid() ? +moment(row.date).valueOf() : null;
    return row;
  });
EN

回答 1

Stack Overflow用户

发布于 2019-11-15 22:04:34

您提到的方式是"默认设置“方式,因为当您不使用useAvroLogicalTypes时,Avro逻辑类型date将作为INTEGER存储在BigQuery中。

这还取决于您的AVRO模式是如何生成的。例如,我必须使用架构构建一个AVRO文件。

代码语言:javascript
复制
"fields":[
    {"logicaltype": "date", "type": "string", "name": "field1"}
]

我使用以下代码正确地上传了我的日期数据:

代码语言:javascript
复制
const metadata = {
      sourceFormat: 'AVRO',
      useAvroLogicalTypes: true,
      createDisposition: 'CREATE_IF_NEEDED',
      writeDisposition: 'WRITE_TRUNCATE',
      schema: {
        fields: [
          {
            name: "field1",
            type: "DATE",
            logicalType: "STRING",
            mode: "NULLABLE"
          }
        ],
      },
      location: 'US',
    };
    const [job] = await bigquery
      .dataset(datasetId)
      .table(tableId)
      .load(storage.bucket(bucketName).file(filename), metadata);

根据您共享的数据,它应该使用此配置,因为您的date数据是String

希望能帮上忙。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58839444

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档