文章/答案/技术大牛

发布

社区首页 >问答首页 >BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes

问BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes
EN

Stack Overflow用户

提问于 2019-11-13 14:34:15

回答 1查看 765关注 0票数 0

我正在尝试从我的nodeJS服务器上运行一个作业，它将云存储上的AVRO文件加载到一个BigQuery表中。作业运行良好，但是date列作为表中的INTEGER类型加载。我在加载作业中包括了useAvroLogicalTypes参数，但它似乎没有任何效果。

如果我使用一个date在表中强制转换SELECT DATE(TIMESTAMP_MILLIS(date))列，我确实得到了正确的日期，但希望避免这个额外的转换步骤。我到处读到，如果设置了参数，则可以隐式转换avro逻辑类型，但我一直无法使它正常工作。表是由作业创建的，因此没有预先存在的架构。

我使用的客户端库版本是：4.4.0 for @google-cloud/bigquery 4.1.2 for @google-cloud/storage。

AVRO模式：

const schema = {
    "name": "root",
    "type": "record",
    "fields": [
      { "name": "date", "type": ["null", { "type": "long", "logicalType": "date" }]},
      { "name": "medium", "type": ["null", "string"] },
      { "name": "source", "type": ["null", "string"] },
      { "name": "campaign", "type": ["null", "string"] },
    ]
  };

作业代码

const options = {
    sourceFormat: 'AVRO',
    writeDisposition: 'WRITE_TRUNCATE',
    useAvroLogicalTypes: true,
    datasetID,
  };

bigquery
    .dataset(datasetID)
    .table(tableID)
    .load(storage.bucket(bucketName).file(fileName), options)
    .then(results => {

      res = results[0];

      // load() waits for the job to finish
      console.log(`Job ${res.id} completed.`);

      // Check the job's status for errors
      const errors = res.status.errors;

      if (errors && errors.length > 0) {
        E = errors;
      }
      // This kicks the execution back to where the Fiber.yield() statement stopped it
      fiber.resume();
    })
    .catch(err => {
      console.error('ERROR:', err);
    });

原始数据样本：

data = [
{"date":"2019-08-01","medium":"(none)","source":"(direct)","campaign":"(not set)","users":3053},
{"date":"2019-08-01","medium":"(not set)","source":"email-client","campaign":"(not set)","users":3},
{"date":"2019-08-01","medium":"affiliate","source":"sdn","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"corner","campaign":"onboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"custom-playlist","campaign":"fonboarding","users":1},
{"date":"2019-08-01","medium":"email","source":"deref-mail.com","campaign":"(not set)","users":2},
{"date":"2019-08-01","medium":"email","source":"faketempmail","campaign":"(not set)","users":1},
{"date":"2019-08-01","medium":"email","source":"fundx","campaign":"email_campaign","users":1},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"(not set)","users":14},
{"date":"2019-08-01","medium":"email","source":"email-client","campaign":"100k","users":2},
]

我使用long和momentJS以及一个简单的映射函数将date属性转换为underscoreJS：

data = _.map(data, row => {
    row.date = moment(row.date).isValid() ? +moment(row.date).valueOf() : null;
    return row;
  });

node.js

google-bigquery

google-cloud-storage

avro

google-api-nodejs-client

回答 1

Stack Overflow用户

发布于 2019-11-15 22:04:34

您提到的方式是"默认设置“方式，因为当您不使用useAvroLogicalTypes时，Avro逻辑类型date将作为INTEGER存储在BigQuery中。

这还取决于您的AVRO模式是如何生成的。例如，我必须使用架构构建一个AVRO文件。

"fields":[
    {"logicaltype": "date", "type": "string", "name": "field1"}
]

我使用以下代码正确地上传了我的日期数据：

const metadata = {
      sourceFormat: 'AVRO',
      useAvroLogicalTypes: true,
      createDisposition: 'CREATE_IF_NEEDED',
      writeDisposition: 'WRITE_TRUNCATE',
      schema: {
        fields: [
          {
            name: "field1",
            type: "DATE",
            logicalType: "STRING",
            mode: "NULLABLE"
          }
        ],
      },
      location: 'US',
    };
    const [job] = await bigquery
      .dataset(datasetId)
      .table(tableId)
      .load(storage.bucket(bucketName).file(filename), metadata);

根据您共享的数据，它应该使用此配置，因为您的date数据是String。

希望能帮上忙。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58839444

复制

相似问题

问BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypesEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BigQuery: NodeJS客户端库在执行云存储加载工作时似乎不尊重useAvroLogicalTypes
EN