我有一个医生执照登记数据集,其中包括每个医生的total_submitted_charge_amount,以及与医疗保险和医疗补助的权利的数量。我使用了下面建议的答案中的查询:
with datamart AS
(SELECT npi,
provider_last_name,
provider_first_name,
provider_mid_initial,
provider_address_1,
provider_address_2,
provider_city,
provider_zipcode,
provider_state_code,
provider_country_code,
provider_type,
number_of_services,
CASE
WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE
WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN
null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017)
SELECT *
FROM datamart
ORDER BY total_submitted_charge_amount DESC不幸的是,我得到了错误
INVALID_CAST_ARGUMENT:不能将VARCHAR '‘转换为十进制(38,0)
此查询针对aggregatepayment_data_2017数据库运行,除非该查询对其进行限定。请在我们的论坛上发布错误信息,或者使用查询Id:be01d1e8-dc4d-4c75-a648-428dcb6be3a5与客户支持联系。“我尝试过十进制、真实、大int,没有任何东西适合于num_entitlement_medicare_medicaid的转换。下面是数据的屏幕截图:

有谁能建议一下如何改写这个查询呢?
发布于 2019-12-02 05:33:01
您获得错误的原因是列中有空白值(但不是空值),因此不能将varchar '‘转换为十进制。您可能可以使用case语句。另外,根据数据集列,num_entitlement_medicare_medicaid中有逗号',‘您不能替换’。
SELECT npi,
case
when REPLACE(num_entitlement_medicare_medicaid,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(num_entitlement_medicare_medicaid,'[^0-9.]', '') AS DECIMAL)
end as medicare_medicaid_entitlement,
case
when REPLACE(total_submitted_charge_amount,'[^A-Za-z0-9.]', '') ='' then null
else CAST(REPLACE(total_submitted_charge_amount,'[^0-9.]', '') AS DECIMAL)
end as total_submitted_charge_amount
FROM cmsaggregatepayment2017发布于 2019-12-04 01:08:29
您可以通过将数据转换为具有“干净”数据的新表,而不是在查询中放置强制转换/替换:
CREATE TABLE clean_table
WITH (format='Parquet', external_location='s3://my_bucket/clean_data/')
AS
SELECT
npi,
provider_last_name,
provider_first_name,
...
CASE WHEN REPLACE(num_entitlement_medicare_medicaid,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS medicare_medicaid_entitlement,
CASE WHEN REPLACE(total_submitted_charge_amount,',', '') ='' THEN null
ELSE CAST(REPLACE(num_entitlement_medicare_medicaid,',', '') AS DECIMAL)
END AS total_submitted_charge_amount
FROM cmsaggregatepayment2017您可以不需要进行任何转换就可以使用SELECT ... FROM clean_table。
在数据仓库中,这种类型的进程称为ETL (提取、转换、加载)。清洗过程是将数据转换为更有用的格式的“转换”。
发布于 2020-12-17 06:43:05
您可能想尝试一下presto中的try_cast()。这个版本适用于胁迫。如果有任何错误,它会避免它,并移动到下一个项目。
文件:https://prestodb.io/docs/current/functions/conversion.html
https://stackoverflow.com/questions/59132598
复制相似问题