我正在尝试查询CloudFront日志,以根据IP地址推导出最大请求速率的流量基准。我正在努力将日期和时间列组合成一个时间戳,我可以用它来生成5分钟间隔。我是不是接近这个查询了?
SELECT from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) AS datetime, request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip
ORDER BY datetime ASC
LIMIT 10;我正在尝试达到一个点,我可以做这样的事情:
SELECT
FROM_UNIXTIME((UNIX_TIMESTAMP(table.timestamp) DIV 300) * 300) AS Timestamp,
request_ip,
COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY CONCAT(CAST(date as varchar), 'T', time, 'Z'), request_ip, count
ORDER BY datetime ASC
LIMIT 10;对于每个请求,以下是基于日志格式的cloudfront_logs表的格式。在创建表的过程中,我曾尝试创建连接的时间戳,但出现了很多语法错误:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,
request_ip STRING,
method STRING,
host STRING,
uri STRING,
status INT,
referrer STRING,
user_agent STRING,
query_string STRING,
cookie STRING,
result_type STRING,
request_id STRING,
host_header STRING,
request_protocol STRING,
request_bytes BIGINT,
time_taken FLOAT,
xforwarded_for STRING,
ssl_protocol STRING,
ssl_cipher STRING,
response_result_type STRING,
http_version STRING,
fle_status STRING,
fle_encrypted_fields INT,
c_port INT,
time_to_first_byte FLOAT,
x_edge_detailed_result_type STRING,
sc_content_type STRING,
sc_content_len BIGINT,
sc_range_start BIGINT,
sc_range_end BIGINT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://CloudFront_bucket_name/CloudFront/'
TBLPROPERTIES ( 'skip.header.line.count'='2' )摘自Querying Amazon CloudFront Logs上的亚马逊网络服务官方文档
失败示例:
CREATE EXTERNAL TABLE IF NOT EXISTS default.cloudfront_logs (
timestamp AS from_iso8601_timestamp(CONCAT(CAST(date as varchar), 'T', time, 'Z')) DATE,
`date` DATE,
time STRING,
location STRING,
bytes BIGINT,语法错误:
line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: e5147782-2e62-465e...)更新:
以下是来自实际日志的示例:
#Version: 1.0
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
2020-10-29 04:59:03 MEL50-C1 1524 192.168.0.2 GET example.cloudfront.net /mobile/js/category/category.js 200 https://www.example.com/ Mozilla/5.0%20(Linux;%20Android%2010;%20A7)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/192.168.0.1%20Mobile%20Safari/537.36 v=2020102601 cookie=redacted Hit BSVYL08_EXAMPLE== cdn.example.com https 2001 0.005 - TLSv1.3 TLS_AES_128_GCM_SHA256 Hit HTTP/2.0
- - 26906 0.004 Hit application/javascript 787 - -解决方案:
SELECT from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip, COUNT(request_ip) AS count
FROM cloudfront_logs
WHERE "date"
BETWEEN DATE '2020-10-28'
AND DATE '2020-11-04'
GROUP BY from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300), request_ip
ORDER BY count DESC;发布于 2020-11-06 01:50:12
也许有一种更短的方式来表达它,但你应该能够使用以下表达式将日期+时间舍入为5分钟间隔:
from_unixtime(floor(to_unixtime(from_iso8601_timestamp(CONCAT(CAST(date AS VARCHAR), 'T', time, 'Z'))) / 300) * 300)如果你在第一个查询中使用它,你应该会得到你想要的结果。
https://stackoverflow.com/questions/64683996
复制相似问题