首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python响应解码损坏

python响应解码损坏
EN

Stack Overflow用户
提问于 2021-02-14 11:35:18
回答 1查看 98关注 0票数 0

我对python和数据抓取很陌生。

我正在尝试使用python脚本获取一些汽车模型的数据。

我遇到的问题是python将响应解码为混合的文本,并且与响应内容不匹配。

发现我需要的信息包含在html head元素中的一个脚本标记中。

下面是我使用的简化脚本:

代码语言:javascript
复制
import requests
import lxml.html
urls = "https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html"
res = requests.get(urls)
print(res.headers)
tree = lxml.html.fromstring(res.content)
helem = lxml.html.tostring(tree.xpath('//head/script[@type=\'application/ld+json\']')[0])
print(helem)
print(helem.decode('utf-8'))

响应标头

'__cfduid=d938bb826c443ab15f20272199e2f18141613300048;{'Date':'Sun,2021年2月14日10:54:09格林尼治时间‘,'Content-Type':'text/html;charset=UTF-8','Transfer-Encoding':’分块‘,’连接‘:’保持活着‘,'Set-Cookie': expires=Tue,16-21 10:54:08 GMT;path=/;domain=.ultimatespecs.com;HttpOnly;SameSite=Lax,PHPSESSID=ea60d27909207143c5ccd860e6fb3b76;path=/',“过期”:‘清华,1981年11月19日08:52:00格林尼治时间’,'Cache-Control':‘无存储,无缓存,必须重新验证’,'Pragma':' no-cache ',‘Cache’:'Accept-Encoding,User‘,'CF-Cache-Status':’动态‘,’cf-请求-id‘:’0841c63a9c0000b61bda3810000001‘,’Expect CT‘:’max-CT‘=604800,报告uri=“https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"',‘’Report To‘:{“group”:“CF-nel”,"endpoints":{"url":"https:\/\/a.nel.cloudflare.com\/report?s=kB6vGZn5zLDoI%2FeQt9AF8174Aanh5La%2Bvh2beLKlCdnrHv5jbEIhC0h3FUVb56wTidKKSMFq1zuWhbakIydNto3EBXMZRt%2BwLD2FZgMsmHH53aJpanc%3D"},“max_age”:604800},'NEL':{“max_age”:604800,“report_to”:“CF-nel”},'Server':'cloudflare','CF-RAY':'62163fd76b76b61b-TLL',“内容-编码”:'gzip'}

作为字节的helem:

b‘\r\t’t\t\t‘t’http://schema.org/",\r\t\t"@type":“汽车”,\r\t“品牌”:“奥迪”,\t“制造商”:“奥迪”,\r“名称”:“奥迪A3 (8Y)轿车35 TDI",”描述“:”35 TDI规格:功率150 PS (148马力);柴油;平均消耗量:3.6升/100公里(65公斤);尺寸:长度:449.5厘米(176.97英寸);宽度:181.6厘米(71.5英寸);高度:142.5厘米(56.1英寸);重量:1390公斤(3064磅);202021年示范年,"productionDate":"2020","mainEntityOfPage":"https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html","image":{r“@type:"ImageObject",\r”contentUrl“:"https://www.ultimatespecs.com/wallpaper.php?id=7243"\r\t\t\t\t\t}\r\t\t\t\t\t,"height":{r\t\t”@type:"QuantitativeValue",\r t“unitCode”:"CMT",\t\t“值”:“142.5”\r\t},“宽度”:{r\t“@type”:"QuantitativeValue",\r\t“unitCode”:"CMT",\r\t“值”:“181.6”\r\t},“权重”:{r\t“@type”:"QuantitativeValue",“unitCode”:"KGM",“t”“值”:“1390”\t},"accelerationTime":{r“@type”:"QuantitativeValue",“unitCode”:"SEC",“t”“值”:“8.4”"driveWheelConfiguration":{r\t“@type”:"DriveWheelConfigurationValue",\r\t“@id”:"https://schema.org/FrontWheelDriveConfiguration"},"bodyType":“轿车”“,"cargoVolume":”@type“:"QuantitativeValue",”unitCode“:"LTR",”值“:"425"},"emissionsCO2":"96","fuelCapacity":{r\t“@type”:"QuantitativeValue",\r\t“unitCode”:"LTR",“unitCode”:“50”\r\t},"fuelConsumption":{r“@type”:"QuantitativeValue",\r\t“unitText”:"L/100 km",\r\t“valueReference”:“平均”,"fuelEfficiency":{r\t“@type”:"QuantitativeValue",“unitText”:"US“,”valueReference“:”平均值“,”t“”值“:”65“,"fuelType":”柴油机“,"numberOfDoors":"4","vehicleSeatingCapacity":"5","numberOfForwardGears":"7","vehicleTransmission":“双离合器自动”,“轴距”:{r\t“@type”:"QuantitativeValue",\r\t\t“unitCode”:"CMT",“值”:“263.6”\r\t\t},“速度”:{r\t“@type”:"QuantitativeValue",“unitCode”:"KMH","value":“232”\r},"vehicleConfiguration":"35 TDI",fuelType:“fuelType”,"engineDisplacement":{r“@type”:"QuantitativeValue",“unitCode”:"QuantitativeValue",“unitCode”:"NU",“值”:"360"},"enginePower":{r“@type”:"QuantitativeValue",“unitCode”:"N12",“值”:“150”}‘

作为文本的helem:

“值”:“150”}:{水泥“:{eEngine":[SeatingCapacity":"5","numberOfForwardGears":"7","vehicleTransmission":”双离合器自动“,”轴距“:{(176.97英寸);宽度:181.6厘米(71.5英寸);高度:142.5厘米(56.1英寸);重量:1390公斤(3064磅);模型年20202021年,"productionDate":"2020","mainEntityOfPage":"https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html","image":{

如您所见,解码后的文本本身重叠多次。

我做错什么了?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-14 13:21:22

如果我理解正确,您将查找以下数据。

代码语言:javascript
复制
import requests
import lxml.html
import json
import pprint as pp
urls = "https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html"
res = requests.get(urls)
tree = lxml.html.fromstring(res.content)
helem = tree.xpath('//head/script[@type=\'application/ld+json\']')[0].text
data = json.loads(helem)
pp.pprint(data,)

输出

代码语言:javascript
复制
{'@context': 'http://schema.org/',
 '@type': 'Car',
 'accelerationTime': {'@type': 'QuantitativeValue',
                      'unitCode': 'SEC',
                      'value': '8.4'},
 'bodyType': 'Sedan',
 'brand': 'Audi',
 'cargoVolume': {'@type': 'QuantitativeValue',
                 'unitCode': 'LTR',
                 'value': '425'},
 'description': '35 TDI Specs:Power 150 PS (148 hp); Diesel;Average '
                'consumption:3.6 l/100km (65 MPG);Dimensions: Length:449.5 cm '
                '(176.97 inches); Width:181.6 cm (71.5 inches);Height:142.5 cm '
                '(56.1 inches);Weight:1390 kg (3064 lbs);Model Years 2020,2021',
 'driveWheelConfiguration': {'@id': 'https://schema.org/FrontWheelDriveConfiguration',
                             '@type': 'DriveWheelConfigurationValue'},
 'emissionsCO2': '96',
 'fuelCapacity': {'@type': 'QuantitativeValue',
                  'unitCode': 'LTR',
                  'value': '50'},
 'fuelConsumption': {'@type': 'QuantitativeValue',
                     'unitText': 'L/100 km',
                     'value': '3.6',
                     'valueReference': 'Average'},
 'fuelEfficiency': {'@type': 'QuantitativeValue',
                    'unitText': 'US MPG',
                    'value': '65',
                    'valueReference': 'Average'},
 'fuelType': 'Diesel',
 'height': {'@type': 'QuantitativeValue', 'unitCode': 'CMT', 'value': '142.5'},
 'image': {'@type': 'ImageObject',
           'contentUrl': 'https://www.ultimatespecs.com/wallpaper.php?id=7243'},
 'mainEntityOfPage': 'https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html',
 'manufacturer': 'Audi',
 'name': 'Audi A3 (8Y) Sedan 35 TDI',
 'numberOfDoors': '4',
 'numberOfForwardGears': '7',
 'productionDate': '2020',
 'speed': {'@type': 'QuantitativeValue', 'unitCode': 'KMH', 'value': '232'},
 'vehicleConfiguration': '35 TDI',
 'vehicleEngine': [{'@type': 'EngineSpecification',
                    'engineDisplacement': {'@type': 'QuantitativeValue',
                                           'unitCode': 'CMQ',
                                           'value': '1968'},
                    'enginePower': {'@type': 'QuantitativeValue',
                                    'unitCode': 'N12',
                                    'value': '150'},
                    'fuelType': 'Diesel',
                    'torque': {'@type': 'QuantitativeValue',
                               'unitCode': 'NU',
                               'value': '360'}}],
 'vehicleSeatingCapacity': '5',
 'vehicleTransmission': 'Dualclutch Automatic',
 'weight': {'@type': 'QuantitativeValue', 'unitCode': 'KGM', 'value': '1390'},
 'wheelbase': {'@type': 'QuantitativeValue',
               'unitCode': 'CMT',
               'value': '263.6'},
 'width': {'@type': 'QuantitativeValue', 'unitCode': 'CMT', 'value': '181.6'}}

Process finished with exit code 0
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66195006

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档