我试着写一个蜘蛛来从Steam's top-sellers列表中获取一些信息。但是我的代码有一些问题。我想是关于're‘模块的,因为我不能在for循环中打印那些代码。运行代码时,它总是在我提供的文件中写入"[]“。
def getDetail(self, url):
source = self.getSource(url)
pattern = re.compile('<div class="col search_name ellipsis"><span class="title">(.*?)</span>', re.S)
items = re.findall(pattern, source)
print(re.findall(pattern, source))
number = 1
for item in items:
print('Crawling No.%d game' % number)
print('Name: %s' % item[0])
number += 1
time.sleep(0.1)
return items这是我的全部代码。
import requests
import re
import time
class Spider(object):
def __init__(self):
self.siteURL = 'http://store.steampowered.com/search/?filter=topsellers'
def getSource(self, url):
user_agent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) ' \
'Chrome/45.0.2454.101 Safari/537.36'
headers = {'User_agent': user_agent}
r = requests.get(url, headers=headers)
return r.text
def getDetail(self, url):
source = self.getSource(url)
pattern = re.compile('<div class="col search_name ellipsis"><span class="title">(.*?)</span>', re.S)
items = re.findall(pattern, source)
print(re.findall(pattern, source))
number = 1
for item in items:
print('Crawling No.%d game' % number)
print('Name: %s' % item[0])
number += 1
time.sleep(0.1)
return items
def saveDetail(self):
data = str(self.getDetail(self.siteURL))
fileName = 'SteamTopseller.txt'
f = open(fileName, 'wb')
f.write(data.encode('utf-8'))
print('Successfully written!')
f.close()
if __name__ == '__main__':
spider = Spider()
spider.saveDetail()请帮我解决这个小问题,谢谢!顺便说一下,我是用python3编写代码的。
发布于 2017-04-18 17:13:04
.
re.findall(pattern,string,flags=0)
以字符串列表的形式返回字符串中模式的所有非重叠匹配。
因此,如果字符串中没有匹配项,它将返回一个空列表,如[]。
+++++++++++++++++++++++++++++++++++++++++++++++++++++++
要跳过"[]",您可以编写如下代码
items = re.findall(pattern, source)
if items:
print(items)++++++++++++++++++++++++++++++++++++++++++++++++++++++++
删除换行符
def getDetail(self, url):
source = self.getSource(url).replace("\r", "").replace("\n", "").replace("\t", "")https://stackoverflow.com/questions/43467275
复制相似问题