嗨,伙计们,我正试着从谷歌的一些论文中得到引文。这是我的密码
import urllib
import mechanize
from bs4 import BeautifulSoup
import csv
import os #change directory
import re #for regular expressions
br = mechanize.Browser()
br.set_handle_equiv(False)
br.set_handle_robots(False) # ignore robots
br.addheaders = [('User-agent', 'Firefox')] # [()]
br.open('http://google.com/')
br.select_form(name='f') # Note: select the form named 'f' here
term = "Multinational Study of the Efficacy and Safety of Humanized Anti-HER2 Monoclonal Antibody in Women Who Have HER2-Overexpressing Metastatic Breast Cancer That Has Progressed After Chemotherapy for Metastatic Disease".replace(" ","+")
br.form['q'] = term # query
data = br.submit()
soup = BeautifulSoup(data)
cite= soup.findAll('div',{'class': 'f slp'})
ref = str(cite[1])
print ref不管怎么说,我总是得痔疮。我要这篇论文的引文数量。
发布于 2014-03-14 05:34:46
问题是在表单提交后得到的页面上没有引用信息,换句话说,在div类中没有f slp类。
您有几种解决方案:
另请参阅:
希望这能有所帮助。
https://stackoverflow.com/questions/22396751
复制相似问题