首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >我的列表(f3)没有打印输出,并且一直给我StaleElementReferenceException?

我的列表(f3)没有打印输出,并且一直给我StaleElementReferenceException?
EN

Stack Overflow用户
提问于 2021-11-16 20:40:22
回答 1查看 68关注 0票数 0

因此,在过去的一周里,我一直在研究这部分代码,我设法更好地理解了codelogic。为了让你知道:我试图在crunchbase上收集每个电动汽车公司的创始人信息(姓名,性别,学校信息)。我想我应该做的是创建不同的字典,因为一些信息在页面的不同部分。代码如下:

代码语言:javascript
复制
#imports
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import TimeoutException
import pandas as pd
import time

#driver path
PATH = "C:/Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

#access crunchbase ui
driver.get("https://www.crunchbase.com/search/organizations/field/organization.companies/categories/electric-vehicle")
driver.maximize_window()
time.sleep(5)
print(driver.title)
    
time.sleep(3)
   
#await element location

WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, ('//a[@aria-label="Next"][@aria-disabled="false"][@type="button"]'))))
    
#next page
page = driver.find_element_by_xpath('/html/body/chrome/div/mat-sidenav-container/mat-sidenav-content/div/search/page-layout/div/div/form/div[2]/results/div/div/div[1]/div/results-info/h3/a[2]')

company_list = [] ###create dictionary
   
counter = 0
for _ in range(2):
    if counter == 1:
        break
    counter += 1
    
    if page.is_displayed():
        
        time.sleep(25)
        
        #webscrape through iterations/rows
        all_rows = driver.find_elements_by_css_selector("grid-row")
                                                      
        for row in all_rows:
            companyname = row.find_element_by_xpath('.//*[@class="identifier-label"]')
            companyname.click()
            time.sleep(10)
            
            ###founder info
            founders = driver.find_element_by_css_selector("body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout:nth-child(3) > div > div > div.main-content > row-card:nth-child(2) > profile-section > section-card > mat-card > div.section-content-wrapper > div > fields-card:nth-child(1) > ul > li:nth-child(4) > field-formatter > identifier-multi-formatter > span > a")
            ActionChains(driver).move_to_element(founders).perform()
            founders.click()
            f1 = {
                'founder name': driver.find_element_by_xpath('.//*[@class="profile-name"]').text.strip(),
                'founder gender': driver.find_element_by_css_selector('body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout.ng-star-inserted > div > div > div.main-content > row-card:nth-child(1) > profile-section > section-card > mat-card > div.section-content-wrapper > div > fields-card:nth-child(3) > ul > li:nth-child(3) > field-formatter > span').text.strip(),
                }
            fschool = driver.find_element_by_css_selector('body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout.ng-star-inserted > div > div > div.main-content > row-card:nth-child(7) > profile-section > section-card > mat-card > div.section-content-wrapper > div > image-list-card > ul > li > div > field-formatter:nth-child(5) > span')
            ActionChains(driver).move_to_element(fschool).perform()
            f2 = {
                'school': driver.find_element_by_css_selector('body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout.ng-star-inserted > div > div > div.main-content > row-card:nth-child(7) > profile-section > section-card > mat-card > div.section-content-wrapper > div > image-list-card > ul > li > div > a').text.strip(),
                'degree type': driver.find_element_by_css_selector('body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout.ng-star-inserted > div > div > div.main-content > row-card:nth-child(7) > profile-section > section-card > mat-card > div.section-content-wrapper > div > image-list-card > ul > li > div > field-formatter:nth-child(2) > span').text.strip(),
                'degree': driver.find_element_by_css_selector('body > chrome > div > mat-sidenav-container > mat-sidenav-content > div > ng-component > entity-v2 > page-layout > div > div > div > page-centered-layout.ng-star-inserted > div > div > div.main-content > row-card:nth-child(7) > profile-section > section-card > mat-card > div.section-content-wrapper > div > image-list-card > ul > li > div > field-formatter:nth-child(3) > span').text.strip()
                }
            f3 = {**f1, **f2}
            print(f3)
            company_list.append(f1)
        print("next")
        page.click()
    
    
#create dataframe    
df = pd.DataFrame(company_list)

print(df)

#create excel writer object
writer=pd.ExcelWriter('crunchbasedemo.xlsx')

#export to excel
df.to_excel(writer)

writer.save()



print("It's alive!")

由于某些原因,f3 (合并的f1和f2字典)不能打印,当我到达打印点时,我不断收到这个错误:

代码语言:javascript
复制
StaleElementReferenceException: stale element reference: element is not attached to the page document

有什么想法吗?

编辑代码:

代码语言:javascript
复制
hrefs=[x.get_attribute('href') for x in driver.find_elements_by_xpath('//a[@class="component--field-formatter field-type-identifier link-accent ng-star-inserted"]')]
names=[x.get_attribute('title') for x in driver.find_elements_by_xpath('//a[@class="component--field-formatter field-type-identifier link-accent ng-star-inserted"]')]
print(names)
print(hrefs)

company_list=[]

for href in hrefs:    
    driver.get(href)
    try: 
        founders=[x.get_attribute('href') for x in driver.find_elements_by_xpath("//li[@class='ng-star-inserted' and contains(.,'Founders')]//a[@class='link-accent ng-star-inserted']")]
        founder_names = [x.get_attribute('title') for x in driver.find_elements_by_xpath("//li[@class='ng-star-inserted' and contains(.,'Founders')]//a[@class='link-accent ng-star-inserted']")]
        print(founder_names)
        for founder in founders:
            driver.get(founder)
            try:
                fschool = driver.find_elements_by_xpath("(//li[@class='ng-star-inserted']//a[@class='link-accent'])[5]")
                ActionChains(driver).move_to_element(fschool).perform() 
                print(fschool)
            except:
                pass
    except: 
        pass
EN

回答 1

Stack Overflow用户

发布于 2021-11-16 21:53:04

此答案不完整

代码语言:javascript
复制
driver.get("https://www.crunchbase.com/search/organizations/field/organization.companies/categories/electric-vehicle")
driver.maximize_window()
time.sleep(5)
print(driver.title)

hrefs=[x.get_attribute('href') for x in driver.find_elements_by_xpath('//a[@class="component--field-formatter field-type-identifier link-accent ng-star-inserted"]')]
names=[x.get_attribute('title') for x in driver.find_elements_by_xpath('//a[@class="component--field-formatter field-type-identifier link-accent ng-star-inserted"]')]
print(names)
print(hrefs)
company_list=[]
for href in hrefs:    
    driver.get(href)
    try: 
        founders=[x.get_attribute('href') for x in driver.find_elements_by_xpath("//li[@class='ng-star-inserted' and contains(.,'Founders')]//a[@class='link-accent ng-star-inserted']")]
        founder_names = [x.get_attribute('title') for x in driver.find_elements_by_xpath("//li[@class='ng-star-inserted' and contains(.,'Founders')]//a[@class='link-accent ng-star-inserted']")]
        print(founder_names)
        for founder in founders:
            driver.get(founder)
    except:
        pass

输出:

代码语言:javascript
复制
Query Builder | Organizations | Crunchbase
['Bird', 'Rivian', 'Zoomo', 'NIO', 'Tesla', 'Ample', 'Rad Power Bikes', 'Wallbox', 'ChargePoint', 'Ather Energy', 'SES', 'Ola Electric', 'EVgo', 'Canoo', 'Wayve']
['https://www.crunchbase.com/organization/bird', 'https://www.crunchbase.com/organization/rivian-automotive', 'https://www.crunchbase.com/organization/bolt-bikes', 'https://www.crunchbase.com/organization/nextev', 'https://www.crunchbase.com/organization/tesla-motors', 'https://www.crunchbase.com/organization/ample-6b70', 'https://www.crunchbase.com/organization/rad-power-bikes', 'https://www.crunchbase.com/organization/wallbox', 'https://www.crunchbase.com/organization/chargepoint', 'https://www.crunchbase.com/organization/ather-energy', 'https://www.crunchbase.com/organization/solidenergy', 'https://www.crunchbase.com/organization/ola-electric', 'https://www.crunchbase.com/organization/nrg-evgo', 'https://www.crunchbase.com/organization/canoo-tech', 'https://www.crunchbase.com/organization/wayve-9739']
['Travis VanderZanden']
['Robert J. Scaringe']
['Jack Cheng', 'Lihong Qin', 'William Li']
['Elon Musk', 'JB Straubel', 'Marc Tarpenning', 'Martin Eberhard']
['Mike Radenbaugh', 'Tyler Collins']
['Eduard Castañeda Mañé', 'Enric Asunción']
['Arun Vinayak', 'Swapnil Jain', 'Tarun Mehta']
[]
['Andrew Wolstan', 'Richard Kim', 'Stefan Krause', 'Ulrich Kranz']

因此,要当前转到每个href,只需使用.get_attribute('href')收集href,然后对它们执行driver.get(href)。这可以防止从页面移动到遍历元素的StaleElementExceptions。这是通过检查每家公司是否有创始人部分,如果他们有多个,然后转到每个页面。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69995792

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档