文章/答案/技术大牛

发布

社区首页 >问答首页 >Python:当所有表行都有相同的类时，如何使用selenium提取特定的文本行

问Python:当所有表行都有相同的类时，如何使用selenium提取特定的文本行
EN

Stack Overflow用户

提问于 2019-08-20 21:52:38

回答 1查看 724关注 0票数 0

只有一个简单的问题，我想用python和selenium抓取this页面上的数据。

下面的脚本：

from selenium import webdriver
import os
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import pandas as pd
import time
import sys


options = Options()
options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_argument("--headless")
driver = webdriver.Chrome(options=options,executable_path='/mnt/c/Users/kela/Desktop/selenium/chromedriver.exe')
for i in range(4):
        driver.get('http://isyslab.info/NeuroPep/search_info?pepNum=NP0000' + str(i))
#       element = driver.find_element_by_css_selector('[id=pmid]')
#       pmid = element.text

        element2 = driver.find_element_by_css_selector('[id=content]')
        print(element2.text)
        print('**')

将打印输出(此处简称为)：

NPID NP00003
Name C-terminal peptide (By similarity)
Organism Mus musculus
NCBI Taxa ID 10090
Tissue Specificity
Family 7B2
UniProt ID 7B2_MOUSE
Length 13
Modification NA
Gene Ontology
GO ID GO Term Definition
Evidence
GO:0005576 Cellular Component extracellular region IEA
GO:0030141 Cellular Component secretory granule ISS
GO:0004857 Molecular Function enzyme inhibitor activity IDA
GO:0051082 Molecular Function unfolded protein binding ISS
GO:0006886 Biological Process intracellular protein transport IDA
GO:0043086 Biological Process negative regulation of catalytic activity IDA
GO:0007218 Biological Process neuropeptide signaling pathway IEA
GO:0016486 Biological Process peptide hormone processing IDA
GO:0046883 Biological Process regulation of hormone secretion IDA
Sequence SVPHFSEEEK[10]EAE
Properties View
Structure NA
Reference NA
**

我不想刮掉某些行；具体地说，我不想要(1)组织特异性、(2)家族、(3)基因本体、(4)属性或(5)结构、(6)长度

或者换一种说法，我只想要(1) NPID，(2)名称，(3)有机体，(4) NCBI分类单元ID，(5) UniProt，(6)修改，(7)参考。

我想要抓取的页面的源HTML在这里：

所以你可以看到，没有特定的标签(例如ID=XXX)可以用来分隔我想要的行和我不想要的行；它们都有相同的标题类，等等。

有没有人可以给我举个例子，告诉我如何在找到我想要的特定行的基础上，从表中提取特定行(例如，如何从表中提取“NP0003”)(然后我可以对其余行执行相同的操作？)

编辑1:根据下面的评论，添加要提取的示例行的屏幕截图：

selenium

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-08-20 22:24:56

在这种情况下，使用xpath查找元素提供了更大的灵活性。尝试此解决方案，

for i in range(4):
    print(i+1)
    driver.get('http://isyslab.info/NeuroPep/search_info?pepNum=NP0000' + str(i+1))
    time.sleep(3)
    NPID = driver.find_element_by_xpath("//tbody/tr/td[contains(.,'NPID')]/following::td[1]")
    print(NPID.text)
    print('**')

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57575255

复制

相似问题

问Python:当所有表行都有相同的类时，如何使用selenium提取特定的文本行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:当所有表行都有相同的类时，如何使用selenium提取特定的文本行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python:当所有表行都有相同的类时，如何使用selenium提取特定的文本行
EN