我是Python的新手,不熟悉循环结构。如果我想迭代我感兴趣的基因的urls列表并提取特定的列(例如,来自网站的基因名称、完整基因名称及其生物类型)作为一行
gene1 full_name1 biotype1并为下一个基因添加新的行,如下所示
gene1 full_name1 biotype1
gene2 full_name2 biotype2
gene3 full_name3 biotype3
...我不知道该怎么做。也许我应该使用多个循环?
下面是我的代码:
gene_list = [gene1, gene2, gene3, ...]
i = 0
while (i in len(gene_list):
url = 'https://www.xxxxxxxx?gene=' + str(gene_list[i])
driver.get(url)
gene = driver.find_element_by_css_selector('em:nth-of-type(1)').text
full = driver.find_element_by_css_selector('h2:nth-of-type(1)').text
biotype = driver.find_element_by_css_selector('span.gc-category').text
i = i + 1有人能帮我吗?谢谢。
发布于 2020-08-17 00:17:35
你会想要使用一个for循环。
首先使用列表理解创建url列表。
gene_list = [gene1, gene2, gene3, ...]
url_list = ['https://www.xxxxxxxx?gene={}'.format(i) for i in gene_list]接下来,初始化每个列的空列表。
genes = []
full_names = []
biotypes = []最后,对于url列表中的每个url,您将提取信息并将其附加到各自的列表中。
for url in url_list:
driver.get(url)
genes.append(driver.find_element_by_css_selector('em:nth-of-type(1)').text)
full_names.append(driver.find_element_by_css_selector('h2:nth-of-type(1)').text)
biotypes.append(driver.find_element_by_css_selector('span.gc-category').text)如果你想从那里得到花哨的东西,你可以把它放进熊猫的DataFrame中:
import pandas as pd
df = pd.DataFrame([gene_list, genes, full_names, biotypes])一旦它在pandas DataFrame中,您将能够更轻松、更漂亮地处理和可视化数据。
发布于 2020-08-17 00:20:59
尝尝这个
gene_list = [gene1, gene2, gene3, ...]
i = 0
gene_info = []
while i < len(gene_list):
url = 'https://www.xxxxxxxx?gene=' + str(gene_list[i])
driver.get(url)
gene = driver.find_element_by_css_selector('em:nth-of-type(1)').text
full = driver.find_element_by_css_selector('h2:nth-of-type(1)').text
biotype = driver.find_element_by_css_selector('span.gc-category').text
gene_info.append([gene, full, biotype])
i = i + 1
print(gene_info)https://stackoverflow.com/questions/63439090
复制相似问题