首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >存储使用Beatifulsoup4解析的数据

存储使用Beatifulsoup4解析的数据
EN

Stack Overflow用户
提问于 2017-03-22 13:23:52
回答 1查看 22关注 0票数 0

我试图提取一些有趣的信息到each表格中,其中包含标题名称、公式、EXACTMASS、MOLWEIGHT、CAS,但当我运行我的循环时,它会将每个字母/数字或字节(不确定是否正确)添加到一个单元格中。我希望它存储它在打印中显示的全部信息,并将其作为字符串存储在每个化合物的每个框中。当下一个链接的循环再次开始时,我希望它在新的行中开始。我不确定我错在哪里。

代码语言:javascript
复制
import urllib
import urllib.request
from bs4 import BeautifulSoup
import os
import csv

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    soupdata = BeautifulSoup(thepage, "html.parser")
    return soupdata


compoundlist = []
soup = make_soup("http://www.genome.jp/dbget-bin/www_bget?ko00020")
i = 1
file = open("Compoundlist.csv", "w")
for record in soup.findAll("nobr"):
    compound = ''
    if (record.text[0] == "C" and record.text[1] == '0') or (record.text[0] == "C" and record.text[1] == '1'):
        compoundlist ="http://www.genome.jp/dbget-bin/www_bget?cpd:" + record.text[:6] + '\n'
        file.write(compoundlist)
        # print(compoundlist)

file.close()
compoundinfo = []
linklist =open('Compoundlist.csv')

#
# def CASnumber(soup):
#     for tag in soup.findAll("div", {"style":"margin-left:3em"}):
#         tag = tag.text
#     return tag


for items in linklist:
    soupcomp = make_soup(items)
    for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
            for NAMES in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0]:
                NAMES = NAMES.text
    print(NAMES)
    for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
            for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:3]:
                FORMULA = INFO.text
    print(FORMULA)
    for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
            for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:4]:
                EXACTMASS = INFO.text
    print(EXACTMASS)
    for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
            for INFO in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"})[0:5]:
                MOLWEIGHT = INFO.text
    print(MOLWEIGHT)
    for data in soupcomp.findAll("div", {"style":"width:555px;overflow-x:auto;overflow-y:hidden"}):
            for CAS in soupcomp.findAll("div", {"style":"margin-left:3em"}):
                CAS = CAS.text
    print(CAS)
    with open("Compoundinfo.csv", 'a') as csv_file:
            writer = csv.writer(csv_file)
            writer.writerows([NAMES,FORMULA,EXACTMASS,MOLWEIGHT,CAS])
EN

回答 1

Stack Overflow用户

发布于 2017-03-22 16:51:53

两件事:

1)将with open("Compoundinfo.csv", 'a') as csv_file:放在for items in linklist:之前-没有必要在每次循环时重新打开文件;

2)对于您的情况,正确的方法是writer.writerow (您有writerows)。

writerow采用一维数据,writerows采用二维数据作为参数。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42943393

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档