首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >提取进出ipython / jupyter笔记本的泡菜的方法

提取进出ipython / jupyter笔记本的泡菜的方法
EN

Stack Overflow用户
提问于 2017-02-17 09:52:35
回答 1查看 637关注 0票数 1

我正在尝试总结一个数据分析项目,该项目运行在许多ipython / jupyter笔记本上,每个笔记本都相当长。对这个过程有帮助的一件事是,如果我至少知道总的“输入”泡菜进入和“输出”泡菜出去。

做这件事最干净/最快/最有效的方法是什么?

EN

回答 1

Stack Overflow用户

发布于 2017-02-17 09:52:35

我不确定这是不是最好的方法,但至少是一种方法……

代码语言:javascript
复制
def summerize_pickles(notebook_path):
    from IPython.nbformat import current as nbformat
    import re

    with open(notebook_path) as fh:
        nb = nbformat.reads_json(fh.read())

    list_of_input_pickles = []
    list_of_output_pickles = []

    for cell in nb["worksheets"][0]["cells"]:
        # This confirms there is at least one pickle in it.
        if cell["cell_type"] != "code" or cell["input"].find("pickle") == -1:   # Skipping over those cells which aren't code or those cells with code but which don't reference "pickle
            continue

        # In case there are multiple lines, it iterates line by line.
        for line in cell["input"].splitlines():
            if line.find("pickle") == -1:  # Skips over lines w/ no mention of "pickle" to potentially reduce the number of times it's searched.
                continue
            ############################    ############################    ############################    ############################
            code_type = str()
            if line.find("pickle.dump") != -1 or line.find(".to_pickle")!= -1:
                code_type = "output"       
            elif line.find("pickle.load") != -1 or line.find(".read_pickle")!= -1:
                code_type = "input"
            else:
                continue   # This tells the code to skip over lines like "import cpickle as pickle"
            ############################    ############################    ############################    ############################
            filename = re.findall(r'"(.*?)"', line)   # This gets all the content between the quotes. See: http://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks    
            ############################    ############################    ############################    ############################        
            if code_type == "input":
                list_of_input_pickles.append(filename[0])
            elif code_type == "output":
                list_of_output_pickles.append(filename[0])

    pickles_dict = {"input_pickles":list_of_input_pickles,
                    "output_pickles":list_of_output_pickles }

    return pickles_dict
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42288023

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档