首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从EML文件中提取附件

从EML文件中提取附件
EN

Stack Overflow用户
提问于 2017-06-02 08:07:45
回答 1查看 3.4K关注 0票数 1

我目前使用这段代码从EML文件中提取附件。我想知道是否可以将附件链接到邮件(EML文件)。也就是说,添加eml文件名作为附件名称前缀。这样我才能知道附件属于什么邮件。谢谢

代码语言:javascript
复制
import os, re
import email
import argparse
import olefile

def extractAttachment(msg, eml_files, output_path):
    #print len(msg.get_payload())
    #print msg.get_payload()
    if len(msg.get_payload()) > 2:
        if isinstance(msg.get_payload(), str):
            try:
                extractOLEFormat(eml_files, output_path)
            except IOError:
                #print 'Could not process %s. Try manual extraction.' % (eml_files)
                #print '\tHeader of file: %s\n' % (msg.get_payload()[:8])
                pass

        elif isinstance(msg.get_payload(), list):
            count = 0
            while count < len(msg.get_payload()):
                payload = msg.get_payload()[count]
                #récupérer les pièces jointes 
                filename = payload.get_filename()
                #os.rename(filename,'rrrrr'+filename)
                #filename=os.path.join(str(filename), str(eml_files))
                if filename is not None:
                    try:
                        magic = payload.get_payload(decode=True)[:4]
                    except TypeError:
                        magic = "None"                    
                    # Print the magic deader and the filename for reference.
                    printIT(eml_files, magic, filename)
                    # Write the payload out.
                    writeFile(filename, payload, output_path)
                count += 1

    elif len(msg.get_payload()) == 2:
        payload = msg.get_payload()[1]
        filename = payload.get_filename()
        try:
            magic = payload.get_payload(decode=True)[:4]
        except TypeError:
            magic = "None"
        # Print the magic deader and the filename for reference.
        printIT(eml_files, magic, filename)
        # Write the payload out.
        writeFile(filename, payload, output_path)        

    elif len(msg.get_payload()) == 1:
        attachment = msg.get_payload()[0]
        payload = attachment.get_payload()[1]
        filename = attachment.get_payload()[1].get_filename()
        try:
            magic = payload.get_payload(decode=True)[:4]
        except TypeError:
            magic = "None"        
        # Print the magic deader and the filename for reference.
        printIT(eml_files, magic, filename)
        # Write the payload out.
        writeFile(filename, payload, output_path)
    #else:
    #    print 'Could not process %s\t%s' % (eml_files, len(msg.get_payload()))

def extractOLEFormat(eml_files, output_path):
    data = '__substg1.0_37010102'
    filename = olefile.OleFileIO(eml_files)
    msg = olefile.OleFileIO(eml_files)
    attachmentDirs = []
    for directories in msg.listdir():
        if directories[0].startswith('__attach') and directories[0] not in attachmentDirs:
            attachmentDirs.append(directories[0])

    for dir in attachmentDirs:
        filename = [dir, data]
        if isinstance(filename, list):
            filenames = "/".join(filename)
            filename = msg.openstream(dir + '/' + '__substg1.0_3707001F').read().replace('\000', '')


            payload = msg.openstream(filenames).read()
            magic = payload[:4]
            # Print the magic deader and the filename for reference.
            printIT(eml_files, magic, filename)
            # Write the payload out.
            writeOLE(filename, payload, output_path)
#filename = str(eml_files)+"--"+str(filename)
def printIT(eml_files, magic, filename):
    filename = str(eml_files)+"--"+str(filename)
    print ('Email Name: %s\n\tMagic: %s\n\tSaved File as: %s\n' % (eml_files, magic, filename))

def writeFile(filename, payload, output_path):

    filename = str(eml_files)+"--"+str(filename)
    try:
        file_location = output_path + filename
        open(os.path.join(file_location), 'wb').write(payload.get_payload(decode=True))
    except (TypeError, IOError):
        pass

def writeOLE(filename, payload, output_path):
    open(os.path.join(output_path + filename), 'wb')
def main():
    parser = argparse.ArgumentParser(description='Attempt to parse the attachment from EML messages.')
    parser.add_argument('-p', '--path',default='C:\\Users\\hamd\\Desktop\\TEX\\emails' ,help='eml')#Path to EML files
    parser.add_argument('-o', '--out', default='C:\\Users\\hamd\\Desktop\\TEX\\PJ\\eml_files\\',help='pj')#Path to write attachments to.
    args = parser.parse_args()    

    if args.path:
        input_path = args.path
    else:
        print ("You need to specify a path to your EML files.")
        exit(0)

    if args.out:
        output_path = args.out
    else:
        print ("You need to specify a path to write your attachments to.")
        exit(0)

    for root, subdirs, files in os.walk(input_path):
        for file_names in files:
            eml_files = os.path.join(root, file_names)
            msg = email.message_from_file(open(eml_files))
            extractAttachment(msg, eml_files, output_path)

if __name__ == "__main__":
    main()
EN

回答 1

Stack Overflow用户

发布于 2021-05-15 15:17:47

我试着写这篇评论,但太长了。我不会给出一个完整的解决方案,但我会解释这个想法。

一个可能的解决方案是创建一个到提取的附件的硬链接,给硬链接赋予相同的EML文件名。如果在同一个EML文件中有更多附件,则可以追加增量后缀:

代码语言:javascript
复制
whatever.eml    (original email file)
whatever_001.attch    (hard link to first extracted attachment)
whatever_002.attch    (hard link to second extracted attachment)
...

这条路:

  • 您可以自由地将提取的附件移动到其他任何地方(但在同一个磁盘中,因为硬盘链接根据定义只能在同一个磁盘上工作)。
  • 您可以在不占用磁盘空间的情况下将附件(硬链接)的副本与EML文件一起保存。
  • 如果提取的文件被删除,您将有附件的备份副本(硬链接),而不会占用磁盘空间。

在Python中,您可以创建一个硬链接,只需:

代码语言:javascript
复制
import os
os.link(existing_target_file, new_link_name)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44323859

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档