文章/答案/技术大牛

发布

社区首页 >问答首页 >Python简单多线程下载程序文件损坏

问Python简单多线程下载程序文件损坏
EN

Stack Overflow用户

提问于 2015-09-14 04:26:30

回答 1查看 393关注 0票数 0

这是我的第一篇帖子。我做python编程已经有一段时间了，最近我正在开发一个多线程下载程序。但问题是我的文件(jpg是我的目标)被破坏了。还使用以下输入：web/headerAUMLogo.jpg

它显示出错误

同时输入：1600-1200.jpg

文件被破坏了。

这是密码：-

import os, sys, requests
import threading
import urllib2
import time

URL = sys.argv[1]

def buildRange(value, numsplits):
    lst = []
    for i in range(numsplits):
    if i == 0:
        lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) +   value/(numsplits*1.0)-1, 0))))
    else:
        lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
return lst

def main(url=None, splitBy=5):
    start_time = time.time()
    if not url:
        print "Please Enter some url to begin download."
        return

fileName = "image.jpg"
sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
print "%s bytes to download." % sizeInBytes
if not sizeInBytes:
    print "Size cannot be determined."
    return

dataDict = {}

# split total num bytes into ranges
ranges = buildRange(int(sizeInBytes), splitBy)

def downloadChunk(idx, irange):
    req = urllib2.Request(url)
    req.headers['Range'] = 'bytes={}'.format(irange)
    dataDict[idx] = urllib2.urlopen(req).read()

# create one downloading thread per chunk
downloaders = [
    threading.Thread(
        target=downloadChunk,
        args=(idx, irange),
    )
    for idx,irange in enumerate(ranges)
    ]

# start threads, let run in parallel, wait for all to finish
for th in downloaders:
    th.start()
for th in downloaders:
    th.join()



print 'done: got {} chunks, total {} bytes'.format(
    len(dataDict), sum( (
        len(chunk) for chunk in dataDict.values()
    ) )
)

print "--- %s seconds ---" % str(time.time() - start_time)

if os.path.exists(fileName):
    os.remove(fileName)



# reassemble file in correct order
with open(fileName, 'w') as fh:

    for _idx,chunk in sorted(dataDict.iteritems()):
        fh.write(chunk)

print "Finished Writing file %s" % fileName
print 'file size {} bytes'.format(os.path.getsize(fileName))

if __name__ == '__main__':
    main(URL)

这里的缩进可能是错误的，所以这里是代码pastebin(dot)com/wGEkp878 878

如果有人能指出错误，我将非常感激。

编辑:由一个人推荐的

   def buildRange(value, numsplits):
     lst = []
    for i in range(numsplits):
        first = i if i == 0 else buildRange().start(i, value, numsplits)
        second = buildRange().end(i, value, numsplits)
        lst.append("{}-{}".format(first, second))
    return lst

有人能告诉我hoe保留下载的部件文件的名字，如part1，part2等。

python

multithreading

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-09-14 05:25:23

原来文件必须以二进制模式打开，用'wb‘代替'w’。如果用“w”打开，就会写一堆额外的字符。这与derpy窗口与linux新行语义有关。如果你使用“wb”，它会把你放进文件里的内容写下来。

编辑:如果要存储单个文件部件，可以更改

# reassemble file in correct order
with open(fileName, 'w') as fh:
    for _idx,chunk in sorted(dataDict.iteritems()):
        fh.write(chunk)

print "Finished Writing file %s" % fileName
print 'file size {} bytes'.format(os.path.getsize(fileName))

至

# reassemble file in correct order
for _idx,chunk in sorted(dataDict.iteritems()):
    with open(fileName + str(".part-") + str(_idx), 'wb') as fh:
        fh.write(chunk)

print "Finished Writing file %s" % fileName
#print 'file size {} bytes'.format(os.path.getsize(fileName))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32557305

复制

相似问题

问Python简单多线程下载程序文件损坏
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python简单多线程下载程序文件损坏EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python简单多线程下载程序文件损坏
EN