首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >具有目录结构的GCS - Python下载blob

具有目录结构的GCS - Python下载blob
EN

Stack Overflow用户
提问于 2018-01-26 02:07:37
回答 2查看 2.9K关注 0票数 4

我使用GCS python SDK和google API客户端的组合来遍历支持版本的存储桶,并根据元数据下载特定的对象。

代码语言:javascript
复制
from google.cloud import storage
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

def downloadepoch_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()

    for item in response['items']:
        if item['metadata']['epoch'] == restore_epoch:
            print(item['bucket'])
            print(item['name'])
            print(item['metadata']['epoch'])
            print(item['updated'])
            blob = source_bucket.blob(item['name'])
            blob.download_to_filename(
                '/Users/admin/git/data-processing/{}'.format(item))


downloadepoch_objects()

对于不在目录(gs://bucketname/ test1.txt )中的blob,上述函数可以正常工作,因为传入的项只是test1.txt。我遇到的问题是在尝试从复杂的目录树(gs://bucketname/ nfs/media/docs/test1.txt )下载文件时,传递的项是nfs/media/docs/test1.txt。如果目录不存在,是否可以使用.download_to_file()方法来创建这些目录?

EN

回答 2

Stack Overflow用户

发布于 2018-02-22 08:58:45

下面是可行的解决方案。我最终从对象名中去掉了路径,并动态地创建了目录结构。一个更好的方法可能是@Brandon Yarbrough建议使用‘前缀+响应’前缀‘’,但我不太明白。希望这对其他人有所帮助。

代码语言:javascript
复制
#!/usr/local/bin/python3

from google.cloud import storage
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import json
import os
import pathlib

bucket_name = 'test-bucket'
restore_epoch = '1519189202'
restore_location = '/Users/admin/data/'

credentials = GoogleCredentials.get_application_default()
service = discovery.build('storage', 'v1', credentials=credentials)

storage_client = storage.Client()
source_bucket = storage_client.get_bucket(bucket_name)


def listall_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()
    print(json.dumps(response, indent=2))


def listname_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()

    for item in response['items']:
        print(item['name'] + ' Uploaded on: ' + item['updated'] +
              ' Epoch: ' + item['metadata']['epoch'])


def downloadepoch_objects():
    request = service.objects().list(
        bucket=bucket_name,
        versions=True
    )
    response = request.execute()

    try:
        for item in response['items']:
            if item['metadata']['epoch'] == restore_epoch:
                print('Downloading ' + item['name'] + ' from ' +
                      item['bucket'] + '; Epoch= ' + item['metadata']['epoch'])
                print('Saving to: ' + restore_location)
                blob = source_bucket.blob(item['name'])
                path = pathlib.Path(restore_location + r'{}'.format(item['name'])).parent
                if os.path.isdir(path):
                    blob.download_to_filename(restore_location + '{}'.format(item['name']))
                    print('Download complete')
                else:
                    os.mkdir(path)
                    blob.download_to_filename(restore_location + '{}'.format(item['name']))
                    print('Download complete')
    except Exception:
        pass


# listall_objects()
# listname_objects()
downloadepoch_objects()
票数 2
EN

Stack Overflow用户

发布于 2018-01-26 03:20:04

GCS没有“目录”的概念,尽管像gsutil这样的工具为了方便起见在伪装方面做得很好。如果您想要"nfs/media/docs/“路径下的所有对象,可以将其指定为前缀,如下所示:

代码语言:javascript
复制
request = service.objects.list(
    bucket=bucket_name,
    versions=True,
    prefix='nfs/media/docs/',  # Only show objects beginning like this
    delimiter='/'  # Consider this character a directory marker.
)
response = request.execute()
subdirectories = response['prefixes']
objects = response['items']

由于prefix参数的原因,在response['items']中只返回以'nfs/media/docs‘开头的对象。因为有了delimiter参数,所以在response['prefixes']中会返回“子目录”。您可以在Python documentation of the objects.list method中获得更多详细信息。

如果您要使用我为新代码推荐的较新的google-cloud Python library,同样的调用将看起来像pretty similar

代码语言:javascript
复制
from google.cloud import storage

client = storage.Client()
bucket = client.bucket(bucket_name)
iterator = bucket.list_blobs(
    versions=True,
    prefix='nfs/media/docs/',
    delimiter='/'
)
subdirectories = iterator.prefixes
objects = list(iterator)
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48449299

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档