Python 通过boto3同步两个存储桶
是否有任何方法可以使用boto3在两个不同的bucket(源和目标)中循环bucket内容,如果它在源中发现任何与目标不匹配的密钥,则将其上载到目标bucket。请注意,我不想使用aws s3同步。我目前正在使用以下代码来完成此工作:Python 通过boto3同步两个存储桶,python,amazon-web-services,amazon-s3,boto3,Python,Amazon Web Services,Amazon S3,Boto3,是否有任何方法可以使用boto3在两个不同的bucket(源和目标)中循环bucket内容,如果它在源中发现任何与目标不匹配的密钥,则将其上载到目标bucket。请注意,我不想使用aws s3同步。我目前正在使用以下代码来完成此工作: import boto3 s3 = boto3.resource('s3') src = s3.Bucket('sourcenabcap') dst = s3.Bucket('destinationnabcap') objs = list(dst.objects
import boto3
s3 = boto3.resource('s3')
src = s3.Bucket('sourcenabcap')
dst = s3.Bucket('destinationnabcap')
objs = list(dst.objects.all())
for k in src.objects.all():
if (k.key !=objs[0].key):
# copy the k.key to target
如果您只希望按键进行比较(忽略对象内的差异),可以使用以下方法:
s3 = boto3.resource('s3')
source_bucket = s3.Bucket('source')
destination_bucket = s3.Bucket('destination')
destination_keys = [object.key for object in destination_bucket.objects.all()]
for object in source_bucket.objects.all():
if (object.key not in destination_keys):
# copy object.key to destination
如果您决定不使用boto3。同步命令仍然不适用于boto3,因此您可以直接使用它
# python 3
import os
sync_command = f"aws s3 sync s3://source-bucket/ s3://destination-bucket/"
os.system(sync_command)
我刚刚实现了一个简单的类(将本地文件夹同步到bucket)。我把它贴在这里,希望它能帮助任何有同样问题的人 您可以修改S3Sync.sync以考虑文件大小
class S3Sync:
"""
Class that holds the operations needed for synchronize local dirs to a given bucket.
"""
def __init__(self):
self._s3 = boto3.client('s3')
def sync(self, source: str, dest: str) -> [str]:
"""
Sync source to dest, this means that all elements existing in
source that not exists in dest will be copied to dest.
No element will be deleted.
:param source: Source folder.
:param dest: Destination folder.
:return: None
"""
paths = self.list_source_objects(source_folder=source)
objects = self.list_bucket_objects(dest)
# Getting the keys and ordering to perform binary search
# each time we want to check if any paths is already there.
object_keys = [obj['Key'] for obj in objects]
object_keys.sort()
object_keys_length = len(object_keys)
for path in paths:
# Binary search.
index = bisect_left(object_keys, path)
if index == object_keys_length:
# If path not found in object_keys, it has to be sync-ed.
self._s3.upload_file(str(Path(source).joinpath(path)), Bucket=dest, Key=path)
def list_bucket_objects(self, bucket: str) -> [dict]:
"""
List all objects for the given bucket.
:param bucket: Bucket name.
:return: A [dict] containing the elements in the bucket.
Example of a single object.
{
'Key': 'example/example.txt',
'LastModified': datetime.datetime(2019, 7, 4, 13, 50, 34, 893000, tzinfo=tzutc()),
'ETag': '"b11564415be7f58435013b414a59ae5c"',
'Size': 115280,
'StorageClass': 'STANDARD',
'Owner': {
'DisplayName': 'webfile',
'ID': '75aa57f09aa0c8caeab4f8c24e99d10f8e7faeebf76c078efc7c6caea54ba06a'
}
}
"""
try:
contents = self._s3.list_objects(Bucket=bucket)['Contents']
except KeyError:
# No Contents Key, empty bucket.
return []
else:
return contents
@staticmethod
def list_source_objects(source_folder: str) -> [str]:
"""
:param source_folder: Root folder for resources you want to list.
:return: A [str] containing relative names of the files.
Example:
/tmp
- example
- file_1.txt
- some_folder
- file_2.txt
>>> sync.list_source_objects("/tmp/example")
['file_1.txt', 'some_folder/file_2.txt']
"""
path = Path(source_folder)
paths = []
for file_path in path.rglob("*"):
if file_path.is_dir():
continue
str_file_path = str(file_path)
str_file_path = str_file_path.replace(f'{str(path)}/', "")
paths.append(str_file_path)
return paths
if __name__ == '__main__':
sync = S3Sync()
sync.sync("/temp/some_folder", "some_bucket_name")
另外,将
if-file\u-path.is\u-dir():
替换为if-not-file\u-path.is\u-file():
可以让它绕过无法解析的链接和其他类似的废话,感谢@keithpjolley指出这一点
{
“版本”:“2012-10-17”,
“声明”:[
{
“Sid”:“DelegateS3Access”,
“效果”:“允许”,
“委托人”:{
“AWS”:“arn:AWS:iam::DEST\u ACCOUNT\u ID:root”
},
“行动”:[
“s3:ListBucket”,
“s3:GetObject”
],
“资源”:[
“arn:aws:s3:::s3复制测试/*”,
“arn:aws:s3::s3复制测试”
]
}
]
}
是的,这似乎很好,但由于目标中的对象位于文件夹(例如ABC)中,因此对象名称与源不同,因此我必须使用过滤器(Prefix='ABC/')。例如,源中的对象名为name1,而目标中的对象名为ABC/name,您有没有办法使它们具有可比性?您可以在最后一个斜杠之前去掉字符串。不再建议使用os.system()
。改用子流程
模块。请参阅,将if-file\u-path.is\u-dir():
替换为if-not-file\u-path.is\u-file():
可以绕过无法解析的链接和其他类似的废话。