是否可以使用Python循环AmazonS3存储桶并计算其文件/密钥中的行数？_Python_Amazon Web Services_Amazon S3_Boto

是否可以使用Python循环AmazonS3存储桶并计算其文件/密钥中的行数？

python amazon-web-services amazon-s3

是否可以使用Python循环AmazonS3存储桶并计算其文件/密钥中的行数？,python,amazon-web-services,amazon-s3,boto,Python,Amazon Web Services,Amazon S3,Boto,是否可以使用Python循环浏览AmazonS3存储桶中的文件/键，读取内容并计算行数例如： 1. My bucket: "my-bucket-name" 2. File/Key : "test.txt" 我需要遍历文件“test.txt”，并计算原始文件中的行数示例代码： for bucket in conn.get_all_buckets(): if bucket.name == "my-bucket-name": for file in bucket

是否可以使用Python循环浏览AmazonS3存储桶中的文件/键，读取内容并计算行数

例如：

  1. My bucket: "my-bucket-name"
  2. File/Key : "test.txt"

我需要遍历文件“test.txt”，并计算原始文件中的行数

示例代码：

for bucket in conn.get_all_buckets():
    if bucket.name == "my-bucket-name":
        for file in bucket.list():
            #need to count the number lines in each file and print to a log.

AmazonS3只是一种存储服务。您必须获取文件才能对其执行操作（例如读取文件数）。

您可以使用循环遍历存储桶。由于list_objects_v2最多只列出1000个键（即使您指定了MaxKeys），您必须确定响应字典中是否存在

NextContinuationToken

，然后指定

ContinuationToken

以阅读下一页

我在一些答案中编写了示例代码，但我记不起来了

然后使用get_object（）读取文件，并使用

（更新）

如果您需要在特定前缀名称中输入密钥，请添加前缀筛选器

使用
bot3
可以执行以下操作：

import boto3

# create the s3 resource
s3 = boto3.resource('s3')

# get the file object
obj = s3.Object('bucket_name', 'key')

# read the file contents in memory
file_contents = obj.get()["Body"].read()

# print the occurrences of the new line character to get the number of lines
print file_contents.count('\n')

如果要对bucket中的所有对象执行此操作，可以使用以下代码段：

bucket = s3.Bucket('bucket_name')
for obj in bucket.objects.all():
    file_contents = obj.get()["Body"].read()
    print file_contents.count('\n')

以下是有关更多功能的boto3文档参考：

更新：（使用boto 2）

有时将大文件读入内存是不理想的。相反，您可能会发现以下更多的用途：

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucketname', Key=fileKey)


nlines = 0
for _ in obj['Body'].iter_lines(): nlines+=1

print (nlines)

嗨，谢谢，也许我能恰当地表达我的问题。我想遍历S3中的特定文件并计算其中的行数。@Renukadevi:请澄清“特定”的含义。你是说带前缀的文件吗？问题是，我没有使用Boto 3.0。我的Boto版本是2.38.0。因此，无法尝试s3.Object方法。另一个问题是我的文件都是.gz格式，当我尝试使用Key.open_read作为gzip.gzip文件的fd时，它会变得更糟。它作为AttributeError出现错误：“str”对象没有“tell”或“seek”属性，我想知道是否有解决方法。@Renukadevi，我更新了我的帖子，为boto2添加了一个示例。要解压缩gzip数据，您可能可以使用zlib库，请参阅这里的示例：希望这有帮助。非常感谢。这非常简单。我是AWS新手，您的解决方案帮助很大。

s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucketname', Key=fileKey)


nlines = 0
for _ in obj['Body'].iter_lines(): nlines+=1

print (nlines)