Python：下载基于s3文件的名称和日期_Python_Csv_Amazon S3_Boto3_Python 2.6

Python：下载基于s3文件的名称和日期

python csv amazon-s3

Python：下载基于s3文件的名称和日期,python,csv,amazon-s3,boto3,python-2.6,Python,Csv,Amazon S3,Boto3,Python 2.6,我试图根据文件名的id和日期从s3中提取文件：命名约定：命名约定如下： ** ID\u NAME\u DATE.csv：文件名遵循相同的模式示例：9919USEN_文件_20180216.csv 示例：9919GBEN_文件_20180211.csv ** 代码：错误：它跳过cFiles.lower:中的fileId并关闭脚本目标：从S3中提取文件并将其下载到tmp_路径，以便根据需要使用。拉文件时，我希望脚本根据ID和日期选择文件。例如：规则：伪：如果S3有9919USEN

我试图根据文件名的id和日期从s3中提取文件：

命名约定：

命名约定如下：

** ID\u NAME\u DATE.csv：文件名遵循相同的模式

示例：9919USEN_文件_20180216.csv

示例：9919GBEN_文件_20180211.csv

代码：

错误：

它跳过cFiles.lower:中的fileId并关闭脚本

目标：

从S3中提取文件并将其下载到tmp_路径，以便根据需要使用。拉文件时，我希望脚本根据ID和日期选择文件。例如：

规则：伪：

如果S3有9919USEN_文件_20180216.csv和9919USEN_文件_20180217.csv，则选择9919USEN_文件_20180217.csv下载。另外，如果S3中的991USEN_File_2018.csv，则不要选择文件，因为它与规则不匹配，fileidreg='[0-9]{4}[a-zA-Z]{4}'和dateIdReg='[0-9]{8}'

规则：视觉：

9919USEN_文件_20180217.csv>9919USEN_文件_20180216.csv[截止日期] 9919USEN_File_20180217.csv>991USEN_File_2018.csv[由于不正确的ID和日期]

解决方案

问题在于它的结构。我已经重新组织并将其放在一个try，exception条件循环中。我还使用了FileIDPrefix.search而不是FileIDPrefix.match，因为它只是专门查看索引，不适合处理手头的问题

最终解决方案

import boto3
import re

#connect to s3
client = boto3.resource(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

#used for downloading                      
s3 = boto3.client(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

def downloadFiletest():
  date = '[0-9]{8}'  # fileDate regex
  dateSuffix = re.compile(dates)  # regex used to check the date of the file
  reg = '[0-9]{4}[a-zA-Z]{4}'  # filename regex
  fileIDPrefix = re.compile(reg)  # check fileID of the Filename.

  folder = u"/folder/example/"  # directory
  bucket = client.Bucket(bucketname)  # bucket

  try:
      for cuList in bucket.objects.filter(Prefix=folder):  # filter to the folder

          filenames= cList.key  # directory of the files that we would like to use
          print(cu)

          # specific locations of site fileID of the file and date of the file
          fileID = filenames[33:41]
          fileDate = filenames[51:59]

          # check the length of each values to be verified later.
          lenf = len(fileID)
          lenG = len(fileDate)
          old_file = cList.key
          dot_index = old_file.find(u'.')
          file_ext = old_file[dot_index:]

          # this check that the files in directory match our specified rules. if does it proceeds.
          if fileIDPrefix.search(cu) and fileDateSuffix.search(cu):
              filename = fileID + u'_file_' + fileDate + file_ext
              tmp_path = "/tmp/mpcmt/" + filename
              file_path = folder + filename
              s3.download_file(bucketname, file_path, tmp_path)


              return filename, tmp_path, fileID, fileDate

              # this check the number of values/char in a directory to see it matches up to what is expected.

          if dot_index > 59 or dot_index < 59:
                  print('File has wrong fileID or Wrong Date')
          if lenG > 8 or lenG < 8:
                  print('File has wrong fileDate Format')
          if lenf > 8 or lenf < 8:
                  print('File has wrong fileID')

  except Exception as e:  # this closes and displays an error if the file doesn't exist.
      print("ALERT", "No file in {0}/{1}".format(bucket, folder))
      # There was some issue / error / problem and that is why the program is exiting.
      print >> sys.stderr, "No file in {0}/{1}".format(bucket, folder)
      print >> sys.stderr, "Exception: %s" % str(e)
      sys.exit(1)


downloadFiletest()

我看不出cFiles变量是在哪里创建的。@Jundiaius现在更新了，如果cFiles.lower是字符串或unicode，cFiles.lower中fileId的行：将循环该字符串的字母，这将不会产生预期的结果。如何解决此问题…..改为使用cFiles中的fileId？您写入日期=。。。下一行是日期。是否应该相同？bucketname在哪里定义？cList在哪里定义？应该是烹饪师吗？printcu中的cu？它是在哪里定义的？fileDateSuffix应该是dateSuffix吗？

import boto3
import re

#connect to s3
client = boto3.resource(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

#used for downloading                      
s3 = boto3.client(u's3', aws_access_key_id=u'KEY',
                   aws_secret_access_key=u'TOKEN')

def downloadFiletest():
  date = '[0-9]{8}'  # fileDate regex
  dateSuffix = re.compile(dates)  # regex used to check the date of the file
  reg = '[0-9]{4}[a-zA-Z]{4}'  # filename regex
  fileIDPrefix = re.compile(reg)  # check fileID of the Filename.

  folder = u"/folder/example/"  # directory
  bucket = client.Bucket(bucketname)  # bucket

  try:
      for cuList in bucket.objects.filter(Prefix=folder):  # filter to the folder

          filenames= cList.key  # directory of the files that we would like to use
          print(cu)

          # specific locations of site fileID of the file and date of the file
          fileID = filenames[33:41]
          fileDate = filenames[51:59]

          # check the length of each values to be verified later.
          lenf = len(fileID)
          lenG = len(fileDate)
          old_file = cList.key
          dot_index = old_file.find(u'.')
          file_ext = old_file[dot_index:]

          # this check that the files in directory match our specified rules. if does it proceeds.
          if fileIDPrefix.search(cu) and fileDateSuffix.search(cu):
              filename = fileID + u'_file_' + fileDate + file_ext
              tmp_path = "/tmp/mpcmt/" + filename
              file_path = folder + filename
              s3.download_file(bucketname, file_path, tmp_path)


              return filename, tmp_path, fileID, fileDate

              # this check the number of values/char in a directory to see it matches up to what is expected.

          if dot_index > 59 or dot_index < 59:
                  print('File has wrong fileID or Wrong Date')
          if lenG > 8 or lenG < 8:
                  print('File has wrong fileDate Format')
          if lenf > 8 or lenf < 8:
                  print('File has wrong fileID')

  except Exception as e:  # this closes and displays an error if the file doesn't exist.
      print("ALERT", "No file in {0}/{1}".format(bucket, folder))
      # There was some issue / error / problem and that is why the program is exiting.
      print >> sys.stderr, "No file in {0}/{1}".format(bucket, folder)
      print >> sys.stderr, "Exception: %s" % str(e)
      sys.exit(1)


downloadFiletest()