Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 皮雅典娜“;s3“登台导演”;文件-如何获取此文件名以使用它?_Python_Sql_Amazon S3_Amazon Athena_Pyathena - Fatal编程技术网

Python 皮雅典娜“;s3“登台导演”;文件-如何获取此文件名以使用它?

Python 皮雅典娜“;s3“登台导演”;文件-如何获取此文件名以使用它?,python,sql,amazon-s3,amazon-athena,pyathena,Python,Sql,Amazon S3,Amazon Athena,Pyathena,我正在使用Pyathena运行基本查询: from pyathena import connect as pyathena_connect #to distinguish from other connect methods import pandas as pd class AthenaDataConnection(): def __init__(self, S3_STAGING_DIR, SEP=';', REGION='us-east-1', ACCESS_KEY=None, S

我正在使用Pyathena运行基本查询:

from pyathena import connect as pyathena_connect #to distinguish from other connect methods
import pandas as pd

class AthenaDataConnection():
    def __init__(self, S3_STAGING_DIR, SEP=';', REGION='us-east-1', ACCESS_KEY=None, S_KEY=None):
        self.S3_STAGING_DIR = S3_STAGING_DIR
        self.REGION = REGION
        self.SEP = SEP
        
        if ACCESS_KEY and S_KEY:
            self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION,
                                                aws_access_key_id=ACCESS_KEY, aws_secret_access_key=S_KEY)
        else:
            self.athena_conn = pyathena_connect(s3_staging_dir=self.S3_STAGING_DIR, region_name=self.REGION)

    def get_athena_data(self, sql_dict):
        print(f"Athena connection established; starting to query data using pd-sql integration")
        sql_results = {}
        for filename, sql in sql_dict.items():
            try:
                load_data = pd.read_sql(sql,self.athena_conn)
                print(f"{filename} data fetched from Athena but not saved (returned in dict only).")
                sql_results[filename] = load_data
            except:
                print(f"Reading {filename} failed")
        return sql_results

athena = AthenaDataConnection('s3://athena-staging/',ACCESS_KEY=ACCESS_KEY, S_KEY=S_KEY)
sql_dict = {'foobar':"select * from foo.bar where foo='bar'"}
df_dict = athena.get_athena_data(sql_dict)
df = df_dict.get('foobar')
#assume this is the end of the script; i.e., I did NOT save the query results myself
因此,当执行查询时,一个文件会出现在暂存文件夹中,例如:

s3://athena-staging/abc123_45678_91011.csv

我希望我的代码能够捕获该文件名并将其保存以用于其他目的。但是怎么做呢?我在皮雅典娜的文件里找不到任何东西更新-我刚刚了解到文件名是查询ID+.csv!因此,我现在正在寻找一种获取雅典娜查询ID的方法。

好的,一旦我知道文件名不是随机的,而是雅典娜的查询ID,我就能够进行更好的搜索并找到解决方案。使用上面我已经创建的对象:

cursor = athena.athena_conn.cursor()
cursor.execute(sql)
cursor.query_id
返回查询id,该id是结尾不带.csv的文件名。现在我可以根据需要获取文件