Python 在while循环中附加到数据帧_Python_Pandas

Python 在while循环中附加到数据帧

python pandas

Python 在while循环中附加到数据帧,python,pandas,Python,Pandas,因此，我在尝试对数据帧进行排序时遇到了一些问题。我的代码一次只能获取1000行数据，然后它发送一个延续URL，我的脚本在while循环中遵循该URL，但问题是，每次传递时，我都要将其写入并附加到csv。它工作得很好，但现在我需要对整个数据帧进行排序，这是一个问题如何在每次传递时写入数据帧，然后将数据帧写入csv。我会在每个循环中附加数据帧，还是让它在每次循环中生成新的数据帧，然后在结束时组合它们？我不知道如何做到这一点，我几乎没有得到这样的工作，所以一些建议将不胜感激 import reque

因此，我在尝试对数据帧进行排序时遇到了一些问题。我的代码一次只能获取1000行数据，然后它发送一个延续URL，我的脚本在while循环中遵循该URL，但问题是，每次传递时，我都要将其写入并附加到csv。它工作得很好，但现在我需要对整个数据帧进行排序，这是一个问题

如何在每次传递时写入数据帧，然后将数据帧写入csv。我会在每个循环中附加数据帧，还是让它在每次循环中生成新的数据帧，然后在结束时组合它们？我不知道如何做到这一点，我几乎没有得到这样的工作，所以一些建议将不胜感激

import requests
import json
import pandas as pd
import time
import os
from  itertools import product

#what I need to loop through
instrument = ('btc-usd')
exchange = ('cbse')  
interval = ('1m','3m')  
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'


for (interval) in product(interval):
    page_size = '1000'
    url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
    #params = {'interval': interval, 'page_size': page_size, 'start_time': start_time, 'end_time': end_time }
    params = {'interval': interval, 'page_size': page_size }
    KEY = 'xxx'
    headers = {
        "X-Api-Key": KEY,
        "Accept": "application/json",
        "Accept-Encoding": "gzip"
    }

    csv_file = f"{exchange}-{instrument}-{interval}.csv"
    c_token = True

    while(c_token):
        res = requests.get(url, params=params, headers=headers)
        j_data = res.json()
        parse_data = j_data['data']
        c_token = j_data.get('continuation_token')
        today = time.strftime("%Y-%m-%d")
        params = {'continuation_token': c_token}

        if c_token:   
            url = f'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/cbse/spot/btc-usd/aggregations/count_ohlcv_vwap?continuation_token={c_token}'        

        # create dataframe
        df = pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns')
        df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))          
        df['range'] = df['high'].astype(float) - df['low'].astype(float)
        df.range = df.range.astype(float)

        #sort
        df = df.sort_values(by='range')
        
        #that means file already exists need to append
        if(csv_file in os.listdir()): 
            csv_string = df.to_csv(index=False, encoding='utf-8', header=False)
            with open(csv_file, 'a') as f:
                f.write(csv_string)
        #that means writing file for the first time        
        else: 
            csv_string = df.to_csv(index=False, encoding='utf-8')
            with open(csv_file, 'w') as f:
                f.write(csv_string)

也许最干净、最有效的方法是创建一个空数据帧，然后附加到它

import requests
import json
import pandas as pd
import time
import os
from  itertools import product

#what I need to loop through
instruments = ('btc-usd',)
exchanges = ('cbse',)
intervals = ('1m','3m')  
start_time = '2021-01-14T00:00:00Z'
end_time = '2021-01-16T23:59:59Z'
params = {'page_size': 1000}
KEY = 'xxx'
    
headers = {
        "X-Api-Key": KEY,
        "Accept": "application/json",
        "Accept-Encoding": "gzip"
    }

for instrument, exchange, interval  in product(instruments, exchanges, intervals):
    params['interval'] = interval
    url = 'https://us.market-api.kaiko.io/v2/data/trades.v1/exchanges/{exchange}/spot/{instrument}/aggregations/count_ohlcv_vwap'
    csv_file = f"{exchange}-{instrument}-{interval}.csv"
    df = pd.DataFrame()   # start with empty dataframe

    while True:
        res = requests.get(url, params=params, headers=headers)
        j_data = res.json()
        parse_data = j_data['data']
        df = df.append(pd.DataFrame.from_dict(pd.json_normalize(parse_data), orient='columns'))  # append to the dataframe
        if 'continuation_token' in j_data:
            params['continuation_token'] = j_data['continuation_token']
        else:
            break
        
    # These parts can be done outside of the while loop, once all the data has been compiled
    df.insert(1, 'time', pd.to_datetime(df.timestamp.astype(int),unit='ms'))          
    df['range'] = df['high'].astype(float) - df['low'].astype(float)
    df.range = df.range.astype(float)
    df = df.sort_values(by='range')
    df.to_csv(csv_file, index=False, encoding='utf-8')  # write the whole CSV at once

如果组合数据帧的大小对于内存来说太大，那么您可以一次读取一个页面并将其附加到CSV，前提是每个页面上的列标题相同。（您可能仍然需要注意熊猫每次都以相同的顺序写入列。）

您可以使用df.loc和len并添加值列表

    win_results_df=pd.DataFrame(columns=['GameId','Team','TeamOpponent',\
    'HomeScore', 'VisitorScore','Target'])

   df_length = len(win_results_df)
   win_results_df.loc[df_length] = [teamOpponent['gameId'], \
   key, teamOpponent['visitorDisplayName'], \
   teamOpponent['HomeScore'], teamOpponent['VisitorScore'],True]

谢谢你的回复，我试着运行了这个，但是我得到了这个错误：回溯（最近一次调用最后一次）：文件“kaiko df.py”，第40行，在df.insert（1，'time'，pd.to_datetime（df.timestamp.astype（int，unit='ms'）文件/home/robothead/scripts/python/venvs/kaiko/lib/python3.6/site packages/pands/core/generic.py），第5141行，在getattr返回对象中。\uuuuGetAttribute\uuuuuuuu（self，name）AttributeError:“DataFrame”对象没有属性“timestamp”，我以前在尝试使连续url正常工作时遇到过这个问题，不知道为什么现在就这么做。只有当您的数据有一列标记为

timestamp

时，这才起作用。您必须查看原始数据才能了解可能出现的问题。试着在没有那一行的情况下运行，查看结果数据帧的形状是否正确。有一列表示时间戳、高和低。原来的代码工作了，它生成了一个完整的csv，所以我确定所有这些列都存在。我得到了它，它是for语句，因为exchange和instrument在列表中只有一个项，它弄糟了。我在product（interval）中将其改回for（interval）：现在您的代码可以工作了！谢谢你，这让我看到了很多，csv写的很好，但我认为这会更好。好的！但是产品（间隔）中的（间隔）没有多大意义！可能是间隔中的间隔

？