Python 2.7 lt(queryReply) 结果。扩展(结果) currentRow+=len(queryReply['rows']) 除AccessTokenRefreshError外: 打印(“凭据已被吊销或过期,请重新运行” “重新授权的申请”) 除HttpError作为错误外: 打印“runSyncQuery中的错误:”,pprint.pprint(err.content) 除异常作为错误外: 打印“未定义错误”%err 返回结果; #主要 如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu': #文件名 FILE_NAME=“results.csv” #默认以前运行查询的天数 天数=“1” #Google开发者控制台中列出的BigQuery项目id。 项目ID='xxxxxx' #谷歌开发者控制台中列出的服务帐户电子邮件地址。 服务账户xxxxxx@developer.gserviceaccount.com' KEY=“/usr/local/xxxxxxxx” 查询=创建查询(天数) #鉴定 客户端=验证\服务\帐户(服务\帐户,密钥) #获取查询结果 结果=runSyncQuery(客户端,项目ID,查询,超时=0) #pdb.set_trace(); #将结果写入csv而不带标题 ordered_FieldName=OrderedDict([('f_split',None),('m_members',None),('f_day',None),('visitors',None),('Purchaers',None),('dmd_per_mem',None),('Purchaers_per visitor',None),('Purchaers_per purchaser',None),('dmd_per purchaser',None)]) writeToCsv(结果、文件名、有序字段名、False) #备份当前数据 backupfilename=“data_bk-”+time.strftime(“%y-%m-%d”)+“.csv” 调用(['cp','../data/data.csv',backupfilename]) #将新结果连接到数据 以open(“../data/data.csv”,“ab”)作为输出文件: 打开(“results.csv”、“rb”)作为填充: line=infle.read() 输出文件。写入(行)

Python 2.7 lt(queryReply) 结果。扩展(结果) currentRow+=len(queryReply['rows']) 除AccessTokenRefreshError外: 打印(“凭据已被吊销或过期,请重新运行” “重新授权的申请”) 除HttpError作为错误外: 打印“runSyncQuery中的错误:”,pprint.pprint(err.content) 除异常作为错误外: 打印“未定义错误”%err 返回结果; #主要 如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu': #文件名 FILE_NAME=“results.csv” #默认以前运行查询的天数 天数=“1” #Google开发者控制台中列出的BigQuery项目id。 项目ID='xxxxxx' #谷歌开发者控制台中列出的服务帐户电子邮件地址。 服务账户xxxxxx@developer.gserviceaccount.com' KEY=“/usr/local/xxxxxxxx” 查询=创建查询(天数) #鉴定 客户端=验证\服务\帐户(服务\帐户,密钥) #获取查询结果 结果=runSyncQuery(客户端,项目ID,查询,超时=0) #pdb.set_trace(); #将结果写入csv而不带标题 ordered_FieldName=OrderedDict([('f_split',None),('m_members',None),('f_day',None),('visitors',None),('Purchaers',None),('dmd_per_mem',None),('Purchaers_per visitor',None),('Purchaers_per purchaser',None),('dmd_per purchaser',None)]) writeToCsv(结果、文件名、有序字段名、False) #备份当前数据 backupfilename=“data_bk-”+time.strftime(“%y-%m-%d”)+“.csv” 调用(['cp','../data/data.csv',backupfilename]) #将新结果连接到数据 以open(“../data/data.csv”,“ab”)作为输出文件: 打开(“results.csv”、“rb”)作为填充: line=infle.read() 输出文件。写入(行),python-2.7,ubuntu,google-bigquery,Python 2.7,Ubuntu,Google Bigquery,假设浮点是非确定性的: “IEEE标准不保证相同的程序将 在所有符合要求的系统上提供相同的结果。” 假设浮点是非确定性的: “IEEE标准不保证相同的程序将 在所有符合要求的系统上提供相同的结果。” 您提到这些数据来自浮点数据的总和。正如Felipe所提到的,浮点运算很难使用;它违反了我们倾向于假设的一些数学恒等式 在这种情况下,关联属性就是咬我们的属性。也就是说,通常(A+B)+C==A+(B+C)。然而,在浮点数学中,情况并非如此。每个操作都是一个近似值;如果使用“近似”函数包装,您可以

假设浮点是非确定性的:

“IEEE标准不保证相同的程序将 在所有符合要求的系统上提供相同的结果。”


假设浮点是非确定性的:

“IEEE标准不保证相同的程序将 在所有符合要求的系统上提供相同的结果。”


您提到这些数据来自浮点数据的总和。正如Felipe所提到的,浮点运算很难使用;它违反了我们倾向于假设的一些数学恒等式

在这种情况下,关联属性就是咬我们的属性。也就是说,通常
(A+B)+C==A+(B+C)
。然而,在浮点数学中,情况并非如此。每个操作都是一个近似值;如果使用“近似”函数包装,您可以更好地看到这一点:
approx(近似(A+B)+C)
明显不同于
approx(A+approx(B+C))

如果您考虑bigquery如何计算聚合,它将构建一个执行树,并计算要在树的叶子处聚合的值。当这些答案准备好后,它们被传递回树的更高级别并聚合(假设它们被添加)。“当他们准备好了”这一部分使它变得不确定

节点可以第一次按
A,B,C
的顺序返回结果,第二次按
C,A,B
的顺序返回结果。这意味着分配顺序将发生变化,因为第一次您将获得
approx(approx(A+B)+C)
,第二次您将获得
approx(approx(C,A)+B)
。注意,因为我们处理的是排序,所以交换属性看起来可能是有问题的,但事实并非如此<浮动数学中的code>A+B与
B+A
相同。问题实际上是您添加的是部分结果,这些结果不具有关联性


浮点数学有各种令人讨厌的特性,如果依赖精度,通常应该避免使用。

您提到这些特性来自浮点数据的总和。正如Felipe所提到的,浮点运算很难使用;它违反了我们倾向于假设的一些数学恒等式

在这种情况下,关联属性就是咬我们的属性。也就是说,通常
(A+B)+C==A+(B+C)
。然而,在浮点数学中,情况并非如此。每个操作都是一个近似值;如果使用“近似”函数包装,您可以更好地看到这一点:
approx(近似(A+B)+C)
明显不同于
approx(A+approx(B+C))

如果您考虑bigquery如何计算聚合,它将构建一个执行树,并计算要在树的叶子处聚合的值。当这些答案准备好后,它们被传递回树的更高级别并聚合(假设它们被添加)。“当他们准备好了”这一部分使它变得不确定

节点可以第一次按
A,B,C
的顺序返回结果,第二次按
C,A,B
的顺序返回结果。这意味着分配顺序将发生变化,因为第一次您将获得
approx(approx(A+B)+C)
,第二次您将获得
approx(approx(C,A)+B)
。注意,因为我们处理的是排序,所以交换属性看起来可能是有问题的,但事实并非如此<浮动数学中的code>A+B与
B+A
相同。问题实际上是您添加的是部分结果,这些结果不具有关联性


浮点数学有各种令人讨厌的属性,如果依赖精度,通常应该避免使用。

propertial-但让我们只说那些变化的浮点值来自float类型datapropertial的聚合和-但让我们只说那些变化的浮点值来自float类型数据的聚合和
import sys
import pdb
import json
from collections import OrderedDict
from csv import DictWriter
from pprint import pprint
from apiclient import discovery
from oauth2client import tools

import functools
import argparse
import httplib2

import time
from subprocess import call



def authenticate_SERVICE_ACCOUNT(service_acct_email, private_key_path):
    """ Generic authentication through a service accounts.

    Args:
        service_acct_email: The service account email associated 
        with the private key private_key_path: The path to the private key file
    """

    from oauth2client.client import SignedJwtAssertionCredentials

    with open(private_key_path, 'rb') as pk_file:
       key = pk_file.read()

    credentials = SignedJwtAssertionCredentials(
      service_acct_email, 
      key, 
      scope='https://www.googleapis.com/auth/bigquery')

    http = httplib2.Http()
    auth_http = credentials.authorize(http)

    return discovery.build('bigquery', 'v2', http=auth_http)

def create_query(number_of_days_ago):
  """ Create a query 

      Args:
        number_of_days_ago: Default value of 1 gets yesterday's data

  """
  q = 'SELECT xxxxxxxxxx'


  return q;

def translate_row(row, schema):
        """Apply the given schema to the given BigQuery data row.
        Args:
            row: A single BigQuery row to transform.
            schema: The BigQuery table schema to apply to the row, specifically
                    the list of field dicts.
        Returns:
            Dict containing keys that match the schema and values that match
            the row.

        Adpated from bigquery client
        https://github.com/tylertreat/BigQuery-Python/blob/master/bigquery/client.py
        """

        log = {}
        #pdb.set_trace()
        # Match each schema column with its associated row value
        for index, col_dict in enumerate(schema):
            col_name = col_dict['name']
            row_value = row['f'][index]['v']

            if row_value is None:
                log[col_name] = None
                continue

            # Cast the value for some types
            if col_dict['type'] == 'INTEGER':
                row_value = int(row_value)

            elif col_dict['type'] == 'FLOAT':
                row_value = float(row_value)

            elif col_dict['type'] == 'BOOLEAN':
                row_value = row_value in ('True', 'true', 'TRUE')

            log[col_name] = row_value

        return log

def extractResult(queryReply):
  """ Extract a result from the query reply.  Uses schema and rows to translate.

      Args:
        queryReply: the object returned by bigquery

  """
  #pdb.set_trace()
  result = []
  schema = queryReply.get('schema', {'fields': None})['fields']
  rows = queryReply.get('rows',[])

  for row in rows:
    result.append(translate_row(row, schema))
  return result


def writeToCsv(results, filename, ordered_fieldnames, withHeader=True):
  """ Create a csv file from a list of rows.

      Args:
        results: list of rows of data (first row is assumed to be a header)
        order_fieldnames: a dict with names of fields in order desired - names must exist in results header
        withHeader: a boolen to indicate whether to write out header -
          Set to false if you are going to append data to existing csv

  """
  try:
    the_file = open(filename, "w")    
    writer = DictWriter(the_file, fieldnames=ordered_fieldnames)
    if withHeader:
      writer.writeheader()
    writer.writerows(results)
    the_file.close()
  except:
    print "Unexpected error:", sys.exc_info()[0]
    raise


def runSyncQuery (client, projectId, query, timeout=0):
  results = []
  try:
    print 'timeout:%d' % timeout
    jobCollection = client.jobs()
    queryData = {'query':query,
                 'timeoutMs':timeout}

    queryReply = jobCollection.query(projectId=projectId,
                                     body=queryData).execute()

    jobReference=queryReply['jobReference']

    # Timeout exceeded: keep polling until the job is complete.
    while(not queryReply['jobComplete']):
      print 'Job not yet complete...'
      queryReply = jobCollection.getQueryResults(
                          projectId=jobReference['projectId'],
                          jobId=jobReference['jobId'],
                          timeoutMs=timeout).execute()

    # If the result has rows, print the rows in the reply.
    if('rows' in queryReply):
      #print 'has a rows attribute'
      #pdb.set_trace();
      result = extractResult(queryReply)
      results.extend(result)

      currentPageRowCount = len(queryReply['rows'])

      # Loop through each page of data
      while('rows' in queryReply and currentPageRowCount < int(queryReply['totalRows'])):
        queryReply = jobCollection.getQueryResults(
                          projectId=jobReference['projectId'],
                          jobId=jobReference['jobId'],
                          startIndex=currentRow).execute()
        if('rows' in queryReply):
          result = extractResult(queryReply)
          results.extend(result)
          currentRow += len(queryReply['rows'])

  except AccessTokenRefreshError:
    print ("The credentials have been revoked or expired, please re-run"
    "the application to re-authorize")

  except HttpError as err:
    print 'Error in runSyncQuery:', pprint.pprint(err.content)

  except Exception as err:
    print 'Undefined error' % err 

  return results;


# Main
if __name__ == '__main__':
  # Name of file
  FILE_NAME = "results.csv"

  # Default prior number of days to run query
  NUMBER_OF_DAYS = "1"

  # BigQuery project id as listed in the Google Developers Console.
  PROJECT_ID = 'xxxxxx'

  # Service account email address as listed in the Google Developers Console.
  SERVICE_ACCOUNT = 'xxxxxx@developer.gserviceaccount.com'
  KEY = "/usr/local/xxxxxxxx"

  query = create_query(NUMBER_OF_DAYS)

  # Authenticate
  client = authenticate_SERVICE_ACCOUNT(SERVICE_ACCOUNT, KEY)

  # Get query results
  results = runSyncQuery (client, PROJECT_ID, query, timeout=0)
  #pdb.set_trace();

  # Write results to csv without header
  ordered_fieldnames = OrderedDict([('f_split',None),('m_members',None),('f_day',None),('visitors',None),('purchasers',None),('demand',None), ('dmd_per_mem',None),('visitors_per_mem',None),('purchasers_per_visitor',None),('dmd_per_purchaser',None)])
  writeToCsv(results, FILE_NAME, ordered_fieldnames, False) 

  # Backup current data
  backupfilename = "data_bk-" + time.strftime("%y-%m-%d") + ".csv"
  call(['cp','../data/data.csv',backupfilename])

  # Concatenate new results to data
  with open("../data/data.csv", "ab") as outfile:
    with open("results.csv","rb") as infile:
      line = infile.read()
      outfile.write(line)