Loops 如何像在SAS中一样在pyspark中循环宏？_Loops_Pandas_Macros_Pyspark

Loops 如何像在SAS中一样在pyspark中循环宏？

loops pandas macros pyspark

Loops 如何像在SAS中一样在pyspark中循环宏？,loops,pandas,macros,pyspark,Loops,Pandas,Macros,Pyspark,我想为不同的宏集（如SAS中的宏集）迭代相同的代码，然后将填充在一起的所有表追加到一起。由于我来自sas背景，我对如何在Pyspark环境中实现这一点感到非常困惑。非常感谢您的帮助示例代码如下：步骤1：定义宏变量步骤2：通过各种宏变量循环代码步骤3：将上面填充的每个数据集附加到基本表HI Pushkr，谢谢。我还可以在列表中使用字符串值吗？所以我的意思是，它可以像['a'，'b'，'c']，[1,2，'x]]等吗？是的，你可以使用字符串。我也可以在数组中单独定义一个宏变量，并在数组中引用

我想为不同的宏集（如SAS中的宏集）迭代相同的代码，然后将填充在一起的所有表追加到一起。由于我来自sas背景，我对如何在Pyspark环境中实现这一点感到非常困惑。非常感谢您的帮助

示例代码如下：

步骤1：定义宏变量步骤2：通过各种宏变量循环代码

步骤3：将上面填充的每个数据集附加到基本表HI Pushkr，谢谢。我还可以在列表中使用字符串值吗？所以我的意思是，它可以像['a'，'b'，'c']，[1,2，'x]]等吗？是的，你可以使用字符串。我也可以在数组中单独定义一个宏变量，并在数组中引用它，比如：a=“”case when spend>0然后1 else 0 end”“”[[a，1,2]，[a，2,4]]我想你可以。例如，您可以编写

a=1 if spend>0 else 0

，然后使用数组/列表中的

macroVar=[[a，1,2]，[a，2,4]

@Pushkr，如果我有多个查询，比如如果我想创建一个来自客户支出结果的查询，我该如何扩展它？

lastyear_st=201615
lastyear_end=201622

thisyear_st=201715
thisyear_end=201722

customer_spend=sqlContext.sql("""
select a.customer_code, 
sum(case when a.week_id between %d and %d then a.spend else 0 end) as spend
from tableA
group by a.card_code
"""
%(lastyear_st,lastyear_end)
(thisyear_st,thisyear_end))

# macroVars are your start and end values arranged as list of list.
# where each innner list contains start and end value 

macroVars = [[201615,201622],[201715, 201722]]

# loop thru list of list ==> 
for start,end in macroVars:

    # prepare query using the values of start and end
    query = "SELECT a.customer_code,Sum(CASE\
    WHEN a.week_id BETWEEN {} AND {} \
    THEN a.spend \
    ELSE 0 END) \
    AS spend FROM tablea GROUP  BY a.card_code".format(start,end) 

    # execute query
    customer_spend = sqlContext.sql(query)

    # depending on your base table setup use appropriate write command for example  

    customer_spend\
     .write.mode('append')\
     .parquet(os.path.join(tempfile.mkdtemp(), 'data'))