pyspark内部联接无法解析明显具有

pyspark内部联接无法解析明显具有,pyspark,apache-spark-sql,pyspark-dataframes,Pyspark,Apache Spark Sql,Pyspark Dataframes,我有两个pyspark数据框tsteval和top_rec。我正在尝试创建一个新的数据框top_rec_tckts,只过滤tsteval中与top_rec相同的storeid和tz_brand_id的记录。因此,我可以从tsteval中获取这些记录的storeid和ticketid。我有下面两个数据帧的示例输出。它们都有storeid和tz_brand_id字段。我不明白为什么我在尝试使用内部联接过滤tsteval时会出现以下错误。有人知道问题是什么吗,或者你能提出另一种方法来实现这一点吗。很抱

我有两个pyspark数据框tsteval和top_rec。我正在尝试创建一个新的数据框top_rec_tckts,只过滤tsteval中与top_rec相同的storeid和tz_brand_id的记录。因此,我可以从tsteval中获取这些记录的storeid和ticketid。我有下面两个数据帧的示例输出。它们都有storeid和tz_brand_id字段。我不明白为什么我在尝试使用内部联接过滤tsteval时会出现以下错误。有人知道问题是什么吗,或者你能提出另一种方法来实现这一点吗。很抱歉,我不得不把下面的一堆错误信息切掉,以使其符合要求。我留下了开头和结尾,我希望有足够的线索来了解到底发生了什么

tsteval.show(truncate=False)

print('')
top_rec.show(truncate=False)
示例数据:

+----------+----------+
|tz_brand_id|storeid|qty|dateclosed|grossreceipts      |ticketid                            |current_date|filter_date|min_dt    |max_dt    |
+-----------+-------+---+----------+-------------------+------------------------------------+------------+-----------+----------+----------+
|2847       |87     |1.0|2020-06-15|21.1453375         |02c8ec06-a75a-4dd2-89e2-dbbf1dxxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|2847       |87     |1.0|2020-05-23|21.1453375         |67a34306-6608-4b00-bf72-f1f42xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|2847       |87     |1.0|2020-05-19|26.129683025000002 |82665853-66ad-4e52-851e-f1cdf8xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|3285       |127    |1.0|2020-06-02|20.642125          |d0898233-64b3-48d8-9a46-a03eefxxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|3285       |127    |1.0|2020-05-22|20.642125          |941d2889-230f-4a19-9cb9-90f7b2xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|2747       |77     |1.0|2020-05-30|21.3902            |72c3c7dd-a436-45ae-9adb-f19618xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|9601       |85     |1.0|2020-05-30|23.0               |74328e66-6371-4323-bdf9-057d2xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
|9601       |85     |1.0|2020-05-29|20.7               |997ab6b3-b4b5-48e4-884d-00834xxxxxx|2020-07-15  |2020-03-17 |2020-03-17|2020-05-16|
+-----------+-------+---+----------+-------------------+------------------------------------+------------+-----------+----------+----------+
only showing top 20 rows


+-------+----------+-----------+
|storeid|max_dt    |tz_brand_id|
+-------+----------+-----------+
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
|127    |2020-05-16|2799       |
+-------+----------+-----------+
代码:

错误:

 An error was encountered:
    'Resolved attribute(s) max_dt#6786 missing from storeid#3445,qty#3375,max_dt#3299,min_dt#3289,grossreceipts#3381,filter_date#411,tz_brand_id#3449,ticketid#3387,dateclosed#3390,current_date#403 in operator !Filter ((dateclosed#3390 > min_dt#3289) && (dateclosed#3390 <= max_dt#6786)). Attribute(s) with the same name appear in the operation: max_dt. Please check if the right attribute(s) are used.;;\nJoin Inner, ((storeid#292 = storeid#7081) && (tz_brand_id#296 = tz_brand_id#4560))\n:- SubqueryAlias `a`\n:  +- Filter ((dateclosed#237 > max_dt#3299) && (dateclosed#237 <= date_add(max_dt#3299, 30)))\n:     +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#3299]\n:        +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n:           +- Filter (dateclosed#237 > filter_date#411)\n:              +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n:                 +- Filter storeid#292 IN (85,130,77,127,87)\n:                    +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n:                       +- Filter isnotnull(tz_brand_id#296)\n:                          +- Filter NOT (storeid#292 = 230)\n:                             +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234]\n:                                +- Filter (producttype#211 = EDIBLE)\n:                                   +- LogicalRDD [cbd_perc#199, thc_perc#200, register#201, customer_type#202, type#203, customer_state#204, customer_city#205, zip_code#206, age#207, age_group#208, cashier#209, approver#210, producttype#211, productsubtype#212, productattributes#213, productbrand#214, productname#215, classification#216, tier#217, weight#218, unitofmeasure#219, size#220, priceunit#221, qty#222, ... 75 more fields], false\n+- SubqueryAlias `b`\n   +- Project [storeid#7081, max_dt#6786, tz_brand_id#4560]\n      +- Project [storeid#7081, max_dt#6786, tz_brand_id#7085, prediction#4548, tz_brand_id#4560, tz_brand_id#4560]\n         +- Window [first(tz_brand_id#7085, true) windowspecdefinition(storeid#7081, max_dt#6786, prediction#4548 DESC NULLS LAST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS tz_brand_id#4560], [storeid#7081, max_dt#6786], [prediction#4548 DESC NULLS LAST]\n            +- Project [storeid#7081, max_dt#6786, tz_brand_id#7085, prediction#4548]\n               +- Filter AtLeastNNulls(n, prediction#4548)\n                  +- Project [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366, UDF(features#4493, features#4503) AS prediction#4548]\n                     +- Join LeftOuter, (UDF(tz_brand_id#7085) = id#4502)\n                        :- Join LeftOuter, (UDF(storeid#7081) = id#4492)\n                        :  :- Deduplicate [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366]\n                        :  :  +- Filter ((cast(max_dt#6786 as timestamp) < accepted_date#4366) || isnull(accepted_date#4366))\n                        :  :     +- Project [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366]\n                        :  :        +- Join LeftOuter, ((storeid#7081 = storeid#4102) && (tz_brand_id#7085 = tz_brand_id#4103))\n                        :  :           :- SubqueryAlias `a`\n                        :  :           :  +- Join Cross\n                        :  :           :     :- Project [storeid#7081, max_dt#6786]\n                        :  :           :     :  +- Project [storeid#7081, max_dt#6786]\n                        :  :           :     :     +- Project [tz_brand_id#7085, min_dt#3289, max_dt#6786, coalesce((brand_qty#3346 / total_qty#3326), cast(0 as double)) AS norm_qty#3472, storeid#7081]\n                        :  :           :     :        +- Join LeftOuter, (storeid#7081 = storeid#3445)\n                        :  :           :     :           :- SubqueryAlias `a`\n                        :  :           :     :           :  +- Project [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty)#3339 AS brand_qty#3346]\n                        :  :           :     :           :     +- Aggregate [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085], [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty#7011) AS sum(qty)#3339]\n                        :  :           :     :           :        +- Filter ((dateclosed#7026 > min_dt#3289) && (dateclosed#7026 <= max_dt#6786))\n                        :  :           :     :           :           +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#6786]\n                        :  :           :     :           :              +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n                        :  :           :     :           :                 +- Filter (dateclosed#7026 > filter_date#411)\n                        :  :           :     :           :                    +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n                        :  :           :     :           :                       +- Filter storeid#7081 IN (85,130,77,127,87)\n                        :  :           :     :           :                          +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n                        :  :           :     :           :                             +- Filter isnotnull(tz_brand_id#7085)\n                        :  :           :     :           :                                +- Filter NOT (storeid#7081 = 230)\n                        :  :           :     :           :                                   +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023]\n                        :  :           :     :           :                                      +- Filter (producttype#7000 = EDIBLE)\n                        :  :           :     :           :                                         +- LogicalRDD [cbd_perc#6988, thc_perc#6989, register#6990, customer_type#6991, type#6992, customer_state#6993, customer_city#6994, zip_code#6995, age#6996, age_group#6997, cashier#6998, approver#6999, producttype#7000, productsubtype#7001, productattributes#7002, productbrand#7003, productname#7004, classification#7005, tier#7006, weight#7007, unitofmeasure#7008, size#7009, priceunit#7010, qty#7011, ... 75 more fields], false\n                        :  :           :     :           +- SubqueryAlias `b`\n                        :  :           :     :              +- Project [storeid#3445, sum(qty)#3320 AS total_qty#3326]\n                        :  :           :     :                 +- !Aggregate [storeid#3445, min_dt#3289, max_dt#6786], [storeid#3445, min_dt#3289, max_dt#6786, sum(qty#3375) AS sum(qty)#3320]\n                        :  :           :     :                    +- !Filter ((dateclosed#3390 > min_dt#3289) && (dateclosed#3390 <= max_dt#6786))\n                        :  :           :     :                       +- Project [tz_brand_id#3449, storeid#3445, qty#3375, dateclosed#3390, grossreceipts#3381, ticketid#3387, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#3299]\n                        :  :           :     :                          +- Project [tz_brand_id#3449, storeid#3445, [tz_brand_id#7085, max_dt#6786]\n                        :  :           :           +- Project [tz_brand_id#7085, min_dt#3289, max_dt#6786, coalesce((brand_qty#3346 / total_qty#3326), cast(0 as double)) AS norm_qty#3472, storeid#7081]\n                        :  :           :              +- Join LeftOuter, (storeid#7081 = storeid#3445)\n                        :  :           :                 :- SubqueryAlias `a`\n                        :  :           :                 :  +- Project [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty)#3339 AS brand_qty#3346]\n                        :  :           :                 :     +- Aggregate [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085], [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty#7011) AS sum(qty)#3339]\n                        :  :           :                 :        +- Filter ((dateclosed#7026 > min_dt#3289) && (dateclosed#7026 <= max_dt#6786))\n                        :  :           :                 :           +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#6786]\n                        :  :           :                 :              +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n                        :  :           :                 :                 +- Filter (dateclosed#7026 > filter_date#411)\n                        :  :           :                 :                    +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n                        :  :           :                 :                       +- Filter storeid#7081 IN (85,130,77,127,87)\n                        :  :           :                 :                          +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n                        :  :           :                 :                             +- Filter isnotnull(tz_brand_id#7085)\n                        :  :           :                 :                                +- Filter NOT (storeid#7081 = 230)\n                        :  :           :                 :                                   
Qty#3948, ... 10 more fields], false\n                        :  +- Project [_1#4489 AS id#4492, _2#4490 AS features#4493]\n                        :     +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#4489, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#4490]\n                        :        +- ExternalRDD [obj#4488]\n                        +- Project [_1#4499 AS id#4502, _2#4500 AS features#4503]\n                           +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#4499, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#4500]\n                              +- ExternalRDD [obj#4498]\n'
遇到错误:
“已解析属性max#u dt#6786从storeid#3445、数量#3375、max#u dt#3299、min#u dt#3289、grossreceipts#3381、筛选器#日期#411、TZU品牌3549、ticketid 3587、dateclosed 3590、current#日期#403运算符中的403!过滤(截止日期关闭的那个些城市居民和(截止日期关闭的那个些城市居民和3390日城市居民和3390日城市居民和3390日城市居民和3390日城市居民和3299)和(截止日期关闭的那个些城市居民和(截止日期关闭的那个些城市居民和237日过滤器和(截止日期关闭的那个些城市居民和237)过滤(截止日期关闭的237)和(截止日期关闭的237)过滤器和(截止日期关闭的237)过滤器和(截止日期关闭的237)过滤(截止日期的237日)过滤器和237过滤(截止日期的237个截止日期的那个些7个截止日期)417个过滤器过滤器和417个截止日期的那个那个那个些过滤器和417个截止日期的那个些过滤器和411)417个过滤器过滤器和411个过滤器的那个那个那个那个么1)1个1个1个)男男男男1个1个)n::男男男男男男男男男男男男男男男男男ASFILTER#U date#411]\n:+-Filter storeid#292 IN(85130,77127,87)\n:+-Project[tz#u brand#u id#296,storeid#292,qty#222,dateclosed#237,grossreceipts#228,ticketid#234,至#u日期(unix#u时间戳(2020-07-15 21:17:18,YYYYYY MM dd,无)作为时间戳),无)作为当前#u日期#403\n:+-Filter isnull(TZ35u#u品牌识别号)\n:+-Filter NOT(storeid#292=230)\n:+-Project[tz#u brand#id#296,storeid#292,qty#222,dateclosed#237,grossreceipts#228,ticketid#234]\n:+-Filter(producttype#211=可食用)\n:+-LogicalRDDcbd-U-perc-C-C-U-perc-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-分类216,等级217,重量218,计量单位219,尺寸220,普华永道这是221个,数量是222个,75个更多的领域,、75个更多的领域,、假\n+n+n+n+n+n+n+n+n+n+n+n+n+n+n+这是221个,数量是221个,数量是222个,数量是222个,数量是222个,数量是222个,,,,,,,,,,,,,,,,,,,,,,,,,,,,[3581号,221-221个,军民民民民民dt dt TTT切切切切切切切切切切切切切切切切切切切切切切切切切蒂蒂,军军军军民dt TTTT5,678,6786,军军军军军民民民民dt TTTTT5,6786,6786,6786,军军军军军军民民民民民民民民民民民民切切切切切切切(正确)WindowsPec定义(storeid#7081,max#u dt#6786,prediction#4548 DESC NULLS LAST,specifiedwindowframe(RangeFrame,unboundredpreceding$(),currentrow$)作为tz品牌id 3560,[storeid 3581,max#dt 3586],[prediction 3548desc NULLS LAST]\n+-Project[storeid 3581,max 3581,max#dt 3586,max#dt NULLS 3585,prediction+-过滤器(n,预测4548)\n+-项目[storeid#7081,tz#brand#id#7085,max#dt#6786,接受日期#4366,UDF(功能#4493,功能#4503)作为预测3548]\n+-加入LeftOuter,(UDF(tz#brand#id 3585)=id 3585)=id 352)\n:-加入LeftOuter,(storeid 3592)\n::-重复数据消除[storeid#7081,tz#U brand#U id#7085,max#dt#6786,已接受日期#4366]\n::+-过滤器(转换(max#U dt 3586作为时间戳)<已接受日期#4366)|为空(已接受日期3566))\n:+-项目[storeid 3581,tz#brand U#id 3586,已接受日期3566]\n:+-Join LeftOuter,((storeid#7081=storeid#4102)和&(tz#u brand#id#7085=tz#u brand#id#4103))\n::-SubqueryAlias`a`\n::+-Join Cross\n::::-项目[storeid#7081,max#dt 6786]\n:::+-项目[storeid#7081,max#dt#6786]\n::::+-项目[TZU brand#id#7085,min#dt#3289,max#dt#6786,凝聚((品牌数量#3346/总数量3526),铸造(0为双倍))作为标准数量3572,storeid 3581]\n::::+-加入LeftOuter,(storeid#7081=storeid#3445)\n:::::-子QueryAlias`a`\n:::::+-项目[storeid#7081,min#dt#3289,max#dt#6786,tz#brand#id 3585,总和(数量)#3339作为品牌数量356]\n:::+-聚合[storeid#7081,min#dt#3289,max#dt#6786,tz#brand#id#7085],[storeid#7081,min#dt#3289,max 3586,tz#brand#id 3585,总和(数量7011)作为总和(数量)359]\_3339];:::(DateDT 3526+,min(dateclosed#7026 filter#date#411)\n::::+-项目[tz#U brand#id#7085,storeid#7081,qty#7011,dateclosed#7026,grossreceipts#7017,ticketid#7023,current#U date#403,date#添加(current#403#403)作为filter 351]\n::::+-过滤storeid#7081 IN(85130,77127,87)\n::::+-项目[tz#U品牌id#7085,storeid#7081,数量#7011,日期关闭#7026,总收入#7017,票号#7023,截止日期(unix#时间戳(2020-07-15 21:17:18,yyyy-MM-dd,无)作为时间戳),无)作为当前日期#403]\n:::::+-Filter不为空(tz:#品牌#id#7085)\n::::+-Fi
 An error was encountered:
    'Resolved attribute(s) max_dt#6786 missing from storeid#3445,qty#3375,max_dt#3299,min_dt#3289,grossreceipts#3381,filter_date#411,tz_brand_id#3449,ticketid#3387,dateclosed#3390,current_date#403 in operator !Filter ((dateclosed#3390 > min_dt#3289) && (dateclosed#3390 <= max_dt#6786)). Attribute(s) with the same name appear in the operation: max_dt. Please check if the right attribute(s) are used.;;\nJoin Inner, ((storeid#292 = storeid#7081) && (tz_brand_id#296 = tz_brand_id#4560))\n:- SubqueryAlias `a`\n:  +- Filter ((dateclosed#237 > max_dt#3299) && (dateclosed#237 <= date_add(max_dt#3299, 30)))\n:     +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#3299]\n:        +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n:           +- Filter (dateclosed#237 > filter_date#411)\n:              +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n:                 +- Filter storeid#292 IN (85,130,77,127,87)\n:                    +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n:                       +- Filter isnotnull(tz_brand_id#296)\n:                          +- Filter NOT (storeid#292 = 230)\n:                             +- Project [tz_brand_id#296, storeid#292, qty#222, dateclosed#237, grossreceipts#228, ticketid#234]\n:                                +- Filter (producttype#211 = EDIBLE)\n:                                   +- LogicalRDD [cbd_perc#199, thc_perc#200, register#201, customer_type#202, type#203, customer_state#204, customer_city#205, zip_code#206, age#207, age_group#208, cashier#209, approver#210, producttype#211, productsubtype#212, productattributes#213, productbrand#214, productname#215, classification#216, tier#217, weight#218, unitofmeasure#219, size#220, priceunit#221, qty#222, ... 75 more fields], false\n+- SubqueryAlias `b`\n   +- Project [storeid#7081, max_dt#6786, tz_brand_id#4560]\n      +- Project [storeid#7081, max_dt#6786, tz_brand_id#7085, prediction#4548, tz_brand_id#4560, tz_brand_id#4560]\n         +- Window [first(tz_brand_id#7085, true) windowspecdefinition(storeid#7081, max_dt#6786, prediction#4548 DESC NULLS LAST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS tz_brand_id#4560], [storeid#7081, max_dt#6786], [prediction#4548 DESC NULLS LAST]\n            +- Project [storeid#7081, max_dt#6786, tz_brand_id#7085, prediction#4548]\n               +- Filter AtLeastNNulls(n, prediction#4548)\n                  +- Project [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366, UDF(features#4493, features#4503) AS prediction#4548]\n                     +- Join LeftOuter, (UDF(tz_brand_id#7085) = id#4502)\n                        :- Join LeftOuter, (UDF(storeid#7081) = id#4492)\n                        :  :- Deduplicate [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366]\n                        :  :  +- Filter ((cast(max_dt#6786 as timestamp) < accepted_date#4366) || isnull(accepted_date#4366))\n                        :  :     +- Project [storeid#7081, tz_brand_id#7085, max_dt#6786, accepted_date#4366]\n                        :  :        +- Join LeftOuter, ((storeid#7081 = storeid#4102) && (tz_brand_id#7085 = tz_brand_id#4103))\n                        :  :           :- SubqueryAlias `a`\n                        :  :           :  +- Join Cross\n                        :  :           :     :- Project [storeid#7081, max_dt#6786]\n                        :  :           :     :  +- Project [storeid#7081, max_dt#6786]\n                        :  :           :     :     +- Project [tz_brand_id#7085, min_dt#3289, max_dt#6786, coalesce((brand_qty#3346 / total_qty#3326), cast(0 as double)) AS norm_qty#3472, storeid#7081]\n                        :  :           :     :        +- Join LeftOuter, (storeid#7081 = storeid#3445)\n                        :  :           :     :           :- SubqueryAlias `a`\n                        :  :           :     :           :  +- Project [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty)#3339 AS brand_qty#3346]\n                        :  :           :     :           :     +- Aggregate [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085], [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty#7011) AS sum(qty)#3339]\n                        :  :           :     :           :        +- Filter ((dateclosed#7026 > min_dt#3289) && (dateclosed#7026 <= max_dt#6786))\n                        :  :           :     :           :           +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#6786]\n                        :  :           :     :           :              +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n                        :  :           :     :           :                 +- Filter (dateclosed#7026 > filter_date#411)\n                        :  :           :     :           :                    +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n                        :  :           :     :           :                       +- Filter storeid#7081 IN (85,130,77,127,87)\n                        :  :           :     :           :                          +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n                        :  :           :     :           :                             +- Filter isnotnull(tz_brand_id#7085)\n                        :  :           :     :           :                                +- Filter NOT (storeid#7081 = 230)\n                        :  :           :     :           :                                   +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023]\n                        :  :           :     :           :                                      +- Filter (producttype#7000 = EDIBLE)\n                        :  :           :     :           :                                         +- LogicalRDD [cbd_perc#6988, thc_perc#6989, register#6990, customer_type#6991, type#6992, customer_state#6993, customer_city#6994, zip_code#6995, age#6996, age_group#6997, cashier#6998, approver#6999, producttype#7000, productsubtype#7001, productattributes#7002, productbrand#7003, productname#7004, classification#7005, tier#7006, weight#7007, unitofmeasure#7008, size#7009, priceunit#7010, qty#7011, ... 75 more fields], false\n                        :  :           :     :           +- SubqueryAlias `b`\n                        :  :           :     :              +- Project [storeid#3445, sum(qty)#3320 AS total_qty#3326]\n                        :  :           :     :                 +- !Aggregate [storeid#3445, min_dt#3289, max_dt#6786], [storeid#3445, min_dt#3289, max_dt#6786, sum(qty#3375) AS sum(qty)#3320]\n                        :  :           :     :                    +- !Filter ((dateclosed#3390 > min_dt#3289) && (dateclosed#3390 <= max_dt#6786))\n                        :  :           :     :                       +- Project [tz_brand_id#3449, storeid#3445, qty#3375, dateclosed#3390, grossreceipts#3381, ticketid#3387, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#3299]\n                        :  :           :     :                          +- Project [tz_brand_id#3449, storeid#3445, [tz_brand_id#7085, max_dt#6786]\n                        :  :           :           +- Project [tz_brand_id#7085, min_dt#3289, max_dt#6786, coalesce((brand_qty#3346 / total_qty#3326), cast(0 as double)) AS norm_qty#3472, storeid#7081]\n                        :  :           :              +- Join LeftOuter, (storeid#7081 = storeid#3445)\n                        :  :           :                 :- SubqueryAlias `a`\n                        :  :           :                 :  +- Project [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty)#3339 AS brand_qty#3346]\n                        :  :           :                 :     +- Aggregate [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085], [storeid#7081, min_dt#3289, max_dt#6786, tz_brand_id#7085, sum(qty#7011) AS sum(qty)#3339]\n                        :  :           :                 :        +- Filter ((dateclosed#7026 > min_dt#3289) && (dateclosed#7026 <= max_dt#6786))\n                        :  :           :                 :           +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, min_dt#3289, date_add(filter_date#411, 60) AS max_dt#6786]\n                        :  :           :                 :              +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, filter_date#411, date_add(filter_date#411, 0) AS min_dt#3289]\n                        :  :           :                 :                 +- Filter (dateclosed#7026 > filter_date#411)\n                        :  :           :                 :                    +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, current_date#403, date_add(current_date#403, -120) AS filter_date#411]\n                        :  :           :                 :                       +- Filter storeid#7081 IN (85,130,77,127,87)\n                        :  :           :                 :                          +- Project [tz_brand_id#7085, storeid#7081, qty#7011, dateclosed#7026, grossreceipts#7017, ticketid#7023, to_date(cast(unix_timestamp(2020-07-15 21:17:18, yyyy-MM-dd, None) as timestamp), None) AS current_date#403]\n                        :  :           :                 :                             +- Filter isnotnull(tz_brand_id#7085)\n                        :  :           :                 :                                +- Filter NOT (storeid#7081 = 230)\n                        :  :           :                 :                                   
Qty#3948, ... 10 more fields], false\n                        :  +- Project [_1#4489 AS id#4492, _2#4490 AS features#4493]\n                        :     +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#4489, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#4490]\n                        :        +- ExternalRDD [obj#4488]\n                        +- Project [_1#4499 AS id#4502, _2#4500 AS features#4503]\n                           +- SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#4499, staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(FloatType,false), fromPrimitiveArray, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#4500]\n                              +- ExternalRDD [obj#4498]\n'
    from pyspark.sql import functions as F

    tsteval = sc.parallelize([
        (2847, 87, 1.0, "2020-06-15", 21.1453375, "02c8ec06-a75a-4dd2-89e2-dbbf1dxxxxxx", "2020-05-16"),  (3285, 127, 1.0, "2020-06-02", 20.642125,"941d2889-230f-4a19-9cb9-90f7b2xxxxxx", "2020-05-16"),
        (2799, 127, 1.0, "2020-06-03", 23.642125, "997ab6b3-b4b5-48e4-884d-00834xxxxxx", "2020-05-16")
    ]).toDF(["tz_brand_id", "storeid", "qty", "dateclosed", "grossreceipts ","ticketid", "max_dt "])


    tsteval_rn = tsteval.withColumnRenamed("storeid", "storeid_a")

    tsteval_rn.show()

    # +-----------+---------+---+----------+--------------+--------------------+----------+
    # |tz_brand_id|storeid_a|qty|dateclosed|grossreceipts |            ticketid|   max_dt |
    # +-----------+---------+---+----------+--------------+--------------------+----------+
    # |       2847|       87|1.0|2020-06-15|    21.1453375|02c8ec06-a75a-4dd...|2020-05-16|
    # |       3285|      127|1.0|2020-06-02|     20.642125|941d2889-230f-4a1...|2020-05-16|
    # |       2799|      127|1.0|2020-06-03|     23.642125|997ab6b3-b4b5-48e...|2020-05-16|
    # +-----------+---------+---+----------+--------------+--------------------+----------+



    top_rec = sc.parallelize([
        (127, "2020-05-16", 2799), (127, "2020-05-16", 2799)
    ]).toDF(["storeid", "date", "tz_brand_id"])

    top_rec.show()

    # +-------+----------+-----------+
    # |storeid|      date|tz_brand_id|
    # +-------+----------+-----------+
    # |    127|2020-05-16|       2799|
    # |    127|2020-05-16|       2799|
    # +-------+----------+-----------+

    df3 = tsteval_rn.join(top_rec, [(tsteval_rn.storeid_a==top_rec.storeid)&(tsteval_rn.tz_brand_id == top_rec.tz_brand_id)], how='inner')

    df3.select(F.col('storeid_a').alias("storeid"),'ticketid').dropDuplicates().show(truncate=False)

    # +-------+-----------------------------------+
    # |storeid|ticketid                           |
    # +-------+-----------------------------------+
    # |127    |997ab6b3-b4b5-48e4-884d-00834xxxxxx|
    # +-------+-----------------------------------+