Java:Spark遍历自定义对象

Java:Spark遍历自定义对象,java,apache-spark,drools,Java,Apache Spark,Drools,我的程序中包含以下代码: finalJoined属于DataSet类型。 RuleParams和RuleOutputParams是带有setter和gette的java pojo类 我正在用下面的代码调用drools规则引擎 List<Row> finalList = finalJoined.collectAsList(); List<RuleOutputParams> ruleOutputParamsList = new ArrayList<RuleOutputP

我的程序中包含以下代码:

finalJoined
属于
DataSet类型。

RuleParams
RuleOutputParams
是带有setter和gette的java pojo类

我正在用下面的代码调用drools规则引擎

List<Row> finalList = finalJoined.collectAsList();
List<RuleOutputParams> ruleOutputParamsList = new ArrayList<RuleOutputParams>();
        Dataset<RuleOutputParams> rulesParamDS = null;
        Iterator<Row> iterator = finalList.iterator();
        while (iterator.hasNext()) {
            Row row = iterator.next();
            RuleParams ruleParams = new RuleParams();
            String outputString = (String) row.get(1);
            // setting up parameters
            System.out.println("Value of TXN DTTM is : " + row.getString(0));
            ruleParams.setTxnDttm(row.getString(0));
            ruleParams.setCisDivision(row.getString(1));
            System.out.println("Division is  : " + ruleParams.getCisDivision());
            ruleParams.setTxnVol(row.getInt(2));
            System.out.println("TXN Volume is  : " + ruleParams.getTxnVol());
            ruleParams.setTxnAmt(row.getInt(3));
            System.out.println("TXN Amount is  : " + ruleParams.getTxnAmt());
            ruleParams.setCurrencyCode(row.getString(4));
            ruleParams.setAcctNumberTypeCode(row.getString(5));
            ruleParams.setAccountNumber(row.getLong(6));
            ruleParams.setUdfChar1(row.getString(7));
            System.out.println("UDF Char1 is : " + ruleParams.getUdfChar1());
            ruleParams.setUdfChar2(row.getString(8));
            ruleParams.setUdfChar3(row.getString(9));
            ruleParams.setAccountId(row.getLong(10));
            kSession.insert(ruleParams);
            int output = kSession.fireAllRules();

            System.out.println("FileAllRules Output" + output);
            System.out.println("After firing  rules");
            System.out.println(ruleParams.getPriceItemParam1());
            System.out.println(ruleParams.getCisDivision());
            // generating output objects depending on the size of priceitems
            // derived.
            System.out.println("No. of priceitems derived : " + ruleParams.getPriceItemCd().size());
            for (int index = 0; index < ruleParams.getPriceItemCd().size(); index++) {

                System.out.println("Inside a for loop");

                RuleOutputParams ruleOutputParams = new RuleOutputParams();

                ruleOutputParams.setTxnDttm(ruleParams.getTxnDttm());
                ruleOutputParams.setCisDivision(ruleParams.getCisDivision());
                ruleOutputParams.setTxnVol(ruleParams.getTxnVol());
                ruleOutputParams.setTxnAmt(ruleParams.getTxnAmt());
                ruleOutputParams.setCurrencyCode(ruleParams.getCurrencyCode());
                ruleOutputParams.setAcctNumberTypeCode(ruleParams.getAcctNumberTypeCode());
                ruleOutputParams.setAccountNumber(ruleParams.getAccountNumber());
                ruleOutputParams.setAccountId(ruleParams.getAccountId());
                ruleOutputParams.setPriceItemCd(ruleParams.getPriceItemCd().get(index));
                System.out.println(ruleOutputParams.getPriceItemCd());
                ruleOutputParams.setPriceItemParam(ruleParams.getPriceItemParams().get(index));
                System.out.println(ruleOutputParams.getPriceItemParam());
                ruleOutputParams.setPriceItemParamCode(ruleParams.getPriceItemParamCodes().get(index));
                ruleOutputParams.setProcessingDate(new SimpleDateFormat("yyyy-MM-dd").format(new Date()));
                ruleOutputParams.setUdfChar1(ruleParams.getUdfChar1());
                ruleOutputParams.setUdfChar2(ruleParams.getUdfChar2());
                ruleOutputParams.setUdfChar3(ruleParams.getUdfChar3());

                ruleOutputParamsList.add(ruleOutputParams);
            }
        }
        System.out.println("Size of ruleOutputParamsList is : " + ruleOutputParamsList.size());
        Encoder<RuleOutputParams> rulesOutputParamEncoder = Encoders.bean(RuleOutputParams.class);
        rulesParamDS = sparkSession.createDataset(Collections.unmodifiableList(ruleOutputParamsList),
                rulesOutputParamEncoder);
        rulesParamDS.show();
List finalList=finalJoined.collectAsList();
List ruleOutputParamsList=new ArrayList();
数据集规则SPARAMDS=null;
迭代器迭代器=finalList.Iterator();
while(iterator.hasNext()){
Row=iterator.next();
RuleParams RuleParams=新的RuleParams();
String outputString=(String)row.get(1);
//设置参数
System.out.println(“TXN DTTM的值为:”+row.getString(0));
ruleParams.setTxnDttm(row.getString(0));
ruleParams.setCisDivision(row.getString(1));
System.out.println(“除法为:+ruleParams.getCisDivision());
ruleParams.setTxnVol(row.getInt(2));
System.out.println(“TXN卷是:“+ruleParams.getTxnVol()”);
ruleParams.setTxnAmt(row.getInt(3));
System.out.println(“TXN金额为:“+ruleParams.getTxnAmt());
ruleParams.setCurrencyCode(row.getString(4));
ruleParams.setAcctNumberTypeCode(row.getString(5));
ruleParams.setAccountNumber(row.getLong(6));
ruleParams.setUdfChar1(row.getString(7));
System.out.println(“UDF字符1为:+ruleParams.getUdfChar1());
ruleParams.setUdfChar2(row.getString(8));
ruleParams.setUdfChar3(row.getString(9));
ruleParams.setAccountId(row.getLong(10));
K会话插入(规则参数);
int output=kSession.fireAllRules();
System.out.println(“FileAllRules输出”+输出);
System.out.println(“点火后规则”);
System.out.println(ruleParams.getPriceItemParam1());
System.out.println(ruleParams.getCisDivision());
//根据priceitems的大小生成输出对象
//衍生的。
System.out.println(“派生的priceitems数量:“+ruleParams.getPriceItemCd().size());
对于(int index=0;index
我在代码中使用了
while
for
循环

可以使用spark的
map
flatmap
forEach
函数重写此代码吗?如何做到这一点

这里的问题是,Drools规则引擎被依次调用。我想并行执行它


编辑-如上面的代码所示,我首先将
数据帧
转换为
列表
,然后对其使用迭代器。我是否可以直接使用
数据帧
RDD

一个非常简单的演示,展示了我测试中的
parallelStream
CompletableFuture

对于
parallelStream

int parallelGet() {
    return IntStream.rangeClosed(0, TOP).parallel().map(i -> getIoBoundNumber(i)).sum();
}
对于
CompletableFuture

int concurrencyGetBasic() {
    List<CompletableFuture<Integer>> futureList = IntStream.rangeClosed(0, TOP).boxed()
            .map(i -> CompletableFuture.supplyAsync(() -> getIoBoundNumber(i)))
            .collect(Collectors.toList());
    return futureList.stream().map(CompletableFuture::join).reduce(0, Integer::sum);
}
int-concurrencyGetBasic(){
List futureList=IntStream.rangeClosed(0,顶部).boxed()
.map(i->CompletableFuture.SupplySync(()->getIoBoundNumber(i)))
.collect(Collectors.toList());
返回futureList.stream().map(CompletableFuture::join).reduce(0,Integer::sum);
}
有关更多教程,请查看和

如前所述,finalJoined已经是一个数据集,所以无需将其收集到驱动程序中。您可以编写类似于以下内容的代码:

这是包含数据的基本数据集

DataSet<Row> finalJoined;
公共void droolprocess(行){

[将while循环中的整个代码(迭代器除外)放在这里。 它将在工人身上并行执行]

[根据需要传递连接参数或在此处获得新连接]

}

如果要从每行的执行中获取返回值

  • 使用MAP而不是F
    finalJoined.forEach(row -> droolprocess(row));