Mysql 批处理文件和与数据库的差异_Mysql_Spring Boot_Batch Processing

Mysql 批处理文件和与数据库的差异

mysql spring-boot

Mysql 批处理文件和与数据库的差异,mysql,spring-boot,batch-processing,Mysql,Spring Boot,Batch Processing,目前我正在开发一个Spring Boot应用程序，它定期尝试处理一个包含用户数据的文件，其中每一行都包含userId和departmentid，以分隔，例如123534 | 13。该文件将包含几百万条记录我的要求是以以下方式将此数据加载到mysql数据库：如果存在具有已处理ID的用户，请勿执行任何操作如果用户不存在，则创建新用户如果用户不在列表中，但存在于数据库中，将其删除如果数据库中不存在当前部门，则创建该部门我做了一些优化，比如缓存部门以填充实体批量收集用户以保存，并通

目前我正在开发一个Spring Boot应用程序，它定期尝试处理一个包含用户数据的文件，其中每一行都包含

userId

和

departmentid

，以

分隔，例如

123534 | 13

。该文件将包含几百万条记录

我的要求是以以下方式将此数据加载到mysql数据库：

如果存在具有已处理ID的用户，请勿执行任何操作
如果用户不存在，则创建新用户
如果用户不在列表中，但存在于数据库中，将其删除
如果数据库中不存在当前部门，则创建该部门

我做了一些优化，比如

缓存部门以填充实体
批量收集用户以保存，并通过以下方法保存：
```
JpaRepository
```
```
saveAll
```
方法

但我仍然对数据库进行了太多的数据库调用，我正在检查用户是否存在，以便为每个记录创建保存实体

我的实体相当简单：

@Entity
@Table(name = "departaments")
public class Departament{

    @Id
    @Column(name = "id")
    private Long id;

    @Column(name = "name")
    private String name;

以及：

有人遇到过这样的问题吗？

是否可以进一步优化？

有什么好的处理模式吗？这里有几点：

对于用户来说，您的主要真相来源似乎是CSV文件。为什么不干脆截断并重新创建

USER

表呢？您可能会遇到一些问题（我知道引用完整性不是您的场景中的问题之一，或者是吗？），但您将免费获得用户删除（TBH我不太清楚在当前设置中如何处理用户删除）。它会跑得更快

使用

saveAll

时，您是否真的看到了性能的提高？这并不限制要执行的

SELECT

语句的数量

你确定你在正确的抽象层次上操作吗？也许您可以使用普通的JDBC而不是JPA。对于JPA，将涉及大量的缓存/映射，从而造成巨大的开销。使用JDBC，您可以利用MySQL的

INSERT IGNORE

或

INSERT。。。使用重复的键更新

语句来获取所需内容

如果您选择上述任何一种方法，您可以尝试使用进行更多声明性处理

如果是“替换”，请执行此操作以避免任何停机：

CREATE TABLE new LIKE old;
LOAD DATA INFILE ... (and any other massaging)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;

如果是“增量”，则

将其加载到单独的表中，然后执行适当的SQL语句来执行更新。在您的问题中，每个项目符号大约有一条SQL语句。没有循环。
只需尝试使用Spring批处理即可
context.xml
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="
        http://www.springframework.org/schema/beans 
        http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">

    <!-- stored job-meta in database -->
    <bean id="jobRepository"
        class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="transactionManager" ref="transactionManager" />
        <property name="databaseType" value="mysql" />
    </bean>

    <bean id="jobLauncher"
        class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
        <property name="jobRepository" ref="jobRepository" />
    </bean>

</beans>

<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:jdbc="http://www.springframework.org/schema/jdbc" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/beans 
        http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
        http://www.springframework.org/schema/jdbc 
        http://www.springframework.org/schema/jdbc/spring-jdbc-3.2.xsd">

    <!-- connect to database -->
    <bean id="dataSource"
        class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="com.mysql.jdbc.Driver" />
        <property name="url" value="jdbc:mysql://localhost:3306/test" />
        <property name="username" value="root" />
        <property name="password" value="User@1234" />
    </bean>

    <bean id="transactionManager"
        class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />

    <!-- create job-meta tables automatically -->
    <jdbc:initialize-database data-source="dataSource">
        <jdbc:script location="org/springframework/batch/core/schema-drop-mysql.sql" />
        <jdbc:script location="org/springframework/batch/core/schema-mysql.sql" />
    </jdbc:initialize-database>

</beans>

<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:batch="http://www.springframework.org/schema/batch" 
    xmlns:task="http://www.springframework.org/schema/task"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/batch
        http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
        http://www.springframework.org/schema/beans 
        http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">

    <bean id="report" class="com.om.model.Report" scope="prototype" />

    <batch:job id="reportJob">
        <batch:step id="step1">
            <batch:tasklet>
                <batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
                    commit-interval="2">
                </batch:chunk>
            </batch:tasklet>
        </batch:step>
    </batch:job>

    <bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">

        <!-- Read a csv file -->
        <property name="resource" value="classpath:cvs/report.csv" />

        <property name="lineMapper">
            <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">

                <!-- split it -->
                <property name="lineTokenizer">
                    <bean
                        class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                        <property name="names" value="userId,departmentId" />
                    </bean>
                </property>

                <property name="fieldSetMapper">

                    <!-- return back to reader, rather than a mapped object. -->
                    <!--
                        <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
                    -->

                    <!-- map to an object -->
                    <bean
                        class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                        <property name="prototypeBeanName" value="report" />
                    </bean>

                </property>

            </bean>
        </property>

    </bean>

    <bean id="mysqlItemWriter"
        class="org.springframework.batch.item.database.JdbcBatchItemWriter">
        <property name="dataSource" ref="dataSource" />
        <property name="sql">
            <value>
            <![CDATA[        
                insert into RAW_REPORT(userId,departmentId) values (:userId, :departmentId)
            ]]>
            </value>
        </property>
        <!-- It will take care matching between object property and sql name parameter -->
        <property name="itemSqlParameterSourceProvider">
            <bean
                class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
        </property>
    </bean>

</beans>

Report.java这是你的Pojo
package com.om.model;

public class Report {

    private String userId;
    private String departmentId;
    public String getUserId() {
        return userId;
    }
    public void setUserId(String userId) {
        this.userId = userId;
    }
    public String getDepartmentId() {
        return departmentId;
    }
    public void setDepartmentId(String departmentId) {
        this.departmentId = departmentId;
    }


}

现在，您需要将report.csv放入具有数百万userId和departmentId的资源文件夹中
您可以简单地查看数据库表，了解作业是如何自动执行的以及数据库条目。
请寻求任何需要的帮助
package com.om;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

public class App {
    public static void main(String[] args) throws IllegalStateException {

        String[] springConfig  = 
            {   "spring/batch/config/database.xml", 
                "spring/batch/config/context.xml",
                "spring/batch/jobs/job-report.xml" 
            };

        ApplicationContext context = 
                new ClassPathXmlApplicationContext(springConfig);

        JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
        Job job = (Job) context.getBean("reportJob");

        try {

            JobExecution execution = jobLauncher.run(job, new JobParameters());
            System.out.println("Exit Status : " + execution.getStatus());

        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println("Done");

    }
}

package com.om.model;

public class Report {

    private String userId;
    private String departmentId;
    public String getUserId() {
        return userId;
    }
    public void setUserId(String userId) {
        this.userId = userId;
    }
    public String getDepartmentId() {
        return departmentId;
    }
    public void setDepartmentId(String departmentId) {
        this.departmentId = departmentId;
    }


}