Mysql 批处理文件和与数据库的差异
目前我正在开发一个Spring Boot应用程序,它定期尝试处理一个包含用户数据的文件,其中每一行都包含Mysql 批处理文件和与数据库的差异,mysql,spring-boot,batch-processing,Mysql,Spring Boot,Batch Processing,目前我正在开发一个Spring Boot应用程序,它定期尝试处理一个包含用户数据的文件,其中每一行都包含userId和departmentid,以分隔,例如123534 | 13。该文件将包含几百万条记录 我的要求是以以下方式将此数据加载到mysql数据库: 如果存在具有已处理ID的用户,请勿执行任何操作 如果用户不存在,则创建新用户 如果用户不在列表中,但存在于数据库中,将其删除 如果数据库中不存在当前部门,则创建该部门 我做了一些优化,比如 缓存部门以填充实体 批量收集用户以保存,并通
userId
和departmentid
,以
分隔,例如123534 | 13
。该文件将包含几百万条记录
我的要求是以以下方式将此数据加载到mysql数据库:
- 如果存在具有已处理ID的用户,请勿执行任何操作
- 如果用户不存在,则创建新用户
- 如果用户不在列表中,但存在于数据库中,将其删除
- 如果数据库中不存在当前部门,则创建该部门
- 缓存部门以填充实体
- 批量收集用户以保存,并通过以下方法保存:
JpaRepository
方法saveAll
@Entity
@Table(name = "departaments")
public class Departament{
@Id
@Column(name = "id")
private Long id;
@Column(name = "name")
private String name;
以及:
有人遇到过这样的问题吗?
是否可以进一步优化?
有什么好的处理模式吗?这里有几点:
USER
表呢?您可能会遇到一些问题(我知道引用完整性不是您的场景中的问题之一,或者是吗?),但您将免费获得用户删除(TBH我不太清楚在当前设置中如何处理用户删除)。它会跑得更快saveAll
时,您是否真的看到了性能的提高?这并不限制要执行的SELECT
语句的数量INSERT IGNORE
或INSERT。。。使用重复的键更新
语句来获取所需内容CREATE TABLE new LIKE old;
LOAD DATA INFILE ... (and any other massaging)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
如果是“增量”,则
将其加载到单独的表中,然后执行适当的SQL语句来执行更新。在您的问题中,每个项目符号大约有一条SQL语句。没有循环。只需尝试使用Spring批处理即可
context.xml
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">
<!-- stored job-meta in database -->
<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.JobRepositoryFactoryBean">
<property name="dataSource" ref="dataSource" />
<property name="transactionManager" ref="transactionManager" />
<property name="databaseType" value="mysql" />
</bean>
<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
</beans>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:jdbc="http://www.springframework.org/schema/jdbc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
http://www.springframework.org/schema/jdbc
http://www.springframework.org/schema/jdbc/spring-jdbc-3.2.xsd">
<!-- connect to database -->
<bean id="dataSource"
class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<property name="driverClassName" value="com.mysql.jdbc.Driver" />
<property name="url" value="jdbc:mysql://localhost:3306/test" />
<property name="username" value="root" />
<property name="password" value="User@1234" />
</bean>
<bean id="transactionManager"
class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" />
<!-- create job-meta tables automatically -->
<jdbc:initialize-database data-source="dataSource">
<jdbc:script location="org/springframework/batch/core/schema-drop-mysql.sql" />
<jdbc:script location="org/springframework/batch/core/schema-mysql.sql" />
</jdbc:initialize-database>
</beans>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:batch="http://www.springframework.org/schema/batch"
xmlns:task="http://www.springframework.org/schema/task"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.2.xsd">
<bean id="report" class="com.om.model.Report" scope="prototype" />
<batch:job id="reportJob">
<batch:step id="step1">
<batch:tasklet>
<batch:chunk reader="cvsFileItemReader" writer="mysqlItemWriter"
commit-interval="2">
</batch:chunk>
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<!-- Read a csv file -->
<property name="resource" value="classpath:cvs/report.csv" />
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<!-- split it -->
<property name="lineTokenizer">
<bean
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
<property name="names" value="userId,departmentId" />
</bean>
</property>
<property name="fieldSetMapper">
<!-- return back to reader, rather than a mapped object. -->
<!--
<bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
-->
<!-- map to an object -->
<bean
class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
<property name="prototypeBeanName" value="report" />
</bean>
</property>
</bean>
</property>
</bean>
<bean id="mysqlItemWriter"
class="org.springframework.batch.item.database.JdbcBatchItemWriter">
<property name="dataSource" ref="dataSource" />
<property name="sql">
<value>
<![CDATA[
insert into RAW_REPORT(userId,departmentId) values (:userId, :departmentId)
]]>
</value>
</property>
<!-- It will take care matching between object property and sql name parameter -->
<property name="itemSqlParameterSourceProvider">
<bean
class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
</property>
</bean>
</beans>
Report.java这是你的Pojo
package com.om.model;
public class Report {
private String userId;
private String departmentId;
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getDepartmentId() {
return departmentId;
}
public void setDepartmentId(String departmentId) {
this.departmentId = departmentId;
}
}
现在,您需要将report.csv放入具有数百万userId和departmentId的资源文件夹中
您可以简单地查看数据库表,了解作业是如何自动执行的以及数据库条目。
请寻求任何需要的帮助
package com.om;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;
public class App {
public static void main(String[] args) throws IllegalStateException {
String[] springConfig =
{ "spring/batch/config/database.xml",
"spring/batch/config/context.xml",
"spring/batch/jobs/job-report.xml"
};
ApplicationContext context =
new ClassPathXmlApplicationContext(springConfig);
JobLauncher jobLauncher = (JobLauncher) context.getBean("jobLauncher");
Job job = (Job) context.getBean("reportJob");
try {
JobExecution execution = jobLauncher.run(job, new JobParameters());
System.out.println("Exit Status : " + execution.getStatus());
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("Done");
}
}
package com.om.model;
public class Report {
private String userId;
private String departmentId;
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getDepartmentId() {
return departmentId;
}
public void setDepartmentId(String departmentId) {
this.departmentId = departmentId;
}
}