Java 是否可以在spring批处理中跨单个文件进行分区?
我读过SpringBatch中的分区,我发现了一个演示分区的示例。该示例从CSV文件中读取人员,进行一些处理并将数据插入数据库。在本例中,分区=1个文件,因此分区器实现如下所示:Java 是否可以在spring批处理中跨单个文件进行分区?,java,spring,multithreading,spring-batch,partitioning,Java,Spring,Multithreading,Spring Batch,Partitioning,我读过SpringBatch中的分区,我发现了一个演示分区的示例。该示例从CSV文件中读取人员,进行一些处理并将数据插入数据库。在本例中,分区=1个文件,因此分区器实现如下所示: public class MultiResourcePartitioner implements Partitioner { private final Logger logger = LoggerFactory.getLogger(MultiResourcePartitioner.class); p
public class MultiResourcePartitioner implements Partitioner {
private final Logger logger = LoggerFactory.getLogger(MultiResourcePartitioner.class);
public static final String FILE_PATH = "filePath";
private static final String PARTITION_KEY = "partition";
private final Collection<Resource> resources;
public MultiResourcePartitioner(Collection<Resource> resources) {
this.resources = resources;
}
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>(gridSize);
int i = 0;
for (Resource resource : resources) {
ExecutionContext context = new ExecutionContext();
context.putString(FILE_PATH, getPath(resource)); //Depends on what logic you want to use to split
map.put(PARTITION_KEY + i++, context);
}
return map;
}
private String getPath(Resource resource) {
try {
return resource.getFile().getPath();
} catch (IOException e) {
logger.warn("Can't get file from from resource {}", resource);
throw new RuntimeException(e);
}
}
}
用于分割文件的逻辑:
public static int splitTextFiles(File bigFile, int maxRows) throws IOException {
int fileCount = 1;
try (BufferedReader reader = Files.newBufferedReader(Paths.get(bigFile.getPath()))) {
String line = null;
int lineNum = 1;
Path splitFile = Paths.get(bigFile.getParent() + "/" + fileCount + "split.txt");
BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
while ((line = reader.readLine()) != null) {
if (lineNum > maxRows) {
writer.close();
lineNum = 1;
fileCount++;
splitFile = Paths.get("split/" + fileCount + "split.txt");
writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
}
writer.append(line);
writer.newLine();
lineNum++;
}
writer.close();
}
return fileCount;
}
所以我把所有的文件都放到了一个特殊的目录中
但这不起作用,因为在上下文初始化时,文件夹/split
还不存在
更新
我已经生成了一个可行的解决方案:
public class MultiResourcePartitionerWrapper implements Partitioner {
private final MultiResourcePartitioner multiResourcePartitioner = new MultiResourcePartitioner();
private final ResourcePatternResolver resourcePatternResolver;
private final String pathPattern;
public MultiResourcePartitionerWrapper(ResourcePatternResolver resourcePatternResolver, String pathPattern) {
this.resourcePatternResolver = resourcePatternResolver;
this.pathPattern = pathPattern;
}
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
try {
Resource[] resources = resourcePatternResolver.getResources(pathPattern);
multiResourcePartitioner.setResources(resources);
return multiResourcePartitioner.partition(gridSize);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
公共类MultiResourcePartitionerRapper实现分区器{
private final MultiResourcePartitioner MultiResourcePartitioner=新的MultiResourcePartitioner();
私有最终ResourcePatternResolver ResourcePatternResolver;
私有最终字符串路径模式;
公共MultiResourcePartitionerRapper(ResourcePatternResolver ResourcePatternResolver,字符串路径模式){
this.resourcepatternsolver=resourcepatternsolver;
this.pathPattern=路径模式;
}
@凌驾
公共地图分区(int gridSize){
试一试{
Resource[]resources=resourcePatternResolver.getResources(pathPattern);
multiResourcePartitioner.setResources(资源);
返回multiResourcePartitioner.partition(gridSize);
}捕获(IOE异常){
抛出新的运行时异常(e);
}
}
}
但它看起来很丑。这是一个正确的解决方案吗?Spring batch允许您进行分区,但如何进行分区取决于您 您只需在partitioner类中拆分10TB的文件(按行数或最大行数),每个分区读取一个拆分的文件。您可以找到许多关于如何在java中拆分大型文件的示例。
你能提供一些建议吗?我必须使用FlatFileItemReader吗?或者我应该自己实现它?或者它应该是分割的附加步骤和处理的下一个步骤。请您在这里指导我:?
public static int splitTextFiles(File bigFile, int maxRows) throws IOException {
int fileCount = 1;
try (BufferedReader reader = Files.newBufferedReader(Paths.get(bigFile.getPath()))) {
String line = null;
int lineNum = 1;
Path splitFile = Paths.get(bigFile.getParent() + "/" + fileCount + "split.txt");
BufferedWriter writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
while ((line = reader.readLine()) != null) {
if (lineNum > maxRows) {
writer.close();
lineNum = 1;
fileCount++;
splitFile = Paths.get("split/" + fileCount + "split.txt");
writer = Files.newBufferedWriter(splitFile, StandardOpenOption.CREATE);
}
writer.append(line);
writer.newLine();
lineNum++;
}
writer.close();
}
return fileCount;
}
public class MultiResourcePartitionerWrapper implements Partitioner {
private final MultiResourcePartitioner multiResourcePartitioner = new MultiResourcePartitioner();
private final ResourcePatternResolver resourcePatternResolver;
private final String pathPattern;
public MultiResourcePartitionerWrapper(ResourcePatternResolver resourcePatternResolver, String pathPattern) {
this.resourcePatternResolver = resourcePatternResolver;
this.pathPattern = pathPattern;
}
@Override
public Map<String, ExecutionContext> partition(int gridSize) {
try {
Resource[] resources = resourcePatternResolver.getResources(pathPattern);
multiResourcePartitioner.setResources(resources);
return multiResourcePartitioner.partition(gridSize);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}