It seems to me that the order of execution of the reader, the processor and the writer, inside a step and a chunk, is not always clear in everybody’s mind (included mine when i started with Spring Batch).
So I made the following 2 tables that should help – I hope – clarify things.
1) First scenario : we read, process and write all data at once. There is no commit interval defined.
Here is an excerpt of a job with one step. The reader reads the data from a datasource such as a database.
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns:b="http://www.springframework.org/schema/batch" xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd"> ... <b:job id="doSomething" incrementer="idIncrementer" > <b:step id="mainStep" > <b:tasklet> <b:chunk reader="doSomethingReader" processor="doSomethingProcessor" writer="doSomethingWriter" chunk-completion-policy="defaultResultCompletionPolicy" /> </b:tasklet> </b:step> </b:job> </beans>
If the total number of lines (items) returned from the database is 6, then here is how Spring Batch will process each item :
Order of execution with defaultResultCompletionPolicy
Execution order | Reader | Processor | Writer | Transactions |
---|---|---|---|---|
1 | 1st item | T1 | ||
2 | 2nd item | |||
3 | 3rd item | |||
4 | 4th item | |||
5 | 5th item | |||
6 | 6th item | |||
7 | 1st item | |||
8 | 2nd item | |||
9 | 3rd item | |||
10 | 4th item | |||
11 | 5th item | |||
12 | 6th item | |||
13 | The 6 items at the same time |
In that configuration, there is a single transaction. The items are all written at once.
2) Second scenario : we define a size of 4 items for each chunk. So that means there will be a commit every 4 items.
<?xml version="1.0" encoding="UTF-8"?> <beans xmlns:b="http://www.springframework.org/schema/batch" xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.springframework.org/schema/batch http://www.springframework.org/schema/batch/spring-batch-2.1.xsd http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd"> ... <b:job id="doSomething" incrementer="idIncrementer" > <b:step id="mainStep" > <b:tasklet> <b:chunk reader="doSomethingReader" processor="doSomethingProcessor" writer="doSomethingWriter" commit-interval="4" /> </b:tasklet> </b:step> </b:job> </beans>
Then chunk processing will occur in that order (supposing there are 6 items) :
Order of execution with chunk processing
Execution order | Reader | Processor | Writer | Transactions |
---|---|---|---|---|
1 | 1st item | T1 | ||
2 | 2nd item | |||
3 | 3rd item | |||
4 | 4th item | |||
5 | 1st item | |||
6 | 2nd item | |||
7 | 3rd item | |||
8 | 4th item | |||
9 | The first 4 lines, at the same time | |||
10 | 5th item | T2 | ||
11 | 6th item | |||
12 | 5th item | |||
13 | 6th item | |||
14 | The last 2 lines, at the same time |
That means if there is a problem with the 6th item, the first 4 items (1st chunk) will already have been processed and committed.
A rollback will occur for all items of the 2nd chunk (items 5 and 6).