文章/答案/技术大牛

发布

社区首页 >问答首页 >JPA运行速度极慢的批处理作业

问JPA运行速度极慢的批处理作业
EN

Stack Overflow用户

提问于 2019-01-04 21:35:22

回答 1查看 387关注 0票数 0

我在使用JPA (使用OpenJPA)编写的批处理作业中遇到了性能问题，它作为普通的Java Application运行。我正在尝试插入大量的对象列表，比如超过1000万条记录。我知道这个设计是不正确的。但是我会突然得到这么多的数据，并且没有办法拆分整个工作。

我已经将列表拆分为每个大小为100,000的子列表。并且我为每个子列表调用JPA事务方法。在每个这样的事务中，当列表达到2000时，我会刷新列表。据我所知，对于一百万条记录，它会发出100个事务性调用。

一旦作业开始，我可以看到在大约15-20分钟内插入了600万条记录，平均每30万条记录只需要一分钟的时间。但是在达到600-650万之后，作业的运行速度非常慢，比如4-6分钟就有10000，感觉就像停止了一样。但它会继续运行，也不会出现堆内存不足的情况。

谁能说出我的代码中有什么错误。我尝试了不同的块大小(25K，50K，100K)作为子列表。我不知道是什么导致了工作中途的缓慢。我应该在每次事务后清除EM吗?我也增加了连接池的大小。

下面是我的代码：

    @Stateless()
    @LocalBean
    @TransactionAttribute(TransactionAttributeType.NEVER)
    public class BatchService{

    @EJB 
    private PersonService personService;

    public void run(List<Person> personList) {
            int totalEventSize = personList.size();
            int quotient = totalEventSize / 100000;
            int modulo = totalEventSize % 100000;
            int totalIterations = quotient + (modulo != 0 ? 1 : 0);
            int startCount = 0;
            int endCount = 0;
            for (int i = 1; i <= totalIterations; i++) {
                if (i == totalIterations) {
                    endCount = totalEventSize;
                } else {
                    endCount = startCount + 100000;
                }
                List<Person> subList = personList.subList(startCount, endCount);
                personService.create(subList);
                startCount = endCount;
            }

        }

    }

    @Stateless
    @LocalBean
    public class PersonService implements Serializable {

    @EJB
    private PersonDLService personDLService;

    public void create(List<Person> list) {
            try {
                personDLService.createPerson(list);
            } catch (RuntimeException e) {
                e.printStackTrace();
            }
    }
}


    @Stateless
    @LocalBean
    @TransactionAttribute(TransactionAttributeType.MANDATORY)
    public class PersonDLService implements Serializable {
        private static final long serialVersionUID = 1L;

        @PersistenceContext(unitName = Constants.PERSISTENCE_UNIT_NAME)
        private transient EntityManager entityManager;

        public void createPerson(List<Person> personObj) {
            for (int i = 0; i < personObj.size(); i++) {
                entityManager.persist(personObj.get(i));
                if (i % 2000 == 0) {
                    entityManager.flush();
                    entityManager.clear();
                }
            }
            System.out.println("***************** COMMITED ****************" + personObj.size());
        }

    }

transactions

batch-processing

java

performance

jpa

回答 1

Stack Overflow用户

发布于 2019-02-04 04:43:47

在批处理作业中使用JPA时，数据过多通常是一个问题。要插入一千万行是很多的。

首先，当有超过100000行时，我会使用更好的api :批处理jdbc。

例如，使用批处理jdbc：

@Stateless
@LocalBean
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public class PersonService implements Serializable {
private static final long serialVersionUID = 1L;

@PersistenceContext(unitName = Constants.PERSISTENCE_UNIT_NAME)
private transient EntityManager entityManager;

public void doIt() {
    // get a jdbc connection from the entityManager (unwrap(Connection.class) is openjpa specific)
    // or you may as well get a jdbc connection from a jdbc DataSource
    try (Connection connection = entityManager.unwrap(Connection.class)) {
        // if Postgresql or Oracle DB, you may need to add a nextval for a sequence in the sql
        String sql = "insert into person (name) values (?)";
        try (PreparedStatement statement = connection.prepareStatement(sql)) {
            int i = 0;
            for (Person person : personList) {
                i++;
                statement.setString(1, person.getName());
                statement.addBatch();
                if (i == 1000) {
                    statement.executeBatch();
                    i = 0;
                }
            }
            if (i > 0) {
                statement.executeBatch();
            }
        }
    }
}
}

如果这还不够，您可以尝试每百万行添加connection.commit()。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54039993

复制

相似问题

问JPA运行速度极慢的批处理作业
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问JPA运行速度极慢的批处理作业EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问JPA运行速度极慢的批处理作业
EN