http://java.dzone.com/articles/performance-tuning

For most typical Spring/Hibernate enterprise applications, the application performance depends almost entirely on the performance of it's persistence layer.

This post will go over how to confirm that we are in presence of a 'database-bound' application, and then walk through 7 frequently used 'quick-win' tips that can help improve application performance.

How to confirm that an application is 'database-bound'

To confirm that an application is 'database-bound', start by doing a typical run in some development environment, using VisualVM for monitoring. VisualVM is a Java profiler shipped with the JDK and launchable via the command line by calling jvisualvm.

After launching Visual VM, try the following steps:

  • double click on your running application
  • Select Sampler
  • click on Settings checkbox
  • Choose Profile only packages, and type in the following packages:
    • your.application.packages.*
    • org.hibernate.*
    • org.springframework.*
    • your.database.driver.package, for example oracle.*
    • Click Sample CPU

The CPU profiling of a typical 'database-bound' application should look something like this:

We can see that the client Java process spends 56% of it's time waiting for the database to return results over the network.

This is a good sign that the queries on the database are what's keeping the application slow. The32.7% in Hibernate reflection calls is normal and nothing much can be done about it.

First step for tuning - obtaining a baseline run

The first step to do tuning is to define a baseline run for the program. We need to identify a set of functionally valid input data that makes the program go through a typical execution similar to the production run.

The main difference is that the baseline run should run in a much shorter period of time, as a guideline an execution time of around 5 to 10 minutes is a good target.

What makes a good baseline?

A good baseline should have the following characteristics:

  • it's functionally correct
  • the input data is similar to production in it's variety
  • it completes in a short amount of time
  • optimizations in the baseline run can be extrapolated to a full run

Getting a good baseline is solving half of the problem.

What makes a bad baseline?

For example, in a batch run for processing call data records in a telecommunications system, taking the first 10 000 records could be the wrong approach.

The reason being, the first 10 000 might be mostly voice calls, but the unknown performance problem is in the processing of SMS traffic. Taking the first records of a large run would lead us to a bad baseline, from which wrong conclusions would be taken.

Collecting SQL logs and query timings

The SQL queries executed with their execution time can be collected using for example log4jdbc. See this blog post for how to collect SQL queries using log4jdbc - Spring/Hibernate improved SQL logging with log4jdbc.

The query execution time is measured from the Java client side, and it includes the network round-trip to the database. The SQL query logs look like this:

1.16 avr. 2014 11:13:48 | SQL_QUERY /* insert your.package.YourEntity */insert into YOUR_TABLE (...) values (...) {executed in 13 msec}

The prepared statements themselves are also a good source of information - they allow to easily identify frequent query types. They can be logged by following this blog post - Why and where is Hibernate doing this SQL query?

What metrics can be extracted from SQL logs

The SQL logs can give the answer these questions:

  • What are slowest queries being executed?
  • What are the most frequent queries?
  • What is the amount of time spent generating primary keys?
  • Is there some data that could benefit from caching ?

How to parse the SQL logs

Probably the only viable option for large log volumes is to use command line tools. This approach has the advantage of being very flexible.

At the expense of writing a small script or command, we can extract mostly any metric needed. Any command line tool will work as long as you are comfortable with it.

If you are used to the Unix command line, bash might be a good option. Bash can be used also in Windows workstations, using for example Cygwin, or Git that includes a bash command line.

Frequently applied Quick-Wins

The quick-wins bellow identify common performance problems in Spring/Hibernate applications, and their corresponding solutions.

Quick-win Tip 1 - Reduce primary key generation overhead

In processes that are 'insert-intensive', the choice of a primary key generation strategy can matter a lot. One common way to generate id's is to use database sequences, usually one per table to avoid contention between inserts on different tables.

The problem is that if 50 records are inserted, we want to avoid that 50 network round-trips are made to the database in order to obtain 50 id's, leaving the Java process hanging most of the time.

How does Hibernate usually handle this?

Hibernate provides new optimized ID generators that avoid this problem. Namely for sequences, aHiLo id generator is used by default. This is how the HiLo sequence generator it works:

  • call a sequence once and get 1000 (the High value)
  • calculate 50 id's like this:
    • 1000 * 50 + 0 = 50000
    • 1000 * 50 + 1 = 50001
    • ...
    • 1000 * 50 + 49 = 50049, Low value (50) reached
    • call sequence for new High value 1001 ... etc ...

So from a single sequence call, 50 keys where generated, reducing the overhead caused my inumerous network round-trips.

These new optimized key generators are on by default in Hibernate 4, and can even be turned off if needed by setting hibernate.id.new_generator_mappings to false.

Why can primary key generation still be a problem?

The problem is, if you declared the key generation strategy as AUTO, the optimized generators arestill off, and your application will end up with a huge amount of sequence calls.

In order to make sure the new optimized generators are on, make sure to use the SEQUENCEstrategy instead of AUTO:

1.@Id
2.@GeneratedValue(strategy = GenerationType.SEQUENCE, generator ="your_key_generator")
3.private Long id;

With this simple change, an improvement in the range of 10%-20% can be measured in 'insert-intensive' applications, with basically no code changes.

Quick-win Tip 2 - Use JDBC batch inserts/updates

For batch programs, JDBC drivers usually provide an optimization for reducing network round-trips named 'JDBC batch inserts/updates'. When these are used, inserts/updates are queued at the driver level before being sent to the database.

When a threshold is reached, then the whole batch of queued statements is sent to the database in one go. This prevents the driver from sending the statements one by one, which would waist multiple network round-trips.

This is the entity manager factory configuration needed to active batch inserts/updates:

1.<prop key="hibernate.jdbc.batch_size">100</prop>
2.<prop key="hibernate.order_inserts">true</prop>
3.<prop key="hibernate.order_updates">true</prop>

Setting only the JDBC batch size won't work. This is because the JDBC driver will batch the inserts only when receiving insert/updates for the exact same table.

If an insert to a new table is received, then the JDBC driver will first flush the batched statements on the previous table, before starting to batch statements on the new table.

A similar functionality is implicitly used if using Spring Batch. This optimization can easily buy you 30%to 40% to 'insert intensive' programs, without changing a single line of code.

Quick-win Tip 3 - Periodically flush and clear the Hibernate session

When adding/modifying data in the database, Hibernate keeps in the session a version of the entities already persisted, just in case they are modified again before the session is closed.

But many times we can safely discard entities once the corresponding inserts where done in the database. This releases memory in the Java client process, preventing performance problems caused by long running Hibernate sessions.

Such long-running sessions should be avoided as much as possible, but if by some reason they are needed, this is how to contain memory consumption:

1.entityManager.flush();
2.entityManager.clear();

The flush will trigger the inserts from new entities to be sent to the database. The clear releases the new entities from the session.

Quick-win Tip 4 - Reduce Hibernate dirty-checking overhead

Hibernate uses internally a mechanism to keep track of modified entities called dirty-checking. This mechanism is not based on the equals and hashcode methods of the entity classes.

Hibernate does it's most to keep the performance cost of dirty-checking to a minimum, and to dirty-check only when it needs to, but the mechanism does have a cost, which is more noticeable in tables with a large number of columns.

Before applying any optimization, the most important is to measure the cost of dirty-checking using VisualVM.

How to avoid dirty-checking?

In Spring business methods that we know are read-only, dirty-checking can be turned off like this:

1.@Transactional(readOnly=true)
2.public void someBusinessMethod() {
3.....
4.}

An alternative to avoid dirty-checking is to use the Hibernate Stateless Session, which is detailed in thedocumentation.

Quick-win Tip 5 - Search for 'bad' query plans

Check the queries in the slowest queries list to see if they have good query plans. The most usual 'bad' query plans are:

  • Full table scans: they happen when the table is being fully scanned due to usually a missing index or outdated table statistics.

  • Full cartesian joins: This means that the full cartesian product of several tables is being computed. Check for missing join conditions, or if this can be avoided by splitting a step into several.

Quick-win Tip 6 - check for wrong commit intervals

If you are doing batch processing, the commit interval can make a large difference in the performance results, as in 10 to 100 times faster.

Confirm that the commit interval is the one expected (usually around 100-1000 for Spring Batch jobs). It happens often that this parameter is not correctly configured.

Quick-win Tip 7 - Use the second-level and query caches

If some data is identified as being eligible for caching, then have a look at this blog post for how to setup the Hibernate caching: Pitfalls of the Hibernate Second-Level / Query Caches

Conclusions

To solve application performance problems, the most important action to take is to collect some metrics that allow to find what the current bottleneck is.

Without some metrics it is often not possible to guess in useful time what the correct problem cause is.

Also, many but not all of the typical performance pitfalls of a 'database-driven' application can be avoided in the first place by using the Spring Batch framework.

 

Published at DZone with permission of Aleksey Novik, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Performance Tuning of Spring/Hibernate Applications---reference的更多相关文章

  1. 老李分享: Oracle Performance Tuning Overview 翻译下

    1.2性能调优特性和工具 Effective data collection and analysis isessential for identifying and correcting perfo ...

  2. spring+hibernate常见异常集合

    spring+hibernate出错小结: (1)java.lang.NoClassDefFoundError: org/hibernate/context/CurrentSessionContext ...

  3. 老李分享: Oracle Performance Tuning Overview 翻译

    老李分享: Oracle Performance Tuning Overview 翻译   poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化测试,性能测试,测试工具开发等工 ...

  4. 基于Maven的S2SH(Struts2+Spring+Hibernate)框架搭建

    1. 前言 基于Maven的开发方式开发项目已经成为主流.Maven能很好的对项目的层次及依赖关系进行管理.方便的解决大型项目中复杂的依赖关系.S2SH(Struts2+Spring+Hibernat ...

  5. Spring+Hibernate实现动态SessionFactory切换

    场景: 1)系统有多个数据库 2)且数据库类型也不尽相同 3)现在应用根据某些条件路由到具体的数据库 4)且在spring+hibernate框架下,支持依赖注入 已有实现,spring动态数据源,但 ...

  6. 软件架构期末复习(Struts2+Spring+Hibernate)

    Struts2+Spring+Hibernate The Model-ViewController pattern in Struts2 is implemented with the followi ...

  7. Performance Tuning

    本文译自Wikipedia的Performance tuning词条,原词条中的不少链接和扩展内容非常值得一读,翻译过程中暴露了个人工程学思想和英语水平的不足,翻译后的内容也失去很多准确性和丰富性,需 ...

  8. Struts+Spring+Hibernate项目的启动线程

    在Java Web项目中,经常要在项目开始运行时启动一个线程,每隔一定的时间就运行一定的代码,比如扫描数据库的变化等等.要实现这个功能,可以现在web.xml文件中定义一个Listener,然后在这个 ...

  9. SSH面试题(struts2+Spring+hibernate)

    struts2 + Spring +hibernate Hibernate工作原理及为什么要用?   原理:   1.读取并解析配置文件   2.读取并解析映射信息,创建SessionFactory ...

随机推荐

  1. Python+django部署(一)

    之所以 写这篇文章的原因在于django环境的确轻松搭建,之前Ubuntu上安装了,的确很轻松,但是后期我才知道随便做个环境出来很容易到了后面很麻烦,污 染了系统里的python版本,导致系统pyth ...

  2. Stanford Parser学习入门(3)-标记

    以下是Stanford parser中的标记中文释义供参考. probabilistic context-free grammar(PCFG)     ROOT:要处理文本的语句 IP:简单从句 NP ...

  3. 中断服务程序(Interrupt Service Routines,ISR)注意事项

    转自ISR之不能做什么 中断是嵌入式系统中重要组成部分,很多编译器开发商都让标准c支持中断,并引入关键字_interrupt.但是: 1.ISR不能有返回值: 2.ISR不能传递参数: 3.ISR应该 ...

  4. NGINX+UWSGI部署生产的DJANGO代码

    并且NGINX不用ROOT帐户哟. 1,编译安装NGINX及UWSGI及DJANGO,不表述 2,将NGINX文件夹更改为普通用户拥有.但执行文件NGINX仍为ROOT,运行如下命令加入特殊权限标志位 ...

  5. AQuery简介:jQuery for Android

    jQuery的流行已经成为了事实,它极大地减少了执行异步任务和操作DOM所需要的代码数量.新项目AQuery想要为Android开发者提供同样的功能.为了向你展示Android Query能够够为用户 ...

  6. 【UVA 11865】 Stream My Contest (二分+MDST最小树形图)

    [题意] 你需要花费不超过cost元来搭建一个比赛网络.网络中有n台机器,编号0~n-1,其中机器0为服务器,其他机器为客户机.一共有m条可以使用的网线,其中第i条网线的发送端是机器ui,接收端是机器 ...

  7. 李洪强iOS开发Swift篇—02_变量和常量

    李洪强iOS开发Swift篇—02_变量和常量 一.语言的性能 (1)根据WWDC的展示 在进行复杂对象排序时Objective-C的性能是Python的2.8倍,Swift的性能是Python的3. ...

  8. 李洪强漫谈iOS开发[C语言-027]-自增与自减运算符

  9. AD域环境的搭建 基于Server 2008 R2

    AD(Active Directory)即活动目录,微软的基础件.微软的很多产品如:Exchange Server,Lync Server,SharePoint Server,Forefront Se ...

  10. Python类的基础入门知识

    http://www.codesky.net/article/201003/122860.html首先第一点,你会发现Python Class的定义中有一个括号,这是体现继承的地方. Java用ext ...