| Subcribe via RSS

Jackrabbit 1.5 vs 1.6 Query Performance

September 2nd, 2009 | No Comments | Posted in java

Yes, I’m still talking about Jackrabbit query performance. But this time, I finally have something positive to say.

In our existing Jackrabbit setup, we are using version 1.5.0. I thought I would try out version 1.6 to see if it provides any query performance boosts. The short answer, yes it does.

Test Setup

My test setup is really basic. I created a simple program that would create 100 threads, each running the same query at the same time. I then measured how long it took for all 100 queries to complete. You might say this vaguely represents 100 concurrent connections, but I just intended the test to run the same query over and over. For each query type (more on that later), I ran the test program 3 separate times for Jackrabbit 1.5.0 and 3 separate times for Jackrabbit 1.6.

Query Types

Looking through our application code, I came up with some basic query types that we use. These are very general queries intended to help point out what types of queries perform better in version 1.6. All the queries tested are written in XPath.

Single Property
//element(*,my:type)[@property='value']

Two Properties
//element(*,my:type)[@property1='value1' and @property2='value2']

Like on Property
//element(*,my:type)[jcr:like(@property,'value%')]

Like on Child Property
//element(*,my:type)[jcr:like(./child/@property,'value%')]

Likes on Two Child Properties
//element(*,my:type)[jcr:like(./child/@property1,'value1%') and jcr:like(./child/@property2,'value2%')]

If Child Property Exists Or Is Not
//element(*,my:type)[not(./child/@property) or ./child/@property!='value')]

Results

Query Type v1.5 Ave v1.6 Ave % Improvement
Single Property 28.5 s 20.3 s 29 %
Two Properties 16.7 s 9.7 s 42 %
Like on Property 17.8 s 10.2 s 43 %
Like on Child Property 94.5 s 42.8 s 55 %
Like on Two Child Properties 65.3 s 34.3 s 47 %
If Child Exists Or Is Not 137.4 s 55.4 s 60 %

Summary

So what do the results show us? First, that if you want increased query performance, moving to v1.6 is something you should really consider. Second, v1.6 shows large performance gains in querying across axis.

Tags: ,

High Performance Jackrabbit, Where Are You?

August 14th, 2009 | 4 Comments | Posted in java

So I’ve had a good amount of time running a high traffic content site using Apache Jackrabbit as the content store. Jackrabbit provides a nice, flexible way to store a variety of content. The one that that is lacking for me is performance.

I’ve looked around the Jackrabbit mailing list and wiki and there are a few points about how to get better performance out of Jackrabbit. Most of these center around how you structure your nodes and how to write better “optimized” queries. That is all fine and dandy, but my problem comes when Jackrabbit is put under heavy load from many concurrent connections.

With lots of concurrent queries, I noticed the site response time dropping dramatically. I tweaked the queries as much as I could, but I soon figured that I would have to get under the hood of Jackrabbit to make any gains. And just to give you the short answer, I didn’t find any answers.

First, Jackrabbit does not have a pluggable cache system. So the idea of, “maybe if I just tweak the cache” things will get better. I’ve read many postings on the mail list that search results are tied to a search session. So even if you could cache search results, you could run into problems with this session variable down the line. Well, any chance of fixing this is very hard to do unless you want to actually change the cache code within org.apache.jackrabbit. I didn’t feel like making a custom port of jackrabbit just to play with caching, so I soon backed off the caching idea.

Another thing I thought about was increasing the number of  connections accessing the Jackrabbit repository. Well, Jackrabbit isn’t able to use a connection pool. Instead, it opens a handful of persistent connections to our database (in my case, MySql). So just adding more connections is out.

I asked on the mailing list several time about how Jackrabbit handles concurrent query requests. I never got a straight answer. But, I was lucky enough to talk with 2 other people who had previously used Jackrabbit in similar projects. Through them I got the answer I didn’t want to hear. Jackrabbit isn’t actually able to handle concurrent queries well. One of the previous Jackrabbit users told me that deep within the bowels of the Jackrabbit code, there are bits of synchronized code that ultimately turn Jackrabbit into a single threaded process. So there goes your ability to handle simultaneous queries. The few answers I got from the mailing list did mention that most Jackrabbit queries actually hit the internal cache, not the database. So I don’t know if these synchronized bits of code affect this or not.

Well, maybe there is a way to have a read-only version of Jackrabbit to speed things up? Nope. As of version 1.5, this isn’t available.

So where does that leave me? I’ve had to start splitting my data between Jackrabbit and a traditional database structure fronted by Hibernate. I put all content where the schema is flexible, like articles, in Jackrabbit. For content that has a rigid schema, like comments, I put those in the traditional database.

I know that Magnolia uses Jackrabbit but I haven’t spent a good deal of time with their code. For my system, I am using Spring and Spring Modules to access Jackrabbit. Magnolia doesn’t use Spring and I thought I show a class that mentioned something about multi-threaded request. So maybe they have figured a way around the performance problems.

Until then, I will just have to keep banging on Jackrabbit in hopes that it will speed up.

Tags: ,