| Subcribe via RSS

Jackrabbit 1.5 vs 1.6 Query Performance

September 2nd, 2009 | No Comments | Posted in java

Yes, I’m still talking about Jackrabbit query performance. But this time, I finally have something positive to say.

In our existing Jackrabbit setup, we are using version 1.5.0. I thought I would try out version 1.6 to see if it provides any query performance boosts. The short answer, yes it does.

Test Setup

My test setup is really basic. I created a simple program that would create 100 threads, each running the same query at the same time. I then measured how long it took for all 100 queries to complete. You might say this vaguely represents 100 concurrent connections, but I just intended the test to run the same query over and over. For each query type (more on that later), I ran the test program 3 separate times for Jackrabbit 1.5.0 and 3 separate times for Jackrabbit 1.6.

Query Types

Looking through our application code, I came up with some basic query types that we use. These are very general queries intended to help point out what types of queries perform better in version 1.6. All the queries tested are written in XPath.

Single Property
//element(*,my:type)[@property='value']

Two Properties
//element(*,my:type)[@property1='value1' and @property2='value2']

Like on Property
//element(*,my:type)[jcr:like(@property,'value%')]

Like on Child Property
//element(*,my:type)[jcr:like(./child/@property,'value%')]

Likes on Two Child Properties
//element(*,my:type)[jcr:like(./child/@property1,'value1%') and jcr:like(./child/@property2,'value2%')]

If Child Property Exists Or Is Not
//element(*,my:type)[not(./child/@property) or ./child/@property!='value')]

Results

Query Type v1.5 Ave v1.6 Ave % Improvement
Single Property 28.5 s 20.3 s 29 %
Two Properties 16.7 s 9.7 s 42 %
Like on Property 17.8 s 10.2 s 43 %
Like on Child Property 94.5 s 42.8 s 55 %
Like on Two Child Properties 65.3 s 34.3 s 47 %
If Child Exists Or Is Not 137.4 s 55.4 s 60 %

Summary

So what do the results show us? First, that if you want increased query performance, moving to v1.6 is something you should really consider. Second, v1.6 shows large performance gains in querying across axis.

Tags: ,

Jackrabbit Query Tips: Better Where Clauses

August 27th, 2009 | No Comments | Posted in java

If you’ve paid attention, you have probably noticed that I have a love/hate relationship with Jackrabbit. Luckily this week I ran into a developer who has been successfully running a high traffic Jackrabbit site for several years. One of the major tips he gave me was to look at how I structured my queries. This was something that I toyed with a few months ago, but never put into production. So for the last few days I’ve been tweaking jackrabbit queries. Everything that I’m doing is found in the Jackrabbit mailing list, but I thought I would just summarize here for those who are interested.

Note: All my queries are in XPath. I’m sure these same ideas apply to SQL queries, I just haven’t done the conversions my self.

Use Meaningful Where Clauses

Where clauses are a must in Jackrabbit queries. The way that a Jackrabbit query works is that it first finds all entries that match the where clause, then filters those results by any path limitations. So if your where clauses are not restrictive, Jackrabbit will have to do a lot of extra work to find the desired results.

Say we have blog post data mixed in with product review data. If our content is organized using Rule 2 of Davids Model, it would look something like:

/mysite/mycontent/blogs/2009/08/27/…
/mysite/mycontent/reviews/2009/08/27/…

In this setup, our content is organized hierarchically by content type and then date.

Now we can query the content to find all blogs by doing:

/jcr:root/mysite/mycontent/blogs//*

The down side to this is that this query will actually get ALL elements, blogs and reviews, then loop through those to find which ones belong in the /mysite/mycontent/blogs path. So what you can do is add a property to your content. I use something like @contentType. In your app, you would assign values to this property like ‘blog’ or ‘review’. So all blog entries would get a property of @contentType=’blog’ and all reviews would get a @contentType=’review’. This will greatly help our query because now we can do:

/jcr:root/mysite/mycontent/blogs//*[@contentType='blog']

What happens in this query is that Jackrabbit first matches all elements with @contentType=’blog’ then it filters by the path /mysite/mycontent/blogs. Say you have 1,000 blogs and 1,000 reviews. Just by adding @contentType=’blog’, you essentially cut in half the number of nodes that Jackrabbit has to analyze during the final part of this query.

So look at your queries. Are there any other properties that you can add to the where clause? Possible a date field like start date or created date?

Move Some Path Date to Properties

The mailing list mentions that there are ways to have Jackrabbit index the full path of a node, but it isn’t an easy thing to change and it also hinders moving nodes around easily. So what I would suggest is look for parts of your path that can work as properties like we did with the @contentType above.

The system I am using hosts multiple websites within the same Jackrabbit workspace. Each site is separated into a different path.

/sites/site1
/sites/site2

One thing that we did is add the site as a property. So all nodes for site “site1″ have the property @site=’site1′. Then in our query, we are able to add that property as a where clause:

//*[@site='site1' and @contentType='blog']

Debugging Help

A great way to find what queries are running is to turn on DEBUG log for org.apache.jackrabbit.core.query.QueryImpl Everytime a query is executed, it will show the query run and how long it took to execute. By watching the logs, you can focus your attention on queries that take a long time to run.

Summary

As you can see, just by tweaking your query you can greatly improve your Jackrabbit performance. One thing that helped me a lot is I created a script that runs the same query 100 times simultaneously and records how long it took to run all 100 queries. I then continually tweak the query and re-run the script until I find a query that works best.

Tags:

High Performance Jackrabbit, Where Are You?

August 14th, 2009 | 3 Comments | Posted in java

So I’ve had a good amount of time running a high traffic content site using Apache Jackrabbit as the content store. Jackrabbit provides a nice, flexible way to store a variety of content. The one that that is lacking for me is performance.

I’ve looked around the Jackrabbit mailing list and wiki and there are a few points about how to get better performance out of Jackrabbit. Most of these center around how you structure your nodes and how to write better “optimized” queries. That is all fine and dandy, but my problem comes when Jackrabbit is put under heavy load from many concurrent connections.

With lots of concurrent queries, I noticed the site response time dropping dramatically. I tweaked the queries as much as I could, but I soon figured that I would have to get under the hood of Jackrabbit to make any gains. And just to give you the short answer, I didn’t find any answers.

First, Jackrabbit does not have a pluggable cache system. So the idea of, “maybe if I just tweak the cache” things will get better. I’ve read many postings on the mail list that search results are tied to a search session. So even if you could cache search results, you could run into problems with this session variable down the line. Well, any chance of fixing this is very hard to do unless you want to actually change the cache code within org.apache.jackrabbit. I didn’t feel like making a custom port of jackrabbit just to play with caching, so I soon backed off the caching idea.

Another thing I thought about was increasing the number of  connections accessing the Jackrabbit repository. Well, Jackrabbit isn’t able to use a connection pool. Instead, it opens a handful of persistent connections to our database (in my case, MySql). So just adding more connections is out.

I asked on the mailing list several time about how Jackrabbit handles concurrent query requests. I never got a straight answer. But, I was lucky enough to talk with 2 other people who had previously used Jackrabbit in similar projects. Through them I got the answer I didn’t want to hear. Jackrabbit isn’t actually able to handle concurrent queries well. One of the previous Jackrabbit users told me that deep within the bowels of the Jackrabbit code, there are bits of synchronized code that ultimately turn Jackrabbit into a single threaded process. So there goes your ability to handle simultaneous queries. The few answers I got from the mailing list did mention that most Jackrabbit queries actually hit the internal cache, not the database. So I don’t know if these synchronized bits of code affect this or not.

Well, maybe there is a way to have a read-only version of Jackrabbit to speed things up? Nope. As of version 1.5, this isn’t available.

So where does that leave me? I’ve had to start splitting my data between Jackrabbit and a traditional database structure fronted by Hibernate. I put all content where the schema is flexible, like articles, in Jackrabbit. For content that has a rigid schema, like comments, I put those in the traditional database.

I know that Magnolia uses Jackrabbit but I haven’t spent a good deal of time with their code. For my system, I am using Spring and Spring Modules to access Jackrabbit. Magnolia doesn’t use Spring and I thought I show a class that mentioned something about multi-threaded request. So maybe they have figured a way around the performance problems.

Until then, I will just have to keep banging on Jackrabbit in hopes that it will speed up.

Tags: ,

Using Jackrabbit to Store Velocity Templates in Spring

November 23rd, 2008 | 1 Comment | Posted in java

I love Velocity. It is simple and quick to pickup. Plus, it plays nicely into MVC design patterns. On a recent project, I wanted to use velocity templates as my views in Spring MVC. Setting that up is pretty straight forward. You just need to add a few entries to your ***-servlet.xml

<bean id="velocityConfig"
class="org.springframework.web.servlet.view.velocity.VelocityConfigurer">
<property name="resourceLoaderPath" value="/" />
</bean>

<bean id="viewResolver"
class="org.springframework.web.servlet.view.velocity.VelocityViewResolver">
<property name="cache" value="true" />
<property name="prefix" value="" />
<property name="suffix" value=".vm" />
<property name="exposeSpringMacroHelpers" value="false" />
</bean>

The velocityConfig bean initializes the velocity engine and sets any velocity specific properties. The viewResolver bean tells spring to use velocity as the view layer instead of the default jsp.

So it’s that simple to get velocity working in Spring. On my current project, I am using Apache Jackrabbit to store content. I started thinking, why not store my velocity templates in Jackrabbit also? It may be a bit of a philosophical reason on where to store your templates, but for this project I wanted to limit the amount of server access needed by the site administrators. So by storing the templates in Jackrabbit, or a database, I can allow the templates to be modified through web forms. Again, this decision is more philosophical and not really the intent of this posting.

Back to the main topic, how to store your velocity templates in Jackrabbit and then have Spring use those templates. This is actually really simple to do. First, you need to get your Spring application setup to handle Jackrabbit. I did this by following the instructions to setup the springmodules-jcr module of Spring Modules. Once you setup springmodules-jcr, you will most likely have a spring bean that will interact with Jackrabbit. For this example, I will call that bean “jcrService”.

We are going to use the ResourceLoader feature of velocity. The ResourceLoader was built to do exactly what we are doing. It allows you to override where and how your templates files are stored. Some existing ResourceLoaders include DataSourceResourceLoader, JarResourceLoader and URLResourceLoader. We need to make a JcrResourceLoader.

Here is the code for my JcrResourceLoader:

import java.io.InputStream;
import javax.jcr.Node;
import org.apache.commons.collections.ExtendedProperties;
import org.apache.velocity.exception.ResourceNotFoundException;
import org.apache.velocity.runtime.resource.Resource;
import org.apache.velocity.runtime.resource.loader.ResourceLoader;

public class JcrResourceLoader extends ResourceLoader {

private JcrService jcrService;

public JcrService getJcrService() {
return jcrService;
}

public void setJcrService(JcrService jcrService) {
this.jcrService = jcrService;
}

@Override
public InputStream getResourceStream(String name)
throws ResourceNotFoundException {

try {
InputStream ins = null;
Node node = jcrService.getNode(name);
Node content = node.getNode("jcr:content");
if (content.hasProperty("jcr:data")) {
ins = content.getProperty("jcr:data").getStream();
}
return ins;

}
catch (Exception e) {
log.error("could not load template for path: " + name);
return null;
}
}

}

Now we need to tell our velocity configuration about this resource loader. In our ***-servlet.xml file, we need to create a bean for our resource loader and to change our velocity config parameters.

<bean id="velocityConfig"
class="org.springframework.web.servlet.view.velocity.VelocityConfigurer">
<property name="resourceLoaderPath" value="/" />
<property name="velocityPropertiesMap">
<map>
<entry key="resource.loader" value="jcr" />
<entry key="jcr.resource.loader.instance" value-ref="jcrResourceLoader" />
</map>
</property>

</bean>

<bean id="jcrResourceLoader" class="JcrResourceLoader" >
<property name="jcrService" ref="jcrService"/>
</bean>

And that’s it. Now when your spring controller goes tries to load the velocity template, it will use the JcrResourceLoader to lookup and load the velocity template. This code is just a first prototype and will need to be cleaned up for error checking and performance.

Tags: , ,