| Subcribe via RSS

Centos and mod_ssl dependency problems

January 23rd, 2010 | 1 Comment | Posted in Uncategorized

I recently wanted to add mod_ssl to a Centos 5 server that I administer. Simple enough. Eveything online says to just do a “yum install mod_ssl” and your done. Well I did “yum install mod_ssl” and I got the error message:

...
Finished Dependency Resolution
httpd-devel-2.2.3-31.el5.centos.2.i386 from updates has depsolving
problems
--> Missing Dependency: httpd = 2.2.3-31.el5.centos.2 is needed by
package httpd-devel-2.2.3-31.el5.centos.2.i386 (updates)
Error: Missing Dependency: httpd = 2.2.3-31.el5.centos.2 is needed by
package httpd-devel-2.2.3-31.el5.centos.2.i386 (updates)

Great! Yum was suppose to handle all my dependencies for me but it looks like it is broken. I googled around for a bit and didn’t find anything help, just other people with a similar problem. Then, I did run across a message on the Centos mailing list that solved my problem.

Basically, the server I am running is 64 bit, but for some reason there was a 32 bit version of http-devel that was causing the conflict. Remove the 32 bit httpd-devel by doing “rpm -e httpd-devel.i386″. Then “yum install mod_ssl” will work just as easily as everyone says it does.

Announcing Tweet Plus1

October 7th, 2009 | No Comments | Posted in social media

tweetplus1I recently launched a new site, Tweet Plus1, http://tweetplus1.com Tweet Plus1 is an easy way for people to acknowledge tweets that they think are of high quality. The system is simple and users don’t even have to create an account.

So what do you do? Just include #+1 in your tweet. That’s it. Then you can go to http://tweetplus1.com and see which users and topics have the most +1s. For more examples and information about Tweet Plus1, visit the How It Works page.

Tags: , ,

Adding External Datasources to Lucene Scoring

September 22nd, 2009 | No Comments | Posted in java

Here is a common scenario that a lot of websites encounter. Say you have a nice lucene index setup to handle searching on your site. For an example, let’s use an ecommerce site where all the products are stored in a lucene index. You’ve tweaked the query parameters and you think the results are fairly accurate. Now you want to make your results even better by adding a boost to products that are popular in your store. How do you do this?

The simplest, but least scalable, solution is to add a popularity field in your lucene index. Periodically you would run a job that would rank all  your products by popularity in 1 to X order then save this popularity as a field in each lucene document. Then, using lucene’s FieldCacheSource and ValueQuery you can add this popularity field as part of the query score.

public class PopularityFieldSource extends FieldCacheSource {

    @Override
    public DocValues getCachedFieldValues(FieldCache cache,
            String field, IndexReader reader) throws IOException {
        int[] popularities = cache.getInts(reader, field);

        float[] weights = new float[popularities.length];

        for (int i = 0; i < popularities.length; i++) {
            // create an inverse of the popularity value
            if (popularities[i]>0) {
	    	weights[i] = 1 + 1 / popularities[i];
	    }
	    else {
		    weights[i] = 1;
	    }
	}

        final float[] arr = weights;

        return new DocValues() {

            public float floatVal(int doc) {
                return (float) arr[doc];
            }

            public int intVal(int doc) {
                return (int) arr[doc];
            }

            public String toString(int doc) {
                return description() + '=' + intVal(doc);
            }
        };

    }
}

What happens in the line with

int[] popularities = cache.getInts(reader, field);

is that lucene will create an array of the popularity field for all the lucene documents. This is a key point, this array relates to the document order within the lucene index. If the document order changes (you add or delete documents from the index), the order of this array will change.

Now you just need to use a ValueSourceQuery to use this PopularityFieldSouce.

Query query = ... your lucene query
PopularityFieldSource fieldSource = new PopularityFieldSource("name_of_popularity_field");
ValueSourceQuery valueQuery = new ValueSourceQuery(fieldSource);
CustomScoreQuery customQuery = new CustomScoreQuery(termQuery, valueQuery);

So now you customQuery will incorporate popularity into your search results. I based this on Rob Young’s blog about Extending Lucene’s Scoring to use the document creation date to boost newer documents.

As I said earlier, this works, but it is not the most scalable solution. First, you probably keep all your analytics data in a separate database instead of in lucene. And secondly, you lucene index is changing all the time, so you cannot constantly run a job to update the popularity field in your index.

Don’t worry, there is a way to include data external to the lucene index at query time.

The first assumption is that each document in your lucene index has a field that is used as the “id” for the document. In our ecommerce example, that field would normally be the “sku” or product id. The second assumption is that we can create a map Map<String, Float> of the popularity rankings for our products. This is done outside of lucene and can just be a simple database call that ranks all your products and then stories that ranking in a map with the product sku as the map key.

Now we can change our PopularityFieldSource to use this map of rankings.

public class PopularityFieldSource extends FieldCacheSource {

    private static final String POPULAR_FIELD = "_popular";
    private static final String ID_FIELD = "sku";

    private Map values;

    public PopularityFieldSource(Map values) {
        super(POPULAR_FIELD);
        this.values = values;
    }

    @Override
    public DocValues getCachedFieldValues(FieldCache cache,
           String field, IndexReader reader) throws IOException {
        String[] skus = cache.getStrings(reader, ID_FIELD);

        float[] weights = new float[skus.length];

        if (values!=null) {

        for (int i = 0; i < skus.length; i++) {
                if (values.get(skus[i])!=null) {
                    weights[i] = values.get(skus[i]);
                }
                else {
                    weights[i] = 1;
                }
            }
        }
        else {
            Arrays.fill(weights, 1);
        }

        final float[] arr = weights;

        return new DocValues() {

            public float floatVal(int doc) {
                return (float) arr[doc];
            }

            public int intVal(int doc) {
                return (int) arr[doc];
            }

            public String toString(int doc) {
                return description() + '=' + intVal(doc);
            }
        };

    }

}

Let’s go over the few changes.

First, since the popularity field is not stored in our lucene index, we have to fiddle with the “field” name used. In this example we are hard coding our id field, in this case “sku”. We are also saying that our PopularityFieldSource will be used for field “_popular”. The “_popular” field doesn’t exist, but don’t worry, that field name is only used for debugging, so you can name it whatever you want.

When we create the PopularityFieldSource, we pass in our map of weights. The weight values are based on our popularity rankings. In this case, the weight = 1 + 1/ranking. I wanted to make the weight non-zero because I found that when the weight was zero, documents would be excluded from the search results. So this weight is just a simple way to have the weights be in the range of 2 (the hightest) to 1 (the lowest). We also make a case that if the document does not appear in the popularity rankings, it still gets a weight of 1.

As in the first version of PopularityFieldSource, the document order within the lucene index is important. So we have to find a way to relate our weightings to the particular document in the lucene index.

In the line:

String[] skus = cache.getStrings(reader, ID_FIELD);

we get an array of all the sku field for all documents in the lucene index. This array will change if we alter our lucene index, so we have to get this array at query time. But this also makes this process nice because we can continually modify both our lucene index and our product popularity rankings at different times.

So once we have an array of all our lucene documents, we loop through the sku array and pull in the corresponding weight from our values map.

for (int i = 0; i < skus.length; i++) {
   if (values.get(skus[i])!=null) {
      weights[i] = values.get(skus[i]);
   }
   else {
      weights[i] = 1;
   }
}

This loop has successfully applied our external popularity weights to each document in the lucene index. The final query is exactly the same as above:

Query query = ... your lucene query
Map<String, Float> popularityWeights = ... external process to create weights for each product
PopularityFieldSource fieldSource = new PopularityFieldSource(popularityWeights);
ValueSourceQuery valueQuery = new ValueSourceQuery(fieldSource);
CustomScoreQuery customQuery = new CustomScoreQuery(termQuery, valueQuery);

Since the ValueSourceQuery implements the base lucene Query, you can tweak the query even more by applying boosts to the valueQuery or you can adjust how you calculate your popularityWeights. And if you want to see exactly how the scores are calculated, you can use the IndexSearcher.explain(customQuery, docid) to see the full details.

There you go. Your search engine just got a little smarter and can continually adjust itself based on your website traffic. In a later post, I will tell how you can create a custom sorter so you can find the most popular items that match a query.

Jackrabbit 1.5 vs 1.6 Query Performance

September 2nd, 2009 | No Comments | Posted in java

Yes, I’m still talking about Jackrabbit query performance. But this time, I finally have something positive to say.

In our existing Jackrabbit setup, we are using version 1.5.0. I thought I would try out version 1.6 to see if it provides any query performance boosts. The short answer, yes it does.

Test Setup

My test setup is really basic. I created a simple program that would create 100 threads, each running the same query at the same time. I then measured how long it took for all 100 queries to complete. You might say this vaguely represents 100 concurrent connections, but I just intended the test to run the same query over and over. For each query type (more on that later), I ran the test program 3 separate times for Jackrabbit 1.5.0 and 3 separate times for Jackrabbit 1.6.

Query Types

Looking through our application code, I came up with some basic query types that we use. These are very general queries intended to help point out what types of queries perform better in version 1.6. All the queries tested are written in XPath.

Single Property
//element(*,my:type)[@property='value']

Two Properties
//element(*,my:type)[@property1='value1' and @property2='value2']

Like on Property
//element(*,my:type)[jcr:like(@property,'value%')]

Like on Child Property
//element(*,my:type)[jcr:like(./child/@property,'value%')]

Likes on Two Child Properties
//element(*,my:type)[jcr:like(./child/@property1,'value1%') and jcr:like(./child/@property2,'value2%')]

If Child Property Exists Or Is Not
//element(*,my:type)[not(./child/@property) or ./child/@property!='value')]

Results

Query Type v1.5 Ave v1.6 Ave % Improvement
Single Property 28.5 s 20.3 s 29 %
Two Properties 16.7 s 9.7 s 42 %
Like on Property 17.8 s 10.2 s 43 %
Like on Child Property 94.5 s 42.8 s 55 %
Like on Two Child Properties 65.3 s 34.3 s 47 %
If Child Exists Or Is Not 137.4 s 55.4 s 60 %

Summary

So what do the results show us? First, that if you want increased query performance, moving to v1.6 is something you should really consider. Second, v1.6 shows large performance gains in querying across axis.

Tags: ,

A Java Plugin Framework Wishlist

September 1st, 2009 | 2 Comments | Posted in java

Yes, there are times that I am jealous of the apparent simplicity of php driven sites. Take for instance Drupal, I really like all the functionality it has. I know there has been a ton of work and support to get Drupal where it is today, but the thing that makes me really jealous is there plugin framework. Anyone can write a plugin and distribute it to other Drupal users. This just seems so simple. And if you look at other php systems like Joomla, they all have similar plugin/module frameworks. So where is the java equivilant?

Now from a technical aspect, I understand why it is easier to write a plugin system in php. Since php is a scripting language, everything happens at runtime, there is no pre-compile stage. With java, everything has to be compiled first before deploying it. So in java, it is a little harder to add things at runtime that have not been pre-compiled with the rest of your app. There are definitely ways around this, it just is a big nightmare to handle all the different configuration settings to make it work.

I’ve looked at a few existing java projects that use plugins, Hudson, Magnolia and OpenCMS. All of them work, I just haven’t fully immersed myself enough to completely understand how each of the different systems work. With java, unlike php, you will sooner or later have to address dependency management and how to organize all the different versions of jars that get thrown into your app. This goes back to handling all configuration of all your different plugins.

So finally, to my wish list. What I would love to have is a basic framework that accepts different plugins that will ultimately build an app. An easy example is for a simple website. I would want one module for the admin section, one module for blogs, one module for message boards, one module for commets. I think you get the picture. Everything is modular, this way you can just stack together your modules to create the functionality of your site.

In my research, OSGi almost looks like what I am looking for. I say almost, because one of the requirements I have is to make it easy to use within a servlet container. With OSGi, your servlet container is actually a module itself. To me, this just adds another layer into the stack. And I most developers already know how to use a servlet container, so I don’t want to make them learn how to use OSGi.

Now to my wishlist of what I want the plugin framework to do:

1. Be able to assign a plugin to act like a servlet filter

This functionality is used to do things before or after a web request. You could use this to intercept incoming request parameters and do something with it, like set a cookie based on the refering source of the request.

2. Be able to register new url actions

I want to be able to add new pages to the site. So I would need to add all the actions associated with view a blog for instance. This would include both the page logic and the view layer (images, templates, etc).

3. Be able to assign plugins to a specific lifecycle phase

This is just like “hooks” in php.  Say I want certain logic to fire every time I save something or run some special code when the page is rendered.

So there you have it. Sounds simple right? Well, hopefully I can find something that will meet my needs, otherwise I may have to start writing my own.

Jackrabbit Query Tips: Better Where Clauses

August 27th, 2009 | No Comments | Posted in java

If you’ve paid attention, you have probably noticed that I have a love/hate relationship with Jackrabbit. Luckily this week I ran into a developer who has been successfully running a high traffic Jackrabbit site for several years. One of the major tips he gave me was to look at how I structured my queries. This was something that I toyed with a few months ago, but never put into production. So for the last few days I’ve been tweaking jackrabbit queries. Everything that I’m doing is found in the Jackrabbit mailing list, but I thought I would just summarize here for those who are interested.

Note: All my queries are in XPath. I’m sure these same ideas apply to SQL queries, I just haven’t done the conversions my self.

Use Meaningful Where Clauses

Where clauses are a must in Jackrabbit queries. The way that a Jackrabbit query works is that it first finds all entries that match the where clause, then filters those results by any path limitations. So if your where clauses are not restrictive, Jackrabbit will have to do a lot of extra work to find the desired results.

Say we have blog post data mixed in with product review data. If our content is organized using Rule 2 of Davids Model, it would look something like:

/mysite/mycontent/blogs/2009/08/27/…
/mysite/mycontent/reviews/2009/08/27/…

In this setup, our content is organized hierarchically by content type and then date.

Now we can query the content to find all blogs by doing:

/jcr:root/mysite/mycontent/blogs//*

The down side to this is that this query will actually get ALL elements, blogs and reviews, then loop through those to find which ones belong in the /mysite/mycontent/blogs path. So what you can do is add a property to your content. I use something like @contentType. In your app, you would assign values to this property like ‘blog’ or ‘review’. So all blog entries would get a property of @contentType=’blog’ and all reviews would get a @contentType=’review’. This will greatly help our query because now we can do:

/jcr:root/mysite/mycontent/blogs//*[@contentType='blog']

What happens in this query is that Jackrabbit first matches all elements with @contentType=’blog’ then it filters by the path /mysite/mycontent/blogs. Say you have 1,000 blogs and 1,000 reviews. Just by adding @contentType=’blog’, you essentially cut in half the number of nodes that Jackrabbit has to analyze during the final part of this query.

So look at your queries. Are there any other properties that you can add to the where clause? Possible a date field like start date or created date?

Move Some Path Date to Properties

The mailing list mentions that there are ways to have Jackrabbit index the full path of a node, but it isn’t an easy thing to change and it also hinders moving nodes around easily. So what I would suggest is look for parts of your path that can work as properties like we did with the @contentType above.

The system I am using hosts multiple websites within the same Jackrabbit workspace. Each site is separated into a different path.

/sites/site1
/sites/site2

One thing that we did is add the site as a property. So all nodes for site “site1″ have the property @site=’site1′. Then in our query, we are able to add that property as a where clause:

//*[@site='site1' and @contentType='blog']

Debugging Help

A great way to find what queries are running is to turn on DEBUG log for org.apache.jackrabbit.core.query.QueryImpl Everytime a query is executed, it will show the query run and how long it took to execute. By watching the logs, you can focus your attention on queries that take a long time to run.

Summary

As you can see, just by tweaking your query you can greatly improve your Jackrabbit performance. One thing that helped me a lot is I created a script that runs the same query 100 times simultaneously and records how long it took to run all 100 queries. I then continually tweak the query and re-run the script until I find a query that works best.

Tags:

High Performance Jackrabbit, Where Are You?

August 14th, 2009 | 3 Comments | Posted in java

So I’ve had a good amount of time running a high traffic content site using Apache Jackrabbit as the content store. Jackrabbit provides a nice, flexible way to store a variety of content. The one that that is lacking for me is performance.

I’ve looked around the Jackrabbit mailing list and wiki and there are a few points about how to get better performance out of Jackrabbit. Most of these center around how you structure your nodes and how to write better “optimized” queries. That is all fine and dandy, but my problem comes when Jackrabbit is put under heavy load from many concurrent connections.

With lots of concurrent queries, I noticed the site response time dropping dramatically. I tweaked the queries as much as I could, but I soon figured that I would have to get under the hood of Jackrabbit to make any gains. And just to give you the short answer, I didn’t find any answers.

First, Jackrabbit does not have a pluggable cache system. So the idea of, “maybe if I just tweak the cache” things will get better. I’ve read many postings on the mail list that search results are tied to a search session. So even if you could cache search results, you could run into problems with this session variable down the line. Well, any chance of fixing this is very hard to do unless you want to actually change the cache code within org.apache.jackrabbit. I didn’t feel like making a custom port of jackrabbit just to play with caching, so I soon backed off the caching idea.

Another thing I thought about was increasing the number of  connections accessing the Jackrabbit repository. Well, Jackrabbit isn’t able to use a connection pool. Instead, it opens a handful of persistent connections to our database (in my case, MySql). So just adding more connections is out.

I asked on the mailing list several time about how Jackrabbit handles concurrent query requests. I never got a straight answer. But, I was lucky enough to talk with 2 other people who had previously used Jackrabbit in similar projects. Through them I got the answer I didn’t want to hear. Jackrabbit isn’t actually able to handle concurrent queries well. One of the previous Jackrabbit users told me that deep within the bowels of the Jackrabbit code, there are bits of synchronized code that ultimately turn Jackrabbit into a single threaded process. So there goes your ability to handle simultaneous queries. The few answers I got from the mailing list did mention that most Jackrabbit queries actually hit the internal cache, not the database. So I don’t know if these synchronized bits of code affect this or not.

Well, maybe there is a way to have a read-only version of Jackrabbit to speed things up? Nope. As of version 1.5, this isn’t available.

So where does that leave me? I’ve had to start splitting my data between Jackrabbit and a traditional database structure fronted by Hibernate. I put all content where the schema is flexible, like articles, in Jackrabbit. For content that has a rigid schema, like comments, I put those in the traditional database.

I know that Magnolia uses Jackrabbit but I haven’t spent a good deal of time with their code. For my system, I am using Spring and Spring Modules to access Jackrabbit. Magnolia doesn’t use Spring and I thought I show a class that mentioned something about multi-threaded request. So maybe they have figured a way around the performance problems.

Until then, I will just have to keep banging on Jackrabbit in hopes that it will speed up.

Tags: ,

Using Jackrabbit to Store Velocity Templates in Spring

November 23rd, 2008 | 1 Comment | Posted in java

I love Velocity. It is simple and quick to pickup. Plus, it plays nicely into MVC design patterns. On a recent project, I wanted to use velocity templates as my views in Spring MVC. Setting that up is pretty straight forward. You just need to add a few entries to your ***-servlet.xml

<bean id="velocityConfig"
class="org.springframework.web.servlet.view.velocity.VelocityConfigurer">
<property name="resourceLoaderPath" value="/" />
</bean>

<bean id="viewResolver"
class="org.springframework.web.servlet.view.velocity.VelocityViewResolver">
<property name="cache" value="true" />
<property name="prefix" value="" />
<property name="suffix" value=".vm" />
<property name="exposeSpringMacroHelpers" value="false" />
</bean>

The velocityConfig bean initializes the velocity engine and sets any velocity specific properties. The viewResolver bean tells spring to use velocity as the view layer instead of the default jsp.

So it’s that simple to get velocity working in Spring. On my current project, I am using Apache Jackrabbit to store content. I started thinking, why not store my velocity templates in Jackrabbit also? It may be a bit of a philosophical reason on where to store your templates, but for this project I wanted to limit the amount of server access needed by the site administrators. So by storing the templates in Jackrabbit, or a database, I can allow the templates to be modified through web forms. Again, this decision is more philosophical and not really the intent of this posting.

Back to the main topic, how to store your velocity templates in Jackrabbit and then have Spring use those templates. This is actually really simple to do. First, you need to get your Spring application setup to handle Jackrabbit. I did this by following the instructions to setup the springmodules-jcr module of Spring Modules. Once you setup springmodules-jcr, you will most likely have a spring bean that will interact with Jackrabbit. For this example, I will call that bean “jcrService”.

We are going to use the ResourceLoader feature of velocity. The ResourceLoader was built to do exactly what we are doing. It allows you to override where and how your templates files are stored. Some existing ResourceLoaders include DataSourceResourceLoader, JarResourceLoader and URLResourceLoader. We need to make a JcrResourceLoader.

Here is the code for my JcrResourceLoader:

import java.io.InputStream;
import javax.jcr.Node;
import org.apache.commons.collections.ExtendedProperties;
import org.apache.velocity.exception.ResourceNotFoundException;
import org.apache.velocity.runtime.resource.Resource;
import org.apache.velocity.runtime.resource.loader.ResourceLoader;

public class JcrResourceLoader extends ResourceLoader {

private JcrService jcrService;

public JcrService getJcrService() {
return jcrService;
}

public void setJcrService(JcrService jcrService) {
this.jcrService = jcrService;
}

@Override
public InputStream getResourceStream(String name)
throws ResourceNotFoundException {

try {
InputStream ins = null;
Node node = jcrService.getNode(name);
Node content = node.getNode("jcr:content");
if (content.hasProperty("jcr:data")) {
ins = content.getProperty("jcr:data").getStream();
}
return ins;

}
catch (Exception e) {
log.error("could not load template for path: " + name);
return null;
}
}

}

Now we need to tell our velocity configuration about this resource loader. In our ***-servlet.xml file, we need to create a bean for our resource loader and to change our velocity config parameters.

<bean id="velocityConfig"
class="org.springframework.web.servlet.view.velocity.VelocityConfigurer">
<property name="resourceLoaderPath" value="/" />
<property name="velocityPropertiesMap">
<map>
<entry key="resource.loader" value="jcr" />
<entry key="jcr.resource.loader.instance" value-ref="jcrResourceLoader" />
</map>
</property>

</bean>

<bean id="jcrResourceLoader" class="JcrResourceLoader" >
<property name="jcrService" ref="jcrService"/>
</bean>

And that’s it. Now when your spring controller goes tries to load the velocity template, it will use the JcrResourceLoader to lookup and load the velocity template. This code is just a first prototype and will need to be cleaned up for error checking and performance.

Tags: , ,

Why I WANT to use Apache Jackrabbit

November 13th, 2008 | No Comments | Posted in java

There are times when I come across a new technology and just start to get giddy because it seems so cool. I want to use it right then and there. But then the rational part of my brain kicks in and makes me start to evaluate the situation and determine if this new spiffy technology is even right for my current project. Sometimes it’s a fit, sometimes it’s not.

For my latest project, a new CMS, I have decided to try out Apache Jackrabbit. The title of this post mentions why I WANT to use Jackrabbit. I’ve just started the new CMS, so I haven’t found all the warts and gotcha points of Jackrabbit, but I will try to tell you what excites me about Jackrabbit.

Versioning
Jackrabbit has built-in versioning history. This is great. If I want to keep a history of changes, this is ready for me to use.

Clustering
The CMS I am building is for a high traffic site with a load balanced configuration. Jackrabbit is suppose to be able to cluster. The way I plan to do this is have a staging server as the master. Once all changes have been made, the staging server will push the changes out to the live web servers. I am pretty sure this is possible and if so, this should work nicely.

Momentum
For ages relational databases have been the standard. But more recently, people are finding that those types of databases might not be the best solution for all situations. That is why JCR 170 was created as a content repository spec for java. JCR 170 is meant to store all types of content (text and binary) not just simple text. So you can use it to store files, snippets and even images. JCR 170 has been adopted by projects like Alfresco and Magnolia. I feel that the momentum is gaining in this direction and soon most larger projects will start to take the content repository approach.

So there you have it. I’ve just begun my Jackrabbit life. I’m crossing my fingers but I feel confident that Jackrabbit will be a major help to me in my CMS project.

Evaluating Open Source Options

November 11th, 2008 | 2 Comments | Posted in evangalism, java

The great thing about open source is that there are usually a ton of open source projects to help you solve a problem. The bad thing about open source is that there are usually a ton of open source projects to help you solve a problem. Part of a good open source developer is the mentality to test and evaluate different projects and then choose the best option for your current problem.

I do a lot of development in java. For every type of framework or library you would need, there are a ton of java options to go through. In this post I will outline my thought process on how I evaluate which java product to use in development.

The first thing I do is find what options are available. This is just a bunch of googling and reading message boards to see what projects names are most popular. I also use Open Source Software in Java. This is a great website that categorizes and summarizes the major java open source projects. Once I am familiar with the major names, I go to the project website and start to work through the introductory documentation and “Hello World” examples. At this stage, I physically don’t write the “Hello World” code, instead I just read over the documentation and get a grasp of things work. I know many people will start doing prototypes and such to evaluate options, but I just prefer to do more reading.

So now that I know my options and roughly how they work, how do I choose? There is no set formula, but here are some of the criteria that I use:

Documentation
This goes along with my research phase. Projects with good documentation are much easier to work with. If the project leaders have not written a lot of documentation, then they will most likely not be very forthcoming with support requests. Sometimes programmers put the documentation inside the code itself and solely rely on things like javadocs. This is nice that they wrote something, but digging through javadocs can be a pain because it is often hard to piece together how the classes interact.

Community Activity
The project community is another great place to get support. I look for projects that have active communities with participants outside the sponsoring company. If the community is just full of employees from the sponsoring company, I worry that the project has not been picked up by a lot of people. But a vibrant community is a great thing because people outside the company start to take ownership of the project.

Project Lifetime & Release Cycle
Look at how long the project has been around and how often it releases code. A project that has been around a few years with regular releases is probably solid. A project that hasn’t released in five years is probably dead. I don’t mind using really new projects, you just have to be careful with them. For new projects, I put a more emphasis on the project momentum.

Project Momentum
Project momentum is probably the most important factor for me. I want to use an open source project that will be active throughout the life cycle of my development project. Otherwise you start to have the situation of supporting legacy code that no-one is familiar with. Momentum is tricky and you have to take into account the overall open source atmosphere. What trends is the industry following? What types of technologies are now hot?

A good example of this is Struts. Struts became the standard for java MVC programming a long time ago. There are a ton of projects and programmers that use Struts. But Struts is now on the way out. People have figured out ways to design frameworks that extend the functionality of struts. Would I choose Struts for a new project? No. I feel that a safer bet would be to use a newer, more up and coming framework like Spring MVC.

So evaluting your open source options is not a fixed science and involves some fuzzy math. In the end, choose a library that you are comfortable with and you think will have a longevity in the industry.