Friday, May 11, 2007

Wikipedia internals

I've noticed that my post about eBay internals was very popular. So, here is another one post about big and complex system - Wikipedia. It is presentation on MySQL Conference & Expo. LAMP environment under heavy load... They are using Apache Lucene for Search. But what they say about Java:

Due to licensing issues Wikipedia did not run Java Virtual Machine from Sun Microsystems (non-free software is not matching free content ideals), so alternatives were chosen - at first GCJ-based solution, afterwards .net Lucene port was used on top of Mono .NET framework.
Interesting point of view... *nix implementation of .NET framework (sponsored by... not Microsof but Novell) with port of well-known solution is better than using "native" Java with JVM.

Another hint is :
The major components for search are:
Mono (or GCJ.. or JVM... depends on mood - we have support for all).
Depends on mood is great ! :) It is still not clear for me what actually they use. But is not a problem, I like wikipedia and using it all the time. If some solution works - let it be so.

Another question is why Lucene ? Why not Swish-e for example ? It is nice product, Apache folks using it at


