WB-Tree : Robust Disk Based B+ Tree B-Tree Tutorial

Wanna B-Tree Tutorial

by AbiusX (9 June 2011)

Introduction

Wanna B-Tree (abbreviated WB-Tree) is a disk based B+ Tree implementation with considerable speed and stability. It has been mainly developed by jaffer@mit using the SCM programming language (a free implementation of Scheme). SCM has some auxilary libraries allowing it's programs to be easily ported to other programming languages, hence WB-Tree is available in SCM, Java, C#, C and C++.

I had many difficulties using WB-Tree for test and benchmark purposes, let alone development, thus I contacted Jaffer and he helped me compile, build, install, link and benchmark the library, indeed with a few changes made to the source code. The code available at Jaffer website is not fully functional, So I've uploaded a modified version obtained from the actual source code CVS in a stable state here, For you to download and use (the license was GPL , So Yipee!)

WB-Tree was designed to be used simply as an Associative Array, The simplest data strcuture available! thus the learning curve is quite gradual. Currently the only limitation of WB-Tree is that it can only store key/value pairs of each at most 256 Bytes, Thus you can't store the actual data in the WB-Tree, but pointers to the actual data fit easily.

Download & Install

Download

To obtain a working copy of the code, either export/checkout a copy from the cvs repository at

http://savannah.gnu.org/cvs/?group=wb
or click here to download a checked out tar.gz working version from the CVS repository (350 KB).

I've only tested the build/install process on Ubuntu 11.04 ( Natty Narwhal ) but it should work fine on any other Linux flavour/distro. I strongly doubt it working on *NIX, OS X, FreeBSD and of course Windows, And I've tested a few, with no avail.

Dependencies

After extracting the archive, You're going to have a folder, namely wb, which contains all the necessary code. But before building/installing the package you need to meet a few dependencies :

  • You need SCM libraries to be able to build and install. To install them, use the following command in Ubuntu or equivalent (2MB approximately, dependencies included which are slib and scm) :
    sudo apt-get install libscm-dev
    
  • If you want the C# version library (mono), you also need to get the following dependency (also 2MB) :
    sudo apt-get install mono-mcs
    
  • To build the Java version, you're going to need Java Compiler (javac), to obtain OpenJDK in Ubuntu (53 MB) , do :
    sudo apt-get install openjdk-6-jdk
    
  • You would most likely need makeinfo which is available in package texinfo (430 KB) to install :
    sudo apt-get install texinfo
    

Building

You can build C, C++ libraries with only meeting libscm-dev dependency, without having mono or OpenJDK. You can also build C# libs without having OpenJDK (of course you're gonna need mono-mcs). To build the code, inside wb folder, type the following command (you need gcc, make and etc. which are almost always available on any UNIX flavor) :

make all

If you have met all the dependencies described above, it should end without an error. If any of them is unmet, there should be an error. Don't mind the errors if you're only building for C (or C#) and not all. To ensure that build was successful, simply check into the respective folders, c, csharp and java. There should be a libwb.a and libwb.so in c folder for C library to be successful, a Wb.dll in csharp folder and a wb.jar in java folder.

Installation

To install, simply execute the following command in the wb folder, keep in mind that you need root priviledges to be able to install :

sudo make install
Also note that you don't need to install WB-Tree into your system, Instead you can copy the respective library files (libwb.so, libwb.a, wb.jar, Wb.dll) into your project folder (indeed if you know what you're doing :) )

If you didn't meet all the dependencies mentioned above, You're going to need to run the following command as well, which allows ld(1) to be able to understand your shared libraries (libwb.so). Since running this command won't hurt, I suggest running it anyways after installation :

sudo ldconfig

Now you should have a few libraries residing in your /usr/lib folder, as well as include files in your /usr/include. Try the following commands to ensure they are intact :

ls -lh /usr/local/lib/*wb* | wc -l
# outputs 10 on my machine
ls -lh /usr/local/include/*wb* | wc -l
# outputs 17 on my machine

If you encountered any problems building/installing WB-Tree, contact me at wb-tree[at]abiusx[dot]com

Using WB-Tree

Running Benchmark Code

I'm not going to describe how static/shared libraries are used in programming languages, Since it's a whole story. Instead I've prepared a C++ example and I shall demonstrate and describe some technics to use the C shared library of WB-Tree in your C++ code (using GCC compilers, And don't ask, I don't support Microsoft compilers).

First of all, review and download the benchmark/test code I've provided for WB-Tree : View Benchmark Code

To be able to compile it, You need to issue the following command in your terminal :

g++ wb.cpp -lwb
If the compile/link process succeeds, you can run
./a.out
and see the benchmark results. Otherwise you haven't completed the steps described above succesfully. You could also try the following command if the one above failed :
g++ wb.cpp -lwb -L/usr/local/lib -L/usr/local/lib/wb -I/usr/local/include
and if this fails as well, You haven't installed libraries properly.

If you're using Eclipse (or any similar IDE), right click on the Project in "Project Explorer", click "Properties", Head to "C/C++ Build", "Settings", Under "GCC C++ Linker" then "Libraries", simply add wb. This would help Eclipse understand which libraries it should link when compiling, So undefined reference errors won't occur. Any other advanced C++ IDE would have the same functionality somewhere, Contact me for more specific details if necessary.

Describing the Code

There are a few points of importance in the benchmark code, And I'm going to describe them here :

Since libwb.so and libwb.a are actually shared and static C libraries (and not C++ libraries), We would have difficulties linking them to C++ code if we didn't explain it to the compiler. The 'extern "C"' block tells the compiler to consider the contained code, C and not C++, thus solving the issue.

extern "C"
{
#include 
}

	init_wb(75, 150, 4096);
is mandatory and its functionality is explained in the WB-Tree official PDF file. Also make_seg function file creates a WB-Tree database (a file containing a few B-Trees) and clears it, instead open_seg function opens the file for read/write.

In every SEG, we can have plenty of WB-Trees, each accessible with three functions, create_db, open_db, bt_close, which are self explanatory in the code.

bt_write creates a node on the B-Tree, but fails if it already exists. bt_put overwrites an existing node, and bt_get reads a node. Each node is identified with a unique key of at most 255 bytes, and can have a value of at most 255 bytes.

If you had any problems understanding any of the API, simply refer to the manual PDF file (available at Jaffer's WB page), or contact me. I shall edit this page whenever I feel something's missing.

Where to go from here

After studying the benchmark code thoroughly, You should be able to write codes using the WB-Tree library. Try developing a simple program, then employ it to save indexes to your own databased data, So that you can't seek and read/write with only a feed disk operations.

In case of any problems, don't forget to contact me at wb-tree[at]abiusx[dot]com

Site Footer

Sliding Sidebar