BTree: Complete-ish!

Feb 9, 2018 15:51 · 1089 words · 6 minutes read Edible

I started work on the BTree implementation for Edible about 21 days ago. 27 commits later the BTree is mostly complete and I’ve learned a ton along the way!

Lua was definitely the wrong language

And that’s not the worst thing in the world. One of the first things I implemented was Page to hold contents with a bound on upper size. SQLite uses a similar construction to keep nodes under a certain size. One of the things you need to know in order to maintain the page size invariant is how big exactly the contents of the page are. In something like C (or any language with manual memory management) it’s very apparent how big each structure is in memory. This provides an easy way to maintain the size of contents of a page and make sure the page never contains more stuff than it should hold.

In Lua there’s no direct way to tell how big something is. The virtual machine covers this up. One option is to write an extension that lets me get the size of objects in memory. Another is to pull in a dependency that can give the size of objects. Neither is an appealing solution right now.

In order to get around this for the time being, I wrote a small function (probably mostly wrong) to give the size of various data items:

    local function get_size(data)
    -- Return the size of the data based on type.  This makes it
    -- easier to enforce page size rules
    -- ASSUMPTION: We are operating on the standard Lua with 64 bit
    -- integers and 8 byte per code point strings

    if type(data) == "number" then
        return 8
    elseif type(data) == "string" then
        return #data

This allows getting data sizes and maintaining the page size invariant (logically) but probably won’t yield the correct performance characteristics based on the actual machine that Lua’s running on. Had I chosen a lower level language, I could have avoided this.

BTrees are self balancing by design (sort of)

I had anticipated having to build something special to rebalance the BTree on each insert (sort of like the rotations from AVL trees). Fortunately this isn’t necessary at all. BTrees are self balancing by design because nodes get added from the root level instead of new nodes being added in the leaves. I could write an entire post about this but the best example I have is this test case:

    it("Should split nodes fter the page size is exceeded", function()

        -- With this page size, the root is set up in such
        -- a way as to allow two elements per page
        local tree = BTree:new(24)

        tree:insert(Row:new(1, {Cell:new(1)}))
        tree:insert(Row:new(2, {Cell:new(2)}))

        -- At this point the tree looks like this
        --          *
        --         / \
        --        1   2
        assert.equal([1].page.elements[1]:id(), 1)
        assert.equal([2].page.elements[1]:id(), 2)

        -- Now we are inserting 3, making the tree look like this
        --          *
        --         /|\
        --        1 2 3
        tree:insert(Row:new(3, {Cell:new(3)}))

        -- Because of the page size, the resulting tree
        -- must look like:
        --          *
        --        /   \
        --       *     *
        --      /     / \
        --     1     2   3

        --  This looks a little weird because of the balancing algorithm used
        --  where the split off pages can be potentially bigger than the original
        --  pages, but it works fine in practice
        assert.equal([1].page.elements[1].page.elements[1]:id(), 1)
        assert.equal([2].page.elements[1].page.elements[1]:id(), 2)
        assert.equal([2].page.elements[2].page.elements[1]:id(), 3)

        -- And I can still get to node 2 or 3
        assert.equal(tree:select(2):id(), 2)
        assert.equal(tree:select(3):id(), 3)

        -- If we add 4 then the table will become:
        --           *
        --         /   \
        --        *     *
        --       /     / \
        --      *     *   *
        --     /     /   / \
        --    1     2   3   4
        tree:insert(Row:new(4, {Cell:new(4)}))

                              .page.elements[1]:id(), 1)

                              .page.elements[1]:id(), 2)

                              .page.elements[1]:id(), 3)

                              .page.elements[1]:id(), 4)


This property is not necessarily maintained when deleting from B-Trees, which is something I still need to address.


There are two things that need to be fixed with the current implementation. The first is that deleting is completely unsupported. I won’t need to delete entries from the B-Tree until I need to support DELETE, which won’t be for a while.

The second feature that isn’t supported is fast traversal. The current B-Tree implementation is sort of half way between a traditional B-Tree and a B+Tree. The defining characteristic of B+Trees is: the data in a B+Tree is stored at the leaf nodes, interior nodes only hold pointers.

Edible’s B-Tree implementation works this way, except that the leaf nodes are not linked together via pointers. In an actual B+Tree, the leaf nodes would operate sort of like a doubly linked list so that they can be traversed in O(n). I believe with Edible’s current implementation (if I’ve done the analysis correctly) the traversal is in O(nlog(n)).

Bonus: Benchmarks!

I’ve written a small benchmarking script that runs two benchmarks (more to come later): 1. Checks performance as it relates to page size 2. Checks performance of iterating over a large B-Tree

NOTE: These benchmarks are far from scientific, they serve only to give a rough estimate of the performance of Edible on somewhat typical hardware in typical circumstances. Benchmarks were performed on the latest version of Lua 5.3, Ubuntu, and a computer made of God-only-knows what hardware built sometime in late 2015.

The results of the first benchmark:

    Runs for    Insert 100000 integers (one cell rows)
    Ps      Time (seconds)
    64      15.827768666667
    128     12.091246666667
    256     10.708329666667
    512     13.285101666667
    1024    19.158698333333
    2048    33.191851666667

The page size clearly has some impact on performance, with the best performance being attained for a page size of 256. I did some benchmarking for larger page sizes, and the amount of time it takes to insert the rows roughly doubles as the page size doubles beyond 2048. I’m not sure why 256 bytes is the ideal page size, and I’m interested to try some benchmarks with bigger rows (not just a single cell) to see if it changes anything.

The second benchmark builds a tree of 100,000 integers and then iterates over the entire tree. This benchmark takes an average 350 ms to run, which is fast enough for now. This number is part of the reason I opted to leave out linking leaf nodes together at this stage.

So what’s next

SQL syntax support is next, and the first task is to create a Lexer and Parser for whatever subset of SQL is going to be supported.

Expect another post when I have something significantly interesting to show for my efforts!