article banner (priority)

Persistent dictionary

Our journey to understand persistent memory began with building an in-memory dictionary. Our in-memory dictionary was built using the trie data structure and we represented this dictionary with the following classes:

class Trie {
    private:
    TrieNode* root;

    public:
    void add(string word);
    bool contains(string word);
    bool containsPrefix(string prefix);    
}

class TrieNode {
    private:
    bool endOfWord;
    std::map<char, TrieNode*> nodeByCharacter;

    public:
    void add(string word);
    bool contains(string word);
    bool containsPrefix(string prefix); 
}

Now is the right time to make our dictionary persistent, but before we do that, let's understand a few concepts of PMDK. To understand PMDK concepts, we will build a very simple and naive append-only key/value store.

Append only key/value store

The overall idea can be summarized as:

  • We will be using linked list as the data structure for persisting key/value on persistent memory
  • Keys and values will be of type string
  • All the nodes of the linked list will be on persisting memory
  • We do not care if the same key with a different value is added again
  • Our key/value store will provide the following behaviors:
    • open(const char *fileName, uint64_t fileSize) that opens the key/value store for read and write,
    • write(std::string key, std::string value) that writes key and value to the persistent linked list,
    • countEntries() that returns the total number of entries in the persistent linked list,
    • lastKey(): that returns the last key in the persistent linked list,
    • lastValue(): that returns the last value in the persistent linked list.

Overview of various key/value components

AppendOnlyKv class represents our append-only key/value store. It maintains references to PersistentList and PersistentPool.

namespace dictionary {
    namespace persistent {
        namespace kv {
            class AppendOnlyKv {
            private:
                ::dictionary::persistent::kv::PersistentPool *persistentPool;
                ::dictionary::persistent::kv::PersistentList *persistentList;
            };
        }
    }
}

PersistentList class represents a persistent linked list. It contains a reference to the head node. This reference is represented by persistent_ptr to PersistentNode. The head node is the starting node for all the read/write operations and all the key/value pairs will be added to its right.

namespace dictionary {
    namespace persistent {
        namespace kv {
            class PersistentList {
            private:
                persistent_ptr<PersistentNode> head;
            };
        }
    }
}

persistent_ptr represents a pointer to an object on persistent memory. It provides member access, dereference and array access operators.

Each PersistentNode contains 2 fields:

  • next represents the next node in the linked list. This field is represented as persistent_ptr to PersistentNode.
  • keyValue represents the persistent key/value. This field is represented as persistent_ptr to char[]. We will be converting our key and value to a char* to write it to a persistent node.
namespace dictionary {
    namespace persistent {
        namespace kv {
            struct PersistentNode {
                persistent_ptr<PersistentNode> next;
                persistent_ptr<char[]> keyValue;
            };
        }
    }
}

PersistentPool class creates a memory pool containing the root object. It creates a memory pool of a given size, an identifier called layout, and a file path. Memory pools reside on DAX-mounted file systems.

namespace dictionary {
    namespace persistent {
        namespace kv {
            class PersistentPool {
            public:
                static PersistentPool *initialize(const char *filePath, uint64_t size);
                static PersistentPool *getInstance();
                pmem::obj::pool_base getPmpool();
            protected:
                struct Root {
                    pmem::obj::persistent_ptr<PersistentNode> ptr;
                };
                pmem::obj::pool_base pmpool;
                PMEMoid *root_oid;
            private:
                static PersistentPool *instance;
                PersistentPool(const char *filePath, uint64_t size = 8 * 1024 * 1024);
                pmem::obj::pool<Root> createOrFail(const char *path, const std::size_t size, const std::string &layout);
            };
        }
    }
}

This implementation of PersistentPool is not thread-safe

Persistent pointer, Memory Pools, Pool Object Pointer (POP), and the Root Object

Persistent pointer

Most operating systems implement address space layout randomization (ASLR). ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap, and libraries. Because of ASLR, files can be mapped at different addresses of the process address space each time the application executes. As a result, traditional pointers that store absolute addresses cannot be used. To solve this problem in persistent memory programming, libpmemobj introduced a C struct called PMEMoid, which consists of an identifier of the pool and an offset from its beginning. This fat pointer is encapsulated in libpmemobj C++ bindings as a template class pmem::obj::persistent_ptr. We used persistent_ptr in PersistentNode.

struct PersistentNode {
    persistent_ptr<PersistentNode> next;
    persistent_ptr<char[]> keyValue;
};

Memory pools, Pool Object Pointer (POP), and the Root Object

DAX (Direct access) allows applications to memory map files on a persistent memory-aware file system to provide direct load/store operations without paging blocks from a block storage device. It bypasses the kernel, avoids context switches and interrupts, and allows applications to read and write directly to the byte-addressable persistent storage.

Memory-mapped files are at the core of the persistent memory programming model. The libpmemobj library provides a convenient API to easily manage pool creation and access, avoiding the complexity of directly mapping and synchronizing data. Memory pools reside on DAX-mounted file systems.

    pmem::obj::pool<Root>::create(path, layout, size);

Location of the pool once memory mapped into the application address space can differ between executions and system reboots due to ASLR. Without a way to access the data within the pool, it would be challenging to locate the data within a pool. PMDK-based pools have a small amount of metadata to solve this problem. Every pmemobj type pool has a root object. This root object is necessary because it is used as an entry point from which it is possible to find all the other objects created in a pool. An application will locate the root object using a special object called pool object pointer (POP). The POP object resides in volatile memory and is created with every program invocation. It keeps track of the metadata related to the pool, such as the offset to the root object inside the pool.

    pmem::obj::pool_base pmpool = pmem::obj::pool<Root>::create(path, layout, size);

    PMEMoid* root_oid = static_cast<pmem::obj::pool<Root>>(pmpool).root()->ptr.raw_ptr();

Let's now understand various operations supported by the key/value store.

Opening key/value store

The idea behind open can be summarized as:

  • Initialize a persistent pool with a file name and the file size
  • Instantiate PersistentList. As a part of instantiating PersistentList, do the following:
    • Get the reference to the persistent pool
    • Create a new PersistentNode using make_persistent<PersistentNode>()
    • Put an empty key/value pair in the header node. The idea is to create a fixed header node containing an empty key/value pair when the append-only key/value store is opened.

This is how the above approach can be implemented in C++:

AppendOnlyKv *AppendOnlyKv::open(const char *fileName, uint64_t fileSize) {
    auto *kv = new AppendOnlyKv();
    kv->persistentPool = PersistentPool::initialize(fileName, fileSize);
    kv->persistentList = new PersistentList();
    return kv;
}

PersistentList::PersistentList() {
    auto pmpool = PersistentPool::getInstance()->getPmpool();
    transaction::run(pmpool, [&] {
        this->head = make_persistent<PersistentNode>();
        this->head.get()->put("", "");
    });
}

make_persistent transactionally allocates and constructs an object of type T. This function can be used to allocate an object inside a transaction.

transaction::run executes a closure-like transaction. It starts a new transaction (nested, if inside another transaction) and executes the passed tx function transactionally.

Writing key/value in persistent memory

The idea behind write can be summarized as:

  • Start with the head node this->head of the persistent list. this->head.get() returns the raw pointer from a persistent_ptr.
  • Use the raw pointer to iterate until the last node is reached. (We do not maintain a current pointer to simulate iteration while adding a new key/value pair)
  • Get the reference to the persistent pool
  • Use make_persistent to allocate new PersistentNode
  • Get the raw pointer reference to the newly created now using newNode.get()
  • Put the key/value pair in the newly allocated node. To put a key/value pair in persistent memory, we need to encode the incoming key/value pair and convert it to a char*. The way to do this includes:
    • Getting the size of incoming key and value
    • Computing the total size (+2 is for null characters in C strings)
    • The way we write our key/value to persistent memory is by writing the key size, followed by the value size, followed by the key, and then the value
    • Allocate memory for char[] using make_persistent<char[]>(size). This time we use make_persistent to allocate an array (char[]) of fixed size
    • Write key size in the first 32 bits (4 bytes)
    • Write value size in the next 32 bits (4 bytes)
    • Perform memcpy to write the key
    • Perform memcpy to write the value

This is how the above approach can be implemented in C++:

void PersistentList::write(std::string key, std::string value) {
    PersistentNode *current = this->head.get();
    //iterate till the last node
    while (current->next.get()) {
        current = current->next.get();
    }

    auto pmpool = PersistentPool::getInstance()->getPmpool();
    transaction::run(pmpool, [&] {
        persistent_ptr<PersistentNode> newNode = make_persistent<PersistentNode>();
        current->next = newNode;
        //put key/value in the new node
        newNode.get()->put(key.data(), value.data());
    });
}

//PersistentNode.h
void put(const char *key, const char *value) {
    size_t ksize = strlen(key);
    size_t vsize = strlen(value);
    size_t size = ksize + vsize + sizeof(uint32_t) + sizeof(uint32_t) + 2;

    keyValue = make_persistent<char[]>(size);

    char *p = keyValue.get();
    setKeySize(p, (uint32_t) ksize);
    setValueSize(p, (uint32_t) vsize);

    char *kvptr = p + sizeof(uint32_t) + sizeof(uint32_t);
    memcpy(kvptr, key, ksize);
    kvptr += ksize + 1;
    memcpy(kvptr, value, vsize);
}

Getting the last key

The idea behind lastKey can be summarized as:

  • Start with the head node this->head of the persistent list. this->head.get() returns the raw pointer.
  • Use a raw pointer to iterate until the last node is reached
  • Extract the key from persistent memory on the last node, current->key()

This is how the above approach can be implemented in C++:

std::string PersistentList::lastKey() {
    PersistentNode *current = this->head.get();
    while (current->next.get()) {
        current = current->next.get();
    }
    return current->key();
}

Extracting key from persistent memory

The idea behind extracting a key from persistent memory can be summarized as:

  • keyValue contains the size of the key, followed by the size of the value, followed by the key and then the value. To get the key, use the raw pointer from keyValue using keyValue.get() and follow pointer arithmetic:
    • Add sizeof(uint32_t) to the raw keyValue pointer to move the pointer ahead by key size
    • Add another sizeof(uint32_t) to move the pointer ahead by value size
  • Finally, convert the result in char * and return the value to the client

This is how the above approach can be implemented in C++:

const char *key() const {
    return ((char *) (keyValue.get()) + sizeof(uint32_t) + sizeof(uint32_t));
}

Testing append-only key/value store

Let's add some tests to validate our tiny append-only key/value store. To test it, we have created a small fixture that opens AppendOnlyKv as a part of the SetUp method and just removes the file as a part of the TearDown method.

#include <gtest/gtest.h>
#include <string>
#include "../../src/append-only-kv/AppendOnlyKv.h"

using namespace dictionary::persistent::kv;

class AppendOnlyKvFixture : public ::testing::Test {
private:
    AppendOnlyKv *kv;
    const char *fileName = "./append-only-kv-tests.log";
public:
    void SetUp() {
        kv = AppendOnlyKv::open(fileName, 8 * 1024 * 1028);
    }
    void TearDown() {
        remove(fileName);
    }
    AppendOnlyKv *getKv() {
        return kv;
    }
};

TEST_F(AppendOnlyKvFixture, AppendOnlyKv_AddsAKeyValue) {
    AppendOnlyKv* kv = AppendOnlyKvFixture::getKv();
    kv->write("A", "B");

    auto totalEntries = kv->countEntries();
    ASSERT_EQ(1, totalEntries);
}

TEST_F(AppendOnlyKvFixture, AppendOnlyKv_GetsTheLastKey) {
    AppendOnlyKv* kv = AppendOnlyKvFixture::getKv();
    kv->write("A", "B");
    kv->write("C", "D");

    auto lastKey = kv->lastKey();
    ASSERT_EQ("C", lastKey);
}

Quick takeaways

  • If we have to convert an in-memory data structure to a persistent memory data structure, we would need persistent_ptr instead of a raw pointer. (assuming the in-memory data structure involves a raw pointer).

  • Allocating an object on persistent memory would be happening transactionally using make_persistent.

  • Storing data in persistent memory would involve encoding the data (in some form) and writing it as persistent_ptr<char[]>.

I guess we are ready to convert our in-memory dictionary to a persistent dictionary.

Convert In-memory dictionary to persistent dictionary

Below is what our in-memory dictionary looked like:

class Trie {
    private:
    TrieNode* root;
}

class TrieNode {
    private:
    bool endOfWord;
    std::map<char, TrieNode*> nodeByCharacter;
}

Based on our learnings from the persistent linked list, we can put down the tasks involved in making this dictionary persistent.

  • Make the root pointer inside Trie class persistent
  • Figure out a way to make std::map persistent
  • Make changes in the methods to use persistent_ptr
  • Transactionally create new nodes

PMDK already provides a set of persistent containers and one of those containers is a concurrent hash map. This effectively means we can replace the in-memory hashmap in TrieNode with a persistent one, but that would be no fun. Instead, we would like to apply the same concepts that we looked at while building our simple append-only key/value store. These concepts include persistent_ptr, transaction and persistent pool. To do that, we would like to build a persistent dictionary that only supports lower-case English letters. This would allow us to use an array of size 26 and allocate the array using make_persistent(fixed_size).

Let's take a quick look at the structure of persistent Trie and TrieNode classes.

Overview of various dictionary components

Trie class represents our persistent dictionary. It maintains a persistent_ptr to TrieNode which serves as our root node. This root node will be created as a part of the open method.

namespace dictionary {
    namespace persistent {
        class Trie {
        private:
            persistent_ptr<TrieNode> root;
        public:
            static Trie *open(const char *fileName, uint64_t fileSize);
            void add(std::string word);
            bool contains(std::string word);
            bool containsPrefix(std::string prefix);
        };
    }
}

Each node of our dictionary is represented by the TrieNode struct. Each node is expected to contain an array of 26 entries and each entry is expected to be a pointer to another TrieNode. We use p<Slot> slots[26] to create an array of 26 entries and each slot is nothing but a persistent_ptr<TrieNode> next. Each TrieNode also contains an endOfWord field that is represented as p<bool>.

namespace dictionary {
    namespace persistent {
        struct TrieNode;

        struct Slot {
            persistent_ptr<TrieNode> next;

            void setNext(persistent_ptr<TrieNode> next_);
        };

        struct TrieNode {
            p<Slot> slots[26];
            p<bool> endOfWord;

            void add(std::string word, pmem::obj::pool_base pmpool);
            bool contains(std::string word);
            bool containsPrefix(std::string prefix);
        };
    }
}

p class is a property-like template class that has to be used for all variables (excluding persistent pointers). The p property makes sure that changes to a variable within a transaction are made atomically with respect to persistence. It does it by creating a snapshot of the variable when modified in the transaction scope.

TriePool is an object that takes the responsibility of creating a memory pool of a given size and an identifier. Similar to our PersistentPool, this implementation of TriePool is not thread-safe.

namespace dictionary {
    namespace persistent {
        class TriePool {
        public:
            TriePool(TriePool &other) = delete;
            void operator=(const TriePool &) = delete;
            static TriePool *initialize(const char *filePath, uint64_t size);
            static TriePool *getInstance();
            pmem::obj::pool_base getPmpool();
            ~TriePool() {
                try {
                    instance = nullptr;
                    pmpool.close();
                } catch (const std::logic_error &e) {
                    std::terminate();
                }
            }
        protected:
            struct Root {
                pmem::obj::persistent_ptr<::dictionary::persistent::TrieNode> ptr;
            };
            pmem::obj::pool_base pmpool;
            PMEMoid *root_oid;
        private:
            static TriePool *instance;
            TriePool(const char *filePath, uint64_t size = 8 * 1024 * 1024) {
                bool openFailed = false;
                std::string layout = "persistentDictionary";
                try {
                    pmpool = pmem::obj::pool<Root>::open(filePath, layout);
                } catch (pmem::pool_invalid_argument &e) {
                    openFailed = true;
                }

                if (openFailed) {
                    pmpool = createOrFail(filePath, size, layout);
                }
                root_oid = static_cast<pmem::obj::pool<Root>>(pmpool).root()->ptr.raw_ptr();
            }
            pmem::obj::pool<Root>
            createOrFail(const char *path, const std::size_t size, const std::string &layout) {
                try {
                    return pmem::obj::pool<Root>::create(path, layout, size);
                } catch (pmem::pool_invalid_argument &e) {
                    throw std::invalid_argument(e.what());
                }
            }
        };
    }
}

Opening a persistent dictionary

The idea behind open can be summarized as:

  • Initialize a persistent pool (TriePool) with a file name and file size
  • Instantiate Trie. As a part of instantiating Trie do the following:
    • Get the reference to the persistent pool
    • Create a new TrieNode using make_persistent<TrieNode>(). This new node will serve as the root node of our dictionary.

This is how the above approach can be implemented in C++:

Trie *Trie::open(const char *fileName, uint64_t fileSize) {
    auto *trie = new Trie();
    TriePool::initialize(fileName, fileSize);

    auto pmpool = TriePool::getInstance()->getPmpool();
    transaction::run(pmpool, [&] {
        trie->root = make_persistent<TrieNode>();
    });
    return trie;
}

Adding a word

The idea behind add can be summarized as:

  • Start with the root node this->root, dereference it to get the raw pointer this->root.get() and invoke add on the root TrieNode
  • Inside add of TrieNode do the following:
    • Iterate through each character of the word
    • Since, our dictionary is going to contain lower case characters, subtract the code point of the letter a to get the slot index
    • Check if the slot at index contains a persistent_ptr to TrieNode using current->slots[index].get_ro().next
      • current->slots[index].get_ro() returns a read-only reference to the slot at the index
      • using the read-only reference, check the next field which is a persistent_ptr to TrieNode
    • If the slot at index does not contain a persistent_ptr to TrieNode, then create a new persistent node and set the reference to the new node in the slot
      • make_persistent<TrieNode>() creates a new node
      • current->slots[index].get_rw() returns the read-write reference to the slot
      • setNext method of slot sets the next persistent_ptr to TrieNode
    • Move current to the node pointed by the slot at index
    • At the end of the iteration, set endOfWord as true in the current node. Run this entire process inside a transaction.

This is how the above approach can be implemented in C++:

void Trie::add(std::string word) {
    this->root.get()->add(word, TriePool::getInstance()->getPmpool());
}

void add(std::string word, pmem::obj::pool_base pmpool) {
    TrieNode *current = this;
    transaction::run(pmpool, [&] {
        for (auto &ch: word) {
            int index = ch - 'a';
            if (!current->slots[index].get_ro().next) {
                current->slots[index].get_rw().setNext(make_persistent<TrieNode>());
            }
            current = current->slots[index].get_rw().next.get();
        }
        current->endOfWord = true;
    });
}

The idea behind persistent add is exactly the same as that of in-memory add. The only difference is persistent add uses persistent_ptr.

Checking if the dictionary contains a word

The idea behind contains can be summarized as:

  • Start with the root node this->root, dereference it to get the raw pointer this->root.get() and invoke contains on the root TrieNode
  • Inside contains of TrieNode do the following:
    • Iterate through each character of the word
    • Since, our dictionary is going to contain lower case characters, subtract the code point of the letter a to get the slot index
    • Check if the slot at index contains a persistent_ptr to TrieNode using current->slots[index].get_ro().next
    • If the slot does not contain a reference to TrieNode, return false
    • Else, move current to the node pointed by the slot at index
    • At the end of the iteration, check endOfWord in the current node

This is how the above approach can be implemented in C++:

bool Trie::contains(std::string word) {
    return this->root.get()->contains(word);
}

bool contains(std::string word) {
    TrieNode *current = this;
    for (auto &ch: word) {
        int index = ch - 'a';
        if (!current->slots[index].get_ro().next) {
            return false;
        }
        current = current->slots[index].get_ro().next.get();
    }
    return current->endOfWord;
}

Checking if the dictionary contains a word by a prefix

The idea behind containsPrefix is almost the same as contains. The only difference here is that we do not check for endOfWord in the current node at the end of the iteration.

This is how the above approach can be implemented in C++:

bool Trie::containsPrefix(std::string prefix) {
    return this->root.get()->containsPrefix(prefix);
}

bool containsPrefix(std::string prefix) {
    TrieNode *current = this;
    for (auto &ch: prefix) {
        int index = ch - 'a';
        if (!current->slots[index].get_ro().next) {
            return false;
        }
        current = current->slots[index].get_ro().next.get();
    }
    return true;
}

Testing persistent dictionary

Let's add some tests to validate our persistent dictionary. To test it we have created a small fixture that opens our dictionary as a part of the SetUp method and just removes the file as a part of the TearDown method.

#include <gtest/gtest.h>
#include <string>
#include "../../src/dictionary/Trie.h"

using namespace dictionary::persistent;

class DictionaryFixture : public ::testing::Test {
private:
    ::dictionary::persistent::Trie *trie;
    const char *fileName = "./dictionary-tests.log";
public:
    void SetUp() {
        trie = ::dictionary::persistent::Trie::open(fileName, 8 * 1024 * 1028);
    }
    void TearDown() {
        remove(fileName);
    }
    ::dictionary::persistent::Trie *getDictionary() {
        return trie;
    }
};

TEST_F(DictionaryFixture, Dictionary_ContainsAnAddedWord) {
    Trie* dictionary = DictionaryFixture::getDictionary();
    dictionary->add("meet");
    dictionary->add("memory");

    ASSERT_TRUE(dictionary -> contains("memory"));
}

TEST_F(DictionaryFixture, Dictionary_DoesNotContainAWord) {
    Trie* dictionary = DictionaryFixture::getDictionary();
    dictionary->add("meet");
    dictionary->add("memory");

    ASSERT_FALSE(dictionary -> contains("market"));
}

TEST_F(DictionaryFixture, Dictionary_ContainsAWordByPrefix) {
    Trie* dictionary = DictionaryFixture::getDictionary();
    dictionary->add("meet");
    dictionary->add("memory");

    ASSERT_TRUE(dictionary -> containsPrefix("me"));
}

TEST_F(DictionaryFixture, Dictionary_DoesNotContainAWordByPrefix) {
    Trie* dictionary = DictionaryFixture::getDictionary();
    dictionary->add("meet");
    dictionary->add("memory");

    ASSERT_FALSE(dictionary -> containsPrefix("met"));
}

Conclusion

Let's summarize.

  • We made our in-memory dictionary persistent by using various PMDK concepts including persistent_ptr, p, memory pool and APIs like make_persistent and transaction.

  • If we are using PMDK, converting any in-memory data structure to a persistent memory data structure would require persistent_ptr instead of a raw pointer. (assuming the in-memory data structure involves a raw pointer).

  • Allocating an object on persistent memory would be happening transactionally using make_persistent.

  • Storing data in persistent memory would involve encoding the data (in some form) and writing it as persistent_ptr<char[]>.

Code

Code for this article is available here

References