More Like This Queries on SQlite3 FTS

Mon, 12 Dec 2016 03:05:48 -0500

Tags: open source

permalink

SQLite is a great embedded SQL engine, and part of the Android platform. It has an extension FTS ("Full Text Search") that enables Boolean search queries (that is, mostly unranked). For small collections of documents (like a blog), Boolean searches could be a viable temporary solution until a full solution (like elasticsearch) can be deployed.

A common type of query supported by elasticsearch are More Like This (MLT) queries, that allow you to find similar documents to given ones. This type of queries are also very useful for blogs, for example. If you're using SQLite FTS, you can construct a query that will approximate MLT by issueing an OR for all terms in a document (in FTS, the terms are lowercase and uppercsase 'OR' or 'AND' are considered logical Boolean operators). The only issue is to obtain the terms, as assigned by FTS. To do so, it is necessary to access the FTS tokenizer by creating virtual table, for example:

CREATE VIRTUAL TABLE tok1 USING fts3tokenize('simple');

Then, given a document, the terms for it can be extracted by doing (in PHP):

$tokens = $db->query("SELECT token FROM tok1 WHERE input='" . SQLite3::escapeString($all_text) ."';");

The query itself can be assembled by taking an OR of the set of terms:

$query = "";
$query_array = array();
while($row = $tokens->fetchArray()){
    $token = $row['token'];
    if(! isset($query_array[$token])){
        $query_array[$token] = 1;
        if(strlen($query) > 0) {
            $query = $query . " OR ";
        }
        $query = $query . $token;
    }
}

This is the query that can then be used against the FTS to provide a makeshift MLT functionality.

Comments

Your name:

URL (optional):

Your e-mail (optional, won't be displayed):

Something funny using the word 'elephant' (spam filter):

Your comment: