Well since someone submitted us here with no apparent reason or context, allow me to provide something of interest. (primary contributor of bleve here)
Just recently we merged support a new experimental index scheme called 'scorch'. This new index scheme is designed from the ground up to reduce index size and improve performance. It features:
- a segment based approach, much like Lucene
- vellum FTS for the term dictionary - https://github.com/couchbase/vellum
- roaring bitmaps for the postings lists - https://github.com/RoaringBitmap/roaring
- and compressed chunked integer storage for all the posting details
It's still experimental at this point, but shows considerable indexing speedup, index size reduction, and similar query performance to the old index format used today.
The code for this new index scheme can be found here: https://github.com/blevesearch/bleve/tree/master/index/scorc...
I ran into some significant performance issues with Bleve during a weekend hackathon a few months ago. I was trying to index the stackoverflow data dump for fun and I couldn't get it to successfully complete. I'm guessing i was running into some boltdb related limitations but I didn't have the time to dig deeper before the party ended and I had to get back to the day job.
SQLite's FTS5 allowed me to load the entire data set without breaking a sweat, but query performance was unexpectedly poor for some combinations of terms with no apparent consistency, and it became unusable due to a temp table I couldn't work out how to avoid when attempting to search in descending primary key order.
It was many months ago now and I have forgotten some of the particulars but thought it might make an interesting jumping-off point for discussion about any ideas anyone might have for indexing 60gb of data into a flat file setup like Bleve or SQLite uses.
Would the new scorch experiment perform better with an index of that size than the boltdb backend?
I am using elasticsearch on production with golang as the both indexer and search service. Last holiday I played with blevesearch to make it works like Elasticsearch search, but the work is far far far from complete. https://github.com/wejick/balasticsearch
What kind of index size does Bleve work with well? Megabytes? Gigabytes?
I’d love to understand at what scale people are using this engine.
Some related work:
I'm new to Go - but it looks like the example code completely ignores the error codes.
Is that a common thing?
is there a c++ library like this?
Get cash immediately after issuing an invoice https://www.cityfinances.lv/faktorings/