cancel
Showing results for 
Search instead for 
Did you mean: 

Using Bit Strings to optimize storage and retrieval of integer list types that have a length or maxLength facet

ron_1
Star Contributor
Star Contributor

I have a few mandatory, limited-size, vocabulary-based, multi-valued properties that are part of my Base Document Type. In order to optimize document retrieval, I would like to minimize the number of tables that need to be joined. Normally, each of these multi-valued properties would be represented as a collection fragment table and therefore require an add'l table join on retrieval.

I have defined these limited-size vocabularies to have one-up integer IDs that represent a bit index. The Base Document Type properties that lookup into these vocabularies are defined as lists of integers with a maxLength facet. It would be ideal if Nuxeo provided an option to optimize storage and retrieval of these length-restricted, multi-valued integer properties using Bit strings. CMIS query "quantified comparison predicate" and "quantified in predicate" constructs could be translated into bitwise operators. NXQL queries could also be translated similarly.

Any thoughts or feedback regarding this optimization approach/technique?

4 REPLIES 4

Florent_Guillau
World-Class Innovator
World-Class Innovator

Nice idea yes.

Don't you mean xs:minInclusive and xs:maxInclusive instead of xs:maxLength though? Or xs:enumeration for strings?

However automatically turning those into bit strings (if there are less than 32 or 64 values) would be a bit harsh, I'd think of using some separate configuration flag (maybe in the repository config) to turn that on. For many users (and some kinds of indexing) having the values explicitly stored in the columns is still needed.

But no plans to work on this for now...

Another optimization that's more likely to make it into an upcoming version of Nuxeo is to use array types for databases that support it (PostgreSQL, maybe Oracle) to store the a of values in a single row, and use specialized indexing and operators (<@ in PostgreSQL) to get to the values.

The xs

Understood. No, we don't have a JIRA issue for the array optimisation yet.

https

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.