cancel
Showing results for 
Search instead for 
Did you mean: 

BLOBS in postgresql?

otheus_uibk
Champ on-the-rise
Champ on-the-rise

As the DBA for our postgresql server, I noticed that the database held BLOBS for use by the nuxeo service. Where and how are these blobs used? Why is nuxeo storing blobs in the database server, when the files themselves are stored locally or via S3? I see that these blobs have ID numbers, but these numbers show up only in the nxp_logs_extinfo table, and no where else. There's no reference to these AFAIK anywhere within the schema.

Are these blobs artifacts of an older version or possibly misconfigured system? Is there a way to determine what they actually are or mean?

The blob entries look like this:

nuxeo=# select loid,substring(encode(data,'escape') for 60) from pg_catalog.pg_largeobject ;
  loid  |                                   substring
--------+--------------------------------------------------------------------------------
 217768 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
 217769 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217770 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217771 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217772 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217773 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
 217774 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217775 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298409 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
 298410 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298411 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298412 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298413 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298414 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
 298415 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298416 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
(16 rows)

Here's the correlation to the "documents". I don't understand this, but it does look like some kind of internal object created by Nuxeo on behalf of users:

nuxeo=# select log_extinfo_blob,mapkey, log_event_category,log_event_comment,log_doc_path,log_doc_type,log_date from nxp_logs_extinfo,nxp_logs_mapextinfos,nxp_logs where discriminator = 'BLOB' and log_extinfo_id = info_fk and nxp_logs.log_id = log_fk ;
 log_extinfo_blob |      mapkey       | log_event_category | log_event_comment |                           log_doc_path
                        | log_doc_type  |        log_date
------------------+-------------------+--------------------+-------------------+------------------------------------------
------------------------+---------------+-------------------------
           217772 | workflowVariables | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217773 | data              | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217774 | nodeVariables     | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217775 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:31:01.385
           217768 | actors            | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217769 | workflowVariables | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217770 | nodeVariables     | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217771 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:30:27.671
           298409 | actors            | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298410 | workflowVariables | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298411 | nodeVariables     | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298412 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:08.55
           298413 | workflowVariables | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298414 | data              | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298415 | nodeVariables     | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298416 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:17.993
(16 rows)
1 ACCEPTED ANSWER

Florent_Guillau
World-Class Innovator
World-Class Innovator

Ok thanks for the data and indeed there are blobs stored, which is something I wasn't expecting.

The blobs come from the workflow audit. The workflow logs audit event properties at each step, and some of these properties are not simple strings or integers and have to be stored as serialized Java objects.

View answer in original post

7 REPLIES 7

Florent_Guillau
World-Class Innovator
World-Class Innovator

There shouldn't be any blob in a PostgreSQL database used by Nuxeo unless you use nuxeo-core-binarymanager-sql. Please give details about the blobs you see with examples if possible.

I updated my question with the relevant data.

Florent_Guillau
World-Class Innovator
World-Class Innovator

Ok thanks for the data and indeed there are blobs stored, which is something I wasn't expecting.

The blobs come from the workflow audit. The workflow logs audit event properties at each step, and some of these properties are not simple strings or integers and have to be stored as serialized Java objects.

I'll open a ticket for improvement of this, we should store that as JSON in a text field. https

Thanks. I was going to suggest JSON as alternative, especially since newer PG handles json natively (or should I say, intelligently)

FYI I discovered this while doing a dump of the database by a non-nuxeo user. I wanted a non-nuxeo user to ensure the user had only read-only access to the data. PG handles blobs a bit archaically, and the grants had to be handled especially.

We probably won't use native JSON as this goes through a Hibernate/JPA abstraction layer and has to work on several different SQL databases. Also keep in mind that the default for recent versions of Nuxeo is to store the audit data in Elasticsearch and not the SQL database.

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.