11-02-2016 11:42 AM
As the DBA for our postgresql server, I noticed that the database held BLOBS for use by the nuxeo service. Where and how are these blobs used? Why is nuxeo storing blobs in the database server, when the files themselves are stored locally or via S3? I see that these blobs have ID numbers, but these numbers show up only in the nxp_logs_extinfo
table, and no where else. There's no reference to these AFAIK anywhere within the schema.
Are these blobs artifacts of an older version or possibly misconfigured system? Is there a way to determine what they actually are or mean?
The blob entries look like this:
nuxeo=# select loid,substring(encode(data,'escape') for 60) from pg_catalog.pg_largeobject ;
loid | substring
--------+--------------------------------------------------------------------------------
217768 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
217769 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
217770 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
217771 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
217772 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
217773 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
217774 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
217775 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298409 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
298410 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298411 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298412 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298413 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298414 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
298415 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
298416 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
(16 rows)
Here's the correlation to the "documents". I don't understand this, but it does look like some kind of internal object created by Nuxeo on behalf of users:
nuxeo=# select log_extinfo_blob,mapkey, log_event_category,log_event_comment,log_doc_path,log_doc_type,log_date from nxp_logs_extinfo,nxp_logs_mapextinfos,nxp_logs where discriminator = 'BLOB' and log_extinfo_id = info_fk and nxp_logs.log_id = log_fk ;
log_extinfo_blob | mapkey | log_event_category | log_event_comment | log_doc_path
| log_doc_type | log_date
------------------+-------------------+--------------------+-------------------+------------------------------------------
------------------------+---------------+-------------------------
217772 | workflowVariables | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:31:01.374
217773 | data | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:31:01.374
217774 | nodeVariables | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:31:01.374
217775 | workflowVariables | Routing | | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:31:01.385
217768 | actors | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:30:27.521
217769 | workflowVariables | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:30:27.521
217770 | nodeVariables | Routing | | /task-root/Task2556.1461083427150
| RoutingTask | 2016-04-19 18:30:27.521
217771 | workflowVariables | Routing | | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:30:27.671
298409 | actors | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:08.527
298410 | workflowVariables | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:08.527
298411 | nodeVariables | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:08.527
298412 | workflowVariables | Routing | | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:08.55
298413 | workflowVariables | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:17.962
298414 | data | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:17.962
298415 | nodeVariables | Routing | | /task-root/Task2556.1474371187953
| RoutingTask | 2016-09-20 13:33:17.962
298416 | workflowVariables | Routing | | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:17.993
(16 rows)
11-09-2016 12:10 PM
Ok thanks for the data and indeed there are blobs stored, which is something I wasn't expecting.
The blobs come from the workflow audit. The workflow logs audit event properties at each step, and some of these properties are not simple strings or integers and have to be stored as serialized Java objects.
11-03-2016 11:16 AM
There shouldn't be any blob in a PostgreSQL database used by Nuxeo unless you use nuxeo-core-binarymanager-sql
. Please give details about the blobs you see with examples if possible.
11-09-2016 09:24 AM
I updated my question with the relevant data.
11-09-2016 12:10 PM
Ok thanks for the data and indeed there are blobs stored, which is something I wasn't expecting.
The blobs come from the workflow audit. The workflow logs audit event properties at each step, and some of these properties are not simple strings or integers and have to be stored as serialized Java objects.
11-09-2016 12:19 PM
I'll open a ticket for improvement of this, we should store that as JSON in a text field. https
11-09-2016 01:57 PM
Thanks. I was going to suggest JSON as alternative, especially since newer PG handles json natively (or should I say, intelligently)
11-09-2016 01:59 PM
FYI I discovered this while doing a dump of the database by a non-nuxeo user. I wanted a non-nuxeo user to ensure the user had only read-only access to the data. PG handles blobs a bit archaically, and the grants had to be handled especially.
11-09-2016 02:04 PM
We probably won't use native JSON as this goes through a Hibernate/JPA abstraction layer and has to work on several different SQL databases. Also keep in mind that the default for recent versions of Nuxeo is to store the audit data in Elasticsearch and not the SQL database.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.