07-16-2014 06:21 AM
We have a Nuxeo-5.9.3 cluster consisting in two Tomcat server and a SQL Server geo-cluster. The SQL instance is shared among many applications.
We have observed that the cleanup of the table cluster_invals and cluster_nodes was not working properly, and Nuxeo performance plummeted (due to 30 rows in cluster_nodes and millions in cluster_invals). After doing some research, we found that the way Nuxeo handles the cleanup of those tables only works if you have a dedicated SQL Server instance.
We were able to work-around the problem by:
Adding a column to cluster_nodes (hostname nvarchar(128)).
Modify the file sqlserver.sql.txt included in nuxeo-core-storage-sql-5.9.3.jar. Now nodes are deleted and inserted this way:
-- DB Collation is CI but Nuxeo DB is CS
DELETE FROM [cluster_nodes] WHERE hostname =
(SELECT host_name COLLATE DATABASE_DEFAULT FROM sys.dm_exec_sessions S
WHERE S.session_id = @@SPID)
INSERT INTO [cluster_nodes] (nodeid, created, hostname)
SELECT @@SPID, CURRENT_TIMESTAMP, host_name FROM sys.dm_exec_sessions S
WHERE S.session_id = @@SPID
The work-around can be easily improved (it doesn't currently work for many nuxeo instances on the same node).
Do you think there is a better approach for this problem? Should it be reported as a bug?
Thanks in advance.
07-25-2014 11:01 AM
I can confirm that 5.9.4 indeed has this problem, but it's fixed for the next 5.9.5 (release in around 2 weeks). I don't have an exact ticket to pinpoint when the problem was introduced or when the fix occurred, but we've done a number of changes in the repository shutdown mechanism and one of them has fixed it.
You can try a 5.9.5-SNAPSHOT from http://qa.nuxeo.org/jenkins/job/IT-nuxeo-master-build/ if you want to confirm.
07-17-2014 11:39 AM
Thanks for the report. I opened NXP-14795 to track this issue and its resolution.
We haven't tested or validated Nuxeo with a clustered SQL Server. Can you explain why the host_name is needed in addition to the session_id? The session_id is not unique globally on the server cluster? Also host_name is provided by the client application and marked not reliable in the documentation.
07-17-2014 01:49 PM
Hi Florent,
The problem is not the SQL cluster but the use of shared instance. The original cleanup is:
When there are connections that are not originated from nuxeo this doesn't work as expected. Session_id is unique, but it may not be assigned to a nuxeo session under these circumstances. Let's consider the following:
Adding host_name allows us to clean all previous rows inserted from one node. It's not a full solution and there are still some situation where it doesn't work:
If there were some field or parameter that uniquely identifies the instance, using it would resolve the problem. Is there such a thing?
07-17-2014 02:30 PM
But in step 2, if the Nuxeo server is correctly shut down then it will clean up its entry in cluster_nodes.
07-17-2014 03:53 PM
In that case we are having some problem in our instances, because cluster_nodes is only cleaned up when starting an instance. It's been the behaviour since we install Nuxeo.
07-18-2014 05:25 AM
Hi Florent,
07-18-2014 07:25 AM
If you look at sqlserver.sql.txt you'll see a section
07-18-2014 09:08 AM
As the triggered test showed, apparently there is no DELETE is executed during shutdown.
07-18-2014 09:34 AM
No sql traces during shutdown even if I enable more logging.
07-18-2014 10:23 AM
It's working correctly in the current 5.9.5-SNAPSHOT version. We have refactored a few things related to repository shutdown. Maybe try 5.9.4 first? Your 5.9.3 is a Fast Track version, so you should always upgrade asap.
Find what you came for
We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.