Hyland Connect

mcs130 · ‎09-16-2012

Hello

We are doing some POC work with AF4 (4.0.e) Community Edition and Apache Chemistry 0.8.0-SNAPSHOT source (built JARs from this)

Much is working fine.

We recently noticed when either running a couple of our unit tests or performing the 2 operations in series in our custom UI's business logic, we sometimes see the 2nd operation fail with an exception. (I have asked the developer to get the detailed trace). In the meantime, here is the failing scenario:

Performs a delete of a document selected from a list displayed in a dialog. This works fine every time.

document.delete(true);‍

To refresh the list, the developer has the code then hitting the back-end AGAIN with the same query to get the new list (which should now be 1 less since the document object was deleted successfully).

cmisSession.query(queryString, false);‍

However, the page has no list because the application throws an exception.

Then we noticed that when a breakpoint is used in Debug mode and you are stepping through the code to try and isolate the breakage, which naturally slows the process… it NEVER breaks - both operations then succeed.

This lead us to think it was a "timing" issue perhaps, but not likely? However, when the developer forced a sleep of say 500 ms between method calls being done in the UI… it works again.

Ironically, another ECM product being used for the POC work, an SP2103 preview, does NOT exhibit this behavior for THIS use-case - however, there have been other challenges with SP that we have encountered which AF4 is far better at handling correctly.

Thanks

jpotts · ‎09-17-2012

Agreed, you should not get an exception. But it looks as though the exception may be coming from OpenCMIS, not Alfresco. I am wondering if you are holding on to that iterator when you shouldn't be. For example, if you delete the object using the web client instead of code and then invoke your page, does the exception happen?

Jeff

Jeff Potts
https://www.metaversant.com | https://ecmarchitect.com

andy · ‎09-18-2012

Hi

You run a query - you get a list of nodes - then you try and get them …
Why do they have to still be there? Someone else could have deleted it ….changed its permissions… etc etc
Eventual consistency is just worse - you are more liekly to have nodes that do not exists in the results.

missing:missing//missing is used to indicate a missing node - you have found a node that longer exists for some reason
When you make a CMIS call for this ID it will not be there.
As this can happen at any time you have to expect it - using any API - hold onto data it can change ….there is no "repeatable read" between calls.

Andy

mcs130 · ‎09-18-2012

Perhaps I am missing something, so please bear with me on this:

1- The scenario we described involved making a call to pass an SQL query String

cmisSession.query(queryString, false);‍

.
2 -We get back an ItemIterable<QueryResult> - it's correct.
3 -We iterate through and display the list of documents (relevant info only) in the UI - works great.
4 -The user then selects a document from the list…behind the scenes that application has the objectId of this ONE document and fires of the delete() method on it with its business logic. The delete works fine because when you refresh the local Alfresco repository web page on the Folder that has the documents, there is ONE less… URL form looks like: http://XX.XX.XX.XX:8081/share/page/repository#filter=path%7C%2FAcmeLibrary%2Facme-data%2F99997777000...
5 -Then, the same exact SQL query String is passed in again to perform an entirely NEW query, expecting to get back a new ItemIterable<QueryResult> - why should this be a problem? The query() method is on the Session object. We are not recreating this Session, we are using the current one we have - is that a problem? I would think it shouldn't be.

It is obtained like the following:

Session org.apache.chemistry.opencmis.client.api.SessionFactory.createSession(Map<String, String> parameters)‍

What "node" is being referred to here in the statement below?

missing:missing//missing is used to indicate a missing node - you have found a node that longer exists for some reason
When you make a CMIS call for this ID it will not be there.

I would expect simply to get a new ItemIterable<QueryResult>, reflective of the same results except now not including the Document object that correlates to the one objectId that I just deleted… of which nowhere in the second call is there any reference to this objectId.

Is the problem actually the "state" of the cmis:Folder's objectId we have - of which I am re-using since the SQL is using the IN_FOLDER predicate? We don't get it again, since the Folder objectId does not change and for the purpose of the IN_FOLDER predicate, you pass the String of the Id and not the object instance itself.

A) We find if we wait up to 20 seconds, and make the SAME call - it works, which we CANNOT actually implement OR
b) Switching to Lucene from Solr - the problem then goes away and is no longer seen.

Am I missing the point altogether?

Thanks

karstene · ‎09-24-2012

I am facing the same issue. A query call fails with the Soap Fault message "Runtime error. Message: org.alfresco.service.cmr.repository.InvalidNodeRefException: Node does not exist: missing://missing/missing(null)"

In my case, the previous call was a CancelCheckOut, wich deletes the working copy. The next query returns the fault and no result set.

I read your statement

You run a query - you get a list of nodes - then you try and get them …
Why do they have to still be there? Someone else could have deleted it ….changed its permissions… etc etc

Andy

and would agree if the fault happens in subsequent calls after the query, when trying to access non-existing nodes. But the query itself has to return a valid node set, with or without the deleted node, depending on timing. Instead, it returns a fault and no result set at all.

thanks
Karsten

t_sato · ‎09-25-2012

Hi,

Are you guys aware of this issue?

Solr CMIS Query After Delete a Node Throws CmisRuntimeException: Node does not exist

This was closed as fixed at 4.0.2. But this may persist. Other test cases must be helpful to fix completely.

mcs130 · ‎09-25-2012

We are using 4.0.e Community Edition. Is it possible that this is why we are seeing this? We wanted to use Solr as it is configured OOTB, but did have to switch the configuration to use Lucene to get around the problem for now. You mention that the defect fix applies to 4.0.2.

Mark

jpotts · ‎10-11-2012

Means it should be fixed in 4.2a Community Edition. Would be great if someone could confirm.

Jeff

Jeff Potts
https://www.metaversant.com | https://ecmarchitect.com

mcs130 · ‎10-31-2012

Just a new update: We are currently using 4.2.a COMMUNITY Edition with the OOTB Solr configuration (Postgress) and are still seeing this issue. I will, for now, go in an switch the index configuration to Lucene to get around this. We've been using the Community Edition for POC work.

Currently, our company is now looking at Alfresco 4.2 Enterprise as a solution for this ECM initiative and perhaps when we have a chance to install the 4.2 Enterprise version, maybe it is fixed in that version?

Thanks

Mark

jspuchau · ‎11-16-2012

Hi everybody

We are facing a very similar issue in our organization, but we are not using Solr, as we are using Alfresco Enterprise 3.4.10.

Our problem appears during execution of stress test on our cmis layer. When we mix deletion with methods using the ItemIterable (and PageFetcher) classes from OpenCMIS, Alfresco returns some odd errors to identical requests.

It's worth to say that it happens when using atom/pub implementation of opencmis, but not using soap implementation.

I don't have access to logs as Im at the airport, but on monday I will post it.

Regards, and thanks for the help. I hope we be able to get a solution.

mcs130 · ‎11-16-2012

I will say that we have recently observed:

Community Edition: 4.2.a - Exception is no longer thrown back upon the immediate SELECT query after the delete is made, but there is a lag due to Solr indexing (which eventually catches up) - the "eventual consistency" behavior seems to be expected with Solr

Enterperise Edition: 4.1 - Exception is no longer thrown back upon the immediate SELECT query after the delete is made, but there is a lag due to Solr indexing (which eventually catches up) - the "eventual consistency" behavior seems to be expected with Solr

Hyland Connect

timing between document.delete() and running query