cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco 4.2.2 + AAAR Extraction problem

pawelb
Champ in-the-making
Champ in-the-making
Hello everyone!

I have tried to parse Alfresco audit logs using AAAR. When I use default, freshly generated AAAR config I get authorization error for /alfresco/cmisatom service (which is good password). Full log attached in extract1.log file.

After changing cmisatom url:


USE AAAR_DataMart;
UPDATE dm_dim_alfresco SET url_cmis_suffix='/alfresco/api/-default-/public/cmis/versions/1.1/atom' WHERE id=1;


Extraction script continues to process logs, but shows some errors at the beginning: Attached in extract2.log file.

Example:


2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : Because of an error, this step can't continue:
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - ERROR (version 5.1.0.0, build 1 from 2014-06-19_19-02-57 by buildguy) : org.pentaho.di.core.exception.KettleException:
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Error batch inserting rows into table [stg_cmis_folders_partial].
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Errors encountered (first 10):
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Error updating batch
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Duplicate entry '1-cd8a02c3-7770-4fb3-bf15-53981e5ce4e2' for key 'PRIMARY'
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:342)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.processRow(TableOutput.java:118)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at java.lang.Thread.run(Unknown Source)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseBatchException:
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Error updating batch
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Duplicate entry '1-cd8a02c3-7770-4fb3-bf15-53981e5ce4e2' for key 'PRIMARY'
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.core.database.Database.createKettleDatabaseBatchException(Database.java:1365)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:289)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    … 3 more
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 - Caused by: java.sql.BatchUpdateException: Duplicate entry '1-cd8a02c3-7770-4fb3-bf15-53981e5ce4e2' for key 'PRIMARY'
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1981)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1388)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    at org.pentaho.di.trans.steps.tableoutput.TableOutput.writeToTable(TableOutput.java:285)
2014/09/02 09:48:24 - stg_cmis_folders_partial 2.0 -    … 3 more


When process finish, most of graps in Analyse are empty (Repository size, Documents per type etc.). Can this be related to errors in extract2.log file?

Why do I need to change cmis url? Is that because Alfresco 4.2.2 is too recent for AAAR 2.1? Is it not supported?

Please help me. I would be grateful for any answers, suggestions Smiley Happy

Regards,
Paul
3 REPLIES 3

fcorti
Elite Collaborator
Elite Collaborator
Hi Paul,

About the first problem you have had:
I have checked on a Alfresco EE 4.2.2 and the default CMIS url is not changed from the previous versions so the unauthorized message seems to be a authentication problem.
I suggest you to double check the login and password and permits.
In every case, the update you have done it's correct and what you have done, changes the CMIS connection url in all the AAAR connections.
Good job!

About the second problem you have had:
The problem is during the CMIS extraction in a temporary table (stg_cmis_folders_partial).
As you can read, the problem is on the primary key: the UUID of the node.
Seems to be that the CMIS query extracts duplicate nodes… I don't feel like the problem is on the AAAR side (it's only a feeling 🙂 ).

Looking at the log file you sent, I was surprised from the message:
2014/09/02 09:48:23 - Cmis Input modified folder.0 - Cmis Input - Retrieved n.100 results from item n.500 on a total of n.137981 results.
In the default installation settings, the SolR settings should avoid to extract more than 1.000 result and here you have +100K results… this seems to be quite strange.
Probably something has been customized in your Alfresco installation?

Please, let us know how it goes or contact me privately to develop more specific tests.

Thanks.

vikash
Champ in-the-making
Champ in-the-making
Hi ,

when i executed AAAR_Extract.sh ,i am using postgres 9.3 and alfresco 4.2f,pentaho 5.2.0


i m getting
org.pentaho.di.core.exception.KettleStepException:
Unable to get queryfields for SQL:
select
*
from
cmis:folder
where
cmis:lastModificationDate >= TIMESTAMP '2001-01-01T00:00:00.000+00:00'
and (
cmis:lastModificationDate > TIMESTAMP '2001-01-01T00:00:00.000+00:00'
or (
  cmis:lastModificationDate = TIMESTAMP '2001-01-01T00:00:00.000+00:00'
  and cmis:name >= ''
))
order by
cmis:lastModificationDate asc,
cmis:name asc


Caused by: org.apache.chemistry.opencmis.commons.exceptions.CmisRuntimeException: 10260013




I am using AAR beta version from market place

fcorti
Elite Collaborator
Elite Collaborator
Hi Vikash,

Is it the first run?
The first check I suggest you is that the Alfresco server is reachable from Pentaho.
I suggest you to check the url and settings in the AAAR_DataMart.dm_dim_alfresco table.

We all know that starting from Alfresco 5.0.a several changes has been on the CMIS interface but in the 4.2.f everything should work fine.

I hope this helps you,