cancel
Showing results for 
Search instead for 
Did you mean: 

How to distinguish different nodes of type "content".

Muffex
Confirmed Champ
Confirmed Champ

Hey there.

I've searched several threads regarding SQL statements to gather information on files, folders, etc. I'm well aware that the proper way would be to use the API or CMIS.

But for my case I have to rely on SQL.

When selecting the nodes (from the alf_node table) I filter on alf_qname.local_name = 'content'.
These content nodes, however, contain (uploaded) files, wiki-pages, blog-posts, discussion-topics and other nodes.

How can I properly distinguish these different content nodes? I.e. how to only select (uploaded) files? How to only select blog-posts? etc.

kind regards

1 ACCEPTED ANSWER

Muffex
Confirmed Champ
Confirmed Champ

Well, I finally figured "something" out.

I can retrieve most of the nodes that refer to "files". It's neither well optimized nor can I guarantee it will work.

But for my needs it is currently sufficient.

SELECT n.id AS nodeId, n.uuid AS fileId, n.audit_created AS created, n.audit_modified AS last_updated, anp.string_value AS name, n.audit_creator AS creator, CONCAT('http://alfresco.bas.uni-koblenz.de/share/page/document-details?nodeRef=',store.protocol,'://',store.identifier,'/',n.uuid) AS content
  FROM alf_node AS n
    JOIN alf_qname AS q on (n.type_qname_id = q.id AND q.local_name='content')
    JOIN alf_store AS store ON (store.id = n.store_id)
    JOIN alf_node_properties AS anp on (n.id = anp.node_id)
    LEFT JOIN alf_content_data AS acd ON (anp.long_value = acd.id)
    JOIN alf_child_assoc AS aca ON (aca.child_node_id = n.id)
    JOIN alf_node AS nP ON (aca.parent_node_id = nP.id)
    JOIN alf_qname AS aqn ON (np.type_qname_id = aqn.id)
    WHERE anp.qname_id IN 
      (SELECT id FROM alf_qname
        WHERE local_name='name' 
          AND ns_id IN 
            (SELECT ns2.id FROM alf_namespace AS ns2 WHERE uri LIKE '%content%'))
      AND n.audit_creator NOT LIKE 'System'
      AND aqn.local_name not like 'archiveUser'
      AND aqn.local_name not like 'person'
      AND anp.string_value like '%.%'
      AND anp.string_value NOT LIKE '%~%~%'
      AND anp.string_value NOT LIKE 'dashboard.xml'

View answer in original post

10 REPLIES 10

cristinamr
World-Class Innovator
World-Class Innovator

Good morning!

You'll need to retrieve by content type, e.g.:

SELECT string_value AS document_name
FROM alf_node node
INNER JOIN alf_node_properties ON (id=node_id)
INNER JOIN alf_qname ON (qname_id=alf_qname.id)
WHERE ns_id IN
(SELECT ns.id FROM alf_namespace AS ns
WHERE ns.uri='http://www.alfresco.org/model/content/1.0')
AND local_name='name';

You can find more examples here and here.

Cheers,

Cris.

--
VenziaIT: helping companies since 2005! Our ECM products: AQuA & Seidoc

Hello @cristinamr,

thanks for your answer,  but this is exactly what I did (including the namespace).
But the results contain nodes that refer to wiki-pages, discussion-posts, etc.

Do you know how to filter the results even further to only include files that have actually been uploaded by users?

kind regards

cristinamr
World-Class Innovator
World-Class Innovator

Could you please share with us your tests?

--
VenziaIT: helping companies since 2005! Our ECM products: AQuA & Seidoc

Hello again.

Thanks for your patience!


When running exactly the query you sent me the results contain nodes with names like:

  • Main_Page
  • links
  • several ftl-Files
  • page.component...
  • post-...
  • Comments
  • doclib
  • documentLibrary
  • imgpreview
  • ...

The results thus also contain a plethora of nodes referring to "files" which have not been "uploaded" by users.
Therefore my question: "How to distinguish content-nodes that refer to wiki-pages, discussion-posts, uploaded files, etc.?"

kind regards

afaust
Legendary Innovator
Legendary Innovator

In short, in default Alfresco without any customisations in place it is impossible to differentiate between documents, wiki articles, blog posts etc. since they all are of type cm:content. In some of my past projects I had developed automation which specialises e.g. wiki articles into a custom wiki content type to be able to better differentiate them from "real" content, e.g. by excluding the custom type from super-type queries. In a default Alfresco, the only way to reliably exclude the "other" types of contents in a site from "real" content is by filtering on the parent node and making sure it is either a regular folder or a folder with site component ID set to "documentLibrary", and NOT a folder with site component ID set to anything else...

You can exclude thumbnails / renditions by excluding any content nodes with the cm:thumbnail / rn:rendition type. You can exclude the surf-config component descriptor nodes by using a name-pattern filter, as they unfortunately also share the generic content type and cannot otherwise be differentiated. This should also work for deployed module descriptors as well as dashboard page configuration files. Content in the Data Dictionary also shared the generic content type, and that will be much harder to filter out in a DB query.

Also note that many files will be associated with various users even if they did not "upload" them. E.g. some files are created automatically when you create a site, change your dashboard configuration or simply try to preview a document. For that reason, it is not really possible to distinguish documents in the system between "uploaded by user" (via the upload function) and "otherwise/indirectly created by user".

Generally I recommend using proper Alfresco FTS instead of DB queries, as that would allow you to use PATH / ANCESTOR conditions to limit the sub-structures from which you retrieve content nodes, and implicitly deals with any prefix - namespace URI mapping for you. With EXACTTYPE you can also do targeted retrieval without being affected by child / parent type relationships.

Muffex
Confirmed Champ
Confirmed Champ

Hello @afaust,

thanks for your detailed answer.

I need to access all information via Dremio. Therefore I am afraid that I have to adhere to SQL. I am not aware of any suitable DataSource beside SQL which is offered by Dremio.

I will try to create SQL statements for each node type which is relevant for me. 

I have to admit, though, that I'm a little disappointed that there isn't an easier way. Especially considering that the overall data structure of Alfresco is designed in such a stringent and compelling way.

kind regards

cristinamr
World-Class Innovator
World-Class Innovator

Well, why don't you create your own custom content model and apply to all the items you need it?

If you're applying this kind of solution, you can adapt the query to your new namespace, so will be extracted those data you need it.

You can do an easy test:

  • Go to content model console.
  • Create one as an example (its parent should be cm:content).
  • Upload a document, change it type to yours.
  • Go to sql and adapt the query to see the effects.

Some url that can help you:

Alfresco content model UI step by step

Content model documentation 

If this test is a success, then we will let you know which next steps you're need to perform.

Let us know.

--
VenziaIT: helping companies since 2005! Our ECM products: AQuA & Seidoc

Hi @cristinamr,

if I wanted to change it just for my instance, that would probably be a great idea.
However, I need to design my queries to be compatible with other instances.
I am aware that using the API would be good for this very reason, however, as I said, I am reliant on Dremio.

kind regards

cristinamr
World-Class Innovator
World-Class Innovator

Ok, in that case I don't see any possible solution and I am not so sure if somebody else can help you.

Cheers,

Cristina.

--
VenziaIT: helping companies since 2005! Our ECM products: AQuA & Seidoc