a week ago
Hi!
I'm using Alfresco 23.4 (community, Docker) and I'm trying to get the plain text for a document uploaded to Alfresco using the REST API.
I can access al the renditions like the preview in image format or PDF that are created automatically. I can trigger the transformation of a file for the list of the renditions of a node, but don't find anything about a plain text version.
Here I can see that the text and HTML transformers are builtin but didn't find a way to enable it automatically (using rules for folders?) or even create a new rendition using the proper endpoint (I receive the error Renditions not registered: txt).
Can I request the transformation or enable it to do automatically? Did I get something wrong?
Thanks!
a week ago
I wrote this sample some time ago: https://github.com/aborroy/alfresco-opensearch-neural-search/blob/main/alfresco-neural-search/src/ma...
You need to add the rendition pipeline for "text" in a new file, like 0200-enableText.json:
{
"renditions": [
{
"renditionName": "text",
"targetMediaType": "text/plain"
}
]
}
a week ago
I wrote this sample some time ago: https://github.com/aborroy/alfresco-opensearch-neural-search/blob/main/alfresco-neural-search/src/ma...
You need to add the rendition pipeline for "text" in a new file, like 0200-enableText.json:
{
"renditions": [
{
"renditionName": "text",
"targetMediaType": "text/plain"
}
]
}
a week ago - last edited a week ago
Thanks @angelborroy, I tried and it worked!.
Just one detail: how can I configure it to convert it automatically like the PDF rendition?.
a week ago
How about using a behaviour?
https://ecmarchitect.com/alfresco-developer-series-tutorials/behaviors/tutorial/tutorial.html
a week ago
Seems too complex for us, we don't use Java and want to keep the compose.yaml as simple as possible.
As the mail goal is to extract the text from each file and then sync it to our application, I think the script approach is good for us. Doesn't matter if we read it from the renditions endpoint or form the children endpoint.
Thanks for your help!
a week ago - last edited a week ago
I found another way to do it, using an script:
var renderingEngineName = 'reformat';
var renditionDefinitionName = 'cm:text';
var renditionDef = renditionService.createRenditionDefinition(renditionDefinitionName, renderingEngineName);
renditionDef.parameters['mime-type'] = 'text/plain';
var htmlRendition= renditionService.render(document, renditionDef);
I can add a simple rule to call this script on each document creation on a folder, but it creates a child instead of a rendition. I understand that there is not much difference between a rendition and a hidden child of a node... Is this a better approach?.
Anyway, there's something strange here: if I set the renditionDefinitionName to my 'text' rendition, using the REST api the rendition appears as CREATED, but if I call the API to get the rendition info appears as NOT_CREATED...
Explore our Alfresco products with the links below. Use labels to filter content by product module.