cancel
Showing results for 
Search instead for 
Did you mean: 

Multi-step document transformation

cburghardt
Champ in-the-making
Champ in-the-making
I want to store a file in alfresco which is has a mimetype that is not known to the repository yet. The file has nested content - think of a zip-file that contains other files. Is it possible to create a transformer that extracts the nested content and passes these normal files (pdf, text, whatever) to the transformers that are already available in alfresco? The goal is to fulltext index my file. I saw the complex transformations that take a list of transformers but as my file can have arbitrary sub-content this is not possible.
7 REPLIES 7

derek
Star Contributor
Star Contributor
Hi,

It seems that you require a transformer from the new mimetype to text/plain.  This is quite easy to do and set up.  Read the section in the WIKI on Content Transformations, which should be enough to get you started.  The PdfBoxContentTransformer would be a good example to look at.

If your subdocuments require transformation to text themselves, then you may have to inject the registry into your transformer, etc.  By plugging in your xxx -> text/plain transformer into the transformation registry you will start to get full text indexing.

Regards

cburghardt
Champ in-the-making
Champ in-the-making
I created a transformer that iterates over the parts, requests a transformer from the registry, creates a new ContentReader from a temp file and fires transformer#transform.
Unfortunately this step tries to get a new writable channel which fails because there is already a writer. So I created a new writer from a temp file but then the fulltext search does not find my file because only a temporary file is indexed.
So how do I append the text of my sub-parts to the writer of my file?

derek
Star Contributor
Star Contributor
Hi,

A ContentReader or ContentWriter cannot be reused.  The underlying Channel on a reader or writer can only be created once.  Certain operations access the channel and close it automatically (e.g. putFile and getFile), but if you access the channel or stream directly then you are responsible for closing it.  However, you can request a new reader from either a reader or writer using #getReader.  Be sure to close the output channelbefore using this method on a ContentWriter.

I'm just going to type here, so forgive any obvious issues:

ContentService contentService;  // by Spring injection
ContentReader sourceReader; // passed in
ContentWriter targetWriter; // passed in

// do the stuff with the source stream

List<ContentReader> innerReaders = new ArrayList<ContentReader>(5);
// some kind of loop

{
   String innerMimetype = …; // you get it somehow
   // use the temp writer so that the file will be cleaned up
   ContentWriter innerSource = contentService.getTempWriter();
   // fill it with the inner document.  Be sure to close the stream if you accessed it directly.
   …
   // get a target to transform to
   ContentWriter innerTarget = contentService.getTempWriter();
   // now transform
   ContentTransformer innerToTextTransformer = contentTransformerRegistry.getTransformer(innerMimetype, MimetypeMap.TEXT_PLAIN);
   innerToTextTransformer.transform(innerSource.getReader(), innerTarget);
   // add the target to the list of readers
   innerReaders.add(innerTarget.getReader());
}

// now you have a list of readers that are the text version of all inner documents
OutputStream targetos = null;
try
{
   // write the stream to the output target.
   targetOs = targetWriter.getOutputStream();   // can only do this once
   for (ContentReader innerReader: innerReaders)
   {
      // you can't use targetWriter.putContent(innerReader) because, as the javadoc states,
      // the ouput channel will be closed and the reader and writer may only be used once
     InputStream innerSourceIs = innerReader.getInputStream();
     copy(innerSourceIs, targetOs);   // inner source input stream will be closed
   }
}
finally
{
   // close the output stream
   …
}
// done
/**
*  Copies the input stream to the output stream, closing only the input stream upon completion
*/
private void copy(InputStream is, OutputStream os)
{
   // see Spring FileCopyUtils for example code, if required
   …
}

I hope it helps.

cburghardt
Champ in-the-making
Champ in-the-making
Great, works like a charm. Thanks a lot!

jdeus
Champ in-the-making
Champ in-the-making
Hi,
I try to use your code but without success :

this instruction is my problem:
ContentTransformer innerToTextTransformer = contentTransformerRegistry.getTransformer(innerMimetype, MimetypeMap.TEXT_PLAIN);

The return of this instruction is null. Whatever I do this give me nothing good, no transformer and this for all kind of transformation i've tried.
pls any suggestions ?

Regards,
Jérôme D.

Nb: sorry for my awful english …

cburghardt
Champ in-the-making
Champ in-the-making
I'm using this:

ContentTransformer transformer = contentTransformerService.getTransformer(mimetype, MimetypeMap.MIMETYPE_TEXT_PLAIN);

(the service is set by Spring)
Works with Alfresco 1.4 and 2.0

jdeus
Champ in-the-making
Champ in-the-making
Thanks a lot, it work like a charm now  Smiley Very Happy