cancel
Showing results for 
Search instead for 
Did you mean: 

Not able run custom transformer(pdf_to_doc) through soffice

hiten_rastogi1
Star Contributor
Star Contributor

Hi All,

Here is the link to my project GitHub - rastoh/ev-pdf-to-doc 

Please look for service-context_bak.xml for the beans related to soffice at the path ev-pdf-to-doc/ev-repo-pdftodoc/src/main/resources/alfresco/module/ev-repo-pdftodoc/context

I am trying to convert pdf to doc using a custom transformer through soffice but not able to though I am able to successfully run the custom transformer using abiword but the transformation is not as good as soffice when ran manually.

I call the transformer through an action on the share UI. Through soffice the transformation do take place but the transformed file not generated at the specified target directory but inside my project. Below is the exception that I get while running through soffice.

2017-11-28 19:13:55,140 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Executing transformation **********

2017-11-28 19:13:55,141 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Got the content data for transformation **********

2017-11-28 19:13:55,141 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Trying transformation for the first time **********

2017-11-28 19:13:55,142 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Calling pdfToDocTransformWorker transform method for transformation **********

2017-11-28 19:13:55,142 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] source Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf
2017-11-28 19:13:55,143 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] target Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc
2017-11-28 19:13:56,551 DEBUG [org.alfresco.util.exec.RuntimeExec] [http-bio-8080-exec-15] Execution result:
os: Linux
command: /usr/bin/soffice --infilter=writer_pdf_import --headless --convert-to doc /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf --print-to-file /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc
succeeded: true
exit code: 0
out: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc using filter : MS Word 97

err: Error: source file could not be loaded

2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] EXIT VALUE: 0
2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDOUT: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc using filter : MS Word 97

2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDERR: Error: source file could not be loaded

2017-11-28 19:13:56,552 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Trying transformation for the second time **********

2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Calling pdfToDocTransformWorker transform method for transformation **********

2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] source Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf
2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] target Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.doc // expected target path
2017-11-28 19:13:57,894 DEBUG [org.alfresco.util.exec.RuntimeExec] [http-bio-8080-exec-15] Execution result:
os: Linux
command: /usr/bin/soffice --infilter=writer_pdf_import --headless --convert-to doc /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf --print-to-file /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.doc
succeeded: true
exit code: 0
out: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_3942032914925752895.doc using filter : MS Word 97  // transformer runs and create the doc file

err: Error: source file could not be loaded // but as the file is created at wrong path the transformer is not able to load the file hence error

2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] EXIT VALUE: 0
2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDOUT: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_3942032914925752895.doc using filter : MS Word 97

2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDERR: Error: source file could not be loaded

2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Content data is null **********

In the exception the target path for the file is /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc but this is getting generated at /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc

The above is what I am able to figure out. Any help will be greatly appreciated.

Thanks

Hiten Rastogi

3 REPLIES 3

afaust
Legendary Innovator
Legendary Innovator

There is no way around it, the transformed file must be located at the path specified via the API. If the transformer is not generating the result at the correct path, you need to implement a thin wrapper that correctly moves / renames the result to the expected file path. Since you already have a custom implementation class (PDFToDOCContentTransformer) it should be trivial to add the necessary logic.

Thanks Axel for your input. I would like to know two things, if possible

1. Why the soffice is copying the file after generating to a different location because while running the command manually I can see the file generated at the same location. Is there something in Alfresco that is forcing the generation to a different directory ??

2. As I have just started playing around transformers I can't fully grasp how I can create the wrapper you are talking about in the suggestion. Any pointer or example would be much appreciated. 

Thanks

Hiten Rastogi

  1. Alfresco does not use command-line soffice - it talks to the headless soffice process via a local socket connection, and the source/result document are streamed through that. Alfresco itself stores the result in the correct file path.
  2. You already have a custom class "PDFToDOCContentTransformer". Simply put any file moving / renaming logic there. That class essentially is your "wrapper" around the soffice conversion process.