11-27-2017 06:45 AM
Hi All,
Here is the link to my project GitHub - rastoh/ev-pdf-to-doc
Please look for service-context_bak.xml for the beans related to soffice at the path ev-pdf-to-doc/ev-repo-pdftodoc/src/main/resources/alfresco/module/ev-repo-pdftodoc/context
I am trying to convert pdf to doc using a custom transformer through soffice but not able to though I am able to successfully run the custom transformer using abiword but the transformation is not as good as soffice when ran manually.
I call the transformer through an action on the share UI. Through soffice the transformation do take place but the transformed file not generated at the specified target directory but inside my project. Below is the exception that I get while running through soffice.
2017-11-28 19:13:55,140 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Executing transformation **********
2017-11-28 19:13:55,141 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Got the content data for transformation **********
2017-11-28 19:13:55,141 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Trying transformation for the first time **********
2017-11-28 19:13:55,142 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Calling pdfToDocTransformWorker transform method for transformation **********
2017-11-28 19:13:55,142 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] source Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf
2017-11-28 19:13:55,143 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] target Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc
2017-11-28 19:13:56,551 DEBUG [org.alfresco.util.exec.RuntimeExec] [http-bio-8080-exec-15] Execution result:
os: Linux
command: /usr/bin/soffice --infilter=writer_pdf_import --headless --convert-to doc /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf --print-to-file /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc
succeeded: true
exit code: 0
out: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc using filter : MS Word 97
err: Error: source file could not be loaded
2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] EXIT VALUE: 0
2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDOUT: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc using filter : MS Word 97
2017-11-28 19:13:56,551 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDERR: Error: source file could not be loaded
2017-11-28 19:13:56,552 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Trying transformation for the second time **********
2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Calling pdfToDocTransformWorker transform method for transformation **********
2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] source Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf
2017-11-28 19:13:56,553 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] target Path = /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.doc // expected target path
2017-11-28 19:13:57,894 DEBUG [org.alfresco.util.exec.RuntimeExec] [http-bio-8080-exec-15] Execution result:
os: Linux
command: /usr/bin/soffice --infilter=writer_pdf_import --headless --convert-to doc /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf --print-to-file /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.doc
succeeded: true
exit code: 0
out: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_3942032914925752895.doc using filter : MS Word 97 // transformer runs and create the doc file
err: Error: source file could not be loaded // but as the file is created at wrong path the transformer is not able to load the file hence error
2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] EXIT VALUE: 0
2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDOUT: convert /tmp/Alfresco/PDFToDOCTransformWorker_source_3942032914925752895.pdf -> /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_3942032914925752895.doc using filter : MS Word 97
2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCTransformWorker] [http-bio-8080-exec-15] STDERR: Error: source file could not be loaded
2017-11-28 19:13:57,895 DEBUG [com.eisenvault.service.transform.PDFToDOCContentTransformer] [http-bio-8080-exec-15] **********
Content data is null **********
In the exception the target path for the file is /tmp/Alfresco/PDFToDOCTransformWorker_source_5703572020147422740.doc but this is getting generated at /home/hitenrastogi/Documents/Projects/R_and_D_Projects/PDF_to_DOC/ev-repo-pdftodoc/PDFToDOCTransformWorker_source_5703572020147422740.doc
The above is what I am able to figure out. Any help will be greatly appreciated.
Thanks
Hiten Rastogi
11-29-2017 04:44 PM
There is no way around it, the transformed file must be located at the path specified via the API. If the transformer is not generating the result at the correct path, you need to implement a thin wrapper that correctly moves / renames the result to the expected file path. Since you already have a custom implementation class (PDFToDOCContentTransformer) it should be trivial to add the necessary logic.
11-30-2017 03:40 AM
Thanks Axel for your input. I would like to know two things, if possible
1. Why the soffice is copying the file after generating to a different location because while running the command manually I can see the file generated at the same location. Is there something in Alfresco that is forcing the generation to a different directory ??
2. As I have just started playing around transformers I can't fully grasp how I can create the wrapper you are talking about in the suggestion. Any pointer or example would be much appreciated.
Thanks
Hiten Rastogi
11-30-2017 03:54 AM
Explore our Alfresco products with the links below. Use labels to filter content by product module.