cancel
Showing results for 
Search instead for 
Did you mean: 

Content Transformation mimetype confusion

deisenlord
Champ in-the-making
Champ in-the-making
Please see my simpler explanation below first.  Thanks David

I need to exact some fields from a PDF Form in a workflow.   I have a 3rd party linux command line tool that works perfectly.   I added a content transformation to extensions that is included below.   The final intent is to exact the fields to a CSV file so the transform is supposed to go from PDF to CSV.   Now near as I can tell although mimetype text/csv exists the transform from application/pdf to text/csv does not, only to text/plain.  Hence my extension at the bottom of the post.  Now this appears to work initially just fine but then I find that my preview is broken for pdfs.  Turn on debugging and I see that the system has decided to use my transform for csv -> swf as well as pdf -> text and csv -> text and damn near everything else ??

Incorrect csv -> text

2013-03-20 14:06:30,260  DEBUG [content.transform.TransformerDebug] [http-bio-8443-exec-7] 7.1           csv  txt  <<TemporaryFile>> 2.1 KB transformer.pdf2csvtxt<<Runtime>>
2013-03-20 14:06:30,297  DEBUG [util.exec.RuntimeExec] [http-bio-8443-exec-7] Execution result:
   os:         Linux
   command:    /usr/local/bin/pdftk /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_661257349285517878.csv dump_data_fields output /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_1434143270609748430.txt
   succeeded:  false
   exit code:  1
   out:
   err:        Error: Failed to open PDF file:
   /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_661257349285517878.csv


Incorrect csv -> swf

2013-03-20 14:08:35,389  DEBUG [content.transform.TransformerDebug] [http-apr-80-exec-10] 8.2           csv  swf  dje20130320-4.csv 2.1 KB transformer.pdf2csvtxt<<Runtime>>
2013-03-20 14:08:35,423  DEBUG [util.exec.RuntimeExec] [http-apr-80-exec-10] Execution result:
   os:         Linux
   command:    /usr/local/bin/pdftk /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_6168968677349312104.csv dump_data_fields output /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_target_3388825081834715207.swf
   succeeded:  false
   exit code:  1
   out:
   err:        Error: Failed to open PDF file:
   /opt/alfresco-4.2.d.2/tomcat/temp/Alfresco/RuntimeExecutableContentTransformerWorker_source_6168968677349312104.csv


In fact when I list registered mimetypes via localhost/alfresco/service mimetypes EVERYTHING that didn't have a default transformation is now handled by my extension??   Here's the first few lines from application/pdf

</code>
application/pdf - pdf
Extractors: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter
Transformable To:
application/acp = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/dita+xml = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
</code>

Signed,
Confused.   Why are all conversions between pdf, csv, text and swf now going though my extension.  I specified an explicitTransformation property.

My extension


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>
        <bean id="transformer.worker.pdf2csv" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
                <property name="mimetypeService">
                        <ref bean="mimetypeService" />
                </property>
                <property name="checkCommand">
                        <bean class="org.alfresco.util.exec.RuntimeExec">
                                <property name="commandsAndArguments">
                                        <map>
                                                <entry key="Linux">
                                                        <list>
                                                                <value>ls</value>
                                                                <value>/usr/local/bin/pdftk</value>
                                                        </list>
                                                </entry>
                                        </map>
                                </property>
                        </bean>
                </property>

                <property name="transformCommand">
                        <bean class="org.alfresco.util.exec.RuntimeExec">
                                <property name="commandsAndArguments">
                                        <map>
                                                <entry key="Linux">
                                                        <list>
                                                           <value>/usr/local/bin/pdftk</value>
                                                           <value>${source}</value>
                                                           <value>dump_data_fields</value>
                                                           <value>output</value>
                                                           <value>${target}</value>
                                                        </list>
                                                </entry>
                                        </map>
                                </property>
                                <property name="errorCodes">
                                        <value>1,2</value>
                                </property>
                        </bean>
                </property>

                <property name="explicitTransformations">
                        <list>
                                <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
                                        <property name="sourceMimetype"><value>application/pdf</value></property>
                                        <property name="targetMimetype"><value>text/csv</value></property>
                                </bean>
                        </list>
                </property>
        </bean>

        <bean id="transformer.pdf2csv" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
                <property name="worker">
                        <ref bean="transformer.worker.pdf2csv" />
                </property>
        </bean>
</beans>
1 REPLY 1

deisenlord
Champ in-the-making
Champ in-the-making
Maybe this will demonstrate why I think something is wrong in a much simpler way.  On my 4.2d system (linux) if I browse to localhost/alfresco/services/mimetypes when I have no custom transformation XML files I see the following for text/calendar.


Before adding extension xml demo from Alfresco Wiki
text/calendar - ics
No extractors
Transformable To:
text/plain = org.alfresco.repo.content.transform.StringExtractingContentTransformer
Transformable From: Cannot be generated from anything else


After adding the demo DWG2PDF transformation xml file from the Alfresco Wiki I get the following for the calendar, every possible transformation in both directions.  Note I changed the name of the executable to /usr/local/bin/pdftk so the INIT phase of the transform would pass.  I assume this is a bug?


After adding extension xml demo from Alfresco Wiki
text/calendar - ics
No extractors
Transformable To:
application/acp = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/dita+xml = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/eps = Complex via: image/jpeg
application/framemaker = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/illustrator = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/java = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/json = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/mac-binhex40 = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/msword = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/octet-stream = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/oda = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/ogg = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)
application/pagemaker = Proxy via: org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker(/usr/local/bin/pdftk)