cancel
Showing results for 
Search instead for 
Did you mean: 

Convert pdf to html

niamh
Champ in-the-making
Champ in-the-making
Hello,

I am trying to add the functionality so that documents can be converted from pdf to html on Alfresco.
I have followed the steps at the following link;
http://wiki.alfresco.com/wiki/Content_Transformations
I downloaded an exe called pdftohtml from sourceforge and am trying to add this as a content transformer. I added the following lines to the file content-services-context.xml;

<bean id="transformer.pdftohtml"
        class="org.alfresco.repo.content.transform.pdftohtmlContentTransformer"
        parent="baseContentTransformer"
        init-method="init">
      <property name="transformer">
         <bean name="transformer.pdftohtml.Command" class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key="Windows.*">
                        <value>pdftohtml "${source}" "${target}"</value>
                    </entry>
                </map>
            </property>
         </bean>
      </property>
  
<property name="explicitTransformations">
   <list>
  <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey">
<constructor-arg><value>pdf/html</value></constructor-arg>
            </bean>
         </list>
</property>
</bean>

However it is throwing errors and Alfresco is not starting up when I add this in.
I was wondering if there is a more straightforward list of steps, including a recommended conversion product that I could use to achieve this.
6 REPLIES 6

derek
Star Contributor
Star Contributor
Hi,

I'm afraid that that's as straightforward as it is going to get.  There are several errors in the config.
    Your class name must be the RuntimeExecutableContentTransformer as per the example.  It's the class that does the work; you can't make it up.
    You're missing a constructor argument for the TransformationKey.  Look at the example, there are two constructor arguments and both are valid mimetypes.  pdf/html isn't a valid mimetype but the idea is there.
I would highly recommend that you read the Javadoc for the classes that you are attempting to configure: http://dev.alfresco.com/resource/docs/java/

Apart from the command to execute and the mimetypes of the transformation key, the config should be the same as the Wiki sample.  If you continue to struggle, then post the relevant exceptions as well.

Regards

mikef
Champ in-the-making
Champ in-the-making
Here is a working version of the config file.  The transformer only supports 1 to 1 transformations so images are not handled as they are separate files. Also make sure the pdftohtml executable is on your path.

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

   <bean id="transformer.pdftohtml" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformer" parent="baseContentTransformer">
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key=".*">
                        <value>pdftohtml -v</value>
                    </entry>
                </map>
            </property>
            <property name="errorCodes">
               <value>2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key="Windows.*">
                        <value>pdftohtml -q -p -noframes "${source}" "${target}"</value>
                    </entry>
                </map>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey" >
                <constructor-arg><value>application/pdf</value></constructor-arg>
                <constructor-arg><value>text/html</value></constructor-arg>
            </bean>
         </list>
      </property>
   </bean>
  
</beans>

fedoratori
Champ in-the-making
Champ in-the-making
hello can you please tell me how did you add this xml to activate the conversion from website interface i try with publican converter from xml+css+.. to html how can i configure this

derek
Star Contributor
Star Contributor
hello can you please tell me how did you add this xml to activate the conversion from website interface i try with publican converter from xml+css+.. to html how can i configure this
Add it to an XML file <tomcat>/shared/classes/alfresco/extension/pdf-html-context.xml.  See this: Advanced Spring Configuration
Regards

fedoratori
Champ in-the-making
Champ in-the-making
thank you for reply but still have problem look at the log


09:34:10,000 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from class path resource [alfresco/repository.properties]
09:34:10,004 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from class path resource [alfresco/domain/transaction.properties]
09:34:10,004 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from file [/opt/Alfresco/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/module/tests/alfresco-global.properties]
09:34:10,004 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from file [/opt/Alfresco/tomcat/webapps/alfresco/WEB-INF/classes/alfresco/module/test/alfresco-global.properties]
09:34:10,005 INFO  [org.alfresco.config.JndiPropertiesFactoryBean] Loading properties file from URL [file:/opt/Alfresco/tomcat/shared/classes/alfresco-global.properties]
09:34:10,217 INFO  [org.alfresco.config.JndiPropertyPlaceholderConfigurer] Loading properties file from class path resource [alfresco/alfresco-shared.properties]
09:34:34,148 ERROR [org.springframework.web.context.ContextLoader] Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'transformer.worker.pdftohtml' defined in file [/opt/Alfresco/tomcat/shared/classes/alfresco/extension/pdf-html-context.xml]: Initialization of bean failed; nested exception is org.springframework.beans.ConversionNotSupportedException: Failed to convert property value of type 'java.util.ArrayList' to required type 'java.util.List' for property 'explicitTransformations'; nested exception is java.lang.IllegalStateException: Cannot convert value of type [org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey] to required type [org.alfresco.repo.content.transform.ExplictTransformationDetails] for property 'explicitTransformations[0]': no matching editors or conversion strategy found
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:450)
   at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:290)
   at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
   at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:287)
   at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:189)
   at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:557)
   at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:842)
   at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:416)
   at org.springframework.web.context.ContextLoader.createWebApplicationContext(ContextLoader.java:261)
   at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:192)
   at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:47)
   at org.alfresco.web.app.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:63)
   at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3972)
   at org.apache.catalina.core.StandardContext.start(StandardContext.java:4467)
   at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
   at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
   at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637)
   at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563)
   at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498)
   at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
   at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
   at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
   at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
   at org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
   at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
   at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
   at org.apache.catalina.core.StandardService.start(StandardService.java:519)
   at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
   at org.apache.catalina.startup.Catalina.start(Catalina.java:581)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
   at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.springframework.beans.ConversionNotSupportedException: Failed to convert property value of type 'java.util.ArrayList' to required type 'java.util.List' for property 'explicitTransformations'; nested exception is java.lang.IllegalStateException: Cannot convert value of type [org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey] to required type [org.alfresco.repo.content.transform.ExplictTransformationDetails] for property 'explicitTransformations[0]': no matching editors or conversion strategy found
   at org.springframework.beans.BeanWrapperImpl.convertForProperty(BeanWrapperImpl.java:462)
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.convertForProperty(AbstractAutowireCapableBeanFactory.java:1351)
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.applyPropertyValues(AbstractAutowireCapableBeanFactory.java:1310)
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1067)
   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:511)
   … 36 more
Caused by: java.lang.IllegalStateException: Cannot convert value of type [org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey] to required type [org.alfresco.repo.content.transform.ExplictTransformationDetails] for property 'explicitTransformations[0]': no matching editors or conversion strategy found
   at org.springframework.beans.TypeConverterDelegate.convertIfNecessary(TypeConverterDelegate.java:289)
   at org.springframework.beans.TypeConverterDelegate.convertToTypedCollection(TypeConverterDelegate.java:575)
   at org.springframework.beans.TypeConverterDelegate.convertIfNecessary(TypeConverterDelegate.java:231)
   at org.springframework.beans.TypeConverterDelegate.convertIfNecessary(TypeConverterDelegate.java:154)
   at org.springframework.beans.BeanWrapperImpl.convertForProperty(BeanWrapperImpl.java:452)
   … 40 more
09:34:44,093 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 227 Web Scripts (+0 failed), 235 URLs
09:34:44,094 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 8 Package Description Documents (+0 failed)
09:34:44,094 INFO  [org.springframework.extensions.webscripts.DeclarativeRegistry] Registered 0 Schema Description Documents (+0 failed)
09:34:44,243 INFO  [org.springframework.extensions.webscripts.AbstractRuntimeContainer] Initialised Spring Surf Container Web Script Container (in 2473.396ms)
09:34:44,359 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
09:34:44,430 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
09:34:44,491 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
09:34:44,495 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js
09:34:44,605 INFO  [org.springframework.extensions.webscripts.TemplateProcessorRegistry] Registered template processor freemarker for extension ftl
09:34:44,610 INFO  [org.springframework.extensions.webscripts.ScriptProcessorRegistry] Registered script processor javascript for extension js




i have centos 5 and alfresco 3.3 installed  so i change the xml :

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

   <bean id="transformer.worker.pdftohtml" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
                <property name="mimetypeService">
                        <ref bean="mimetypeService" />
                </property>
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key=".*">
                        <value>/usr/bin/pdftohtml -v</value>
                    </entry>
                </map>
            </property>
            <property name="errorCodes">
               <value>2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key=".*">
                                 <value>/usr/bin/pdftohtml -q -p -noframes "${source}" "${target}"</value>
                    </entry>
                </map>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ContentTransformerRegistry$TransformationKey" >
                <constructor-arg><value>application/pdf</value></constructor-arg>
                <constructor-arg><value>text/html</value></constructor-arg>
            </bean>
         </list>
      </property>
   </bean>
        <bean id="transformer.pdftohtml" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
                <property name="worker">
                        <ref bean="transformer.worker.pdftohtml" />
                </property>
        </bean>

</beans>




i dont know if the problem with xml or there is other config to do ??

fedoratori
Champ in-the-making
Champ in-the-making
dont bother you self man the problem it was with xml, now its working with this xml


<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>

<beans>

   <bean id="transformer.worker.pdftohtml" class="org.alfresco.repo.content.transform.RuntimeExecutableContentTransformerWorker">
                <property name="mimetypeService">
                        <ref bean="mimetypeService" />
                </property>
      <property name="checkCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
                    <entry key=".*">
                        <value>/usr/bin/pdftohtml -v</value>
                    </entry>
                </map>
            </property>
            <property name="errorCodes">
               <value>2</value>
            </property>
         </bean>
      </property>
      <property name="transformCommand">
         <bean class="org.alfresco.util.exec.RuntimeExec">
            <property name="commandMap">
                <map>
<entry key=".*">
                        <value>/usr/bin/pdftohtml -q -p -noframes "${source}" "${target}"</value>
                    </entry>
                </map>
            </property>
         </bean>
      </property>
      <property name="explicitTransformations">
         <list>
            <bean class="org.alfresco.repo.content.transform.ExplictTransformationDetails">
               <property name="sourceMimetype">
                           <value>application/pdf</value>
               </property>
               <property name="targetMimetype">
                           <value>text/html</value>
               </property>
            </bean>
         </list>
      </property>
   </bean>

        <bean id="transformer.pdftohtml" class="org.alfresco.repo.content.transform.ProxyContentTransformer" parent="baseContentTransformer">
                <property name="worker">
                        <ref bean="transformer.worker.pdftohtml" />
                </property>
</bean>

</beans>



i modify the explicitTransformations property section with an other from the alfresco 3.3 samples and now it works very well