cancel
Showing results for 
Search instead for 
Did you mean: 

html to pdf coversion

aditya
Champ in-the-making
Champ in-the-making
Hi,

I wished to know which transformer does Alfresco uses to convert html or xml to pdf. Does it use a stylesheet for the conversion.

As a developer is it possible to extend this functionality and use style sheets according to our requirement.

Regards
Aditya
4 REPLIES 4

kevinr
Star Contributor
Star Contributor
I believe OpenOffice is used for that transformation - but the PDFBox library is to convert PDFs back to text (as OpenOffice does not).

All transformers in Alfesco are pluggable and can be replaced or new ones added.

http://wiki.alfresco.com/wiki/Content_Transformations

Thanks,

Kevin

good123
Champ in-the-making
Champ in-the-making
PDF to HTML - Converts PDF files to HTML files while seeking to preserve the original page layout (as best as technically possible).

derek
Star Contributor
Star Contributor
Hi,

Add this to your custom-repository-context.xml.  I knocked it up quickly and haven't tested it, even.  Please let me know if it works or not.  You will have to have Open Office available as it uses an odt as an intermediate step between any format that can be converted to odt, and html.

It won't be fast, but it'll be more versatile (if it works).  If you have another transformer that specializes in PDF to HTML, then you will have the best of both worlds as the repository will choose the fastest when it finds two that can do the same job.

   <bean id="transformer.complex.html-doc"
        class="org.alfresco.repo.content.transform.ComplexContentTransformer"
        parent="baseContentTransformer" >
      <property name="transformers">
         <list>
            <ref bean="transformer.OpenOffice" />
            <ref bean="transformer.OpenOffice" />
         </list>
      </property>
      <property name="intermediateMimetypes">
         <list>
            <value>application/vnd.oasis.opendocument.text</value>
         </list>
      </property>
   </bean>

If you want to knock together a quick converter for PDF to HTML, you can find a similar project on the forge: http://forge.alfresco.com/projects/xmltransformer/

Regards

derek
Star Contributor
Star Contributor
Hi,

I see that the PDF to HTML transformer mentioned comes as an executable, pdf2htm.exe.  There might be libraries, too, but I haven't looked.

It is possible to wrap executables up as transformers provided you can conceive of a command line that has a ${source} and ${target} file.  Read the Javadocs for RuntimeExecutableContentTransformer and look at the transformer.ImageMagick (in content-services-context.xml) for an example of how to Spring up a RuntimeExec instance to handle the execution itself.
Look at the Wiki for a further example.

Regards