cancel
Showing results for 
Search instead for 
Did you mean: 

How to get/exract content as text string from a noderef?

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I have a workflow which sends ms-word/powerpoint/pdf/excel for approval to a pool of users.Once approved, i need to extract content from the document in an XML file. I tried this code for content transformation:



contentService = services.getContentService();
        ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
        if (reader != null && reader.exists())
        {
                // get the transformer
                ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN);
                // is this transformer good enough?
                if (transformer != null)
                {
                    // We have a transformer that is fast enough
                    ContentWriter writer = contentService.getTempWriter();
                    writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);

                    try
                    {
                        transformer.transform(reader, writer);
                        // point the reader to the new-written content
                        reader = writer.getReader();
                        // Check that the reader is a view onto something concrete
                        if (!reader.exists())
                        {
                            throw new ContentIOException("The transformation did not write any content, yet: \n"
                                    + "   transformer:     " + transformer + "\n" + "   temp writer:     " + writer);
                        }else {
                              content = reader.getContentString();
                        }
                    }
                    catch (ContentIOException e)
                    {


                    }
                }
            }


        logger.debug("Content as a string  :  "+content);


It gives me content of document as a string but also creates a text file with same name as content which i do not want & nullifying purpose of workflow. How can i avoid that.I have read that transformation of content will result into file creation. 😎

Also the content i got in previous step is not purely text, it contains characters like   and a lots like this in formed XML which my external application fails to parse. Smiley Sad


How can i fulfill my requirement.Would appreciate for any help/suggestions.
3 REPLIES 3

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I could eliminate extra unwanted characters using code as below:

String cleanContent = content.replaceAll("[^\\x20-\\x7e]", "");

It's working fine now.  Smiley Happy

But i am not able to extract any content out of a MS PowerPoint presentation.It is showing length of content as 0.

How can i achieve the same?

My organization uses MS-Office suite & i am able to get text from word & excel.

Any help will be appreciated.

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I could get text extracted from word/excel/ppt successfully after installing Open Office 3.2.0.

Here are my entries in alfresco-global.properties for Open Office:

#
# External locations
#————-
ooo.exe=C:/Program Files/OpenOffice.org 3/program/soffice
ooo.user=D:/Alfresco/alf_data/oouser
jodconverter.officeHome=C:/Program Files/OpenOffice.org 3
jodconverter.portNumbers=8101
ooo.enabled=true
jodconverter.enabled=true
img.root=D:/Alfresco/ImageMagick
swf.exe=D:/Alfresco/bin/pdf2swf

But it is not working for MS Office 2007 ,only  for 97- 2003 & throwing error as below:


11:56:25,408 User:xxxxadmin ERROR [km.bpm.XmlFormationBean] Error while forming XML in formXmlFromContent() : null
java.lang.NullPointerException
        at org.apache.xml.serializer.TreeWalker.dispatachChars(TreeWalker.java:244)
        at org.apache.xml.serializer.TreeWalker.startNode(TreeWalker.java:414)
        at org.apache.xml.serializer.TreeWalker.traverse(TreeWalker.java:143)
        at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:389)
        at com.xxxx.alfresco.km.bpm.XmlFormationBean.formXmlFromContent(XmlFormationBean.java:192)
        at com.xxxx.alfresco.km.bpm.KMSolrSearchActionHandler.formSolrXml(KMSolrSearchActionHandler.java:224)
        at com.xxxx.alfresco.km.bpm.KMSolrSearchActionHandler.execute(KMSolrSearchActionHandler.java:94)
        at org.jbpm.graph.def.Action.execute(Action.java:129)
        at org.jbpm.graph.def.GraphElement.executeAction(GraphElement.java:284)
        at org.jbpm.graph.def.GraphElement.executeActions(GraphElement.java:241)
        at org.jbpm.graph.def.GraphElement.fireAndPropagateEvent(GraphElement.java:213)
        at org.jbpm.graph.def.GraphElement.fireEvent(GraphElement.java:196)
        at org.jbpm.graph.def.Node.leave(Node.java:466)
        at org.jbpm.graph.def.Node.leave(Node.java:438)
        at org.jbpm.graph.def.Node.execute(Node.java:429)
        at org.jbpm.graph.def.Node.enter(Node.java:390)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.hibernate.proxy.pojo.cglib.CGLIBLazyInitializer.invoke(CGLIBLazyInitializer.java:157)
        at org.jbpm.graph.def.Node$$EnhancerByCGLIB$$a33a6802.enter(<generated>)
        at org.jbpm.graph.def.Transition.take(Transition.java:167)
        at org.jbpm.graph.def.Node.leave(Node.java:479)
        at org.jbpm.graph.node.StartState.leave(StartState.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.hibernate.proxy.pojo.cglib.CGLIBLazyInitializer.invoke(CGLIBLazyInitializer.java:157)
        at org.jbpm.graph.def.Node$$EnhancerByCGLIB$$a33a6802.leave(<generated>)
        at org.jbpm.graph.exe.Token.signal(Token.java:223)
        at org.jbpm.graph.exe.Token.signal(Token.java:150)
        at org.jbpm.taskmgmt.exe.TaskInstance.end(TaskInstance.java:490)
        at org.alfresco.repo.workflow.jbpm.WorkflowTaskInstance.end(WorkflowTaskInstance.java:141)
        at org.jbpm.taskmgmt.exe.TaskInstance.end(TaskInstance.java:406)
        at org.alfresco.repo.workflow.jbpm.JBPMEngine$26.doInJbpm(JBPMEngine.java:1703)
        at org.springmodules.workflow.jbpm31.JbpmTemplate$1.doInHibernate(JbpmTemplate.java:87)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:372)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:338)
        at org.springmodules.workflow.jbpm31.JbpmTemplate.execute(JbpmTemplate.java:80)
        at org.alfresco.repo.workflow.jbpm.JBPMEngine.endTask(JBPMEngine.java:1680)
        at org.alfresco.repo.workflow.WorkflowServiceImpl.endTask(WorkflowServiceImpl.java:627)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:40)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodIntercep
:49)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.audit.AuditMethodInterceptor.proceed(AuditMethodInterceptor.java:199)
        at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:153)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy48.endTask(Unknown Source)
        at org.alfresco.repo.workflow.StartWorkflowActionExecuter.executeImpl(StartWorkflowActionExecuter.java:166)
        at org.alfresco.repo.action.executer.ActionExecuterAbstractBase.execute(ActionExecuterAbstractBase.java:127)
        at org.alfresco.repo.action.ActionServiceImpl.directActionExecution(ActionServiceImpl.java:711)
        at org.alfresco.repo.action.ActionServiceImpl.executeActionImpl(ActionServiceImpl.java:648)
        at org.alfresco.repo.action.ActionServiceImpl.executeAction(ActionServiceImpl.java:510)
        at org.alfresco.repo.action.ActionServiceImpl.executeAction(ActionServiceImpl.java:498)
        at org.alfresco.repo.action.ActionServiceImpl.executeAction(ActionServiceImpl.java:719)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:40)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodIntercep
:49)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.audit.AuditMethodInterceptor.proceed(AuditMethodInterceptor.java:199)
        at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:153)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy28.executeAction(Unknown Source)
        at org.alfresco.repo.jscript.ScriptAction.execute(ScriptAction.java:144)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.mozilla.javascript.MemberBox.invoke(MemberBox.java:155)
        at org.mozilla.javascript.NativeJavaMethod.call(NativeJavaMethod.java:243)
        at org.mozilla.javascript.optimizer.OptRuntime.call1(OptRuntime.java:66)
        at org.mozilla.javascript.gen.c5._c0(workspace://SpacesStore/f907489e-4762-4698-a44d-343ebb5f57cf:6)
        at org.mozilla.javascript.gen.c5.call(workspace://SpacesStore/f907489e-4762-4698-a44d-343ebb5f57cf)
        at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:393)
        at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:2834)
        at org.mozilla.javascript.gen.c5.call(workspace://SpacesStore/f907489e-4762-4698-a44d-343ebb5f57cf)
        at org.mozilla.javascript.gen.c5.exec(workspace://SpacesStore/f907489e-4762-4698-a44d-343ebb5f57cf)
        at org.alfresco.repo.jscript.RhinoScriptProcessor.executeScriptImpl(RhinoScriptProcessor.java:456)
        at org.alfresco.repo.jscript.RhinoScriptProcessor.execute(RhinoScriptProcessor.java:224)
        at org.alfresco.repo.processor.ScriptServiceImpl.executeScript(ScriptServiceImpl.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:40)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodIntercep
:49)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.audit.AuditMethodInterceptor.proceed(AuditMethodInterceptor.java:199)
        at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:153)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy234.executeScript(Unknown Source)
        at org.alfresco.repo.action.executer.ScriptActionExecuter.executeImpl(ScriptActionExecuter.java:170)
        at org.alfresco.repo.action.executer.ActionExecuterAbstractBase.execute(ActionExecuterAbstractBase.java:127)
        at org.alfresco.repo.action.ActionServiceImpl.directActionExecution(ActionServiceImpl.java:711)
        at org.alfresco.repo.action.executer.CompositeActionExecuter.executeImpl(CompositeActionExecuter.java:72)
        at org.alfresco.repo.action.executer.ActionExecuterAbstractBase.execute(ActionExecuterAbstractBase.java:127)
        at org.alfresco.repo.action.ActionServiceImpl.directActionExecution(ActionServiceImpl.java:711)
        at org.alfresco.repo.action.ActionServiceImpl.executeActionImpl(ActionServiceImpl.java:648)
        at org.alfresco.repo.action.ActionServiceImpl.executeAction(ActionServiceImpl.java:510)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at org.alfresco.repo.security.permissions.impl.AlwaysProceedMethodInterceptor.invoke(AlwaysProceedMethodInterceptor.java:40)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.security.permissions.impl.ExceptionTranslatorMethodInterceptor.invoke(ExceptionTranslatorMethodIntercep
:49)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.audit.AuditMethodInterceptor.proceedWithAudit(AuditMethodInterceptor.java:238)
        at org.alfresco.repo.audit.AuditMethodInterceptor.proceed(AuditMethodInterceptor.java:205)
        at org.alfresco.repo.audit.AuditMethodInterceptor.invoke(AuditMethodInterceptor.java:153)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy28.executeAction(Unknown Source)
        at org.alfresco.repo.rule.RuleServiceImpl.executeRule(RuleServiceImpl.java:1040)
        at org.alfresco.repo.rule.RuleServiceImpl.executePendingRule(RuleServiceImpl.java:1008)
        at org.alfresco.repo.rule.RuleServiceImpl.executePendingRulesImpl(RuleServiceImpl.java:979)
        at org.alfresco.repo.rule.RuleServiceImpl.executePendingRules(RuleServiceImpl.java:952)
        at org.alfresco.repo.rule.RuleTransactionListener.beforeCommit(RuleTransactionListener.java:63)
        at org.alfresco.repo.transaction.AlfrescoTransactionSupport$TransactionSynchronizationImpl.doBeforeCommit(AlfrescoTransactio
.java:744)
        at org.alfresco.repo.transaction.AlfrescoTransactionSupport$TransactionSynchronizationImpl.doBeforeCommit(AlfrescoTransactio
.java:724)
        at org.alfresco.repo.transaction.AlfrescoTransactionSupport$TransactionSynchronizationImpl.beforeCommit(AlfrescoTransactionS
ava:680)
        at org.springframework.transaction.support.TransactionSynchronizationUtils.triggerBeforeCommit(TransactionSynchronizationUti
48)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.triggerBeforeCommit(AbstractPlatformTransactio
.java:835)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManag
645)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java
        at org.springframework.transaction.interceptor.TransactionAspectSupport.commitTransactionAfterReturning(TransactionAspectSup
a:314)
        at org.alfresco.util.transaction.SpringAwareUserTransaction.commit(SpringAwareUserTransaction.java:467)
        at org.alfresco.web.bean.workflow.ManageTaskDialog.transition(ManageTaskDialog.java:452)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.myfaces.el.MethodBindingImpl.invoke(MethodBindingImpl.java:132)
        at org.apache.myfaces.application.ActionListenerImpl.processAction(ActionListenerImpl.java:61)
        at javax.faces.component.UICommand.broadcast(UICommand.java:109)
        at javax.faces.component.UIViewRoot._broadcastForPhase(UIViewRoot.java:97)
        at javax.faces.component.UIViewRoot.processApplication(UIViewRoot.java:171)
        at org.apache.myfaces.lifecycle.InvokeApplicationExecutor.execute(InvokeApplicationExecutor.java:32)
        at org.apache.myfaces.lifecycle.LifecycleImpl.executePhase(LifecycleImpl.java:95)
        at org.apache.myfaces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:70)
        at javax.faces.webapp.FacesServlet.service(FacesServlet.java:139)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.alfresco.web.app.servlet.AuthenticationFilter.doFilter(AuthenticationFilter.java:110)
        at sun.reflect.GeneratedMethodAccessor521.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.alfresco.repo.management.subsystems.ChainingSubsystemProxyFactory$1.invoke(ChainingSubsystemProxyFactory.java:122)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy216.doFilter(Unknown Source)
        at org.alfresco.repo.web.filter.beans.BeanProxyFilter.doFilter(BeanProxyFilter.java:88)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.alfresco.repo.web.filter.beans.NullFilter.doFilter(NullFilter.java:74)
        at sun.reflect.GeneratedMethodAccessor521.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.alfresco.repo.management.subsystems.ChainingSubsystemProxyFactory$1.invoke(ChainingSubsystemProxyFactory.java:122)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy216.doFilter(Unknown Source)
        at org.alfresco.repo.web.filter.beans.BeanProxyFilter.doFilter(BeanProxyFilter.java:88)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)

I checked with Alfresco wiki @

http://wiki.alfresco.com/wiki/Setting_up_OpenOffice_for_Alfresco

It is saying Open Office 3.2.0 is able to transform MS office 2007 docs also but its not.

Any idea?

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I finally used Apache Tika 0.7 to parse text from content. :idea:

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#sup...

Its working for MS office 2007 along with text, html, image, audio, zip, pdf. But for XML, i used Alfresco transformation only due to SAX parser issue in Tika.

Below is source for the same:


package com.xxxx.alfresco.km.bpm;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import javax.faces.context.FacesContext;

import org.alfresco.model.ContentModel;
import org.alfresco.repo.content.MimetypeMap;
import org.alfresco.repo.content.transform.ContentTransformer;
import org.alfresco.repo.workflow.jbpm.JBPMSpringActionHandler;
import org.alfresco.service.ServiceRegistry;
import org.alfresco.service.cmr.repository.ContentData;
import org.alfresco.service.cmr.repository.ContentIOException;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentService;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.namespace.QName;
import org.alfresco.web.bean.repository.Node;
import org.alfresco.web.bean.repository.Repository;
import org.alfresco.web.ui.common.Utils;
import org.alfresco.web.ui.common.Utils.URLMode;
import org.apache.log4j.Logger;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.BodyContentHandler;
import org.jbpm.context.exe.ContextInstance;
import org.jbpm.graph.exe.ExecutionContext;
import org.springframework.beans.factory.BeanFactory;
import org.xml.sax.ContentHandler;

import com.xxxx.alfresco.km.property.KmPropertyReader;

/**
* @author Lalit Jangra
* Class to handle SOLR integration with Alfresco.
* Once content is approved & moved back to original upload location,
* it will check for 'km:underWorkflow' property of content.
* If it is set to 'true', another workflow named 'SolrSearchWF' is triggered.
* This workflow will extract all required metadata properties from content noderef &
* form an XML to be posted to SOLR along with content as text string which is parsed using Apache Tika Parser.
* Finally XML will be posted to SOLR using SOLR Content Post Utility.
* Once content is posted successfully to SOLR, 'km:underWorkflow' is set to 'indexed'.
*/
public class KMSolrSearchActionHandler extends JBPMSpringActionHandler{
   
   private static final long serialVersionUID = 1L;
   private static Logger logger = Logger
   .getLogger(KMSolrSearchActionHandler.class);
   private String solrServIp = null;
   private String solrFileLoc = null;
   private   String alfServerIp = null;
   private NodeService nodeService;
   private ServiceRegistry services;
   private ContentService contentService;   
   //private FileFolderService fileFolderService;
   private String webdavUrl = null;
   List<String> categoryListToPass = new ArrayList<String>();
   private InputStream iStream = null;
   
   /**
    * Method to initialize services.
    */
   @Override
   protected void initialiseHandler(BeanFactory factory) {
      services = (ServiceRegistry) factory
      .getBean(ServiceRegistry.SERVICE_REGISTRY);
      nodeService = services.getNodeService();
      contentService = services.getContentService();
      //fileFolderService = services.getFileFolderService();
   }

   /**
    * Method calling formSolrXml() passing noderef of the content
    * forming SOLR specific well-formed XML.
    */
   @Override
   public void execute(ExecutionContext context) throws Exception {
      logger.debug("Inside execute of KMSolrSearchActionHandler");
       try{
              KmPropertyReader kmPropertyReader = new KmPropertyReader();
               solrServIp = kmPropertyReader.getProperty("solr.server.ip");
               logger.debug("solrServIp in KMSolrSearchActionHandler : "+solrServIp);
               solrFileLoc = kmPropertyReader.getProperty("solr.file.location");
               logger.debug("solrFileLoc in KMSolrSearchActionHandler : "+solrFileLoc);
               alfServerIp = kmPropertyReader.getProperty("alfresco.server.ip");
               logger.debug("alfServerIp in KMSolrSearchActionHandler : "+alfServerIp);
           }catch (Exception e) {
            logger.error("Error while reading property in KMSolrSearchActionHandler : "+e.getMessage());
         }
      final ContextInstance contextInstance = context.getContextInstance();
      NodeRef nodeRef = (NodeRef) contextInstance.getVariable("nodeRef");
      //Forming SOLR XML.
      formSolrXml(nodeRef);      
   }
   
   /**
    * Method to form XML to be posted to SOLR.
    * Content text string is parsed using APache Tika Parser.
    * Once well-formed XML is formed, it will call postSolrXML method to post
    * the same XML to SOLR using Content Post Utility.
    * @param nodeRef
    */
   @SuppressWarnings("unchecked")
   public void formSolrXml(NodeRef nodeRef){
      logger.debug("Inside formSolrXml in KMSolrSearchActionHandler : "+nodeRef);
      Node contentNode = new Node(nodeRef);
      String repoPath = Utils.generateURL(FacesContext.getCurrentInstance(), contentNode, URLMode.WEBDAV);      
      //logger.debug("repoPath "+repoPath);      
      webdavUrl = alfServerIp+repoPath;
      logger.debug("webdavUrl in formSolrXml() : "+webdavUrl);
      String noderef = nodeRef.toString();
      String[] tempNodeRef = noderef.split("SpacesStore/");
      String contentUuid = tempNodeRef[1];
        String category = "";
        String content = "";
       
      //Create an XML using XMLFormationBean & pass it to SOLR.
      XmlFormationBean xmlFormationBean = new XmlFormationBean();
      QName statusQname = QName.createQName("{http://www.xxxx.com/model/km/content/1.0}contentStatus");
      QName ownerQname = QName.createQName("{http://www.xxxx.com/model/km/content/1.0}originalOwner");
      QName ratingQname = QName.createQName("{http://www.xxxx.com/model/km/content/1.0}contentRating");
      QName coAuthorQname = QName.createQName("{http://www.xxxx.com/model/km/content/1.0}coAuthor");
      QName nameQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}name");      
      QName titleQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}title");
      QName descriptionQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}description");      
      QName authorQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}author");
//      QName kTQName = QName.createQName("{http://www.alfresco.org/model/content/1.0}Knowledge Type");
//      QName kPQName = QName.createQName("{http://www.alfresco.org/model/content/1.0}KP Domain");
//      QName typeQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}content");
      
      String cName = "";
      try{
         cName = nodeService.getProperty(nodeRef, nameQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Name in formSolrXml()");
      }      
      logger.debug("Content Name in formSolrXml : "+cName);
      String cStatus ="";
      try{
         cStatus = nodeService.getProperty(nodeRef, statusQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Status in formSolrXml()");
      }      
      logger.debug("Content Status in formSolrXml : "+cStatus);
      String cOwner = "";
      try{
         cOwner = nodeService.getProperty(nodeRef, ownerQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Owner in formSolrXml()");
      }         
      logger.debug("Content Owner in formSolrXml :  "+cOwner);
      String cRating = "";
      try{
         cRating = nodeService.getProperty(nodeRef, ratingQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Rating in formSolrXml()");
      }      
      logger.debug("Content Rating in formSolrXml :  "+cRating);
      String cCoAuthor = "";
      try{
         cCoAuthor = nodeService.getProperty(nodeRef, coAuthorQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null CoAuthor in formSolrXml()");
      }
      logger.debug("Content CoAuthor in formSolrXml : "+cCoAuthor);
      String cTitle = "";
      try{
         cTitle = nodeService.getProperty(nodeRef, titleQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Title in formSolrXml()");
      }
      logger.debug("Content Title in formSolrXml :  "+cTitle);
      String cDesc = "";
      try{
         cDesc = nodeService.getProperty(nodeRef, descriptionQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Description in formSolrXml()");
      }
      logger.debug("Content Description in formSolrXml :  "+cDesc);
      String cAuthor = "";
      try{
         cAuthor = nodeService.getProperty(nodeRef, authorQname).toString();
      }catch (NullPointerException e) {
         logger.debug("Null Content Author in formSolrXml()");
      }
      logger.debug("Content Author in formSolrXml :  "+cAuthor);
      String cType = "";
      try{
         cType = getContentMimeType(nodeRef);
      }catch (NullPointerException e) {
         logger.debug("Null Content Mimetype in formSolrXml()");
      }      
      logger.debug("Content Mimetype in formSolrXml :  "+cType);   
      //Getting Content Categories.
      try{
         Collection<NodeRef> categories = (Collection<NodeRef>)nodeService.getProperty(nodeRef, ContentModel.PROP_CATEGORIES);
         logger.debug("categories size : "+categories.size());
         Iterator itr11 =  categories.iterator();
         List<String> categoryList = new ArrayList<String>();
         while(itr11.hasNext()){
            NodeRef catNodeRef = (NodeRef) itr11.next();
            category = Repository.getNameForNode(nodeService, catNodeRef);
            logger.debug("Content Category in formSolrXml() : "+category);
            categoryList.add(category);
            logger.debug("categoryList size : "+categoryList.size());
         }   
         categoryListToPass = categoryList;
      }catch (NullPointerException e) {
         logger.error("Null Content categories in formSolrXml() ");
         //e.printStackTrace();
      }
      
      /*
       * Parsing content to extract text from content.
       * For all content types supported except XML, Apache Tika is used as XML formation issue with Tika due to SAX parser.
       * Hence, for XML, Alfresco transformation is used.
       * Also currently, .pst & images are not parsed giving content string of 0 length.
       * Please refer to url http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika#sup....
       */      
      if(cType.equalsIgnoreCase("text/xml")){
         //Using Alfresco transformation to transform content to plain text format & extracting text from content as a string.
         logger.debug("Using Alfresco to transform XML");
           ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
           if (reader != null && reader.exists())
           {
                   // Get the transformer
                   ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN);
                   if (transformer != null)
                   {
                       // We have a transformer that is fast enough
                       ContentWriter writer = contentService.getTempWriter();
                       writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);
                       try
                       {   
                          transformer.transform(reader, writer);
                           // point the reader to the new-written content
                           reader = writer.getReader();
                           // Check that the reader is a view onto something concrete
                           if (!reader.exists())
                           {
                              logger.error("Error while getting reader in KMSolrSearchActionHandler ");
                               throw new ContentIOException("The transformation did not write any content, yet: \n"
                                       + "   transformer:     " + transformer + "\n" + "   temp writer:     " + writer);
                           }else {
                                 content = reader.getContentString();
                           }
                          
                       }
                       catch (ContentIOException e)
                       {
                          logger.error("Error in transforming content : "+e.getMessage());                          
                       }
                   }
               }     
         
      }else{//Parsing Content Text using Apache Tika.
         logger.debug("Using Tika to parse content.");   
         try {
            QName contentQname = QName.createQName("{http://www.alfresco.org/model/content/1.0}content");
            ContentReader contentReader = contentService.getReader(nodeRef, contentQname);            
            // Using Alfresco API to get input stream
            iStream  = (InputStream)contentReader.getReader().getContentInputStream();            
            // Using BodyContentHandler to get extracted text.
            ContentHandler textHandler = new BodyContentHandler();
            // Setting metadata properties.
            Metadata metadata = new Metadata();
            metadata.add(Metadata.RESOURCE_NAME_KEY, cName);
            // Parsing is done here.
            AutoDetectParser parser = new AutoDetectParser();
            parser.parse(iStream , textHandler , metadata);
            // Getting parsed content as a string.
            content = textHandler.toString();
         } catch (Exception e) {
            logger.error("Error in Tika parsing : "+e.getMessage());
            //e.printStackTrace();
         } finally{
            //logger.debug("Closing input stream.");
            try {
               iStream.close();
               logger.debug("Input stream closed for Tika.");
            } catch (IOException e) {
               logger.debug("Error while closing stream for Tika : "+e.getMessage());
               e.printStackTrace();
            }
         }
      }      
        logger.debug("Length of content string for SOLR indexing  :  "+content.length());
       
        //Forming well-formed SOLR search XML.
        String finalFileName =  xmlFormationBean.formXmlFromContent(contentUuid,content, cType, cAuthor, cCoAuthor, cTitle, cOwner, categoryListToPass, cDesc, cStatus, cRating, cName, solrFileLoc,webdavUrl);
        logger.debug("finalFileName in formSolrXml : "+finalFileName);
        //Posting well-formed XML to SOLR server.
        postSolrXML(finalFileName,cName,nodeRef);
   }   
   
   /**
    * Method to post well-formed XML to SOLR using content post utility.
    * @param fileName
    * @param contentName
    */
   public void postSolrXML(String fullFileName,String contentName, NodeRef nodeRef){
      logger.debug("Entering postSolrXML() with content to be posted to SOLR : "+contentName +" &  fullFileName : "+fullFileName);
      //Only if these is a non-null file, then it should be posted to SOLR.
        File file = new File(fullFileName);
        if(file.length() > 0){           
             try {
                //Calling  SolrPostContentUtility.
                logger.debug("\n******************* Posting content to SOLR using utility by passing fully qualified file name,if result is 0, it's OK . if it's 1, its error! ****************");
                SolrPostContentUtility utility = new SolrPostContentUtility();
                int outCome = utility.postXmlToSolr(fullFileName);
                //If outCome is 0, it is OK , if it is 1, it's error!
                logger.debug("Result in postSolrXML() : "+outCome);
                if(outCome == 0){
                   //Content is posted successfully, Set km:underWorkflow to indexed for this content.
                   logger.debug("\n *************** Content named : "+contentName+" : successfully posted to SOLR. *************** \n");
                   logger.debug("\n *************** nodeRef of content posted successfully : " + nodeRef + " ***************");
                   logger.debug("\n *************** webdavUrl of content posted successfully : " + webdavUrl + " ***************");
                   QName underWorkflowQname = QName.createQName("{http://www.xxxx.com/model/km/content/1.0}underWorkflow");
                   String underWorkflowFlag = "";
                   try{
                      underWorkflowFlag = nodeService.getProperty(nodeRef, underWorkflowQname).toString();
                   }catch (NullPointerException e) {
                      logger.info("Null underWorkflowFlag!");
                      underWorkflowFlag = "";
                   }                  
                   Map<QName, Serializable> propertyMap = nodeService.getProperties(nodeRef);
                   propertyMap.put(underWorkflowQname, "indexed");
                   nodeService.setProperties(nodeRef, propertyMap);
                   underWorkflowFlag = nodeService.getProperty(nodeRef, underWorkflowQname).toString();
                   logger.debug("km:underWorkflow value in postSolrXML after successful indexing : "+underWorkflowFlag);
                }else if(outCome == 1){
                   //Content post is not successful.
                   logger.error("\n *************** Content named : "+contentName +" : could NOT be posted successfully to SOLR. *************** \n");
                }                
          } catch (Exception e) {
             logger.debug("Error while posting file in postSolrXML() : "+e.getMessage());
             e.printStackTrace();
          }      
        }else{
           logger.error("Not well-formed XML in postSolrXML()");
        }
   }
   
   /**
    * Method to get MimeType of a content passing it's nodeRef.
    * @param nodeRef
    * @return MimeType
    */
   public String getContentMimeType(NodeRef nodeRef){
      QName PROP_QNAME_CONTENT = QName.createQName("http://www.alfresco.org/model/content/1.0", "content");
       ContentData contentData = (ContentData) nodeService.getProperty(nodeRef, PROP_QNAME_CONTENT);
       String originalMimeType = contentData.getMimetype();
       return originalMimeType;
   }
}

Its working fine for me. Smiley Happy