cancel
Showing results for 
Search instead for 
Did you mean: 

Disable Full Text Indexing

jjf
Champ in-the-making
Champ in-the-making
Is there a way to disable full text indexing of content?  If so, will this aid in the performance of running "index.recovery.mode=FULL".  Right now a full index takes hours to run and over 1GB of memory.  We have no need for Full text indexing in Alfresco so if it can be disabled, we'd like to pursue that. Thanks.
24 REPLIES 24

derek
Star Contributor
Star Contributor
Hi, Lee

You can disable the POI transformer and the slack will be picked up by the OpenOffice transformer; POI is much faster but has the drawback that it keeps the document in memory during the transformation.  Contact our support dept. if you have any trouble.

Regards
Derek

loftux
Star Contributor
Star Contributor
I'm using 3.1sp1 and have noticed that there IS a problem indexing large excel files (300+meg).

Maybe try a File, Save As to see if you can reduce the file size. To me 300+meg seems a bit like the file has phantom size added.
Have a look at http://www.ozgrid.com/Excel/ExcelProblems.htm or http://www.brainbell.com/tutorials/ms-office/excel/Reduce_Workbook_Bloat.htm or try searching internet for "excel large file size". Quite a lot of hits.

Isn't really an answer to your issue, but might be worth looking into.

lee
Champ in-the-making
Champ in-the-making
Hi Loftux,

Thanks for the suggestion.
Unfortunately our excel files really are that large.

For now, I've disabled the poi transformer as Derek suggested and that has worked. Obviously a better solution might be to use the low-level event oriented poi api when indexing. This would both speed up the indexing and keep mem usage low. If I get around to doing this, I'll post the code for all.

Thanks for your help

zladuric
Champ on-the-rise
Champ on-the-rise
I'm pretty new at all this so I may ask a newbie question now:

will disabling indexing of content also disable the indexing of metadata?

All my content are just docs scanned to pdf, I really have no use of these being indexed (my users, that is).

TIA,

Zlatko

derek
Star Contributor
Star Contributor
Hi,
You can switch the indexing of content off by modifying the cm:content property in the model and the other metadata will still be indexed.
Regards

zladuric
Champ on-the-rise
Champ on-the-rise
Can I bother you a bit more on this?

What model do I need to modify? My own or some predefined one?
And modify it in what manner?

lee
Champ in-the-making
Champ in-the-making
What model do I need to modify? My own or some predefined one?
And modify it in what manner?

You need to modify the default content model which should be called contentModel and found under WEB-INF/classes/alfresco/model.

You could change the cm:content type definition to something like this:

<type name="cm:content">
         <title>Content</title>
         <parent>cm:cmobject</parent>
         <archive>true</archive>
         <properties>
            <property name="cm:content">
               <type>d:content</type>
               <mandatory>false</mandatory>
               <!— Index content in the background –>
               <!– THIS DISABLES INDEXING–>
               <index enabled="false">
                  <atomic>true</atomic>
                  <stored>false</stored>
                  <tokenised>true</tokenised>
               </index>
            </property>
         </properties>
      </type>

Beware that this turns off indexing for all cm:content nodes. It's possible this may have unintended consequences elsewhere as there may be other processes that require the content to be indexed in order to work correctly (though the alfresco guys would have a better idea of this). For example, you may have more trouble using the search box - in that you won't get results you expect.

It's probably best to only turn off indexing for the specific content types you want. You can override the cm:content property for a specific type in your custom model. See this blog for help: http://www.ixxus.com/blog/2009/01/aspects/

zladuric
Champ on-the-rise
Champ on-the-rise
Would that be posible in my custom model like this?

[… model definition etc…]
   <type name="protenusSmiley Very HappyokumentDefault">
      <title>Default dokument</title>
      <parent>cm:content</parent>
      <properties>
         <overrides>
                   <property name="cm:content">
            <type>d:content</type>
            <mandatory>false</mandatory>
            <!— Index content in the background –>
                          <index enabled="false">
                         <atomic>true</atomic>
                         <stored>false</stored>
                           <tokenised>true</tokenised>
                      </index>
                   </property>
         </overrides>
         <property name=…
[… and so on, my custom properties…]

index enabled is true in contentModel.xml, I've put it to false here. Will that disable my properties being indexed or just the content?

TIA,

Zlatko

lee
Champ in-the-making
Champ in-the-making
That disables the indexing of cm:content for content type protenusSmiley Very HappyokumentDefault  only.

zladuric
Champ on-the-rise
Champ on-the-rise
Yes, in our custom model. The docs usually enter our system by being scanned to pdf and assigned one of the protenus: types. If anything else gets in the system, it's gonna be fine.

However, I still have some problems. I just came to a co that uses Alfresco (comunity, 3.0). I copied over the database and the contentstore AND the whole tomcat directory, but I can't seem to get alfresco to work on my machine with that. When I point it to empty contentstore and empty database, it starts and it's all great, /alfresco app is working and all is well. However, when I use the existing data, I have to rebuild the index (for some 100k transactions) but I always get an OOM.


Here's my full catalina.out:

Dec 9, 2009 4:04:02 PM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/lib/jvm/java-6-openjdk/jre/lib/amd64/server:/usr/lib/jvm/java-6-openjdk/jre/lib/amd64:/usr/lib/jvm/java-6-openjdk/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib/jni:/lib:/usr/lib
Dec 9, 2009 4:04:02 PM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
Dec 9, 2009 4:04:02 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1596 ms
Dec 9, 2009 4:04:02 PM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
Dec 9, 2009 4:04:02 PM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.18
Dec 9, 2009 4:04:07 PM org.apache.catalina.core.StandardContext addApplicationListener
INFO: The listener "org.apache.myfaces.webapp.StartupServletContextListener" is already configured for this context. The duplicate definition has been ignored.
16:04:28,390  INFO  [config.xml.XMLConfigService$PropertyConfigurer] Loading properties file from class path resource [alfresco/file-servers.properties]
16:04:29,135  DEBUG [workflow.jbpm.JBPMTransactionTemplate] jBPM persistence service present
16:04:29,135  DEBUG [workflow.jbpm.JBPMTransactionTemplate] creating hibernateTemplate based on jBPM SessionFactory
16:04:32,669  INFO  [alfresco.repo.workflow] Registered Workflow Component 'jbpm' (class org.alfresco.repo.workflow.jbpm.JBPMEngine)
16:04:32,669  INFO  [alfresco.repo.workflow] Registered Task Component 'jbpm' (class org.alfresco.repo.workflow.jbpm.JBPMEngine)
16:04:40,130  INFO  [domain.schema.SchemaBootstrap] Schema managed by database dialect org.hibernate.dialect.MySQLInnoDBDialect.
16:04:48,402  INFO  [domain.schema.SchemaBootstrap] No changes were made to the schema.
16:04:49,254 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Attached JBPM Context to transaction db32a60b-f1a1-44ae-b9f0-c864df0466f1
16:04:49,435 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/review_processdefinition.xml' already deployed
16:04:49,462 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/adhoc_processdefinition.xml' already deployed
16:04:49,485 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/submit_processdefinition.xml' already deployed
16:04:49,514 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/changerequest_processdefinition.xml' already deployed
16:04:49,607 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/submitdirect_processdefinition.xml' already deployed
16:04:49,650 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/invitation-nominated_processdefinition.xml' already deployed
16:04:49,747 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Workflow deployer: Definition 'alfresco/workflow/invitation-moderated_processdefinition.xml' already deployed
16:04:49,871 UserSmiley Frustratedystem DEBUG [alfresco.repo.workflow] Detached (commit) JBPM Context from transaction db32a60b-f1a1-44ae-b9f0-c864df0466f1
16:04:50,512 UserSmiley Frustratedystem INFO  [node.index.FullIndexRecoveryComponent] Index recovery started: 108,377 transactions.
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI RenewClean-[127.0.1.1:50503]" java.lang.OutOfMemoryError: Java heap space
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: Java heap space
        at java.lang.String.valueOf(String.java:2852)
        at java.lang.Thread.getName(Thread.java:1078)
        at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:662)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Exception in thread "RMI RenewClean-[127.0.1.1:50506]" java.lang.OutOfMemoryError: Java heap space
        at sun.rmi.transport.StreamRemoteCall.getInputStream(StreamRemoteCall.java:133)
        at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:221)
        at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:377)
        at sun.rmi.transport.DGCImpl_Stub.dirty(Unknown Source)
        at sun.rmi.transport.DGCClient$EndpointEntry.makeDirtyCall(DGCClient.java:360)
        at sun.rmi.transport.DGCClient$EndpointEntry.access$1600(DGCClient.java:171)
        at sun.rmi.transport.DGCClient$EndpointEntry$RenewCleanThread.run(DGCClient.java:573)
        at java.lang.Thread.run(Thread.java:636)
16:30:07,379  DEBUG [repo.transaction.RetryingTransactionHelper]
Transaction commit failed:
   Thread: indexTrackerThread1
   Txn:    UserTransaction[object=org.alfresco.util.transaction.SpringAwareUserTransaction@100bfd54, status=4]
   Iteration: 0
   Exception follows:
javax.transaction.RollbackException: Transaction didn't commit: Java heap space
        at org.alfresco.util.transaction.SpringAwareUserTransaction.commit(SpringAwareUserTransaction.java:477)
        at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:336)
        at org.alfresco.repo.node.index.AbstractReindexComponent$ReindexWorkerRunnable.run(AbstractReindexComponent.java:780)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.hibernate.loader.Loader.getKeyFromResultSet(Loader.java:1111)
        at org.hibernate.loader.Loader.getRowFromResultSet(Loader.java:565)
        at org.hibernate.loader.Loader.doQuery(Loader.java:701)
        at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
        at org.hibernate.loader.Loader.doList(Loader.java:2213)
        at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2104)
        at org.hibernate.loader.Loader.list(Loader.java:2099)
        at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:378)
        at org.hibernate.hql.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:338)
        at org.hibernate.engine.query.HQLQueryPlan.performList(HQLQueryPlan.java:172)
        at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1121)
        at org.hibernate.impl.QueryImpl.list(QueryImpl.java:79)
        at org.alfresco.repo.node.db.hibernate.HibernateNodeDaoServiceImpl$30.doInHibernate(HibernateNodeDaoServiceImpl.java:2971)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:372)
        at org.springframework.orm.hibernate3.HibernateTemplate.execute(HibernateTemplate.java:338)
        at org.alfresco.repo.node.db.hibernate.HibernateNodeDaoServiceImpl.getParentAssocsInternal(HibernateNodeDaoServiceImpl.java:2974)
        at org.alfresco.repo.node.db.hibernate.HibernateNodeDaoServiceImpl.getParentAssocs(HibernateNodeDaoServiceImpl.java:3005)
        at sun.reflect.GeneratedMethodAccessor241.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
        at org.alfresco.repo.transaction.TransactionalDaoInterceptor.invoke(TransactionalDaoInterceptor.java:68)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.alfresco.repo.domain.hibernate.DirtySessionMethodInterceptor.invoke(DirtySessionMethodInterceptor.java:381)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
        at $Proxy2.getParentAssocs(Unknown Source)
        at org.alfresco.repo.node.db.DbNodeServiceImpl.getParentAssocs(DbNodeServiceImpl.java:1331)
        at org.alfresco.repo.node.AbstractNodeServiceImpl.getParentAssocs(AbstractNodeServiceImpl.java:580)
        at sun.reflect.GeneratedMethodAccessor246.invoke(Unknown Source)
16:30:14,477  DEBUG [repo.transaction.RetryingTransactionHelper]
Transaction commit failed:
   Thread: indexTrackerThread2
   Txn:    UserTransaction[object=org.alfresco.util.transaction.SpringAwareUserTransaction@77a768c4, status=4]
   Iteration: 0
   Exception follows:
javax.transaction.RollbackException: Transaction didn't commit: Java heap space
        at org.alfresco.util.transaction.SpringAwareUserTransaction.commit(SpringAwareUserTransaction.java:477)
        at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:336)
        at org.alfresco.repo.node.index.AbstractReindexComponent$ReindexWorkerRunnable.run(AbstractReindexComponent.java:780)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2444)
        at java.lang.Class.getDeclaredMethod(Class.java:1952)
        at net.sf.cglib.proxy.Enhancer.getCallbacksSetter(Enhancer.java:627)
        at net.sf.cglib.proxy.Enhancer.setCallbacksHelper(Enhancer.java:615)
        at net.sf.cglib.proxy.Enhancer.setThreadCallbacks(Enhancer.java:609)
        at net.sf.cglib.proxy.Enhancer.registerCallbacks(Enhancer.java:578)
        at org.alfresco.repo.domain.hibernate.HibernateLoadListener.onLoad(HibernateLoadListener.java:16)
        at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:878)
        at org.hibernate.impl.SessionImpl.internalLoad(SessionImpl.java:846)
        at org.hibernate.type.EntityType.resolveIdentifier(EntityType.java:557)
        at org.hibernate.type.ManyToOneType.assemble(ManyToOneType.java:196)
        at org.hibernate.type.TypeFactory.assemble(TypeFactory.java:420)
        at org.hibernate.cache.entry.CacheEntry.assemble(CacheEntry.java:96)
        at org.hibernate.cache.entry.CacheEntry.assemble(CacheEntry.java:82)
        at org.hibernate.event.def.DefaultLoadEventListener.assembleCacheEntry(DefaultLoadEventListener.java:557)
        at org.hibernate.event.def.DefaultLoadEventListener.loadFromSecondLevelCache(DefaultLoadEventListener.java:512)
        at org.hibernate.event.def.DefaultLoadEventListener.doLoad(DefaultLoadEventListener.java:357)
        at org.hibernate.event.def.DefaultLoadEventListener.load(DefaultLoadEventListener.java:139)
        at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:195)
        at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:103)
        at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:878)
        at org.hibernate.impl.SessionImpl.get(SessionImpl.java:815)
        at org.hibernate.impl.SessionImpl.get(SessionImpl.java:808)
        at org.alfresco.repo.node.db.hibernate.HibernateNodeDaoServiceImpl.getParentAssocsInternal(HibernateNodeDaoServiceImpl.java:2940)
        at org.alfresco.repo.node.db.hibernate.HibernateNodeDaoServiceImpl.getParentAssocs(HibernateNodeDaoServiceImpl.java:3005)
        at sun.reflect.GeneratedMethodAccessor241.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
16:30:19,940  ERROR [index.AbstractReindexComponent.threads] Reindexer    62 failed with error: Exception in Transaction..
16:30:19,940  ERROR [index.AbstractReindexComponent.threads] Reindexer    61 failed with error: Exception in Transaction..
16:38:56,118  DEBUG [repo.transaction.RetryingTransactionHelper]
Transaction commit failed:
   Thread: indexTrackerThread3
   Txn:    UserTransaction[object=org.alfresco.util.transaction.SpringAwareUserTransaction@555ce56d, status=4]
   Iteration: 0
   Exception follows:
javax.transaction.RollbackException: Transaction didn't commit: Java heap space
        at org.alfresco.util.transaction.SpringAwareUserTransaction.commit(SpringAwareUserTransaction.java:477)
        at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:336)
        at org.alfresco.repo.node.index.AbstractReindexComponent$ReindexWorkerRunnable.run(AbstractReindexComponent.java:780)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.OutOfMemoryError: Java heap space
16:38:56,967  ERROR [index.AbstractReindexComponent.threads] Reindexer    63 failed with error: Exception in Transaction..



and it goes on like that.

I wonder if you know how would I get it to start and work on my devel box?

Btw, both machines have 64 bit Sun Java and all is the same, except that the production box has 4 gigs of RAM and I have 2gigs..