<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: solr4 folder size and WFSTInputIterator files  are very large in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296619#M249749</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As commented by Axel, the size of the indices depends on the amount of documents, nodes and relating metadata. If your&amp;nbsp;content is mainly text-based (Office, PDF, HTML...), your indices can be a substantial (and important) part of the storage, compared to the size of the contentstore. This may&amp;nbsp;be dangerous when&amp;nbsp;your repository grows, maybe not now. Contentstore may be&amp;nbsp;located&amp;nbsp;in a NFS mount point, and SOLR indices may be in local disk&amp;nbsp;(or faster disks for performance) and this is &amp;nbsp;more expensive in general. Besides, if&amp;nbsp;your indices disk is slow, you will have problems with IO when indexing and searching.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If&amp;nbsp;you have lots of documents deleted in your index you can make it smaller full reindexing (this is when maxdoc is much bigger than numdoc in your searchers). There exists other indexation strategies for making your indices smaller:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;disabling full text in SOLR (if this is possible for your use case)&lt;/LI&gt;&lt;LI&gt;disabling OCR processes (if any, and also, if possible)&lt;/LI&gt;&lt;LI&gt;disabling automatic metadata extracters (for example, exif&amp;nbsp;metadata in images...)&lt;/LI&gt;&lt;LI&gt;controling your indices with&amp;nbsp;cm:indexControl aspect in Alfresco.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;And then reindexing. Also disabling archive searcher in SOLR may be helpful, cause you have to keep in mind that when your repository is growing, your SOLR memory requirements are higher too.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Finally, relating to the WFSIterator* files in /tomcat/temp, it is usual to deactivate SOLR&amp;nbsp;suggester in &amp;lt;solrRootDir&amp;gt;/workspace-SpacesStore/conf/solrcore.properties (solr.suggester.enabled=false) to avoid these huge files in tomcat/temp. Then, you can clean tomcat/temp and restart Alfresco.&amp;nbsp;In fact, this is recommended when migrating from Alfresco 4 to Alfresco 5.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;--C.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 16 Dec 2016 12:24:00 GMT</pubDate>
    <dc:creator>cesarista</dc:creator>
    <dc:date>2016-12-16T12:24:00Z</dc:date>
    <item>
      <title>solr4 folder size and WFSTInputIterator files  are very large</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296617#M249747</link>
      <description>Hello,&amp;nbsp;I use alfresco community 5.0.d and i find that the size of /alfresco/alf_data/solr4 is very large. And I have very large WFSTInputIterator files too. (44go)&amp;nbsp;Is it normal? Is it possible to decrease the size?&amp;nbsp;some details on folders sizes :/alfresco/alf_data/contentstore/ : 210go/alfresco/alf_</description>
      <pubDate>Thu, 15 Dec 2016 14:34:17 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296617#M249747</guid>
      <dc:creator>mattjourdan</dc:creator>
      <dc:date>2016-12-15T14:34:17Z</dc:date>
    </item>
    <item>
      <title>Re: solr4 folder size and WFSTInputIterator files  are very large</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296618#M249748</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The question "is this normal" cannot be universally answered. It depends extremely on the amount of metadata associated with your nodes as well as scope/sizes of full text that is indexed. Your sizes are definitely not that high that I would consider them to be extreme or not normal.&lt;/P&gt;&lt;P&gt;Typically the index may fragment over time so doing a complete reindex might help reduce the size of the indices. Alfresco also provides various templates for SOLR cores whereas the "rerank" template is said to produce more efficient indices. Last but not least you can technically reduce the amount of full text that is indexed or optimize the amount of metadata you maintain...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 16 Dec 2016 08:44:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296618#M249748</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2016-12-16T08:44:00Z</dc:date>
    </item>
    <item>
      <title>Re: solr4 folder size and WFSTInputIterator files  are very large</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296619#M249749</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As commented by Axel, the size of the indices depends on the amount of documents, nodes and relating metadata. If your&amp;nbsp;content is mainly text-based (Office, PDF, HTML...), your indices can be a substantial (and important) part of the storage, compared to the size of the contentstore. This may&amp;nbsp;be dangerous when&amp;nbsp;your repository grows, maybe not now. Contentstore may be&amp;nbsp;located&amp;nbsp;in a NFS mount point, and SOLR indices may be in local disk&amp;nbsp;(or faster disks for performance) and this is &amp;nbsp;more expensive in general. Besides, if&amp;nbsp;your indices disk is slow, you will have problems with IO when indexing and searching.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If&amp;nbsp;you have lots of documents deleted in your index you can make it smaller full reindexing (this is when maxdoc is much bigger than numdoc in your searchers). There exists other indexation strategies for making your indices smaller:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;disabling full text in SOLR (if this is possible for your use case)&lt;/LI&gt;&lt;LI&gt;disabling OCR processes (if any, and also, if possible)&lt;/LI&gt;&lt;LI&gt;disabling automatic metadata extracters (for example, exif&amp;nbsp;metadata in images...)&lt;/LI&gt;&lt;LI&gt;controling your indices with&amp;nbsp;cm:indexControl aspect in Alfresco.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;And then reindexing. Also disabling archive searcher in SOLR may be helpful, cause you have to keep in mind that when your repository is growing, your SOLR memory requirements are higher too.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Finally, relating to the WFSIterator* files in /tomcat/temp, it is usual to deactivate SOLR&amp;nbsp;suggester in &amp;lt;solrRootDir&amp;gt;/workspace-SpacesStore/conf/solrcore.properties (solr.suggester.enabled=false) to avoid these huge files in tomcat/temp. Then, you can clean tomcat/temp and restart Alfresco.&amp;nbsp;In fact, this is recommended when migrating from Alfresco 4 to Alfresco 5.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;--C.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 16 Dec 2016 12:24:00 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296619#M249749</guid>
      <dc:creator>cesarista</dc:creator>
      <dc:date>2016-12-16T12:24:00Z</dc:date>
    </item>
    <item>
      <title>Re: solr4 folder size and WFSTInputIterator files  are very large</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296620#M249750</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks a lof for your answers.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Matthieu&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 16 Dec 2016 12:51:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/solr4-folder-size-and-wfstinputiterator-files-are-very-large/m-p/296620#M249750</guid>
      <dc:creator>mattjourdan</dc:creator>
      <dc:date>2016-12-16T12:51:36Z</dc:date>
    </item>
  </channel>
</rss>

