<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Bad Lucene index distribution among segments in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306850#M259980</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;On a server with a bit more than 1 million documents, I have noticed this bad Lucene index segments distribution:&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;09:00:00,440 DEBUG [org.alfresco.repo.search.impl.lucene.index.IndexInfo] &lt;BR /&gt;Entry List&lt;BR /&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=cb6b6534-1162-403c-acea-59b5d9c55dba Type=INDEX Status=COMMITTED Docs=2445933 Deletions=0 &lt;BR /&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=e9b269b1-2206-448d-b98d-b52e45093000 Type=INDEX Status=COMMITTED Docs=13284 Deletions=0 &lt;BR /&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=16aa5e8a-f42f-413b-b01f-30b22e77f29f Type=INDEX Status=COMMITTED Docs=4199 Deletions=0 &lt;BR /&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=784c8097-1758-4d1f-8815-4ed84623781d Type=INDEX Status=COMMITTED Docs=2688 Deletions=0 &lt;BR /&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=ee0adccf-247a-4a9c-85c9-a43386bb7a5a Type=INDEX Status=COMMITTED Docs=321 Deletions=0 &lt;BR /&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=657e4ccb-b69f-4525-95c4-388bfb7a920e Type=INDEX Status=COMMITTED Docs=64 Deletions=0 &lt;BR /&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=b3e937ca-c0d6-43ea-af39-a0f7ca88a6bf Type=INDEX Status=COMMITTED Docs=27 Deletions=0 &lt;BR /&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=bc0f647b-0e3f-4bfd-a487-6d47062ff42a Type=INDEX Status=COMMITTED Docs=12 Deletions=0 &lt;BR /&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=514d7c32-93eb-4035-852d-e685f68a608a Type=DELTA Status=ACTIVE Docs=0 Deletions=0&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I use Alfresco 4.2-b and the configuration in alfresco-global.properties is :&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;### Lucene indexing ###&lt;BR /&gt;index.subsystem.name=lucene&lt;BR /&gt;lucene.indexer.mergerTargetIndexCount=10&lt;BR /&gt;lucene.indexer.mergerMergeFactor=10&lt;BR /&gt;lucene.indexer.writerMergeFactor=10&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It is not good at all. On another server, with the same index configuration and comparable number of documents, I get a far better segments distribution.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;How can I force Lucene to optimize its index without making a full reindex?&lt;/SPAN&gt;&lt;BR /&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 13 Oct 2015 15:17:07 GMT</pubDate>
    <dc:creator>rivarola</dc:creator>
    <dc:date>2015-10-13T15:17:07Z</dc:date>
    <item>
      <title>Bad Lucene index distribution among segments</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306850#M259980</link>
      <description>Hello,On a server with a bit more than 1 million documents, I have noticed this bad Lucene index segments distribution:09:00:00,440 DEBUG [org.alfresco.repo.search.impl.lucene.index.IndexInfo] Entry List1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Name=cb6b6534-1162-403c-acea-59b5d9c55dba Type=INDEX Status=COMMITTED Docs=2445933 Dele</description>
      <pubDate>Tue, 13 Oct 2015 15:17:07 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306850#M259980</guid>
      <dc:creator>rivarola</dc:creator>
      <dc:date>2015-10-13T15:17:07Z</dc:date>
    </item>
    <item>
      <title>Re: Bad Lucene index distribution among segments</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306851#M259981</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;what precisely do you consider bad with regards to this distribution? Your distribution actually looks ok to me - not perfect but a realistic result of normal operations.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In your case, the only thing that seems off to me is the relation between index segment #1 and #2. Optimally, the difference should be within an order of magnitude, not several or double digit. But this might be the result of a recent round of merging and segment #2 is only now starting to fill up again as new documents / changes come in.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Otherwise the progression is quite decent - you might even make do with fewer segments to improve search performance since you (currently) have very few documents in the three highest numbere segments.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am not aware of any method to optimize the segment structure apart from performing a full reindex. The only way I can think of involves writing a low-level Lucene tool to basically create a new index by piping the contents from the old into it, resulting in an implicit optimisation by "writing anew", without any actual reindexing that requires access to Alfresco DB / content.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Axel&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 14 Oct 2015 08:21:16 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306851#M259981</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2015-10-14T08:21:16Z</dc:date>
    </item>
    <item>
      <title>Re: Bad Lucene index distribution among segments</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306852#M259982</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Alex,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Two things seem strange to me :&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- I asked for 10 segments and there are only 8&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- the first one is far too big: 200 times the second. Usually the biggest is only 3 times the second.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; Philippe&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 14 Oct 2015 13:46:03 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306852#M259982</guid>
      <dc:creator>rivarola</dc:creator>
      <dc:date>2015-10-14T13:46:03Z</dc:date>
    </item>
    <item>
      <title>Re: Bad Lucene index distribution among segments</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306853#M259983</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;You aren't looking for an even spread of documents to segments.&amp;nbsp; Lucene does not use segments to partition the data in a way that say a hash bucket would, it simply creates a segment as and when it needs to persist its in memory state.&amp;nbsp;&amp;nbsp; And a background process to merge together old segments and remove deleted documents from the new segment.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 14 Oct 2015 16:17:19 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306853#M259983</guid>
      <dc:creator>mrogers</dc:creator>
      <dc:date>2015-10-14T16:17:19Z</dc:date>
    </item>
    <item>
      <title>Re: Bad Lucene index distribution among segments</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306854#M259984</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello Philippe,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;you have set a "target factor" of 10 which Lucene uses as input in its optimization / distribution but it is a "target", not a guarantee. Much like the MERGE index segments are created / deleted as needed and almost never are exactly the amount specified by the merger target factor, the normal index also goes through creation / deletion cycles. The 200-times larger first segment may very well be the result of Lucene merging the segments previously at positions 1 through 3 into one large segment at position 1, resulting in temporarily 2 fewer segments than the target factor until Lucene re-creates those when needed. With those previous segments, the progression might have been less radical and more in line with your expectations.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Axel&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 15 Oct 2015 08:09:09 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/bad-lucene-index-distribution-among-segments/m-p/306854#M259984</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2015-10-15T08:09:09Z</dc:date>
    </item>
  </channel>
</rss>

