cancel
Showing results for 
Search instead for 
Did you mean: 

lucene search returns less than max results

tobikl1
Champ in-the-making
Champ in-the-making

Hi, i have a question regarding the lucene search and moving / renaming nodes.

We are moving some folders from one location to a backup location and then recreate the same folderstructure by copying from a Space template to the base folder again. we then move existing files from the old folder structure to this newly created one in a javascript webscript.

The problem we are now facing is that when we call search.query with a lucene search term and a configured maximum resultset of e.g. 10, at first we get the 10 results, but after calling this javascript webscript, even though we would expect to still get 10 maximum results afterwards, we only get 9 results afterwards.

So we have lots of folders with a specific folderstructure inside and rules applied to several of this folders. Since on our production server this has become outdated we are trying to rebuild this folderstructures from scrath, yet the original files contained in them should remain the same. For this we move all the files and folders from the base folder to a backup folder. Then we recreate the folderstructure by copying from a Space Templates Folder and afterwards move some existing files from the backup to the newly created subfolder.

E.g.:

We have the Folder "Apps/<App_Name>/Channels/General" containing a file "<App_Name>.xml"

In the script, we create a folder "Apps Backup/<App_Name>_backup_<timestamp>" and move the folder "Apps/<App_Name>/Channels" to the folder "Apps Backup/<App_Name>_backup_<timestamp>".

In the same script we then recreate the folder structure in "Apps/<App_Name>" by copying "Channels/General" from a Space Template Folder. Then we move the "<App_Name>.xml" file from the General Folder moved to the backup by now to the newly created General folder.

in "Apps" we have 120k+ Folders of the same structure, yet over the time we have added new features to our apps and therefor new rules got applied to the different folders inside. Since in javascript we do not have direct access to the ruleservice, we thought it would be a viable way, to update rules on the folders by just recreating them with the most recent rules configured on the space templates folders.

When starting a search for given xml files with search.query by searching for the path + specialized type of xml file and limiting the maximum results to 10, we get a resultset of 10 of this xml files as expected. but once we run the described script on one of the folders containing one of the xml files returned by this search, on the next search configured with 10 max results we only get 9 results afterwards, yet the search should still return 10 results always, since we have more than 120k of this xml files. i would suggest the lucene search messed something up and gets some kind of db-id afterwards which it cannot connect to a node id anymore and therefore just discards if of the search result leaving us with only 9 nodes returned?

I am not aware on how this could happen and hope you could give me some insight.

the query we use is

{ 
   "query": "PATH:\"/app:company_home/cm:Apps/cm:*/cm:Channels/cm:General/cm:*\" AND TYPE:\"sbd:sam\" AND ASPECT:\"praxis:baseSAM\" AND @praxis\\:visible:\"false\" AND @sbd\\:city:\"Köln\"",
   "sort": [   {
        "column": "name",
        "ascending": true
   }   ],
   "language": "lucene",
   "page": {
        "maxItems": 10,
        "skipCount": 0
}

Best regards,

Tobias Kleigrewe

4 REPLIES 4

mrks_js1
Star Contributor
Star Contributor

Hi!

These "bulk copy operations" do often result in issues with the index.

Could you gather some index related information, for example: https://your_alfresco_host/solr4/admin/cores?action=REPORT&wt=xml

And what version of alfresco are you running?

tobikl1
Champ in-the-making
Champ in-the-making

Hi, thx for the quick reply,

We are working on

Alfresco Community v5.0.0

(c r91299-b145) schema 8009

Spring Surf and Spring WebScripts - v5.0.0

(Release)

I am adding the info you asked for from our Test Server, which shows the same behaviour and wrong search results, yet with a smaller amount of Folders in the "Apps" Folder

<?xml version="1.0" encoding="UTF-8"?>

<response>

  <lst name="responseHeader">

  <int name="status">0</int>

  <int name="QTime">52523</int>

  </lst>

  <lst name="report">

  <lst name="alfresco">

  <str name="Alfresco version">5.0.0</str>

  <long name="DB acl transaction count">1066</long>

  <long name="Count of duplicated acl transactions in the index">0</long>

  <long name="Count of acl transactions in the index but not the DB">4</long>

  <long name="First acl transaction in the index but not the DB">88</long>

  <long name="Count of missing acl transactions from the Index">0</long>

  <long name="Index acl transaction count">1787</long>

  <long name="Index unique acl transaction count">1787</long>

  <long name="Last indexed change set commit time">1477579611985</long>

  <str name="Last indexed change set commit date">2016-10-27T16:46:51</str>

  <long name="Last changeset id before holes">-1</long>

  <long name="DB transaction count">56832</long>

  <long name="Count of duplicated transactions in the index">0</long>

  <long name="Count of transactions in the index but not the DB">5042</long>

  <long name="First transaction in the index but not the DB">43</long>

  <long name="Count of missing transactions from the Index">0</long>

  <long name="Index transaction count">178432</long>

  <long name="Index unique transaction count">178432</long>

  <long name="Index node count">72361</long>

  <long name="Count of duplicate nodes in the index">100</long>

  <long name="First duplicate node id in the index">6179</long>

  <long name="Index error count">4</long>

  <long name="Count of duplicate error docs in the index">0</long>

  <long name="Index unindexed count">3115</long>

  <long name="Count of duplicate unindexed docs in the index">17</long>

  <long name="First duplicate unindexed in the index">66049</long>

  <long name="Last indexed transaction commit time">1477650600945</long>

  <str name="Last indexed transaction commit date">2016-10-28T12:30:00</str>

  <long name="Last TX id before holes">-1</long>

  <long name="Node count with FTSStatus Clean">30597</long>

  <long name="Node count with FTSStatus Dirty">0</long>

  <long name="Node count with FTSStatus New">0</long>

  </lst>

  </lst>

</response>

mrks_js1
Star Contributor
Star Contributor

well, it would make sense to compare the report from test with the one from prod. check the values for unindexed and error nodes.

what happens if you run the same query in the node browser? same behavior?

tobikl1
Champ in-the-making
Champ in-the-making

Hi,

sorrry it took me a while to reply. So following is the report of the production server. As you can see we have quite a number of unindexed and error nodes on both the production and the test server. Now I am wondering, wether it is possible to repair this nodes one by one manually or if i have to build a new index completely.

Regarding the bulk copy and move stuff, we are currently trying to use a different approach on updating the rules on this folders by exposing the ruleService to the javascript. So we propably won't mess up the index anymore since we do not have to copy and move folders anymore.

Anyway, we would like to get our Index back into a healthy state. We tried calling the FIX and RETRY action, yet the errors persisted.

Im not experienced in handling solr, could you point me into the right direction for fixing this issues?

Thanks for your help and best regards,

Tobias Kleigrewe

<?xml version="1.0" encoding="UTF-8"?>

<response>

  <lst name="responseHeader">

    <int name="status">0</int>

    <int name="QTime">94942</int>

  </lst>

  <lst name="report">

    <lst name="alfresco">

      <str name="Alfresco version">5.0.0</str>

      <long name="DB acl transaction count">3042</long>

      <long name="Count of duplicated acl transactions in the index">0</long>

      <long name="Count of acl transactions in the index but not the DB">229</long>

      <long name="First acl transaction in the index but not the DB">19</long>

      <long name="Count of missing acl transactions from the Index">0</long>

      <long name="Index acl transaction count">9189</long>

      <long name="Index unique acl transaction count">9189</long>

      <long name="Last indexed change set commit time">1478074760182</long>

      <str name="Last indexed change set commit date">2016-11-02T09:19:20</str>

      <long name="Last changeset id before holes">-1</long>

      <long name="DB transaction count">107188</long>

      <long name="Count of duplicated transactions in the index">0</long>

      <long name="Count of transactions in the index but not the DB">2784</long>

      <long name="First transaction in the index but not the DB">136</long>

      <long name="Count of missing transactions from the Index">0</long>

      <long name="Index transaction count">187530</long>

      <long name="Index unique transaction count">187530</long>

      <long name="Index node count">24337988</long>

      <long name="Count of duplicate nodes in the index">100</long>

      <long name="First duplicate node id in the index">5067</long>

      <long name="Index error count">15</long>

      <long name="Count of duplicate error docs in the index">0</long>

      <long name="Index unindexed count">4141</long>

      <long name="Count of duplicate unindexed docs in the index">3</long>

      <long name="First duplicate unindexed in the index">57254</long>

      <long name="Last indexed transaction commit time">1478084498955</long>

      <str name="Last indexed transaction commit date">2016-11-02T12:01:38</str>

      <long name="Last TX id before holes">-1</long>

      <long name="Node count with FTSStatus Clean">2393970</long>

      <long name="Node count with FTSStatus Dirty">1</long>

      <long name="Node count with FTSStatus New">0</long>

    </lst>

  </lst>

</response>