cancel
Showing results for 
Search instead for 
Did you mean: 

What is the best way to selectively re-index certain properties for an install with SOLR?

binduwavell
Star Contributor
Star Contributor
This is hypothetical and open-ended question, we have run into this quite a few times though so is a real issue.

Imagine I have a property in my data model that has a certain indexing configuration (tokenization, etc) and for some reason we need to change that configuration in a live system. What tools techniques do we have to get the index updated without re-indexing the whole repository?

Some real-world examples (somewhat generalized):
<ul>
<li>We have a non-tokenized date property, that we now need to tokenize in order to do range searches on it.</li>
<li>We have a non-tokenized text property that has a constrained list of values. The business requirement changes and now we have to make the field free-text editable, so we need to tokenize the field.</li>
<li>We have a property that is not indexed at all, now we need to mark it for indexing</li>
<li>We don't want to index certain types of documents, the business requirements change and we need to cause those documents to be indexed.</li>
</ul>

Approaches we have considered:
<ul>
<li>Blow away the index and just re-build the whole thing.</li>
<li>Write a script that "touches" each affected node. Causing it to be fully re-indexed, unsure if we can disable auditing and versioning behaviors so these items don't get updated but still have the node re-indexed.</li>
<li>Write a script that finds the transaction ids for each affected node and write a script that uses the specialized SOLR URLs to request re-indexing of each of these transactions.</li>
<ul>
<li>Obviously we'd have to blow away the model cache and have it rebuilt with the updated model first. Unsure what issues this would cause.</li>
</ul>
<li>similar to above I think there are SOLR URLs for re-indexing individual nodes rather than full transactions, unclear if this ends up being the same thing.</li>
</ul>

What would be ideal would be to post a URL to SOLR that lists one or more properties. SOLR would then find all content with those properties and just update the index information for those properties. i.e. if I want to update a date property, I don't really want to re-index all of the document contents.

Questions, comments & suggestions greatly appreciated!
5 REPLIES 5

andy
Champ on-the-rise
Champ on-the-rise
Hi

At the moment there is no easy way.
You could find the ids of all affected nodes and ask SOLR to reindex those nodes (you can index one node over the URL)

  http://localhost:8080/solr/admin/cores?action=REINDEX&nodeid=3

At some point we will make this take a a list.
Reindexing a node is just the node. It assumes you know what you are doing, only the node specified is re-indexed.
It will not change anything else in the transaction/index.
It is possible the node has been updated in an as yet un-indexed transaction but that is unlikely to a an issue.

For the generic case we intend to add re-index by type or property - using a generic re-index by query
So "at some point" we will have

http://localhost:8080/solr/admin/cores?action=REINDEX&query=TYPE:'my:type"

to support a limited rebuild of the index.

The query approach would be very flexible.

There is SOLR support for deleting by query which you could use for one of your use cases.

We always sync models at the start of tracking, but they may not update if there is a clash with an existing definition that would require a reindex.
Currently you have to delete the conflicting cached model by hand to get the updated model.
We need an API to force this too.

Hope this helps

Andy

sunnrunner
Champ in-the-making
Champ in-the-making
using alfresco 4.0d I'm trying to use

http://localhost:8080/solr/admin/cores?action=REINDEX&query=TYPE:'my:type"

in which nodeId is from the column id in the node table for alfresco?

It is either not working or not working as expected.

I make a change to the object using cmis. I then hit the above url with the assumed appropriate id and then get this response.


<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
</response>

So, did the index actually update or should i see something else? When I try to run a cmis query i am still getting the old value of the object until the indexer runs again. I am looking for an immediate update solution. Is this possible?

Thanks ahead

I have also tried switching to lucene to see if that would give us what we were looking for. However, our custom fields are not being indexed. I've tried adding the index = true with no luck.

I can do a select * from i3b:batchFolder and i see the documents but, when I try to select * from i3b:batchFolder where i3b:batchFolder:queue = 'somestring'; I don't get anything.


        <type name="i3b:batchFolder">
            <title>Batch</title>
            <parent>cm:folder</parent>
            <properties>
                <property name="i3b:batchFolder:pageCount">
                    <title>Page Count</title>
                    <type>d:text</type>
                </property>
                <property name="i3b:batchFolder:status">
                    <title>Status</title>
                    <type>d:text</type>
                    <default>new</default>
                </property>
                <property name="i3b:batchFolder:queue">
                    <title>Queue</title>
                    <type>d:text</type>
                    <default>Unidentified</default>
                    <index enabled="true">
                        <atomic>true</atomic>
                        <stored>true</stored>
                        <tokenised>true</tokenised>
                    </index>
                </property>
            </properties>
        </type>

andy
Champ on-the-rise
Champ on-the-rise
Hi

Just noticed the last bit again ….

When we upgrade SOLR we will (most likely) split content out from properties (as we did with lucene).
So we can track content and metadata on their own. We can then support reindex for either or both.

Andy 

binduwavell
Star Contributor
Star Contributor
Andy,

As usual, a great detailed response, thanks a ton!