<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Custom Analyzer - Different Query/Index behaviors in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288864#M241994</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have read through this but could not find a question. Did you intend this as a kind of blog post or is there something that you want 3rd-party input on?&lt;/P&gt;&lt;P&gt;One of the main differences between SOLR 1 and SOLR 4 is that localized text is no longer indexed in one composite field where the values are prefixed with the locale, but each locale now gets its own index field. Since the field is already locale-specific there is no point in including a locale-prefix anymore and for that reason I assume your analyzer is no longer able to detect the locale during indexing. Now I don't know what you are using to detect a locale during query-time since I stands to reason that queries are now targeted at the specific field(s) depending on requested locales and query text should also no longer include any locale prefixes.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Generally, 98% of Alfresco community members will never customize SOLR-tier components and as such only very few people will actually be familiar with any SOLR internals. Specific / helpful responses might be far in between...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Sat, 19 Nov 2016 15:03:41 GMT</pubDate>
    <dc:creator>afaust</dc:creator>
    <dc:date>2016-11-19T15:03:41Z</dc:date>
    <item>
      <title>Custom Analyzer - Different Query/Index behaviors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288863#M241993</link>
      <description>Good day,I have a fresh installation of Alfresco 5.1, running by default Solr 4.I wrote a custom analyzer (for my custom needs) which first detects the language of the document/query and then redirects to the correct analyzer. After doing that, I went to the solr schema files and update all locale f</description>
      <pubDate>Fri, 18 Nov 2016 09:27:48 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288863#M241993</guid>
      <dc:creator>akaisora</dc:creator>
      <dc:date>2016-11-18T09:27:48Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Analyzer - Different Query/Index behaviors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288864#M241994</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have read through this but could not find a question. Did you intend this as a kind of blog post or is there something that you want 3rd-party input on?&lt;/P&gt;&lt;P&gt;One of the main differences between SOLR 1 and SOLR 4 is that localized text is no longer indexed in one composite field where the values are prefixed with the locale, but each locale now gets its own index field. Since the field is already locale-specific there is no point in including a locale-prefix anymore and for that reason I assume your analyzer is no longer able to detect the locale during indexing. Now I don't know what you are using to detect a locale during query-time since I stands to reason that queries are now targeted at the specific field(s) depending on requested locales and query text should also no longer include any locale prefixes.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Generally, 98% of Alfresco community members will never customize SOLR-tier components and as such only very few people will actually be familiar with any SOLR internals. Specific / helpful responses might be far in between...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 19 Nov 2016 15:03:41 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288864#M241994</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2016-11-19T15:03:41Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Analyzer - Different Query/Index behaviors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288865#M241995</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Thank you very much for your reply.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I do understand that each locale has its own field in Sol4 as opposed to solr1.4. The locale can and will be different from the document's language. That's why I'm using Tika (and experimenting with other libraries) to detect to actual document language from my analyzer. After detecting the language my analyzers calls the right tokenizer and filters for that language.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This solution works perfectly when I run it on a Lucene (v4.9.1) and it also works great as well on Alfresco Solr admin panel (for both query and index analysis). Same result I get when I search for a text (Query) from alfresco's search. The only problem is during document upload, my analyzer is not able to capture or read the document's text. It only (apparently) captures meta data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My only guess is that during index-time, alfresco does something that I am not aware of, which makes my analyzer unable to read the document text. So the question what might this thing be?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note: This might be Solr/Lucene specific but this how my analyzer work:&lt;/P&gt;&lt;P&gt;From the entry point "createComponents(String string, Reader reader)" my analyzer reads the entire "reader" as string, detect the language and then constructs a StringReader that is sent to the correct Analyzer for that language.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you very much!&lt;/P&gt;&lt;P&gt;-- sora&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 21 Nov 2016 09:21:41 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288865#M241995</guid>
      <dc:creator>akaisora</dc:creator>
      <dc:date>2016-11-21T09:21:41Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Analyzer - Different Query/Index behaviors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288866#M241996</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Metadata and document content are very likely indexed in two separate operations - at least that is the way it has been in Alfresco for a very long time even with Lucene and SOLR 1. This is due to the fact that it can take some time to convert a document into indexable texts and node indexing is typically batched together - so separating the metadata from content during index ensures that all batched nodes are at least metadata-indexed in a reasonable amount of time while content indexing may lag a bit longer.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 21 Nov 2016 09:40:57 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288866#M241996</guid>
      <dc:creator>afaust</dc:creator>
      <dc:date>2016-11-21T09:40:57Z</dc:date>
    </item>
    <item>
      <title>Re: Custom Analyzer - Different Query/Index behaviors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288867#M241997</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I have found the issue, seems like during indexing, Solr does not pass the actual data the filter chain during the analyzer construction. Instead, it takes the filter chain out of the analyzer and then feed it the data, therefore there no way to get the data during analyzer construction.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My bad for this issue is not related to alfresco. Thank you for the help!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 24 Nov 2016 11:03:30 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-analyzer-different-query-index-behaviors/m-p/288867#M241997</guid>
      <dc:creator>akaisora</dc:creator>
      <dc:date>2016-11-24T11:03:30Z</dc:date>
    </item>
  </channel>
</rss>

