<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lucene tokenization in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232568#M185698</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The lucene standard analyser which we wrap does indeed do some funny things.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;We did not appreciate this in the dim and distant past. It is now a pain to change this deafult as everyone would be forced to reindex.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The standard analyser tries to auto detect dates, computer names, emails, product codes, acronyms etc etc and may end up grouping tokens together when separated by /-. amongst others.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;However, it is also good as general cross language default.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The way to avoid this is to use another analyzer &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;OR&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;tokenize as both and use Alfresco FTS and "=" to force the use of the untokenised field and use pattern matching.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 15 Oct 2010 14:35:58 GMT</pubDate>
    <dc:creator>andy</dc:creator>
    <dc:date>2010-10-15T14:35:58Z</dc:date>
    <item>
      <title>Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232551#M185681</link>
      <description>Searches for files with underscores in the file name are currently unpredictable and returning no results in some cases.&amp;nbsp; How can I prevent indexing from tokenizing file names with underscores into separate tokens?</description>
      <pubDate>Wed, 23 Sep 2009 15:59:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232551#M185681</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-09-23T15:59:59Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232552#M185682</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I added some handling for underscores in the alfrescostandardfilter but still having problems when the underscore is followed by single digit, as in testarticle_1 &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;anyone have any insight?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Oct 2009 14:47:30 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232552#M185682</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-07T14:47:30Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232553#M185683</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I encountered the same issue. Only after understanding how Lucene indexes, will you find out that characters such as underscore, dashes, etc. are not included. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;One possible solution is to add a custom property on the node (of the content item) to capture the file name and tell Lucene not to tokenize this field. It would look as such: &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;lt;property name="xxx:filename_property"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;description&amp;gt;Untokenised filename used by Lucene queries&amp;lt;/description&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;type&amp;gt;d:text&amp;lt;/type&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;mandatory&amp;gt;true&amp;lt;/mandatory&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;multiple&amp;gt;false&amp;lt;/multiple&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;tokenised&amp;gt;false&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/index&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;Please confirm if this works.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Oct 2009 18:22:19 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232553#M185683</guid>
      <dc:creator>rliu</dc:creator>
      <dc:date>2009-10-07T18:22:19Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232554#M185684</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;I encountered the same issue. Only after understanding how Lucene indexes, will you find out that characters such as underscore, dashes, etc. are not included. &lt;BR /&gt;&lt;BR /&gt;One possible solution is to add a custom property on the node (of the content item) to capture the file name and tell Lucene not to tokenize this field. It would look as such: &lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;lt;property name="xxx:filename_property"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;description&amp;gt;Untokenised filename used by Lucene queries&amp;lt;/description&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;type&amp;gt;d:text&amp;lt;/type&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;mandatory&amp;gt;true&amp;lt;/mandatory&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;multiple&amp;gt;false&amp;lt;/multiple&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;tokenised&amp;gt;false&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/index&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;Please confirm if this works.&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;This was actually the first thing I tried, but I found that because the filename wasn't tokenized, searches for partial filenames weren't coming back consistently.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 08 Oct 2009 13:39:48 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232554#M185684</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-08T13:39:48Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232555#M185685</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;What does your Lucene query syntax look like?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 08 Oct 2009 16:01:24 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232555#M185685</guid>
      <dc:creator>rliu</dc:creator>
      <dc:date>2009-10-08T16:01:24Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232556#M185686</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Did you try this?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;atomic&amp;gt;true&amp;lt;/atomic&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;stored&amp;gt;false&amp;lt;/stored&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;tokenised&amp;gt;false&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/index&amp;gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;When you create new content this will be applied. if you want it to be applied to the old content you'll have to reïndex alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Thats what ive done. And it worked fine.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 16 Oct 2009 09:29:41 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232556#M185686</guid>
      <dc:creator>tdt</dc:creator>
      <dc:date>2009-10-16T09:29:41Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232557#M185687</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;Did you try this?&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;atomic&amp;gt;true&amp;lt;/atomic&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;stored&amp;gt;false&amp;lt;/stored&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;tokenised&amp;gt;false&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/index&amp;gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;When you create new content this will be applied. if you want it to be applied to the old content you'll have to reïndex alfresco.&lt;BR /&gt;Thats what ive done. And it worked fine.&lt;BR /&gt;&lt;BR /&gt;Regards&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;yes, if I'm not mistaken that is basically the same thing rliu suggested.&amp;nbsp; I guess the issue with this solution is that filenames need to be tokenised.&amp;nbsp; I added tokenisation behavior for the standardfilter so that it occurs on underscorey, then it was discovered searches like "test_1" were not returning properly.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 20 Oct 2009 12:42:13 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232557#M185687</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-20T12:42:13Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232558#M185688</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;So, which alfresco version are you working with? &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In my Labs 3.0 the name field is declared with &lt;/SPAN&gt;&lt;STRONG&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/STRONG&gt;&lt;SPAN&gt;, which should mean that both the single tokens and the complete name will be stored:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;lt;property name="cm:name"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;title&amp;gt;Name&amp;lt;/title&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;type&amp;gt;d:text&amp;lt;/type&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;mandatory enforced="true"&amp;gt;true&amp;lt;/mandatory&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;atomic&amp;gt;true&amp;lt;/atomic&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;stored&amp;gt;false&amp;lt;/stored&amp;gt; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;/index&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;constraints&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;constraint ref="cm:filename" /&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;/constraints&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;May &lt;/SPAN&gt;&lt;STRONG&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/STRONG&gt;&lt;SPAN&gt; solve your problem?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Beside this i recognized a serious problem with indexing of 'd:text' and 'd:content' fields. In some cases (maybe during index merge processes) the index content is cuttened, so the creator admin will be cropped to "admi" in the index and will not be searchable with "admin" any longer! Currently I'm trying to get deeper into this.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 09:04:05 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232558#M185688</guid>
      <dc:creator>dbachem</dc:creator>
      <dc:date>2009-10-27T09:04:05Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232559#M185689</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Can anyone help me out&amp;nbsp; ..&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I am using web service to search a file in alfresco repository, here's the code:&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;RepositoryServiceSoapBindingStub repositoryService = WebServiceFactory.getRepositoryService();&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Create a query object, looking for all items with alfresco in the name of text&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Query query = new Query(Constants.QUERY_LANG_LUCENE, "PATH:\"/app:company_home/cm:" + searchText + "\"");&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Execute the query&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final Store STORE = new Store(Constants.WORKSPACE_STORE, "SpacesStore");&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; QueryResult queryResult = repositoryService.query(STORE, query, false);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Display the results&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ResultSet resultSet = queryResult.getResultSet();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ResultSetRow[] rows = resultSet.getRows();&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;&lt;SPAN&gt;I am passing file name without extension&amp;nbsp; as &lt;/SPAN&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;searchText&lt;/BLOCKQUOTE&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;for ex: &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Suppose i have two files File1.txt and file1.pdf and i want to search both the files just by passing file1 as my&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;searchText&lt;/BLOCKQUOTE&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I tried the same thing,&amp;nbsp; query returns nothing. And when I tried searching as File1.txt, query returns the exact file.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;What could be the possible modification I should do with the above query to get my expected result.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any suggestion appreciated&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks in advance&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 09:36:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232559#M185689</guid>
      <dc:creator>maqsood</dc:creator>
      <dc:date>2009-10-27T09:36:59Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232560#M185690</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;Can anyone help me out&amp;nbsp; ..&lt;BR /&gt;I am using web service to search a file in alfresco repository, here's the code:&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;RepositoryServiceSoapBindingStub repositoryService = WebServiceFactory.getRepositoryService();&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Create a query object, looking for all items with alfresco in the name of text&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Query query = new Query(Constants.QUERY_LANG_LUCENE, "PATH:\"/app:company_home/cm:" + searchText + "\"");&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Execute the query&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; final Store STORE = new Store(Constants.WORKSPACE_STORE, "SpacesStore");&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; QueryResult queryResult = repositoryService.query(STORE, query, false);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Display the results&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ResultSet resultSet = queryResult.getResultSet();&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ResultSetRow[] rows = resultSet.getRows();&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;I am passing file name without extension&amp;nbsp; as &lt;BLOCKQUOTE class="jive-quote"&gt;searchText&lt;/BLOCKQUOTE&gt; &lt;BR /&gt;for ex: &lt;BR /&gt;Suppose i have two files File1.txt and file1.pdf and i want to search both the files just by passing file1 as my&amp;nbsp;&amp;nbsp; &lt;BLOCKQUOTE class="jive-quote"&gt;searchText&lt;/BLOCKQUOTE&gt; &lt;BR /&gt;I tried the same thing,&amp;nbsp; query returns nothing. And when I tried searching as File1.txt, query returns the exact file.&lt;BR /&gt;What could be the possible modification I should do with the above query to get my expected result.&lt;BR /&gt;&lt;BR /&gt;Any suggestion appreciated&lt;BR /&gt;&lt;BR /&gt;Thanks in advance&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;Post a new thread instead of hijacking this one.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 16:01:50 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232560#M185690</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-27T16:01:50Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232561#M185691</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;So, which alfresco version are you working with? &lt;BR /&gt;&lt;BR /&gt;In my Labs 3.0 the name field is declared with &lt;STRONG&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/STRONG&gt;, which should mean that both the single tokens and the complete name will be stored:&lt;BR /&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;lt;property name="cm:name"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;title&amp;gt;Name&amp;lt;/title&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;type&amp;gt;d:text&amp;lt;/type&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;mandatory enforced="true"&amp;gt;true&amp;lt;/mandatory&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;index enabled="true"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;atomic&amp;gt;true&amp;lt;/atomic&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;stored&amp;gt;false&amp;lt;/stored&amp;gt; &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;/index&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;constraints&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;constraint ref="cm:filename" /&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;lt;/constraints&amp;gt;&lt;BR /&gt;&amp;lt;/property&amp;gt;&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;BR /&gt;May &lt;STRONG&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/STRONG&gt; solve your problem?&lt;BR /&gt;&lt;BR /&gt;Beside this i recognized a serious problem with indexing of 'd:text' and 'd:content' fields. In some cases (maybe during index merge processes) the index content is cuttened, so the creator admin will be cropped to "admi" in the index and will not be searchable with "admin" any longer! Currently I'm trying to get deeper into this.&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Hello, thanks for the response!&amp;nbsp; I'm using 2.1.5, do you know if tokenised&amp;gt;both is supported?&amp;nbsp; I added it to the filename attribute in the contentModel.xml and then added a new document to the repository, it didnt solve the problem.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 16:34:46 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232561#M185691</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-27T16:34:46Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232562#M185692</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi morgand,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Sorry, that was mistakenly posted in your thread.&amp;nbsp; :cry: &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;i've already started a new topic for my query when realized my mistake.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 19:14:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232562#M185692</guid>
      <dc:creator>maqsood</dc:creator>
      <dc:date>2009-10-27T19:14:36Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232563#M185693</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Morgand,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regarding:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Searches for files with underscores in the file name are currently unpredictable and returning no results in some cases……&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;I found the behavior is quite predictable but wierd.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;If you have not already tried, should try this in your custom property "filename_property" :&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;&amp;lt;stored&amp;gt;&lt;STRONG&gt;true&lt;/STRONG&gt;&amp;lt;/stored&amp;gt;&lt;BR /&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;This would let the fields be &lt;/SPAN&gt;&lt;STRONG&gt;stored in the index&lt;/STRONG&gt;&lt;SPAN&gt; and then look into the Lucene index with Luke - &lt;/SPAN&gt;&lt;A href="http://www.getopt.org/luke/" rel="nofollow noopener noreferrer"&gt;http://www.getopt.org/luke/&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;That would give you an idea of how the fields are being tokenized and how you could search.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I did the same and observed the following:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;1. test_name is tokenized as "test", "name"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2. test_my_name is =&amp;gt; "test", "my", "name"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;3. test_name10 is =&amp;gt; "test_name10"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;4. test_my_name10 =&amp;gt; "test", "my_name10"&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;5. test_again_my_name10 =&amp;gt; "test", "again", "my_name10"&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Haven't tried out test_10 still.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 27 Oct 2009 23:05:27 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232563#M185693</guid>
      <dc:creator>nvsreeram</dc:creator>
      <dc:date>2009-10-27T23:05:27Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232564#M185694</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Morgand,&lt;BR /&gt;&lt;BR /&gt;Regarding:&lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;Searches for files with underscores in the file name are currently unpredictable and returning no results in some cases……&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;I found the behavior is quite predictable but wierd.&lt;BR /&gt;If you have not already tried, should try this in your custom property "filename_property" :&lt;BR /&gt; &lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;&amp;lt;stored&amp;gt;&lt;STRONG&gt;true&lt;/STRONG&gt;&amp;lt;/stored&amp;gt;&lt;BR /&gt;&amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;This would let the fields be &lt;STRONG&gt;stored in the index&lt;/STRONG&gt; and then look into the Lucene index with Luke - &lt;A href="http://www.getopt.org/luke/" rel="nofollow noopener noreferrer"&gt;http://www.getopt.org/luke/&lt;/A&gt;&lt;BR /&gt;That would give you an idea of how the fields are being tokenized and how you could search.&lt;BR /&gt;&lt;BR /&gt;I did the same and observed the following:&lt;BR /&gt;&lt;BR /&gt;1. test_name is tokenized as "test", "name"&lt;BR /&gt;2. test_my_name is =&amp;gt; "test", "my", "name"&lt;BR /&gt;3. test_name10 is =&amp;gt; "test_name10"&lt;BR /&gt;4. test_my_name10 =&amp;gt; "test", "my_name10"&lt;BR /&gt;5. test_again_my_name10 =&amp;gt; "test", "again", "my_name10"&lt;BR /&gt;&lt;BR /&gt;Haven't tried out test_10 still.&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;Ok, I'm trying to search with luke but I have a few nagging questions.&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;A)&amp;nbsp; When choosing an index to load into luke, i look in ..\alfresco_data\alf_data\lucene-indexes\workspace\SpacesStore&amp;nbsp; Why are there 5-10 different folders in there?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;B)&amp;nbsp; When i choose an index and in luke, look at the available fields I don't see filename, why not?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Oct 2009 16:23:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232564#M185694</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-28T16:23:59Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232565#M185695</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Regarding:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;A) When choosing an index to load into luke, i look in ..\alfresco_data\alf_data\lucene-indexes\workspace\SpacesStore Why are there 5-10 different folders in there?&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;I don't know of a concrete reason for this. But I've noticed whenever you do a full recovery of the index, the folders are replaced by a single one.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I suppose Alfresco likes to spread the index into multiple folders and upon optimization (optimizing the indexes or creating a fresh index from scratch) it merges the multi-folder index into a single folder.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regarding:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BLOCKQUOTE class="jive-quote"&gt;B) When i choose an index and in luke, look at the available fields I don't see filename, why not?&lt;/BLOCKQUOTE&gt;&lt;BR /&gt;&lt;SPAN&gt;There is no such field called filename(unless you create a custom field), Alfresco by default stores the tokenized file name (of the article) in this field - @cm:name or @{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0}name" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0}name&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;That said, I am still trying to understand your actual requirement.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 30 Oct 2009 18:14:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232565#M185695</guid>
      <dc:creator>nvsreeram</dc:creator>
      <dc:date>2009-10-30T18:14:36Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232566#M185696</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;thanks for the response.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;My basic requirement is to split filenames with underscore into separate tokens.&amp;nbsp; The StandardTokenizer seems to handle underscores in strange ways, sometimes splitting/sometimes not.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 30 Oct 2009 18:36:27 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232566#M185696</guid>
      <dc:creator>morgand</dc:creator>
      <dc:date>2009-10-30T18:36:27Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232567#M185697</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;And I assume that would be to search by filename.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;If that's the case, you can try out searching by ID (just to check if that's a fit for your need).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Lets consider an example (I am following some arbitrary path structure):&lt;/SPAN&gt;&lt;BR /&gt;&lt;STRONG&gt;ID&lt;/STRONG&gt;&lt;SPAN&gt; = testsitecom:/www/avm_webapps/ROOT/_content/en_US/testContentType/&lt;/SPAN&gt;&lt;STRONG&gt;test_Content.xml&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;@{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0&lt;/A&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;STRONG&gt;name&lt;/STRONG&gt;&lt;SPAN&gt; = test, content, xml &lt;/SPAN&gt;&lt;EM&gt;(tokenized by underscore and dot)&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;You can search this way (wild card search):&lt;/SPAN&gt;&lt;BR /&gt;&lt;STRONG&gt;ID&lt;/STRONG&gt;&lt;SPAN&gt;:testsitecom\:"/www/avm_webapps/ROOT/_content/en_US/testContentType/test*xml"&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;instead of name search:&lt;/SPAN&gt;&lt;BR /&gt;&lt;STRONG&gt;@cm\:name&lt;/STRONG&gt;&lt;SPAN&gt;:test*xml&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 30 Oct 2009 23:57:35 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232567#M185697</guid>
      <dc:creator>nvsreeram</dc:creator>
      <dc:date>2009-10-30T23:57:35Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene tokenization</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232568#M185698</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The lucene standard analyser which we wrap does indeed do some funny things.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;We did not appreciate this in the dim and distant past. It is now a pain to change this deafult as everyone would be forced to reindex.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The standard analyser tries to auto detect dates, computer names, emails, product codes, acronyms etc etc and may end up grouping tokens together when separated by /-. amongst others.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;However, it is also good as general cross language default.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The way to avoid this is to use another analyzer &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;OR&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;tokenize as both and use Alfresco FTS and "=" to force the use of the untokenised field and use pattern matching.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 15 Oct 2010 14:35:58 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-tokenization/m-p/232568#M185698</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2010-10-15T14:35:58Z</dc:date>
    </item>
  </channel>
</rss>

