<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Lucene and stop words in Alfresco in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141008#M98777</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Dear list,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Suppose I have a text document in Alfresco containing the phrase "time is money". I want users to be able to enter "money is time" and find the document. That is, I want to find all documents that contain all the words the user enters, in any order.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Reading &lt;/SPAN&gt;&lt;A href="http://wiki.alfresco.com/wiki/Search#Finding_nodes_by_content" rel="nofollow noopener noreferrer"&gt;Alfresco's Search documentation&lt;/A&gt;&lt;SPAN&gt; I could not find a way to formulate a query for this.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Maybe I am missing something?! If so, apologies!&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Here comes what I have found out:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The query&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;TEXT:"money is time"&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;internally drops the stopword "is" and therefore searches for "money" followed by one or more stop words followed y "time" and will therefore NOT match.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The query&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;TEXT:"money" AND TEXT:"is" AND TEXT:"time"&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;searches for all documents containing the three words "money", "is", and "time". As "is" is a stop word, it does not occur in the index and therefore the query returns NO result.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;The query&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;TEXT:"money" AND TEXT:"time"&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;searches for all documents containing the two words "money" and "time". It finds the document Ã¢â‚¬Â¦&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Ã¢â‚¬Â¦ however, I cannot easily generate this query as it requires me to drop all words that Alfresco's analyzer considers stop words.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Is there another way to perform a query for all documents containing a given set of words (possibly including stop words)?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;If not, I see two ways out:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;* Alfresco exposes the list of stop words (not nice).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;* Alfresco's query parser recognizes stop words and handles them accordingly. (It would drop the clause 'AND TEXT:"is"' from the query 'TEXT:"money" AND TEXT:"is" AND TEXT:"time"' for example.)&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Many thanks,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Kaspar&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;P.S. This question is Lucene related. However, I post it here and not to the Lucene mailing list as it depends on Alfresco's particular Lucene adaption. Not knowing the details, I might be wrong, of course.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Sun, 30 Dec 2007 14:19:01 GMT</pubDate>
    <dc:creator>hbf</dc:creator>
    <dc:date>2007-12-30T14:19:01Z</dc:date>
    <item>
      <title>Lucene and stop words in Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141008#M98777</link>
      <description>Dear list,Suppose I have a text document in Alfresco containing the phrase "time is money". I want users to be able to enter "money is time" and find the document. That is, I want to find all documents that contain all the words the user enters, in any order.Reading Alfresco's Search documentation I</description>
      <pubDate>Sun, 30 Dec 2007 14:19:01 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141008#M98777</guid>
      <dc:creator>hbf</dc:creator>
      <dc:date>2007-12-30T14:19:01Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene and stop words in Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141009#M98778</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;(For those who need a temporary fix (like me): the stop-words Alfresco is using seem to be in file AlfrescoStandardAnalyser.java. My code reads them and drops all stop-words from the query. Not nice, but works.)&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sun, 30 Dec 2007 14:42:39 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141009#M98778</guid>
      <dc:creator>hbf</dc:creator>
      <dc:date>2007-12-30T14:42:39Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene and stop words in Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141010#M98779</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Alfresco supports the standard lucene query syntax so you can write proximity queries (but not span queries). See &lt;/SPAN&gt;&lt;A href="http://lucene.apache.org/java/2_1_0/queryparsersyntax.html#Proximity%20Searches" rel="nofollow noopener noreferrer"&gt;http://lucene.apache.org/java/2_1_0/queryparsersyntax.html#Proximity%20Searches&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 28 Jan 2008 16:31:52 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141010#M98779</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2008-01-28T16:31:52Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene and stop words in Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141011#M98780</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hm, thanks for the hint, Andy, it seems to me, however, that this does not solve the problem entirely.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;If you want to create a custom "Advanced Search" page for Alfresco where they can enter arbitrary words (see my examples in the previous post), you are forced to know the stop word list (from AlfrescoStandardAnalyser). That creates a dependency which is not nice at all.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I use Alfresco as the backend engine for a CMS, so I find myself exactly in this situation. Assuming I am not the only who will/does do something like this, do you want me to open a JIRA for this?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Kaspar&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 28 Jan 2008 16:44:38 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141011#M98780</guid>
      <dc:creator>hbf</dc:creator>
      <dc:date>2008-01-28T16:44:38Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene and stop words in Alfresco</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141012#M98781</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;You could use a customized version of the analayser and set an empty stop word list. This way there will be no stop words. It is a simple wrapper class and change to the tokenisation config.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Andy&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 07 Feb 2008 17:04:18 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-and-stop-words-in-alfresco/m-p/141012#M98781</guid>
      <dc:creator>andy</dc:creator>
      <dc:date>2008-02-07T17:04:18Z</dc:date>
    </item>
  </channel>
</rss>

