<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Noob-ish Lucene content question: stored &amp; tokenized? in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170940#M124205</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I want to search alfresco's indexes from some legacy lucene code I have. More specifically, I want to get the tokenstream for the content of the indexed files. Is this possible? Easy? My assumption is that I am missing something obvious : -).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I found:&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://wiki.alfresco.com/wiki/Full-Text_Search_Configuration" rel="nofollow noopener noreferrer"&gt;http://wiki.alfresco.com/wiki/Full-Text_Search_Configuration&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;which seems to tell me that I can set &lt;/SPAN&gt;&lt;STRONG&gt;tokenized&lt;/STRONG&gt;&lt;SPAN&gt; and &lt;/SPAN&gt;&lt;STRONG&gt;stored&lt;/STRONG&gt;&lt;SPAN&gt; in alfresco\tomcat\webapps\alfresco\web-inf\classes\alfresco\model\contentModel.xml&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I made the change, restarted alfresco/tomcat, and added a document to the repository. But I can't seem to find the content in the index. &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;With Luke 0.8.1, I can find the index which seems to have my info: I find the file name in the QNAME field, and I can do a search, and will get a hit on the content. but the &lt;/SPAN&gt;&lt;STRONG&gt;@{http…}content&lt;/STRONG&gt;&lt;SPAN&gt; field comes up as &lt;/SPAN&gt;&lt;EM&gt;not present, or not stored&lt;/EM&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In the end, I am using the token stream to find the start/end offset position of hits to analyze the text of the actual hits. I may very well be doing this bass-ackwards as well, but I have it working with a Lius implementation, and would love to switch over to Alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Any pointers? &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Sean&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 25 Apr 2008 00:47:51 GMT</pubDate>
    <dc:creator>seanoc5</dc:creator>
    <dc:date>2008-04-25T00:47:51Z</dc:date>
    <item>
      <title>Noob-ish Lucene content question: stored &amp; tokenized?</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170940#M124205</link>
      <description>Hello,I want to search alfresco's indexes from some legacy lucene code I have. More specifically, I want to get the tokenstream for the content of the indexed files. Is this possible? Easy? My assumption is that I am missing something obvious : -).I found:http://wiki.alfresco.com/wiki/Full-Text_Sear</description>
      <pubDate>Fri, 25 Apr 2008 00:47:51 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170940#M124205</guid>
      <dc:creator>seanoc5</dc:creator>
      <dc:date>2008-04-25T00:47:51Z</dc:date>
    </item>
    <item>
      <title>Re: Noob-ish Lucene content question: stored &amp; tokenized?</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170941#M124206</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Not sure if it gets you much closer, but have you seen &lt;/SPAN&gt;&lt;A href="http://wiki.alfresco.com/wiki/OpenSearch" rel="nofollow noopener noreferrer"&gt;http://wiki.alfresco.com/wiki/OpenSearch&lt;/A&gt;&lt;SPAN&gt;?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 25 Apr 2008 07:48:16 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170941#M124206</guid>
      <dc:creator>sdavis</dc:creator>
      <dc:date>2008-04-25T07:48:16Z</dc:date>
    </item>
    <item>
      <title>Re: Noob-ish Lucene content question: stored &amp; tokenized?</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170942#M124207</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;sdavis,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;OpenSearch looks useful, but not quite what I need at the moment. Thanks for the reply.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Any other suggestions?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 08 May 2008 20:35:50 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/noob-ish-lucene-content-question-stored-tokenized/m-p/170942#M124207</guid>
      <dc:creator>seanoc5</dc:creator>
      <dc:date>2008-05-08T20:35:50Z</dc:date>
    </item>
  </channel>
</rss>

