<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lucene search and content indexing in PDF documents in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212991#M166121</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;For example, I uploaded two files for testing, with the same characteristics (size, PDF conversion engine, the application that converted document to PDF). &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Both are placed in the same space.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I get the following:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*"&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (2 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;actions-article.pdf workspace: / / SpacesStore/1e8a97a4-a7b7-4 … 08cd3b2fbc workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;content-article.pdf workspace: / / SpacesStore/c724c44d-880b-4 … ca1b99dbe1 workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*" AND ( TEXT:&lt;/SPAN&gt;&lt;STRONG&gt;*admin*&lt;/STRONG&gt;&lt;SPAN&gt; )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (1 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;content-article.pdf workspace: / / SpacesStore/c724c44d-880b-4 … ca1b99dbe1 workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*" AND ( TEXT:&lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;SPAN&gt; )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (0 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Note that &lt;/SPAN&gt;&lt;STRONG&gt;both&lt;/STRONG&gt;&lt;SPAN&gt; have the properties:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0}creator" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0}creator&lt;/A&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0}modifier" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0}modifier&lt;/A&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In contentModel.xml:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;property name="cm:creator"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;title&amp;gt; Creator &amp;lt;/ title&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;type&amp;gt; d: text &amp;lt;/ type&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;protected&amp;gt; true &amp;lt;/ protected&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;mandatory enforced="true"&amp;gt; true &amp;lt;/ mandatory&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;index enabled="true"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;atomic&amp;gt; true &amp;lt;/ atomic&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;stored&amp;gt; false &amp;lt;/ stored&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;tokenised&amp;gt; both &amp;lt;/ tokenised&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ index&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ property&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;property name="cm:modifier"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;title&amp;gt; Modifier &amp;lt;/ title&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;type&amp;gt; d: text &amp;lt;/ type&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;protected&amp;gt; true &amp;lt;/ protected&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;mandatory enforced="true"&amp;gt; true &amp;lt;/ mandatory&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;index enabled="true"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;atomic&amp;gt; true &amp;lt;/ atomic&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;stored&amp;gt; false &amp;lt;/ stored&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;tokenised&amp;gt; both &amp;lt;/ tokenised&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ index&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ property&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It can be seen with the default properties of Alfresco, that Lucene search fails, as in the examples above. In this case, in metadata properties.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Someone has any idea what is wrong? Is there any settings that I should review?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 19 Feb 2010 18:28:59 GMT</pubDate>
    <dc:creator>ricardoc-moreda</dc:creator>
    <dc:date>2010-02-19T18:28:59Z</dc:date>
    <item>
      <title>Lucene search and content indexing in PDF documents</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212989#M166119</link>
      <description>Hi everyone,I'm having trouble getting reliable results in the Lucene search of Alfresco, in PDF documents.Examples&lt;IMG id="smileyfrustrated" class="emoticon emoticon-smileyfrustrated" src="https://migration33.stage.lithium.com/i/smilies/16x16_smiley-frustrated.png" alt="Smiley Frustrated" title="Smiley Frustrated" /&gt;earch Language:&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;luceneSearch: &amp;nbsp;&amp;nbsp;&amp;nbsp;PATH:"/app:company_home/cm:Empresa/cm:Expediente/cm://*"Results (14 rows)Parent Node Name_x0032_010 workspace: / / SpacesStore/837eda52-bc75-4fba-</description>
      <pubDate>Thu, 18 Feb 2010 15:06:52 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212989#M166119</guid>
      <dc:creator>ricardoc-moreda</dc:creator>
      <dc:date>2010-02-18T15:06:52Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene search and content indexing in PDF documents</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212990#M166120</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Now I have in custom properties:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;index enabled="true"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;lt;atomic&amp;gt;true&amp;lt;/atomic&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;lt;stored&amp;gt;false&amp;lt;/stored&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &amp;lt;tokenised&amp;gt;both&amp;lt;/tokenised&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/index&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;From five documents, I get three in Lucene searches, with full reindexing. One more than before.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;My version is 3.2.0 (2039) schema 2019.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Could this be an issue?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Feb 2010 11:58:02 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212990#M166120</guid>
      <dc:creator>ricardoc-moreda</dc:creator>
      <dc:date>2010-02-19T11:58:02Z</dc:date>
    </item>
    <item>
      <title>Re: Lucene search and content indexing in PDF documents</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212991#M166121</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;For example, I uploaded two files for testing, with the same characteristics (size, PDF conversion engine, the application that converted document to PDF). &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Both are placed in the same space.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I get the following:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*"&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (2 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;actions-article.pdf workspace: / / SpacesStore/1e8a97a4-a7b7-4 … 08cd3b2fbc workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;content-article.pdf workspace: / / SpacesStore/c724c44d-880b-4 … ca1b99dbe1 workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*" AND ( TEXT:&lt;/SPAN&gt;&lt;STRONG&gt;*admin*&lt;/STRONG&gt;&lt;SPAN&gt; )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (1 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;content-article.pdf workspace: / / SpacesStore/c724c44d-880b-4 … ca1b99dbe1 workspace: / / SpacesStore/d1822abb-4be2-4 … 602c2806f8&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Search: PATH:"/app:company_home/cm:Empresa/cm:EntradasPendentes/cm:Evora//*" AND ( TEXT:&lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;SPAN&gt; )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Results (0 rows)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Parent Node Name&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;****&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Note that &lt;/SPAN&gt;&lt;STRONG&gt;both&lt;/STRONG&gt;&lt;SPAN&gt; have the properties:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0}creator" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0}creator&lt;/A&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;BR /&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;A href="http://www.alfresco.org/model/content/1.0}modifier" rel="nofollow noopener noreferrer"&gt;http://www.alfresco.org/model/content/1.0}modifier&lt;/A&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;STRONG&gt;admin&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In contentModel.xml:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;property name="cm:creator"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;title&amp;gt; Creator &amp;lt;/ title&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;type&amp;gt; d: text &amp;lt;/ type&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;protected&amp;gt; true &amp;lt;/ protected&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;mandatory enforced="true"&amp;gt; true &amp;lt;/ mandatory&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;index enabled="true"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;atomic&amp;gt; true &amp;lt;/ atomic&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;stored&amp;gt; false &amp;lt;/ stored&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;tokenised&amp;gt; both &amp;lt;/ tokenised&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ index&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ property&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;property name="cm:modifier"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;title&amp;gt; Modifier &amp;lt;/ title&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;type&amp;gt; d: text &amp;lt;/ type&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;protected&amp;gt; true &amp;lt;/ protected&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;mandatory enforced="true"&amp;gt; true &amp;lt;/ mandatory&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;index enabled="true"&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;atomic&amp;gt; true &amp;lt;/ atomic&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;stored&amp;gt; false &amp;lt;/ stored&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;tokenised&amp;gt; both &amp;lt;/ tokenised&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ index&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;lt;/ property&amp;gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;It can be seen with the default properties of Alfresco, that Lucene search fails, as in the examples above. In this case, in metadata properties.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Someone has any idea what is wrong? Is there any settings that I should review?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 19 Feb 2010 18:28:59 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/lucene-search-and-content-indexing-in-pdf-documents/m-p/212991#M166121</guid>
      <dc:creator>ricardoc-moreda</dc:creator>
      <dc:date>2010-02-19T18:28:59Z</dc:date>
    </item>
  </channel>
</rss>

