<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Custom text extractors in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/custom-text-extractors/m-p/94851#M65025</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Alfresco already has a comprehensive framework for configuring and developing new transformation classes. Basically it's a matter of coding up a bean to a specific interface, then using Spring to config in the new bean with the correct configuration specifying that it is capable of transforming one mimetype to the text/plain mimetype.&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://wiki.alfresco.com/wiki/Content_Transformations#Development" rel="nofollow noopener noreferrer"&gt;http://wiki.alfresco.com/wiki/Content_Transformations#Development&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;There are several examples in the SDK including the PDFBox (PDF to text) transformer and the OpenOffice transformers. Once you have written and configured your transformer it will automatically be used by our lucene integration to convert a specific filetype to text format during the indexing process.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Hope this helps,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Kevin&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 19 Jun 2007 14:41:41 GMT</pubDate>
    <dc:creator>kevinr</dc:creator>
    <dc:date>2007-06-19T14:41:41Z</dc:date>
    <item>
      <title>Custom text extractors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-text-extractors/m-p/94850#M65024</link>
      <description>Hi all,I need to write some custom text extractors for Lucene in Alfresco because I have some files in my company that Alfresco doesn't index.I already look at the config files and didn't find any tag to assign a new text extractor class.Is there a way to do this?Thanks in advance.</description>
      <pubDate>Tue, 19 Jun 2007 13:16:15 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-text-extractors/m-p/94850#M65024</guid>
      <dc:creator>gscheibel</dc:creator>
      <dc:date>2007-06-19T13:16:15Z</dc:date>
    </item>
    <item>
      <title>Re: Custom text extractors</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/custom-text-extractors/m-p/94851#M65025</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Alfresco already has a comprehensive framework for configuring and developing new transformation classes. Basically it's a matter of coding up a bean to a specific interface, then using Spring to config in the new bean with the correct configuration specifying that it is capable of transforming one mimetype to the text/plain mimetype.&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://wiki.alfresco.com/wiki/Content_Transformations#Development" rel="nofollow noopener noreferrer"&gt;http://wiki.alfresco.com/wiki/Content_Transformations#Development&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;There are several examples in the SDK including the PDFBox (PDF to text) transformer and the OpenOffice transformers. Once you have written and configured your transformer it will automatically be used by our lucene integration to convert a specific filetype to text format during the indexing process.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Hope this helps,&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Kevin&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 19 Jun 2007 14:41:41 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/custom-text-extractors/m-p/94851#M65025</guid>
      <dc:creator>kevinr</dc:creator>
      <dc:date>2007-06-19T14:41:41Z</dc:date>
    </item>
  </channel>
</rss>

