<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Determining the encoding of a text document in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148726#M103814</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Another thing: The default buffer size given to your decoder is 8K.&amp;nbsp; To detect HTML encoding, you don't need that much, so you can configure your detector bean with much less.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 18 Jan 2008 14:04:27 GMT</pubDate>
    <dc:creator>derek</dc:creator>
    <dc:date>2008-01-18T14:04:27Z</dc:date>
    <item>
      <title>Determining the encoding of a text document</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148723#M103811</link>
      <description>Hi,For an AMP I am developing I need a way to determine from an input stream the encoding of a document. I want to store the latter in Alfresco and need to know the mime-type (which I know how to determine) and the encoding.In the Alfresco API I've found that the MimetypeService provides a way:mimet</description>
      <pubDate>Sun, 06 Jan 2008 17:39:15 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148723#M103811</guid>
      <dc:creator>hbf</dc:creator>
      <dc:date>2008-01-06T17:39:15Z</dc:date>
    </item>
    <item>
      <title>Re: Determining the encoding of a text document</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148724#M103812</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Yes you will need to write your charset finder for that. If you contribute we can add it to the core &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://connect.hyland.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;You can override any bean using a *-custom.xml context file in your alfresco extension folder. Take a look at the Repository Configuration page off the Developer Guide link in my sig below.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Kevin&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 18 Jan 2008 14:00:16 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148724#M103812</guid>
      <dc:creator>kevinr</dc:creator>
      <dc:date>2008-01-18T14:00:16Z</dc:date>
    </item>
    <item>
      <title>Re: Determining the encoding of a text document</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148725#M103813</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;You don't have to change the &lt;/SPAN&gt;&lt;STRONG&gt;core-services-context&lt;/STRONG&gt;&lt;SPAN&gt; to change the bean.&amp;nbsp; If you write our own finder, let's say &lt;/SPAN&gt;&lt;EM&gt;org.alfresco.encoding.HtmlCharsetFinder&lt;/EM&gt;&lt;SPAN&gt;, you can add the following bean to your &lt;/SPAN&gt;&lt;STRONG&gt;custom-repository-context.xml&lt;/STRONG&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;bean id="charset.finder" class="org.alfresco.repo.content.encoding.ContentCharsetFinder"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;property name="defaultCharset"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;value&amp;gt;UTF-8&amp;lt;/value&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;property name="mimetypeService"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;ref bean="mimetypeService"/&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;property name="charactersetFinders"&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;list&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;bean class="org.alfresco.encoding.GuessEncodingCharsetFinder" /&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;bean class="org.alfresco.encoding.HtmlCharsetFinder" /&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/list&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/property&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/bean&amp;gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;From the Javadocs:&lt;/SPAN&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /**&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Worker method for implementations to override.&amp;nbsp; All exceptions will be reported and&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * absorbed and &amp;lt;tt&amp;gt;null&amp;lt;/tt&amp;gt; returned.&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * &amp;lt;p&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * The interface contract is that the data buffer must not be altered in any way.&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * @param buffer&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; the buffer of data no bigger than the requested&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {@linkplain #getBestBufferSize() best buffer size}.&amp;nbsp; This can,&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; very efficiently, be turned into an &amp;lt;tt&amp;gt;InputStream&amp;lt;/tt&amp;gt; using a&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;tt&amp;gt;ByteArrayInputStream&amp;lt;tt&amp;gt;. &lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * @return&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Returns the charset or &amp;lt;tt&amp;gt;null&amp;lt;/tt&amp;gt; if an accurate conclusion&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; *&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; is not possible&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; * @throws Exception&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Any exception, checked or not&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; */&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; protected abstract Charset detectCharsetImpl(byte[] buffer) throws Exception;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;SPAN&gt;All finders will be attempted, in order, until a non-null is returned.&amp;nbsp; So, any contributed finders would be appreciated and attached to that list provided that they don't produce poor guesses: A null guess is better than an incorrect guess.&amp;nbsp; You finder can also specialize in HTML and just return null for everything else.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Regards&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 18 Jan 2008 14:00:46 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148725#M103813</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2008-01-18T14:00:46Z</dc:date>
    </item>
    <item>
      <title>Re: Determining the encoding of a text document</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148726#M103814</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Another thing: The default buffer size given to your decoder is 8K.&amp;nbsp; To detect HTML encoding, you don't need that much, so you can configure your detector bean with much less.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 18 Jan 2008 14:04:27 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/determining-the-encoding-of-a-text-document/m-p/148726#M103814</guid>
      <dc:creator>derek</dc:creator>
      <dc:date>2008-01-18T14:04:27Z</dc:date>
    </item>
  </channel>
</rss>

