<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Virtual Tomcat forcing UTF-8 in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/virtual-tomcat-forcing-utf-8/m-p/117682#M83104</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Apologies if this has been posted elsewhere, but I did my best to search. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;The virtual Tomcat server in Afresco WCM 2.1 seems to be forcing content as UTF-8, even if it has been checked in as something else.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;I checked in an HTML file as SHIFT-JIS, but previewing it from the virtual Tomcat garbles on output.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;To reproduce, from the content details screen:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;View in Browser - GOOD; displays file exactly as checked in&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://localhost:8080/alfresco/d/d/avm/mysite-live/-1;www;avm_webapps;ROOT;corp;tos_shiftjis.html/tos_shiftjis.html" rel="nofollow noopener noreferrer"&gt;http://localhost:8080/alfresco/d/d/avm/mysite-live/-1;www;avm_webapps;ROOT;corp;tos_shiftjis.html/tos_shiftjis.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Preview File - NO GOOD; seems to add UTF-8 BOM which garbles file&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://mysite-live.www--sandbox.172-100-100-100.ip.alfrescodemo.net:8180/corp/tos_shiftjis.html" rel="nofollow noopener noreferrer"&gt;http://mysite-live.www--sandbox.172-100-100-100.ip.alfrescodemo.net:8180/corp/tos_shiftjis.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Note that the latter goes through 8180, the virtual Tomcat run by Alfresco.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Hopefully there is some configuration we are missing to avoid this force of UTF-8 encoding?&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 24 Oct 2007 01:10:01 GMT</pubDate>
    <dc:creator>bfranke</dc:creator>
    <dc:date>2007-10-24T01:10:01Z</dc:date>
    <item>
      <title>Virtual Tomcat forcing UTF-8</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/virtual-tomcat-forcing-utf-8/m-p/117682#M83104</link>
      <description>Apologies if this has been posted elsewhere, but I did my best to search. The virtual Tomcat server in Afresco WCM 2.1 seems to be forcing content as UTF-8, even if it has been checked in as something else.I checked in an HTML file as SHIFT-JIS, but previewing it from the virtual Tomcat garbles on o</description>
      <pubDate>Wed, 24 Oct 2007 01:10:01 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/virtual-tomcat-forcing-utf-8/m-p/117682#M83104</guid>
      <dc:creator>bfranke</dc:creator>
      <dc:date>2007-10-24T01:10:01Z</dc:date>
    </item>
    <item>
      <title>Re: Virtual Tomcat forcing UTF-8</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/virtual-tomcat-forcing-utf-8/m-p/117683#M83105</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;The virtualization server is really just Tomcat with some extra&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;stuff behind the scenes to deal with virtualization.&amp;nbsp; All of the&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;rules governing how it interacts with character sets are&amp;nbsp; exactly&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the same as Tomcat 5.5.25 (i.e.: Servlet/JSP&amp;nbsp; 2.4/2.0) because&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;it's the exact same codebase.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;First, it might be worthwhile to inspect your files with a low-level&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;tool (e.g.:&amp;nbsp; 'dd' or&amp;nbsp; 'vim -b')&amp;nbsp; to see if the HTML files in question&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;actually do contain a BOM or not.&amp;nbsp; &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Here's some information on character sets from the context&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;of a servlet/JSP.&amp;nbsp;&amp;nbsp; Note:&amp;nbsp;&amp;nbsp; your examples are for HTML pages&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;but because they're being delivered by a servlet container,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the servlet spec's rules on charset specification/conversion &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;still apply (as do those of HTML):&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/" rel="nofollow noopener noreferrer"&gt;http://java.sun.com/developer/technicalArticles/Intl/HTTPCharset/&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/" rel="nofollow noopener noreferrer"&gt;http://java.sun.com/developer/technicalArticles/Intl/MultilingualJSP/&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://www.w3.org/TR/REC-html40/charset.html" rel="nofollow noopener noreferrer"&gt;http://www.w3.org/TR/REC-html40/charset.html&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Going beyond this question a bit is the issue of how servlet containers&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;deal with form-related I18N issues.&amp;nbsp;&amp;nbsp;&amp;nbsp; When a browser does a POST,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;it should send a Content-Type header that looks like this:&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; Content-type: application/x-www-form-urlencoded; charset=&lt;/SPAN&gt;&lt;EM&gt;YOUR-CHARSET&lt;/EM&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;However, early versions of Microsoft Internet Explorer (i.e.: IE)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;failed to include the ';' between the application type and the&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;charset specifier.&amp;nbsp; As a result, many websites came to handle&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the correct header badly.&amp;nbsp; To deal with &lt;/SPAN&gt;&lt;EM&gt;that&lt;/EM&gt;&lt;SPAN&gt;, both IE and Firefox&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;send back form data encoded using whatever encoding the page was&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;supplied with (Mozilla attempted to include the proper header, but&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;there were so many compatibility issues, they were forced to yank it).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Therefore, if you ever do end up dealing with I18N issues in the&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;context of forms, my advice to you is to set page charsets everywhere&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;(both HTTP headers and HTML metadata).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;While we're on the topic of I18N in general, it's also worth knowing&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;that while most unicode encodings require a BOM, it's optional with&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;UTF-8.&amp;nbsp; Astonishingly (or perhaps not so astonishingly if you're&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;a bit cynical about Sun),&amp;nbsp; Java is intolerant of UTF-8 streams&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;that include a BOM, even though they are perfectly legal&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;(see:&amp;nbsp; &lt;/SPAN&gt;&lt;A href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058" rel="nofollow noopener noreferrer"&gt;http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058&lt;/A&gt;&lt;SPAN&gt; ).&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;It's up to the app to deal with it….&amp;nbsp; If you're in the mood for some&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;stomach-churning rationalization, check out Sun's reason for not fixing it:&lt;/SPAN&gt;&lt;BR /&gt;&lt;A href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911" rel="nofollow noopener noreferrer"&gt;http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911&lt;/A&gt;&lt;BR /&gt;&lt;SPAN&gt;Ultimately, they claim to be hemmed in by the possibility that others may&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;have brittle workarounds in place, and they didn't want to break them…&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;so everybody else relying on the standard has to bump their head&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;and institute their own workaround, …one despondent engineer by one,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;globally, forever&amp;nbsp; (or until the open sourcing of Java starts being felt,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;which ever comes first).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;In short, check your HTML files, check your web.xml settings,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;and check your HTML meta declarations.&amp;nbsp;&amp;nbsp; Use low-level tools&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;that allow you to see the exact bytestream you get back from&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the server, rather than merely inspecting things in your browser&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;(that eliminates a whole other set of variables).&amp;nbsp;&amp;nbsp; An example of&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;a low-level tool that might be useful to you for advanced debugging&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;of webserver configuration problems is netcat.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; I hope this helps,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;&amp;nbsp; - Jon&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 31 Oct 2007 17:36:24 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/virtual-tomcat-forcing-utf-8/m-p/117683#M83105</guid>
      <dc:creator>jcox</dc:creator>
      <dc:date>2007-10-31T17:36:24Z</dc:date>
    </item>
  </channel>
</rss>

