cancel
Showing results for 
Search instead for 
Did you mean: 

Adding HTML content, Alfresco 1.4

hannes
Champ in-the-making
Champ in-the-making
There seems to be an unwanted transformation of context when adding HTML content into an Alfresco space. It shows up when I view the HTML document I just added. It seems like there is some kind of parsing involved where "&nbsp", " ' ", bulleted items and such get replaced with a " ? ". This means, that a line with multiple TABs on a line in the original(.doc) would show a bunch of question marks instead when viewed within Alfresco.

The original document  was ceated in MS Word (.doc), and Saved as filtered HTML before adding as content to Alfresco.

Would someone have a way around this problem.
5 REPLIES 5

kevinr
Star Contributor
Star Contributor
Unless you have applied a transformation via a rule, then there is no modification of the content when it is added to Alfresco. HTML or other content types are not modified by default.

If you view the content directly in the browser before uploading does it still have the problem? If you compare the content in Alfresco with the original file does it actually contain different values?

Thanks,

Kevin

hannes
Champ in-the-making
Champ in-the-making
Yes. The contents are different.
Before posting this question I did the following test:
1. Created a simple word document with a bit of text with imbedded tabs, saved as .doc and .htm
2. Viewed the html source with Nvu to confirm the tabulator spaces are coded as &nbsp's
3. Added the file as content to Alfresco
4. Click the document to view the contents - the tabs are all shown as questionmarks
5. Saving the document on disc and then viewing it does not show the unwanted characters
6.Checked out for editing, and then edit from within the space shows the unwanted characters
7. Saved the working copy of the file on disc, the view or edit, and the unwnated characters will not show

bensai
Champ in-the-making
Champ in-the-making
Try saving your web page(word document) as Unicode or Unicode(UTF-8) encoded.

MS Word doc to html:
   1. write your document with MS Word
   2. select “File” -> “Save as Web Page”
   3. from the “Save as”-dialog select “Tools” -> “Web Options”
   4. from the “Web Options”-dialog select “Encoding”-tab
   5. select “Unicode” or “Unicode(UTF-8)” encoding instead of windows-1252
   6. click ok
   7. type file name and
   8. click save

OR in html file:
   1. change <meta http-equiv=Content-Type content="text/html; charset=windows-1252"> to <meta http-equiv=Content-Type content="text/html; charset=utf-8">

Regards,
Said

kevinr
Star Contributor
Star Contributor
7. Saved the working copy of the file on disc, the view or edit, and the unwnated characters will not show

This proves that Alfresco has not modified the content. As user 'bensai' suggests, the display in the browser is the issue with the display encoding.

Thanks,

Kevin

hannes
Champ in-the-making
Champ in-the-making
Thank You Kevin for Your prompt reply, and bensai as well.
Things are in order now.