cancel
Showing results for 
Search instead for 
Did you mean: 

html content management...save as mht file ?

mattv
Champ in-the-making
Champ in-the-making
Hi all,

I'd like some advice on the following subject : I often find some very interesting articles on the internet and I would find it very useful if I could save the content and store it into Alfresco for later use and to index it.

Storing a simple html page is easy but how would you handle a case when there are images attached ? I'm not interested in having these images managed independently in Alfresco.

I often save pages using the wonderful Scrapbook firefox extension
https://addons.mozilla.org/en-US/firefox/addon/427
I dream about having the folder location where scrapbook saves the pages and its hierarchy, directly pointing on an Alfresco space exposed through cifs.

Another way (maybe more realistic for now) would be to use Internet Explorer to save pages as web archives where image files and html content are stored into one single mht file.
I tried this and uploaded a mht file into Alfresco. Alfresco suggest the 'octet stream' mime type. Doing so, the document never appears as a search result using the right keywords.

When I set the type to html, it gets indexed correctly. But when I try to open it with IE directly from the Alfresco web client, IE doesn't recognize it as a mht file (even if the source shows a multi-part mime-type) and displays it more or less as an ugly text file. To be able to view it correctly, I need to save it on the disk and then open it with IE. Perhaps I need to choose a better mime type when uploading the document ?

Do you have any suggestions regarding this ? How would you store saved web content ?

Thanks.
1 REPLY 1

mattv
Champ in-the-making
Champ in-the-making
for those interested, I found an intermediate solution : I save the web page as a PDF document by "printing" using pdf995 (http://www.pdf995.com).