Read document content (doc, docx, odt)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2009 12:15 PM
In my first application I've read document content using Alfresco Web Service and apache POI 3.5 and PdfBox libs.
Now I'm developing an action for alfresco 3.2, so I cannot use POI 3.5 because alfresco already contains the 3.1 version.
For pdf documents there's no problem….because I can use the ContentReader inputStream in pdfbox and convert it into plain text (String).
But for doc/docx/odt? How can I read the document content?
Thanks in advance,
Revenge
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2009 06:15 AM
check the openoffice API, Alfresco already includes it and you can use it to read a lot of document formats
Regards
Ivo Costa
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2009 10:03 AM
Word97TextExtractor extractor = new Word97TextExtractor(this._stream);strContent = extractor.getText();
while looking for other classes I've found the UnoContentTransformer.java that transforms the openoffice supported documents directly into the repository….
but it uses net.sf.joott.uno package that I don't find in the SDK
Do you have any suggestion to solve this problem? Or if you know which classes could I use?
Thanks,
Revenge
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2009 11:11 AM
ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT); if (reader != null && reader.exists()) { // get the transformer ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN); // is this transformer good enough? if (transformer == null) { // We have a transformer that is fast enough ContentWriter writer = contentService.getTempWriter(); writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN); try { transformer.transform(reader, writer); // point the reader to the new-written content reader = writer.getReader(); // Check that the reader is a view onto something concrete if (!reader.exists()) { throw new ContentIOException("The transformation did not write any content, yet: \n" + " transformer: " + transformer + "\n" + " temp writer: " + writer); } } catch (ContentIOException e) { } } }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2009 06:19 AM
if (transformer == null)
in if (transformer != null)
and then reader = writer.getReader();// Check that the reader is a view onto something concreteif (!reader.exists()) { throw new ContentIOException( "The transformation did not write any content, yet: \n" + " transformer: " + transformer + "\n" + " temp writer: " + writer);} else { content = reader.getContentString();}
The first time I excluded this transformer because I tought it worked only in repository…. (it works in this way…but on temporary files… so I can get the reader based on the temp file)… but when you posted the code…. I've understood how it works…
Thanks very much!
Bye,
Revenge
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 04:36 AM
that "net.sf.joott.uno" package you didn't find is part of the openoffice API
just in case you need something more complicated
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-05-2009 05:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2010 08:43 AM
I'm doing (or trying to do) something like that, and I've some questions abouta that code…
I want to transform an html code, which I obtain from a jsp to word format, because I'm trying to create a document in alfresco through the web services api…I need to do the transformation before creating the word document, but I'm blocked with that.
I create a node like that:
Store storeRef = new Store(Constants.WORKSPACE_STORE, "SpacesStore"); ParentReference companyHomeParent = new ParentReference(storeRef, null, "/app:company_home", Constants.ASSOC_CONTAINS, null); companyHomeParent.setChildName("cm:" + name); String id=companyHomeParent.getUuid(); Reference nodeRef = new Reference(storeRef, id, null);
and then I tried to probe the code it's before, but I don't understand what is contentService's value…If someone can help me, i'll very thankful!
ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT); if (reader != null && reader.exists()) { // get the transformer ContentTransformer transformer = contentService.getTransformer(reader.getMimetype(), MimetypeMap.MIMETYPE_TEXT_PLAIN); // is this transformer good enough? if (transformer == null) { // We have a transformer that is fast enough ContentWriter writer = contentService.getTempWriter(); writer.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN); try { transformer.transform(reader, writer); // point the reader to the new-written content reader = writer.getReader(); // Check that the reader is a view onto something concrete if (!reader.exists()) { throw new ContentIOException("The transformation did not write any content, yet: \n" + " transformer: " + transformer + "\n" + " temp writer: " + writer); } } catch (ContentIOException e) { } } }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2010 08:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-25-2014 05:36 AM