I am looking to see if anyone has experience with successfully integrating Alfresco/Share and Apache Stanbol for semantic information extraction and auto-tagging of content with semantic data (tags).
My environment is fairly straight-forward: <ol>I have a repository of ~75GB of proprietary and sensitive information</ol> <ol>I share this repository with my clients/associates to support a number of strategic and operational business processes</ol> <ol>The repository is almost exclusively text (pdf, doc/docx) and is unstructured data</ol> <ol>Effectively, 0% of these documents have been tagged in any way</ol>
So, I wish to be able to: <ol>Configure an Apache Stanbol server in-house</ol> <ol>Be able to have my entire repository, or individual folders within it, run as a batch</ol> <ol>Be entirely self-contained with no access to the internet</ol>
From the links I posted above, no clear experiences actually integrating Apache Stanbol with Alfresco CE emerge. In one of these threads, someone stated that Zaizi was working towards an open-source Stanbol/Alfresco solution, but I've not seen any evidence of this.
I understand that, for example, Semantics4Alfresco looks at providing some semantic tagging capability by extending OpenCalais for this purpose, but (again) my restrictions prevent the use of URL-based APIs or any other method that would take data/information out of my secure server space (Internet baaaaad….).
So, here are a few questions: <ol>Has anyone reading this successfully integrated Apache Stanbol and Alfresco CE</ol> <ol>Are you willing to share your development path here or with my privately?</ol> <ol>Can anyone from Zaizi comment on the status of your Stanbol solution?</ol>
Many thanks and please feel free to PM me if you prefer. Trevor
We have a similar use case where we need to have an in-house solution. Were you successful in implementing the solution with Alfresco and Apache Stanbol?