cancel
Showing results for 
Search instead for 
Did you mean: 

Alfresco and Apache Stanbol (semantics)

ttownsend
Champ on-the-rise
Champ on-the-rise
Hello all,

I am looking to see if anyone has experience with successfully integrating Alfresco/Share and Apache Stanbol for semantic information extraction and auto-tagging of content with semantic data (tags).

Searching the whole of the Alfresco forums for "semantics" brought up only two threads:
<ol>http://forums.alfresco.com/forum/developer-discussions/repository-services/alfresco-auto-tagging-072...</ol>
<ol>http://forums.alfresco.com/forum/developer-discussions/add-ons/semantic-search-alfresco-05302008-140...</ol>

My environment is fairly straight-forward:
<ol>I have a repository of ~75GB of proprietary and sensitive information</ol>
<ol>I share this repository with my clients/associates to support a number of strategic and operational business processes</ol>
<ol>The repository is almost exclusively text (pdf, doc/docx) and is unstructured data</ol>
<ol>Effectively, 0% of these documents have been tagged in any way</ol>

So, I wish to be able to:
<ol>Configure an Apache Stanbol server in-house</ol>
<ol>Be able to have my entire repository, or individual folders within it, run as a batch</ol>
<ol>Be entirely self-contained with no access to the internet</ol>

From the links I posted above, no clear experiences actually integrating Apache Stanbol with Alfresco CE emerge.
In one of these threads, someone stated that Zaizi was working towards an open-source Stanbol/Alfresco solution, but I've not seen any evidence of this.

I understand that, for example, Semantics4Alfresco looks at providing some semantic tagging capability by extending OpenCalais for this purpose, but (again) my restrictions prevent the use of URL-based APIs or any other method that would take data/information out of my secure server space (Internet baaaaad….).

So, here are a few questions:
<ol>Has anyone reading this successfully integrated Apache Stanbol and Alfresco CE</ol>
<ol>Are you willing to share your development path here or with my privately?</ol>
<ol>Can anyone from Zaizi comment on the status of your Stanbol solution?</ol>

Many thanks and please feel free to PM me if you prefer.
Trevor

3 REPLIES 3

stevereiner
Champ in-the-making
Champ in-the-making
The open source "semantics4alfresco" project has a port that uses Apache Stanbol instead of the OpenCalais Service https://code.google.com/p/semantics4alfresco  http://addons.alfresco.com/addons/apache-stanbol-integration-port-opencalais-integration
This auto tags and has share UI. This can do what you want: using an in-house Apache Stanbol server. Its action can be used in a content rule on a folder or thru its Share menus.

In regards to Zaizi, they are having a webinar tomorrow morning about their solution:
http://www.zaizi.com/events/extracting-knowledge-from-unstructured-documents-in-alfresco

Thanks Steve - I've already signed up for the Zaizi webinar, so I'm hoping to learn more about their path on this, too.

Still very curious to know if anyone in the Alfresco community has done similar work.

Cheers,
Trevor

sramani
Champ in-the-making
Champ in-the-making
Hi,

We have a similar use case where we need to have an in-house solution.  Were you successful in implementing the solution with Alfresco and Apache Stanbol?

Thanks
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.