cancel
Showing results for 
Search instead for 
Did you mean: 

Extracting all links in published content

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I am using Alfresco 3.1.1 over JBoss 4.2.3 AS.I have a requirement that i should be able to retrieve all the links that are present in a content(xml) which has been published & i need to update a table with the same.I have gone through

http://wiki.alfresco.com/wiki/Metadata_Extraction#XML_Meta-data_Extractor_Configuration_for_WCM

but it has been defined for Tomcat setup only not for JBoss.Also i could find UriExtractor API in Alfresco that gives you all the links present in a content in a map as described here

http://dev.alfresco.com/resource/docs/java/link-validation/org/alfresco/linkvalidation/UriExtractor.....

But i am not able to get it how to call this API to get all the links which are present in my published content so that i can get all the links & update DB table.

Do anyone has any idea?This is very critical for my project.Can there be any other way to achieve the same?
1 REPLY 1

dynamolalit
Champ on-the-rise
Champ on-the-rise
Hi,

I have found a way to do so by using HTML parser api at

http://htmlparser.sourceforge.net/javadoc/overview-summary.html

& UriExtractor class at

http://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/link-validation/sourc....

            You have to tweak the UriExtractor so that you can send the HTML content as String object to extractURIs() method & get all the http links as a Map.

Or you can use xml extractor for extracting http links from xml which is being published.