Extracting all links in published content

dynamolalit — Tue, 14 Jul 2009 12:37:07 GMT

Hi,I am using Alfresco 3.1.1 over JBoss 4.2.3 AS.I have a requirement that i should be able to retrieve all the links that are present in a content(xml) which has been published & i need to update a table with the same.I have gone through http://wiki.alfresco.com/wiki/Metadata_Extraction#XML_Met

Re: Extracting all links in published content

dynamolalit — Mon, 20 Jul 2009 07:50:51 GMT

Hi,

I have found a way to do so by using HTML parser api at

http://htmlparser.sourceforge.net/javadoc/overview-summary.html

& UriExtractor class at

http://svn.alfresco.com/repos/alfresco-open-mirror/alfresco/HEAD/root/projects/link-validation/source/java/org/alfresco/linkvalidation/UriExtractor.java.

You have to tweak the UriExtractor so that you can send the HTML content as String object to extractURIs() method & get all the http links as a Map.

Or you can use xml extractor for extracting http links from xml which is being published.

topic Re: Extracting all links in published content in Alfresco Archive

Extracting all links in published content

Re: Extracting all links in published content