Hyland Connect

samuel_penn · ‎09-10-2008

Hi all,

What I would like to be able to do in WCM, is create pages which are derived from the content of the entire site. For example, a "recentnews.html" which contains a list of the most recent news articles. Currently, the easiest way to do this seems to be to have a web form for news articles, with two rendition templates - one for the news article itself, and a second which creates the "recentnews.html" file by calling parseXMLDocuments() to find the list of all news articles.

This works fine if there's one user of the system. Each time a news article is created or updated, the recentnews.html is updated automatically. Which is good.

With multiple users however, it breaks, since the recentnews.html becomes locked as soon as the first user makes a change, and other users can't change the version in their sandbox (and get errors if they create/edit a news article because one of the rendition targets is locked).

Is there any way of marking a page as being derived from other content, and therefore shouldn't be locked - effectively each sandbox has its own version.

Failing that, is there another way of achieving this goal (in Alfresco 2.2)? Methods I can thing of:
* Define a recentnews web form, and have users manually refresh it.
* Have the page dynamically generated each time it is viewed, but this could be slow, especially since there will probably be lots of them (navigation bars and the like have the same problem).

I guess this post is as much a 'wish list' as it is a 'how do I?'. If there was a way to generate such derived content from events which are fired when a sandbox changes, then again that could be useful.

Thanks,

Sam.

pmonks · ‎09-11-2008

This has been covered previously, eg. at http://forums.alfresco.com/en/viewtopic.php?f=29&t=13865:

The best solution at this time is to simply punt the problem to the delivery tier, and compose "multi input" pages (such as the job posting listing page) at request time. If performance is a concern, this can be made efficient by pre-generating HTML snippets for each job listing (both the table-of-contents snippet and the body snippet) and then simply including those snippets at request time to compose the final page impression (you might use server side includes for example, or <jsp:includes> or what-have-you). This doesn't have the dependency or race condition problems described above, since it's a simple 1-to-1 mapping of input Web Form content to output renditions (which are now a larger number of snippets, rather than a small number of fully composed pages).

Cheers,
Peter

samuel_penn · ‎09-11-2008

Okay, but I'm not sure what you mean by "multi-input pages". I think what you mean is to have the main pages use includes to bring the navigation/recent news etc content in from snippet pages. However, unless I'm being dense, this doesn't solve the issue of how these snippets are generated.

In my example, recentnews.html is effectively a snippet page, which contains a <div>, with some headings, links and summary text for each news article. The links would point to the full news article page. The front page of the site (and any other page which wants to include it), would <jsp:include> the recentnews.html into a sidebar. Ditto for navigation bars and the like - they're all HTML fragments which are included into the pages which need them.

If recentnews.html contains the most recent 5 news stories, how do we generate that? We either pre-generate it at authoring time (which runs into the serialisation problem you mention, and which I've run into), or we build it dynamically at view time, which could be a performance hit, since we'd have to search all existing news articles, and find the most recent 5.

Am I making sense, or just missing something obvious?

Thanks,
Sam.

pmonks · ‎09-11-2008

"Multi-input pages" was the phrase used in that other thread to describe any page that sources its content from more than one content item. In your example the recentnews.html page would be an example of a "multi-input page", since it gets composed from multiple individual news article content items (amongst other things).

When looking for opportunities to pre-generate various parts of a site, rather than using page level granularity (which, for the reasons discussed in the other thread, isn't practical for all pages, particularly multi-input pages), it's best to look for opportunities to generate sub-page "snippets" of HTML that can then be dynamically (ie. at request time) composed together to form the final full page impression. This still gives a significant performance benefit since there's very little conversion from XML to HTML occurring at request time - the HTML itself is mostly pre-generated and simply needs to be stitched together (via includes) in the right combinations at request time. What's more, there are a variety of different ways to include HTML snippets (from jsp:include to SSIs) that give some options for tuning performance.

Now as you point out the bit that's still dynamic (ie. there is some logic beyond simple includes that has to execute at request time) is the logic that determines which snippets meet the criteria for inclusion in the recentnews.html page. This is where you need some kind of queryable content repository in the delivery environment eg. an ASR (see http://wiki.alfresco.com/wiki/ASR and http://wiki.alfresco.com/wiki/Deployment#Alfresco_To_Alfresco_Deployment_.28ASR.29 for more details) and the webapp would query this repository at request time in order to determine which content items it needs to include into the final page.

If that delivery side content repository is an ASR, this would be accomplished by calling a Web Script (http://wiki.alfresco.com/wiki/Web_Scripts) that executes a Lucene search and returns some kind of reference to the top-N matching items. In many cases this would be a list of paths to the pre-generated HTML snippets for each of the matching items, which the webapp could then simply loop through and include one by one (thereby minimising the processing that's occurring in the webapp yet further). These HTML snippet files would be located on disk (to maximise inclusion performance), having been deployed there via an FSR.

Now depending on the size of your content set, the complexity of the queries used to determine which content items to include in the page, etc. etc., you may find that performance needs to be tuned further. There are two complimentary approaches to this:

Tuning the ASR

Adding a caching layer in your webapp that caches Web Script calls

Cheers,
Peter

samuel_penn · ‎09-12-2008

Okay, thanks. I guess I need to look into how best to cache the results then.

Sam.

pmonks · ‎09-12-2008

FWIW it's also a good idea to look at caching final page impressions if possible as well - that gives even better results than just caching the output from Web Script calls (and in fact there's no reason not to do both). OSCache (http://www.opensymphony.com/oscache/) has a very handy servlet filter that can do this (http://www.opensymphony.com/oscache/wiki/CacheFilter.html).

Cheers,
Peter

steventux · ‎10-03-2008

Is it possible to call a web script via an SSI or jsp:include? For example if the web script returned the HTML content of items for a search or latest by date, can the script be executed via some sort of page include? I have attempted this but I believe the HTTP basic auth is killing it (a c:import tag complains of the response code 401). I've read on the Web Scripts Wiki pages that they can be run as a specific user - is this the runas attribute in the desc.xml <authentication/> node? Also there is brief mention of a ticket being appended to the url but doesn't give much detail on this.

samuel_penn · ‎10-03-2008

Is it possible to call a web script via an SSI or jsp:include? For example if the web script returned the HTML content of items for a search or latest by date, can the script be executed via some sort of page include? I have attempted this but I believe the HTTP basic auth is killing it (a c:import tag complains of the response code 401). I've read on the Web Scripts Wiki pages that they can be run as a specific user - is this the runas attribute in the desc.xml <authentication/> node? Also there is brief mention of a ticket being appended to the url but doesn't give much detail on this.

Yes, this is possible. I've managed to do it via a nasty hack, by building up an URLConnection. I have a JSP page which reads a session variable to get the URL (including parameters) to call, which calls the script, sets a suitable authentication on the connection, and just writes out the results. The code below is what I wrote to get something working - there's better ways to store the user/password etc (it's base 64 encoded version of the string "user

assword").


    String      scriptName = session.getAttribute("scriptName").toString();
    
    if (scriptName != null && scriptName.length() > 0) {
        try {
            URL                     url = new URL("http://localhost:8080/alfresco/service'+scriptName);
            URLConnection           connection = url.openConnection();
            connection.setRequestProperty("Authorization", "Basic HURtaHJSYHFdDdW4=");
            InputStream             stream = connection.getInputStream();
            BufferedInputStream     in = new BufferedInputStream(stream);

            int i;
            while ((i = in.read()) != -1) {
                out.write(i);
            }
            out.flush();
            
        } catch (Throwable e) {
            // Silently fail.
        }
    }
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I had to read a session variable, because there doesn't seem to be a way to pass a variable in a <jsp:include/>. You can pass constant parameters, but not variable parameters. So I set the session variable before including the page.

If anyone has a better solution, I'd love to see it.

Sam.

steventux · ‎10-03-2008

Thanks Sam, this got me started. I stuffed the java code into /WEB-INF/tags/webScript.tag :

<%@ attribute name="server" required="true" %>
<%@ attribute name="port" required="true" %>
<%@ attribute name="path" required="true" %>
<%@ attribute name="queryString" required="false" %>
<%
   StringBuilder urlStr = new StringBuilder("http://");
   urlStr.append(server).append(":").append(port);
   urlStr.append("/alfresco/service");
   urlStr.append(path.indexOf("/") == 0 ? "" : "/").append(path);
   if (queryString != null) urlStr.append(queryString.indexOf("?") == 0 ? "" : "?").append(queryString);
   try {
      java.net.URL url = new java.net.URL(urlStr.toString());
      java.net.URLConnection connection = url.openConnection();
      connection.setRequestProperty("Authorization", "Basic YWRtaW46YWRtaW4=");
      java.io.BufferedInputStream in = new java.io.BufferedInputStream(connection.getInputStream());
      int i;
      while ((i = in.read()) != -1) {
            out.write(i);
        }
        out.flush();
     
     } catch (Throwable e) {
         // Silently fail.
     }
%>‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Other alternatives are to use the Jakarta Scrape taglib as this caters for basic auth, or roll your own tag class - at least this would have the advantage of looking up application config for the Alfresco Web Script server hostname, port and security credentials.
Cheers
Steve

Hyland Connect

Derived page content