Hyland Connect

sbzoom · ‎05-03-2007

Hello again.

I have uploaded about 35 projects as "web projects" in my new Alfresco installation. And, not surprisingly, the virtual server falls over because it runs out of memory.

1) A bunch of the apps aren't even webapps and could easily be run through an Apache server instead of tomcat. Is it possible to make the virtual server NOT recognize certain apps?

2) Along those lines, can the virtual server be configured to look at some apps and not others? That way I could setup 2 or 3 virtual Tomcats (on different machines) and have certain instances preview certain apps. Possibly by setting up <Context> elements inside the <Host> element in the virtual-tomcat/conf/server.xml? I would funnel all the requests through one Apache instance, so the preview domain name would be constant. Is this separation possible?

Thanks again.

Charlie

sbzoom · ‎05-03-2007

Ok. It's not the <Context> element that I need to mess with. It's the <Host> element. By looking at AVMHost.java and AVMHostConfig.java I see what the <Host> element in the virtual Tomcat config is doing. It is looping through each directory in the "avm" mount and setting it up as a virtual host (or as just a <Context> element - still not 100% sure).

What I want to be able to do is give the AVMHost.java or AMVHostConfig.java a list of folder paths to deploy. Like, I give it a list of 5 apps to deploy, maybe 10. This way it doesn't try to deploy all of them and run out of memory.

You can see it happening in the logs:


org.alfresco.catalina.host.AVMHostConfig.deployAllAVMwebappsInRepository
‍‍‍

Deploy ALL avm webapps in repository. How can I make it deploy SOME?

Thanks.

Charlie

jcox · ‎05-04-2007

Charlie,

There are three short-term approaches you could take:

Approach [1]
—————
Obviously, there are limits to what you can do here,
but depending on what you're doing, it might be adequate.
You may be nowhere near running out of virtual memory,
but instead are hitting the artificial limit imposed on the JVM
by a configurable option. On Linux/Mac/Solaris you can increase
this limit by tuning the -XmX flag:


#—————————————————————-
#  The -server   flag avoids problems on Linux and Mac.
#  The -Xmx512M flag gives the JVM ample memory.
#—————————————————————-
if ! [[ "${CATALINA_OPTS}" =~ '-server' ]]; then
    CATALINA_OPTS="$CATALINA_OPTS -server"
fi

if ! [[ "${CATALINA_OPTS}" =~ '-Xmx' ]]; then
    CATALINA_OPTS="$CATALINA_OPTS -Xmx512M"
fi‍‍‍‍‍‍‍‍‍‍‍‍

On Windows, if you've got the virtualization server installed
as a service you can adjust the registry setting used by
the JVM using the tomcat executable itself. The tunables
on Windows are:


  tomcat5 //US// –JvmSs NNN
  tomcat5 //US// –JvmMs NNN
  tomcat5 //US// –JvmMx NNN
  etc…
‍‍‍‍‍‍

Or you could tune them all at once:

tomcat5 //US// –JvmSs NNN –JvmMs NNN –JvmMx NNN‍

Approach [2]
—————-
The feature you want is currently in the planning stage.
It's mentioned in the Virtualization Server FAQ:
http://wiki.alfresco.com/wiki/Virtualization_Server_FAQ#Multiple_virtualization_servers_per_webapp
To do this nicely will require some of the new 2.1 features in the backend
(such as the newly developed AttributeService), as well as GUI support,
and a nice way to configure things for both redundancy and work partitioning.

Speaking of that, here's what I've been thinking about so far:
all virtualization servers (tomcat, apache, etc) will register
themselves with the alfresco webapp and specify what they
are willing & unwilling to virtualize, along with other metadata
that will be useful for handling user/project preferences.
On the webapp side, there could also be project-specific and
user-specific preferences. To keep things simple, the default
config is to place every server the "global" pool, and do
session-bound load balancing; however, with tuning the
admin will be able to route different web projects to different
servers, and/or to different sub-pools of servers based on the
meta info (e.g.: apache+php vs tomcat).   Ideally, we'll
also have user-level prefs that will allow advanced users
to override project level & global defaults. Eventually, I'd also
like to check heartbeats on the server pool and provide automatic
pool-aware failover in the web client.

Another thing I've been considering is the role of an EJB (Mule?)
in all of this. A general-purpose message bus would be extremely
useful, and pools of servers is a nice first client.

Anyway, the "wait for a while" approach will give you something
very nice & powerful, but other things are higher priority at the
moment (e.g.: link management), so I probably won't get to
this for a few months.   As I hope you can see, I *have* thought
about it a fair amount, and I regard it as an extremely important
core capability.   We will get there, but I want to do it right.
If you have questions, suggestions, and/or specific requirements
you'd like addressed please let me know.

Approach [3]
—————
Hacking the source probably means throwing away your work
(or maintaining it) when you upgrade.   However, the ability
to choose "swimming at your own risk" is one of the major
benefits of open source, so don't take my words of warning
as discouragement – I just want you to be aware that depending
on the level of integration you're shooting for, it might be a
fair amount of work for what's ultimately a stopgap.

There are a lot of fussy details to get right.   If you
don't have a good understanding of tomcat's internals,
alfresco's internals, transactions, spring, and so forth,
it might end up being a bigger project for you than
it seems.

With these caveats in mind, if you are going to going to hack
AVMHostConfig, the first thing you'll need to do is filter out
all repositories you don't want to virtualize.   Next, be aware
that repositories form hierarchies that reflect the transparent
overlay structure within the AVM.   These hierarchies allow
the virt server to scale horizontally across many users
per common staging area.   Next, it's important to consider
that updates to the virt server take place via JMX messages
and that while there *is* a registration process that takes
place, this registration is currently just expecting a single
virt server, not a collection.   The code was written with an
eye towards pools but none of it is there yet, and the webapp
does not support any sort of dynamic update when it comes
to registration and/or heartbeat failure.   There's other stuff
to worry about too, such as handing broadcasts gracefully
in the face of intermittent network failures, and so on.

Recommendation
——————–
My best guess is that you should try [1] first.
If you win, great.   If not, wait. However, if you
really want "something that does something" now,
I'd limit the hackery to support for a single server that
just ignores certain projects & sandboxes. Doing a
multi-server config properly seems like too much
time invested in a throw-away.

In any event, I hope this overview helps you to
understand the tradeoffs & make a decision that's
appropriate for your application.

    Cheers,
    -Jon

sbzoom · ‎05-04-2007

Jon,

Thanks much for the reply. Made lots of sense. I had already been looking at the code in svn to see how it all connects. What I had been thinking was something pretty simple, since I know there is a lot going on with all the mbean registration and all that.

Here in deployAllAVMwebappsInRepository() is where I would try to pass some kind of list of repositories.


    protected void deployAllAVMwebappsInRepository()
    {
        HashMap<String, AVMWebappDescriptor> webapp_descriptors =  
                   new HashMap<String, AVMWebappDescriptor>();

        LinkedList<String> avm_webapp_paths = new LinkedList<String>();
        try 
        {
            Map<String, Map<QName, PropertyValue>> store_dns_entries = 
                AVMRemote_.queryStoresPropertyKey(
                    QName.createQName(null,".dns.%"));

            for ( Map.Entry<String, Map<QName, PropertyValue>> 
                     store_dns_entry  :  store_dns_entries.entrySet() )
            {
                String store_name  = store_dns_entry.getKey();
                …
            }
        }
    }
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍


String store_name  = store_dns_entry.getKey();
for( String registeredStoreName : registeredStoreNames )
{
    if( store_name.contains( registeredStoreName ) )
    {
        …
        // rest of code as-is from here
    }
}
‍‍‍‍‍‍‍‍‍‍‍

This way everything else would be registered and setup as it was before, but with a limited number of stores. Now, I still have not yet figured out how to pass such a list into AVMHostConfig, but I'll get there.   :wink:

Does that sound like a good temporary solution? Or one that will fall over immediately.

In the meantime my steps will be as follows:
    [1] Move the virtual server to its own machine
         [a] give it plenty of RAM
         increase the jvm size
    [2] Bring down the amount of apps I have from 35 to something like
         10 or 12 (the most that Tomcat usually seems to be able to handle).
    [3] Try out my wacky code hacks.

But there is something else to consider here. Say I have only 5 web project spaces. But for each of those spaces, I have 5 users. With the current setup, that loads up 25 webapps into Tomcat. And if one of those apps is particularly heavy, it will definitely kill the server.

The idea you put forth above with registering which apps Tomcat should start and which ones Apache should is great. That will definitely take a huge load off of Tomcat. But that still doesn't quite fix the issue of lots of users in one space, each with virtualized content (if it is a Tomcat space, Apache can handle all those apps no problem [no jars to load]).

Oh yeah, I wanted to ask about trying to setup 2 or 3 virtual servers. If I just copy the config from one machine to another, will it work? Or will the alfresco server freak out receiving rmi requests from more than one place? If not, an Apache instance with a Tomcat "worker" (ajp13, workers.properties) for each server would work just fine. I could broker off request for certain server names to different workers on different machines.

Thanks again for the help. I'm liking the product a lot and hopefully in a few weeks we will start our move from TeamSite.

Charlie

jcox · ‎05-04-2007

Charlie,

If you go the hack route, the steps you're considering are the right ones…
just be sure that you handle the JMX-triggered loading/unloading properly.

But there is something else to consider here. Say I have only 5 web
project spaces. But for each of those spaces, I have 5 users. With the
current setup, that loads up 25 webapps into Tomcat. And if one of those
apps is particularly heavy, it will definitely kill the server.

When it comes to handling many user-level sandboxes overlaid on top
of a staging area, a single tomcat-based virtualization server instance is
quite scalable. A *lot* of work went into making this posslble.

Some of this is hinted at within the virt. server faq:
http://wiki.alfresco.com/wiki/Virtualization_Server_FAQ#How_is_the_virtualization_server_programmati...

The relevant excerpt is:

Then, a classloader hierarchy for the virtual webapps is created that
parallels the overlay structure of the AVM stores containing them
(this is used whenever possible, to allow the virtualization server to scale).

Thus, even if you have a big set of very large jar files in the
staging area of a web project, virtualizing them for many users
does not cost much at all, due to a combination of custom
AVM-aware classloader logic, some tricks when it comes to
handling Tomcat's "work" dir for non-modifed overlayed WEB-INF
contents, efficient ways of testing for modified files in a subtree using
AVMRemote, and the manner in which virtual hosts are supported
via valve-based reverse proxying + wildcard dns.

The key observation is that most user areas just contain modified
"plain" files like html, txt, jpg, etc… and possibly a few modified jsp
files at a time.   A tiny subset of users will have modified jar files,
and even then, they're prone to submit them sooner or later.
Just like UNIX or Windows can share .so/.dll files amongst apps,
the virt server plays a similar trick per virtual webapp so long as
the user has not modified their personal WEB-INF.   Only if they
do that is a user-level sandbox "heavy" for the virt server.

The only real downside of this is that when someone submits contents
from WEB-INF to staging, the virt server does a recusive reload of
all virtual webapps that overlay staging once the submission is approved.
What you'll see is a momentary delay when browsing for all "client"
virtual webapps that rely on the common staging area (the webapp
sends back a JMX message to the virt server to do a recursive reload
of classes).

My belief is that this downside is minimal when compared to the benefit
of having a single server be able to host a huge number of virtual
webapps hanging off a staging area.   Jar mods are rare, and approval
can be managed by workflow so that the momentary browsing outage
isn't that disruptive.

I should probably call more attention to the fact that we can support
a large number of users hanging off a staging area, but the virt server
faq does mention it a few times (search for the word 'scale').

The alfresco webapp isn't prepared to handle more than 1 virt server
at the moment, and doing so reliably is a bit harder than it looks.
That's why I advised against going this route. You're probably better
off waiting for real support there.

I'm very glad you like Alfresco!

   Cheers,
   -Jon

"Freedom of the press is guaranteed only to those who own one." – A. J. Liebling

sbzoom · ‎05-04-2007

Jon,

Thanks for clearing up the bit about lots of users. I was thinking/hoping that the virtual server did exactly as you described in keeping track of jars in the WEB-INF/lib/ directory. For us, we do not even use JSP pages (since they suck and have to be re-compiled and cache a lot), we use Velocity (just like Freemarker). So we should be spared a lot of the watching work that goes on for reloads.

I agree that having the webapp reload for all users is a small price to pay for having the system work in the way that it does (which is nice).

As for making those changes, I'll see. I might be a lot more talk than walk. I will most likely rely on my current practice that I use for TeamSite, which isn't "virtualization" in the strictest sense, but it works - for us.

We do this by simply mounting the virtual drive on our secondary server(s). For Alfresco it doesn't even need to be an AVM directory. It will be something like \\alfresco-server\alfresco\projects\mysite\. On our standalone Tomcat machine we will have /apps/alfresco/ mapped to \\alfresco-server\alfresco\. Then in our /tomcat/webapps/ directory, we create symlinks to the apps that we want running. This doesn't allow a user to click on a file in Alfresco and have it show up in a browser as it should with "true" virtualization. But, most of our links are "/somepage.htm", which doesn't actually exist. *.htm is mapped to a Spring Controller and forwarded to a Velocity page for display. So our web developers know to to go http://other-server/mysite/ to see any changes they have made Alfrseco:/space/mysite/.

This way, I can setup some projects in the AVM spaces and leave the rest in a document space and still have "previews" available. That makes it easier for me to wait the 3 or so months until it is implemented correctly by Alfresco.

There is only one thing about the above that I am not sure of. Versions. The AVM sites have versions. The regular "document" spaces do not. Files are versioned, not the whole space. So it is tough for us to do deployments based on a version when there is only one version.

I will poke around some more in the documentation and try to find a way to do versions (I could very easily have just missed it). Thanks again for all the help and knowledge.

Charlie

kvc · ‎05-14-2007

Charlie:

Awesome thread. I'm particularly pleased that you see value in our approach to "true" virtualization; we had lots of lessons learned, and our approach at Alfresco centered around truly solving the problem of virtualizing dynamic sites. I'd like to see you take full advantage of this!

Also, I'd love to follow-up with you specifically on your use case integrating content from the DM repo with your website. I am wondering if the new capabilities in 2.0.1E for using a pre-built action as a part of content rule couldn't solve this problem for you (by automatically promoting finalized, approved assets to a specific folder in one or more web projects). Drop me a line at kevinc@alfresco.com to chat.

Kevin

Hyland Connect

limit on number of web projects