cancel
Showing results for 
Search instead for 
Did you mean: 

JCR querying is not scalable?

panokhin
Champ in-the-making
Champ in-the-making
Hello.

I'm currently evaluating Alfresco for use as a platform for our electronic tendering service. We really like the architecture and code quality, however there are some concerns about its scalability.

I'm trying to do a simple test of loading about 1700 objects (I'll include the complete model and sample code below) and querying them using JCR API (we decided to go with JCR since it seems to becoming a widely adopted standard).

The model is like this: Site->Buyer->Buyer->Notice->LangProperty
There are about 1300 of LangProperties, 300 of Notices and 100 of Buyers.
The computer is AMD Athlon 64 3200+ with 2Gb memory, running both Oracle XE and JBoss+Alfresco.
I ran this XPath query:
Query q = qm.createQuery("//dg:notices/dg:langProperties[jcr:contains(@dgSmiley TongueropText,'solicitation')]/..",Query.XPATH);
and results were quite disappointing - between 60 and 80 seconds on different runs.

So, I wanted to ask 3 questions:
1. Is this poor result due to the fact that JCR support is new in Alfresco, so it's going to be changed in near future to be more scalable?
2. If we use Alfresco search API instead, will it scale better or the problem is in the model we use: few nodes but many (large text) properties?
3. Are there any support (current or planned) in Alfresco's version of JCR search or it's own Lucene search for language-specific search modifiers (e.g. stemming, base letter conversion, fuzzy search)?

Thank you,
Looking forward to hearing your advice,
Philipp.
9 REPLIES 9

panokhin
Champ in-the-making
Champ in-the-making
The model:

<model name="dg:dgmodel" xmlns="http://www.alfresco.org/model/dictionary/1.0">

   <description>dgMarket model</description>
   <author>Philipp Anokhin</author>
   <version>1.0</version>

   <imports>
      <import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d" />
      <import uri="http://www.alfresco.org/model/content/1.0" prefix="cm" />
   </imports>

   <namespaces>
      <namespace uri="http://www.dgmarket.com/model/dgModel/1.0" prefix="dg" />
   </namespaces>

   <types>

      <type name="dg:category">
         <title>dg Categotry</title>
         <parent>cm:content</parent>
         <properties>
            <property name="dg:category">
               <type>d:text</type>
            </property>
            <property name="dg:type">
               <type>d:text</type>
               <default>cpv</default>
            </property>
         </properties>
      </type>

      <type name="dg:country">
         <title>dg Country</title>
         <parent>cm:content</parent>
         <properties>
            <property name="dg:iso">
               <type>d:text</type>
            </property>
         </properties>
      </type>

      <type name="dg:contact">
         <title>dg Contact</title>
         <parent>cm:content</parent>
         <properties>
            <property name="dg:contactId">
               <type>d:text</type>
            </property>
            <property name="dg:contactFirstNames">
               <type>d:text</type>
            </property>
            <property name="dg:contactLastName">
               <type>d:text</type>
            </property>
            <property name="dg:contactTitle">
               <type>d:text</type>
            </property>
            <property name="dg:contactOrganization">
               <type>d:text</type>
            </property>
            <property name="dg:contactAddress">
               <type>d:text</type>
            </property>
            <property name="dg:contactAddress2">
               <type>d:text</type>
            </property>
            <property name="dg:contactCity">
               <type>d:text</type>
            </property>
            <property name="dg:contactProvince">
               <type>d:text</type>
            </property>
            <property name="dg:contactPostalCode">
               <type>d:text</type>
            </property>
            <property name="dg:contactPhone">
               <type>d:text</type>
            </property>
            <property name="dg:contactFax">
               <type>d:text</type>
            </property>
            <property name="dg:contactEmail">
               <type>d:text</type>
            </property>
            <property name="dg:contactUrl">
               <type>d:text</type>
            </property>
            <property name="dg:contactCountry">
               <type>d:noderef</type>
            </property>
         </properties>
         <mandatory-aspects>
            <aspect>cm:versionable</aspect>
         </mandatory-aspects>
      </type>

      <type name="dg:langProperty">
         <title>dg Language Property</title>
         <parent>cm:content</parent>
         <properties>
            <property name="dg:propName">
               <type>d:text</type>
               <mandatory>true</mandatory>
            </property>
            <property name="dg:propText">
               <type>d:text</type>  <– This can be very large
               <mandatory>false</mandatory>
               <index enabled="true">
                  <atomic>true</atomic>
                  <stored>false</stored>
                  <tokenised>true</tokenised>
               </index>
            </property>
            <property name="dg:propLang">
               <type>d:text</type>
               <mandatory>true</mandatory>
            </property>
         </properties>
      </type>

      <type name="dg:notice">
         <title>dg Notice</title>
         <parent>cm:folder</parent>
         <properties>
            <property name="dg:noticeId">
               <type>d:text</type>
            </property>
            <property name="dg:noticeType">
               <type>d:text</type>
            </property>
            <property name="dg:noticeLang">
               <type>d:text</type>
            </property>
            <property name="dg:noticeSubmitted">
               <type>d:date</type>
            </property>
            <property name="dg:noticeDeadline">
               <type>d:datetime</type>
            </property>
            <property name="dg:noticeExpiration">
               <type>d:date</type>
            </property>
            <property name="dg:noticeMethod">
               <type>d:text</type>
            </property>
            <property name="dg:noticeCity">
               <type>d:text</type>
            </property>
            <property name="dg:noticePostalCode">
               <type>d:text</type>
            </property>
            <property name="dg:noticePublisher">
               <type>d:text</type>
            </property>
            <property name="dg:noticeCountry">
               <type>d:noderef</type>
            </property>
            <property name="dg:noticeCategory">
               <type>d:noderef</type>
               <multiple>true</multiple>
            </property>
         </properties>
         <associations>
            <child-association name="dg:noticeContact">
               <source>
                  <mandatory>false</mandatory>
                  <many>false</many>
               </source>
               <target>
                  <class>dg:contact</class>
                  <mandatory>false</mandatory>
                  <many>false</many>
               </target>
               <duplicate>false</duplicate>
            </child-association>
         </associations>
         <mandatory-aspects>
            <aspect>cm:versionable</aspect>
         </mandatory-aspects>
      </type>

      <type name="dg:buyer">
         <title>dg Buyer</title>
         <parent>cm:folder</parent>
         <properties>
            <property name="dg:buyerId">
               <type>d:text</type>
            </property>
            <property name="dg:buyerType">
               <type>d:text</type>
            </property>
            <property name="dg:buyerLang">
               <type>d:text</type>
            </property>
            <property name="dg:buyerCountry">
               <type>d:noderef</type>
            </property>
         </properties>
         <associations>
            <child-association name="dg:buyerContact">
               <source>
                  <mandatory>false</mandatory>
                  <many>false</many>
               </source>
               <target>
                  <class>dg:contact</class>
                  <mandatory>false</mandatory>
                  <many>false</many>
               </target>
               <duplicate>false</duplicate>
            </child-association>
         </associations>
      </type>
     
      <type name="dg:site">
         <title>dg Site</title>
         <parent>cm:folder</parent>
         <properties>
            <property name="dg:siteName">
               <type>d:text</type>
            </property>
         </properties>
      </type>
     
    </types>

</model>

panokhin
Champ in-the-making
Champ in-the-making
Generating code:

   public static Node addBuyer(Node node, Buyer buyer)
      throws Exception
   {
        Node b = node.addNode("dg:buyers","dg:buyer");
        b.setProperty("dg:buyerId",buyer.getId().toString());
        b.setProperty("cm:name","buyer:"+buyer.getId().toString());
        //print(b);
      if(buyer.getBuyerTexts()!=null)
      {
         Iterator it = buyer.getBuyerTexts().iterator();
         while(it.hasNext())
         {
            BuyerText bt = (BuyerText)it.next();
            Node lp = b.addNode("dg:langProperties","dg:langProperty");
              lp.setProperty("cm:name",bt.getTextType());
              lp.setProperty("dg:propName",bt.getTextType());
              lp.setProperty("dg:propLang",bt.getLang());
              lp.setProperty("dg:propText",bt.getText());
              lp.setProperty("cm:content",bt.getText());
              //print(lp);
         }
      }
      if(buyer.getBuyerNotices()!=null)
      {
         System.out.println("\t\t\t\tNumber of notices is: "+buyer.getBuyerNotices().size());
         System.out.flush();
         Iterator it = buyer.getBuyerNotices().iterator();
         while(it.hasNext())
         {
            Notice notice = (Notice)it.next();
              Node n = b.addNode("dg:notices","dg:notice");
              n.setProperty("dg:noticeId",notice.getId().toString());
              n.setProperty("cm:name","notice:"+notice.getId().toString());
              //print(n);
            if(notice.getNoticeTexts()!=null)
            {
               Iterator itt = notice.getNoticeTexts().iterator();
               while(itt.hasNext())
               {
                  NoticeText nt = (NoticeText)itt.next();
                  Node lp = n.addNode("dg:langProperties","dg:langProperty");
                    lp.setProperty("cm:name",nt.getTextType());
                    lp.setProperty("dg:propName",nt.getTextType());
                    lp.setProperty("dg:propLang",nt.getLang());
                    lp.setProperty("dg:propText",nt.getText());
                    lp.setProperty("cm:content",nt.getText());
                    //print(lp);
               }
            }
         }
      }
      return b;
   }

kevinr
Star Contributor
Star Contributor
I'm trying to do a simple test of loading about 1700 objects (I'll include the complete model and sample code below) and querying them using JCR API (we decided to go with JCR since it seems to becoming a widely adopted standard).

So, I wanted to ask 3 questions:
1. Is this poor result due to the fact that JCR support is new in Alfresco, so it's going to be changed in near future to be more scalable?
2. If we use Alfresco search API instead, will it scale better or the problem is in the model we use: few nodes but many (large text) properties?
3. Are there any support (current or planned) in Alfresco's version of JCR search or it's own Lucene search for language-specific search modifiers (e.g. stemming, base letter conversion, fuzzy search)?

1. The problem is the JCR XPath search - it currently involves loading nodes into memory and walking them. We can change this impl in the future (to say use Lucene search) but at the moment it is needed to support the standard.

2. Yes MUCH better! I would you use Alfresco native APIs and particularly the Lucene search API. This will give vastly better performance than the JCR search.

3. I'm not sure about this. I believe the Lucene API has language specific stemming support yes.

Thanks,

Kevin

panokhin
Champ in-the-making
Champ in-the-making
Thanks for the fast reply. I'll try the Alfresco native search API and post the results here.

paulhh
Champ in-the-making
Champ in-the-making
Hi

You might want to take a look at our new SDK bundle - if you're not interested in utilising any of the web client.

Our native API is more service-oriented than the JCR, which we've found suits more people.

Cheers
Paul.

davidc
Star Contributor
Star Contributor
FYI, you can also mix JCR and Alfresco native code.

So, you can continue to code in JCR, but jump out to the Alfresco native API for search.  Then, when we've moved the JCR search over to our Lucene based search, you'll have minimal effort to go 100% JCR.

An example of mixed code is given in http://wiki.alfresco.com/wiki/Introducing_the_Alfresco_Java_Content_Repository_API#Mixing_JCR_and_Al...

panokhin
Champ in-the-making
Champ in-the-making
Thank you, David - I'll do just that.
Meanwhile, can you tell me - is there already a timeline for JCR->Lucene search conversion? And what the priority of this would be in your plan?
Thanks,
Philipp.

hazmat
Champ in-the-making
Champ in-the-making
subject has the gist.. i notice a few other services also utilize xpath queries (filefolder being the most sig. imo, since it does per segment xpath queries on resolving a path ). are these also subject to this sort of horrid scaling characteristics?

there are several mentions on this forums of scale testing, with 100k and millions of content items (also load tests in svn), but also of notes of serious scaling deficiencies ( like avoiding large amounts of content in a single container). are these scale tests that  just repository load tests, or is there any real world usage or front end application to them?

kevinr
Star Contributor
Star Contributor
Both us and several of our partners and customers have performed load and benchmark tests against the Alfresco repository and web-client. We have be shown to scale excellently in many areas and others areas that we feel improvements are needed. We made big strides forward for release 1.2 and 1.3 is better again (e.g. the large number of objects per container issue is vastly improved).

For 1.3 we have changed the database schema and this has made a big improvement, there are also other performance improvements in 1.3 - you should see the results of these for the next 1.3 release drop.

The XPath API issue is definitely an implementation issue rather than a repository scaling issue - it could be made a lot, lot faster! When it was implemented originally, both the XPath and JCR APIs were seen as the "second choice" to the native Alfresco APIs. Recently we have been applying resource specifically to improve the JCR performance as more and more people are now using it.

We have an in-house benchmark suite that performance tests the Alfresco native APIs, Alfresco JCR-170 APIs and tests agaist other JCR-170 repositories. It also performs bulk load tests (which Alfresco is particluarly good at). We will make all these tests available in the 1.3 source download.

It think for a 1.X product Alfresco has an impressive ammount of useful functionality and shown to be very stable and performant (almost all crashing/out-of-memory issues in 1.1/1.2 have been the result of JVM/OS or 3rd party libraries!) - I think you can rest assured that it will continue to get faster and more stable in the future.

Thanks,

Kevin