cancel
Showing results for 
Search instead for 
Did you mean: 

RepositoryServiceSoapBindingStub.fetchMore() method

pitt1
Champ in-the-making
Champ in-the-making
I would like to programmatically grab results from a lucene query in chunks, and furthermore would like to be able to make sure that I can grab all search results.  That is, if there are 2950 results and I want to retrieve them in batches of 100, I want to be able to make 30 successive calls to retrieve the next batch of results.

First I noticed that no matter what I did with web services (or even running the query in the node browser), search results were constrainted to 1000.  Support directed me to parameters in repository.properties that allow more than 1000 results to come back:

# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=100000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=5000

(Default values for these properties are 10000 and 1000, respectively).  After changing these, my web service query returned more than 1000 results.  While this is helpful, it does not solve the problem of controlling results returned via the API.

I noticed that there is a RepositoryServiceSoapBindingStub.fetchMore() method, which, I surmised, must allow batched fetching of results.  I tried the following code:



    QueryResult queryResult = repositoryService.query( store , query, false );
    String querySession = queryResult.getQuerySession();

    ResultSet resultSet = queryResult.getResultSet();
    ResultSetRow[] rows = resultSet.getRows();
   
    if (rows!= null) {
        System.out .println( "RESULTS: there are "+rows. length + " results");
        QueryResult q = repositoryService .fetchMore(querySession);
        ResultSet rs = q.getResultSet();
        ResultSetRow[] r = rs.getRows();
        System.out .println( "RESULTS: there are "+r. length + " results in the second batch");
}

This code was run with max permission checks set to 200, even though there were over 500 results to my query, so I knew that there should be more results to fetch.  Axis complained rather badly when I ran this code.

Client-side error:
ERROR: ; nested exception is:
org.xml.sax.SAXParseException : Premature end of file.
AxisFault
faultCode: { http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: org.xml.sax.SAXParseException : Premature end of file.
faultActor:
faultNode:
faultDetail:
{ http://xml.apache.org/axis/ }stackTraceSmiley Surprisedrg.xml.sax.SAXParseException : Premature end of file.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.apache.axis.encoding.DeserializationContext.parse( DeserializationContext.java:227 )
at org.apache.axis.SOAPPart.getAsSOAPEnvelope( SOAPPart.java:696 )
at org.apache.axis.Message.getSOAPEnvelope( Message.java:435 )
at org.apache.axis.handlers.soap.MustUnderstandChecker.invoke( MustUnderstandChecker.java:62 )
at org.apache.axis.client.AxisClient.invoke( AxisClient.java:206 )
at org.apache.axis.client.Call.invokeEngine( Call.java:2784 )

Server-side error:
14:13:04,487 ERROR [org.apache.axis.Message] java.io.IOException:
AxisFault
faultCode: { http://schemas.xmlsoap.org/soap/envelope/}Server.generalException
faultSubcode:
faultString:
faultActor:
faultNode:
faultDetail:
        { http://xml.apache.org/axis/ }exceptionNameSmiley Surprisedrg.alfresco.repo.webservice.repository.RepositoryFault
        { http://xml.apache.org/axis/}stackTrace:
        at org.alfresco.repo.webservice.repository.RepositoryWebService.fetchMore(RepositoryWebService.java:465)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.axis.providers.java.RPCProvider.invokeMethod(RPCProvider.java:397)
        at org.apache.axis.providers.java.RPCProvider.processMessage(RPCProvider.java:186)
        at org.apache.axis.providers.java.JavaProvider.invoke(JavaProvider.java:323)
        at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32)

So my questions are:

Is fetchMore() supposed to do what I want?  If so, what am I doing wrong?  If not, is there a way to set the batch size and max total number of returned results via web services?
4 REPLIES 4

gblomqui
Champ in-the-making
Champ in-the-making
I'm very surprised that no one has responded to this post.  I have also noticed this issue.


I attached a debugger to a locally running Alfresco server (2.0 running on JBoss) and managed to track down what I think is going on.

RepositoryWebService.query delegates to  QuerySession.getNextResultsBatch to get the first batch of results based on the given query.

The ResultSetQuerySession.getNextResultsBatch (the ResultSet implementation of the QuerySession interface) method does all the work of assembling the query result set.  After it's done building the query result set, it calls AbstractQuerySession.updatePosition.  The updatePosition method sets the position instance variable based on the batchSize instance variable.  However, the ResultSetQuerySession.getNextResultsBatch method never sets the batchSize instance variable, so position is set to -1.

Later, when you call RepositoryWebService.fetchMore, it too delegates to QuerySession.getNextResultsBatch to get the next batch of results based on the established query.  The ResultSetQuerySession.getNextResultsBatch method compares position to -1 and returns null because it believes there is no more data to read.  The RepositoryWebService.fetchMore method does not handle a null return value from QuerySession.getNextResultsBatch, thus you get your esoteric error.

I haven't dug into the code enough to determine how easy it is to fix this.

I also have not determined if upgrading to 2.1 fixes this problem.

I also don't know if it's simply something I'm doing wrong that's causing the batchSize instance variable to not be set.  I can't immediately tell that this is the problem.  But, it could easily be.

gblomqui
Champ in-the-making
Champ in-the-making
Still digging…

I found that the batchSize is actually set by RepositoryWebService.query by calling Utils.getBatchSize.  According to the JavaDoc, that method reads the fetchSize from the QueryConfiguration soap header.

Time to figure out how to send the fetchSize in the QueryConfiguration soap header.

gblomqui
Champ in-the-making
Champ in-the-making
There's actually a good example of how to set the "fetchSize" in the QueryHeader in one of the repository service tests.  Here's what I've got based on that test:

String sQuery = "TYPE:\"" + Constants.TYPE_CONTENT + "\"";

// set the batch size in the query header
int batchSize = 10;
QueryConfiguration queryCfg = new QueryConfiguration();
queryCfg.setFetchSize(batchSize);
RepositoryServiceSoapBindingStub repositoryService = WebServiceFactory.getRepositoryService();
repositoryService.setHeader(new RepositoryServiceLocator().getServiceName().getNamespaceURI(), "QueryHeader", queryCfg);

//  get the first batch of results
QueryResult result = repositoryService.query(STORE, getQuery(sQuery), true);
//process the first query result
String querySession = result.getQuerySession();
while (querySession != null) {
    //  get the next batch of results
    result = repositoryService.fetchMore(querySession);
    // process subsequent query results
    querySession = result.getQuerySession();
}

kdejaeger
Champ in-the-making
Champ in-the-making
I had problems too with fetchMore. Setting the fetchSize solved the problem.
Apparrently the default batchSize of 1000 in org.alfresco.repo.webservice.Utils is not used since QueryConfigHandler.ALF_FETCH_SIZE is always present (as New Integer(0)). Resulting in a nullpointerexception in RepositoryWebService.

Can somebody verify this? So using fetchMore without setting the fetchSize results in an error. Enable this line in log4j.properties : log4j.logger.org.alfresco.repo.webservice=debug