cancel
Showing results for 
Search instead for 
Did you mean: 

more than 1000 records in webservice search api?

itsard
Champ in-the-making
Champ in-the-making
Hi,
I am trying to search the repository documents but am not able to get more than 1000 records from the search result.
The query I am using is :

PATH:"/app:company_home/cm:TestMySpace//*" AND TEXT:"*a*"

I tried changing these parameters in the repository.properties file
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000
by appending a zero at the end of both the values still I aint getting more than 1000 records althrough there are 1800 docs available of that  criteria.
If I reduce the maxPermissionChecks value to 500 it gets reflected but more than 1000 fetches just 1000 rows.
Kindly let me know the solution.
9 REPLIES 9

gokceng
Champ in-the-making
Champ in-the-making
hi itsard,
lucene.query.maxClauses=10000 will be enough in the same file.

itsard
Champ in-the-making
Champ in-the-making
Hi,
Thanks for your reply.
My repository.properties file is as below.
Still its not giving me more than 1000 records by search query as mentioned above


# Repository configuration

repository.name=Main Repository

# Directory configuration

dir.root=./alf_data

dir.contentstore=${dir.root}/contentstore
dir.contentstore.deleted=${dir.root}/contentstore.deleted

dir.auditcontentstore=${dir.root}/audit.contentstore

# The location for lucene index files
dir.indexes=${dir.root}/lucene-indexes

# The location for index backups
dir.indexes.backup=${dir.root}/backup-lucene-indexes

# The location for lucene index locks
dir.indexes.lock=${dir.indexes}/locks

# ######################################### #
# Index Recovery and Tracking Configuration #
# ######################################### #
#
# Recovery types are:
#    NONE:     Ignore
#    VALIDATE: Checks that the first and last transaction for each store is represented in the indexes
#    AUTO:     Validates and auto-recovers if validation fails
#    FULL:     Full index rebuild, processing all transactions in order.  The server is temporarily suspended.
index.recovery.mode=VALIDATE
# FULL recovery continues when encountering errors
index.recovery.stopOnError=false
index.recovery.maximumPoolSize=5
# Set the frequency with which the index tracking is triggered.
# For more information on index tracking in a cluster:
#    http://wiki.alfresco.com/wiki/High_Availability_Configuration_V1.4_to_V2.1#Version_1.4.5.2C_2.1.1_an...
# By default, this is effectively never, but can be modified as required.
#    Examples:
#       Once every five seconds: 0/5 * * * * ?
#       Once every two seconds : 0/2 * * * * ?
#       See http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
index.tracking.cronExpression=* * * * * ? 2099
index.tracking.adm.cronExpression=${index.tracking.cronExpression}
index.tracking.avm.cronExpression=${index.tracking.cronExpression}
# Other properties.
index.tracking.maxTxnDurationMinutes=10
index.tracking.reindexLagMs=10000
index.tracking.maxRecordSetSize=100
index.tracking.maxTransactionsPerLuceneCommit=100
index.tracking.disableInTransactionIndexing=false
# Index tracking information of a certain age is cleaned out by a scheduled job.
# Any clustered system that has been offline for longer than this period will need to be seeded
# with a more recent backup of the Lucene indexes or the indexes will have to be fully rebuilt.
# Use -1 to disable purging.  This can be switched on at any stage.
index.tracking.minRecordPurgeAgeDays=30

# Change the failure behaviour of the configuration checker
system.bootstrap.config_check.strict=true

#
# How long should shutdown wait to complete normally before
# taking stronger action and calling System.exit()
# in ms, 10,000 is 10 seconds
#
shutdown.backstop.timeout=10000
shutdown.backstop.enabled=true

# Server Single User Mode
# note:
#   only allow named user (note: if blank or not set then will allow all users)
#   assuming maxusers is not set to 0
#server.singleuseronly.name=admin

# Server Max Users - limit number of users with non-expired tickets
# note:
#   -1 allows any number of users, assuming not in single-user mode
#   0 prevents further logins, including the ability to enter single-user mode
server.maxusers=-1

# The Cron expression controlling the frequency with which the OpenOffice connection is tested
openOffice.test.cronExpression=0 * * * * ?

#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=10000

#
# Manually control how the system handles maximum string lengths.
# Any zero or negative value is ignored.
# Only change this after consulting support or reading the appropriate Javadocs for
# org.alfresco.repo.domain.schema.SchemaBootstrap for V2.1.2
system.maximumStringLength=-1

#
# Limit hibernate session size by trying to amalgamate events for the L2 session invalidation
# - hibernate works as is up to this size
# - after the limit is hit events that can be grouped invalidate the L2 cache by type and not instance
# events may not group if there are post action listener registered (this is not the case with the default distribution)
system.hibernateMaxExecutions=20000

#
# Determine if document deletion and archival must cascade delete in the same
# transaction that triggers the operation.
system.cascadeDeleteInTransaction=true

# #################### #
# Lucene configuration #
# #################### #
#
# Millisecond threshold for text transformations
# Slower transformers will force the text extraction to be asynchronous
#
lucene.maxAtomicTransformationTime=20
#
# The maximum number of clauses that are allowed in a lucene query
#
lucene.query.maxClauses=100000
#
# The size of the queue of nodes waiting for index
# Events are generated as nodes are changed, this is the maximum size of the queue used to coalesce event
# When this size is reached the lists of nodes will be indexed
#
# http://issues.alfresco.com/browse/AR-1280:  Setting this high is the workaround as of 1.4.3.
#
lucene.indexer.batchSize=1000000
#
# Lucene index min merge docs - the in memory size of the index
#
lucene.indexer.minMergeDocs=1000
#
# When lucene index files are merged together - it will try to keep this number of segments/files in 
#
lucene.indexer.mergeFactor=10
#
# Roughly the maximum number of nodes indexed in one file/segment
#
lucene.indexer.maxMergeDocs=100000
#
# The number of terms from a document that will be indexed
#
lucene.indexer.maxFieldLength=10000

lucene.write.lock.timeout=10000
lucene.commit.lock.timeout=100000
lucene.lock.poll.interval=100

# Database configuration
db.schema.update=true
db.schema.update.lockRetryCount=24
db.schema.update.lockRetryWaitSeconds=5
db.driver=org.gjt.mm.mysql.Driver
db.name=alfresco
db.url=jdbc:mysql:///${db.name}
db.username=alfresco
db.password=alfresco
db.pool.initial=10
db.pool.max=40
db.txn.isolation=-1

# Email configuration
mail.host=
mail.port=25
mail.username=anonymous
mail.password=
# Set this value to UTF-8 or similar for encoding of email messages as required
mail.encoding=UTF-8
# Set this value to 7bit or similar for Asian encoding of email headers as required
mail.header=
mail.from.default=alfresco@alfresco.org

# System Configuration
system.store=system://system
system.descriptor.childname=sys:descriptor
system.descriptor.current.childname=sys:descriptor-current

# User config
alfresco_user_store.store=user://alfrescoUserStore
alfresco_user_store.system_container.childname=sys:system
alfresco_user_store.user_container.childname=sys:people
alfresco_user_store.authorities_container.childname=sys:authorities

# note: default admin username - should not be changed
alfresco_user_store.adminusername=admin

# note: default guest username - should not be changed
alfresco_user_store.guestusername=guest

# Spaces Archive Configuration
spaces.archive.store=archive://SpacesStore

# Spaces Configuration
spaces.store=workspace://SpacesStore
spaces.company_home.childname=app:company_home
spaces.guest_home.childname=app:guest_home
spaces.dictionary.childname=app:dictionary
spaces.templates.childname=app:space_templates
spaces.templates.content.childname=app:content_templates
spaces.templates.email.childname=app:email_templates
spaces.templates.rss.childname=app:rss_templates
spaces.savedsearches.childname=app:saved_searches
spaces.scripts.childname=app:scripts
spaces.wcm.childname=app:wcm
spaces.wcm_content_forms.childname=app:wcm_forms
spaces.content_forms.childname=app:forms
spaces.user_homes.childname=app:user_homes
spaces.sites.childname=st:sites
spaces.templates.email.invite.childname=cm:invite

# ADM VersionStore Configuration
version.store.deprecated.lightWeightVersionStore=workspace://lightWeightVersionStore
version.store.version2Store=workspace://version2Store
# WARNING: For non-production testing only !!! Do not change (to avoid version store issues, including possible mismatch). Should be false since lightWeightVersionStore is deprecated.
version.store.onlyUseDeprecatedV1=false

# Folders for storing people
system.system_container.childname=sys:system
system.people_container.childname=sys:people

# Folders for storing workflow related info
system.workflow_container.childname=sys:workflow

# Are user names case sensitive?
user.name.caseSensitive=false

# AVM Specific properties.
avm.remote.idlestream.timeout=30000

# ################################## #
# WCM Link Validation Configuration  #
# ################################## #
#
# Note: Link Validation is disabled by default (as per poll interval = 0)
#
# linkvalidation.pollInterval  - Poll interval to check getLatestSnapshotID (in milliseconds), eg. 5000 for 5 sec interval
#                           If pollInterval is 0, link validation is disabled.
#
# linkvalidation.retryInterval - Retry interval (Virtualization server is not accessible or an error has occurred
#                          during link validation.
#
# linkvalidation.disableOnFail - If set to TRUE link validation service will be terminated if an error will be occurred.

linkvalidation.pollInterval=0
linkvalidation.retryInterval=120000
linkvalidation.disableOnFail=false

# ECM content usages/quotas
system.usages.enabled=true

# Repository endpoint - used by Activity Service
repo.remote.endpoint.url=http://localhost:8080/alfresco/service

# The well known RMI registry port is defined in the alfresco-shared.properties file
# alfresco.rmi.services.port=50500
#
# RMI service ports for the individual services.
# These six services are available remotely.
#
# Assign individual ports for each service for best performance
# or run several services on the same port, you can even run everything on 50500 if
# running through a firewall.
#
# Specify 0 to use a random unused port.
#
avm.rmi.service.port=50501
avmsync.rmi.service.port=50502
attribute.rmi.service.port=50503
authentication.rmi.service.port=50504
repo.rmi.service.port=50505
action.rmi.service.port=50506

# External executable locations
ooo.exe=soffice
ooo.user=${dir.root}/oouser
img.root=./ImageMagick
img.dyn=${img.root}/lib
img.exe=${img.root}/bin/convert
swf.exe=./bin/pdf2swf


Please guide.

ra74
Champ in-the-making
Champ in-the-making
How long does it take to execute the query ? Maybe this parameter
system.acl.maxPermissionCheckTimeMillis=10000 is too short

I've belive you you are fetching the results in the loop calling

queryResult = repositoryService.fetchMore(querySession);

itsard
Champ in-the-making
Champ in-the-making
no i am just fetching the results from the query once…
cant i get all the records in one go like..
QueryResult queryResult = repositoryService.query(STORE, query, false);

Can u give me the line of code to fetch more than 1000 records from query.
Thanks a lot

ra74
Champ in-the-making
Champ in-the-making
then you have to set batch size to 1000, have a look i.e. http://forums.alfresco.com/en/viewtopic.php?f=27&t=5599

anyway there there's a timeout on query execution so there's no guarantee you fetch all the records

itsard
Champ in-the-making
Champ in-the-making
Hi,
What does setting the batchsize actually mean?
If all the records are queried in one go and the results are just fetched from the result set according to the batchSize given then how does it help?
Or result = repositoryService.fetchMore(querySession) fire another query to the alfresco repository?
Please do let me know

ra74
Champ in-the-making
Champ in-the-making
I don't develop against alfresco so I can be wrong but according to the source code it seems that all the results are fetched during first execution of the query and stored in the cache in the user session
Batch size determines number of rows sent to the client

http://wiki.alfresco.com/wiki/Repository_Web_Service#fetchMore

sselvan
Champ in-the-making
Champ in-the-making
Just wanted to contribute a little documentation around a related problem and solution, I faced in SHARE.

I was able to upload 1400 files into Alfresco Share via WebDAV and I was able to see 1400 files in Alfresco Explorer, but not in Alfresco Share.
Share was showing only 1000 files.

I happened to see this post and fixed that issue. Now the problem is solved. Here is the documentation around it for helping others.

Problem:
Alfresco Explorer has all the files (more than 1000), which were uploaded but Share does not show all of them, instead it shows only till 1000 documents.

Possible Cause:
Alfresco Explorer shows the files from the repository as it is. Share is an application on top of Alfresco Repository. Share accesses Alfresco content repository via APIs to display the list of documents/assets in the Alfresco Share UI.

Search Service API should be the one being used for showing the results in documentLibrary for the list of documents. Lucene Search has the setting of only 1000 records to be processed (as per repository.properties).

Possible Solution:
Hence, increasing the number of records processed should solve this problem and just in case, increase the time limit as well.

Changes Done in –> repository.properties are:


#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results (*********I Changed from 10000 to 100000)
system.acl.maxPermissionCheckTimeMillis=100000
# The maximum number of results to perform permission checks against (*********I Changed from 1000 to 10000)
system.acl.maxPermissionChecks=10000

Hope this is useful for somebody!

fiferyan
Champ in-the-making
Champ in-the-making
Thank you. This was useful for me!