cancel
Showing results for 
Search instead for 
Did you mean: 

Lucene Index issue

girishkkulkarni
Champ in-the-making
Champ in-the-making
Hello,

Is there any way to identify how long the Full reindex would take to index 3.4 million indices? Previously in repository.properties index.recovery.mode was set to VALIDATE but we got following error.

CONTENT INTEGRITY ERROR: Indexes not found for 1 stores

To solve this issue we changed recovery mode to Auto. But from last 10 hours, Alfresco is still not started?

Could you help me to understand what is happening here as logs are not getting updated from long time? and how much more time it will take to start Alfresco? Also can we change any parameters to make it faster?

We are using Alfresco community Labs - v3.0.0 (b 1164). Alfresco is running on SUN SunFire F15K with 24 CPUs.

Following is repository.propties -

# Repository configuration

repository.name=Main Repository

# Directory configuration

dir.root=./alf_data

dir.contentstore=${dir.root}/contentstore
dir.contentstore.deleted=${dir.root}/contentstore.deleted

dir.auditcontentstore=${dir.root}/audit.contentstore

# The location for lucene index files
dir.indexes=${dir.root}/lucene-indexes

# The location for index backups
dir.indexes.backup=${dir.root}/backup-lucene-indexes

# The location for lucene index locks
dir.indexes.lock=${dir.indexes}/locks

# ######################################### #
# Index Recovery and Tracking Configuration #
# ######################################### #
#
# Recovery types are:
#    NONE:     Ignore
#    VALIDATE: Checks that the first and last transaction for each store is represented in the indexes
#    AUTO:     Validates and auto-recovers if validation fails
#    FULL:     Full index rebuild, processing all transactions in order.  The server is temporarily suspended.
index.recovery.mode=AUTO
# FULL recovery continues when encountering errors
index.recovery.stopOnError=false
index.recovery.maximumPoolSize=5
# Set the frequency with which the index tracking is triggered.
# For more information on index tracking in a cluster:
#    http://wiki.alfresco.com/wiki/High_Availability_Configuration_V1.4_to_V2.1#Version_1.4.5.2C_2.1.1_an...
# By default, this is effectively never, but can be modified as required.
#    Examples:
#       Once every five seconds: 0/5 * * * * ?
#       Once every two seconds : 0/2 * * * * ?
#       See http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
index.tracking.cronExpression=* * * * * ? 2099
index.tracking.adm.cronExpression=${index.tracking.cronExpression}
index.tracking.avm.cronExpression=${index.tracking.cronExpression}
# Other properties.
index.tracking.maxTxnDurationMinutes=10
index.tracking.reindexLagMs=1000
index.tracking.maxRecordSetSize=1000
index.tracking.maxTransactionsPerLuceneCommit=100
index.tracking.disableInTransactionIndexing=false

# Change the failure behaviour of the configuration checker
system.bootstrap.config_check.strict=true

# Server Single User Mode
# note:
#   only allow named user (note: if blank or not set then will allow all users)
#   assuming maxusers is not set to 0
#server.singleuseronly.name=admin

# Server Max Users - limit number of users with non-expired tickets
# note:
#   -1 allows any number of users, assuming not in single-user mode
#   0 prevents further logins, including the ability to enter single-user mode
server.maxusers=-1

# The Cron expression controlling the frequency with which the OpenOffice connection is tested
openOffice.test.cronExpression=0 * * * * ?

#
# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000

#
# Manually control how the system handles maximum string lengths.
# Any zero or negative value is ignored.
# Only change this after consulting support or reading the appropriate Javadocs for
# org.alfresco.repo.domain.schema.SchemaBootstrap for V2.1.2
system.maximumStringLength=-1

#
# Limit hibernate session size by trying to amalgamate events for the L2 session invalidation
# - hibernate works as is up to this size
# - after the limit is hit events that can be grouped invalidate the L2 cache by type and not instance
# events may not group if there are post action listener registered (this is not the case with the default distribution)
system.hibernateMaxExecutions=20000

#
# Determine if document deletion and archival must cascade delete in the same
# transaction that triggers the operation.
system.cascadeDeleteInTransaction=true

# #################### #
# Lucene configuration #
# #################### #
#
# Millisecond threshold for text transformations
# Slower transformers will force the text extraction to be asynchronous
#
lucene.maxAtomicTransformationTime=20
#
# The maximum number of clauses that are allowed in a lucene query
#
lucene.query.maxClauses=10000
#
# The size of the queue of nodes waiting for index
# Events are generated as nodes are changed, this is the maximum size of the queue used to coalesce event
# When this size is reached the lists of nodes will be indexed
#
# http://issues.alfresco.com/browse/AR-1280:  Setting this high is the workaround as of 1.4.3.
#
lucene.indexer.batchSize=1000000
#
# Lucene index min merge docs - the in memory size of the index
#
lucene.indexer.minMergeDocs=1000
#
# When lucene index files are merged together - it will try to keep this number of segments/files in 
#
lucene.indexer.mergeFactor=10
#
# Roughly the maximum number of nodes indexed in one file/segment
#
lucene.indexer.maxMergeDocs=100000
#
# The number of terms from a document that will be indexed
#
lucene.indexer.maxFieldLength=10000

lucene.write.lock.timeout=10000
lucene.commit.lock.timeout=100000
lucene.lock.poll.interval=100

# Database configuration
db.schema.update=true
db.schema.update.lockRetryCount=24
db.schema.update.lockRetryWaitSeconds=5
db.driver=org.gjt.mm.mysql.Driver
db.name=alfresco
db.url=jdbc:mysql:///${db.name}
db.username=alfresco
db.password=alfresco
db.pool.initial=10
db.pool.max=40

# Email configuration
mail.host=
mail.port=25
mail.username=anonymous
mail.password=
# Set this value to UTF-8 or similar for encoding of email messages as required
mail.encoding=UTF-8
# Set this value to 7bit or similar for Asian encoding of email headers as required
mail.header=
mail.from.default=alfresco@alfresco.org

# System Configuration
system.store=system://system
system.descriptor.childname=sys:descriptor
system.descriptor.current.childname=sys:descriptor-current

# User config
alfresco_user_store.store=user://alfrescoUserStore
alfresco_user_store.system_container.childname=sys:system
alfresco_user_store.user_container.childname=sysSmiley Tongueeople
alfresco_user_store.authorities_container.childname=sys:authorities

# note: default admin username - should not be changed
alfresco_user_store.adminusername=admin

# note: default guest username - should not be changed
alfresco_user_store.guestusername=guest

# Spaces Archive Configuration
spaces.archive.store=archive://SpacesStore

# Spaces Configuration
spaces.store=workspace://SpacesStore
spaces.company_home.childname=app:company_home
spaces.guest_home.childname=app:guest_home
spaces.dictionary.childname=app:dictionary
spaces.templates.childname=app:space_templates
spaces.templates.content.childname=app:content_templates
spaces.templates.email.childname=app:email_templates
spaces.templates.rss.childname=app:rss_templates
spaces.savedsearches.childname=app:saved_searches
spaces.scripts.childname=app:scripts
spaces.wcm.childname=app:wcm
spaces.wcm_content_forms.childname=app:wcm_forms
spaces.content_forms.childname=app:forms
spaces.user_homes.childname=app:user_homes
spaces.sites.childname=st:sites
spaces.templates.email.invite.childname=cm:invite

# ADM VersionStore Configuration
version.store.deprecated.lightWeightVersionStore=workspace://lightWeightVersionStore
version.store.version2Store=workspace://version2Store
# WARNING: For non-production testing only !!! Do not change (to avoid version store issues, including possible mismatch). Should be false since lightWeightVersionStore is deprecated.
version.store.onlyUseDeprecatedV1=false

# Folders for storing people
system.system_container.childname=sys:system
system.people_container.childname=sysSmiley Tongueeople

# Folders for storing workflow related info
system.workflow_container.childname=sys:workflow

# Are user names case sensitive?
# ==============================
#
# NOTE: If you are using mysql you must have case sensitive collation
#
# You can do this when creating the alfresco database at the start
# CREATE DATABASE alfresco CHARACTER SET utf8 COLLATE utf8_bin;
# If you want to do this later this is a dump and load fix as it is done when the database, tables and columns are created.
#
# Must other databases are case sensitive by default.
#
user.name.caseSensitive=false

# AVM Specific properties.
avm.remote.idlestream.timeout=30000

# ECM content usages/quotas
system.usages.enabled=true

# Repository endpoint - used by Activity Service
repo.remote.endpoint.url=http://localhost:8080/alfresco/service

Thanks & Regards,
Girish
1 REPLY 1

andy
Champ on-the-rise
Champ on-the-rise
Hi

Can you post the reindex section from your log and the index threads from a stack trace.

I believe while you have a very large box the processors it contains are slow ….
Is it 10 years old or slightly more up to date?

Are you sure you are using all the processors as you can set up virtual machines etc etc (beyond my ken …)

Andy