cancel
Showing results for 
Search instead for 
Did you mean: 

HUGE database size

willisadams
Champ on-the-rise
Champ on-the-rise
We started with Alfresco Community Edition 5.0.d.  We later upgraded to the 201508 Early Access release of 5.1.  Later still, we upgraded to the 201509 and now, today, the 201510 EA.

The updates all went smoothly. I followed the directions found here: https://alpha-labs.net/2015/02/upgrading-alfresco/

During the latest update (from 201509 to 201510), the pg_dump step took a lot longer than it usually does.  When it finished, the dump file size was almost 16GB! Our entire alf_data directory only has around 65,000 files totaling 12.6GB.

Still, I loaded dump file into the new installation and everything seems to be working okay.  Using the pgAdmin tool I can connect the running alfresco database and see that it has a total size of 9275MB.  All of the tables in the database are pretty small (a few dozen KB or MB) with the exception of two tables: alf_prop_serializable_value is 8297MB and alf_prop_value is 872MB.

Is this normal?  If not, any suggestions on getting this problem under control?

Thanks in advance for your assistance.

EDIT: For reference, we are uploading most of our files to Alfresco using the built-in SMB server, but we're also using the Share web interface and the built-in IMAP server (clients use Thunderbird to connect). We aren't using any third-party add-ons. Our OS is CentOS 7 running in a VM on a Debian 8 KVM host. Let me know if you need any other info or want me to post logs.
11 REPLIES 11

afaust
Legendary Innovator
Legendary Innovator
Hello,

do you have auditing enabled in your environment? If so, do you have properties where you are storing text values with more than 1000 characters?
Due to database constraints on the different database systems Alfresco supports, there is a limit of 1000 characters that can effectively be handled for text values. Normally, if Alfresco encounters a text with more characters than the limit, it will store it as a serializable value. Serializable values have one main disadvantages: They are "black boxes" and can't be reused, i.e. if the identical value needs to be referenced from a new audit entry it can't just be linked to the same value. This may cause increasingly large alf_prop_value and alf_prop_serializable_value tables.

In Alfresco 4.2.x and 5.0, there was a change in place that cut off text values at 1000 characters to prevent this. Maybe this has been changed (again) in one of the 5.1 snapshots.
There also is a cleanup job that removes obsolete data from these tables. You might want to activate it to ensure you don't hold old data in the database. The property to set is "attributes.propcleaner.cronExpression". You can set it in the alfresco-global.properties and it could have a value of "* * 1 * * ? *" to have it run at 1 AM every day.

If that cleanup does not reduce the size substantially, it means all the values are actively referenced by audit or other data structures. It may be worth going through your audit configuration and any customizations to check what data is collected and maybe pinpoint the specific source of the data (either to remove it in the configuration or prepare a regression issue for Alfresco, depending on the result).

Regards
Axel

Axel,

Thank you for your quick reply and proposed solution.  I'll try running the clean-up later tonight and let you know whether it substantially reduced the database size.

Aside from enabling the SMB and IMAP servers via the alfresco-globap.properties file, I haven't touched the default Alfresco settings, so if auditing is enabled by default then it is on.  It looks like it is, based on the "control.json" file returned by http://[my alfresco address]:8080/alfresco/service/api/audit/control

<blockcode>{
   "enabled" : true,
   "applications":
   [
      {
         "name": "Alfresco Tagging Service",
         "path" : "/tagging",
         "enabled" : true
      }
        
   ]
}</blockcode>

Assuming the database clean-up script works for me, would you recommend disabling auditing via alfresco-global.properties to prevent the database from growing again?

afaust
Legendary Innovator
Legendary Innovator
If the audit control JSON returned only the Alfresco Tagging Service as an enabled audit application then it is likely not auditing that causes the issue. Tagging is (or better has been) a very good citizen concerning audit data and there is no reason why it should have changed in 5.1. As such, I don't see a benefit in disabling it and it would also impact the core tagging feature (specifically, the tag filter in the documentlibrary will not be updated anymore and may not work correctly as a result).

I am looking forward to the result of the cleanup and we can see about identifying the root cause depending on the results…

Unfortunately, the attributes.propcleaner.cronExpression script didn't work.  The size of the alf-prop-serializable-value and alf-prop-value tables are still 8297MB and 872MB, respectively.

I can tell the script did run because my alfresco.log file ballooned to over 500MB, filled with almost 3 million lines of script updates ("Processing from [x] to [y]…").  The final lines of the log indicate an error, though.  I'll copy-paste it here in a minute…

EDIT…  Here are the last few lines of the log:

<blockcode>2015-11-17 01:59:59,426 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-1] Processing from 0 to 1 rows of 0 rows from table temp_del_double1.
2015-11-17 01:59:59,431 ERROR [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-4] Statement execution failed:
   SQL: create table temp_prop_root_ref
(
    id INT8 NOT NULL
)
   Error: ERROR: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
  Detail: Key (typname, typnamespace)=(temp_prop_root_ref, 2200) already exists.
   File: /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-8302102904952264845.sql
   Line: 20
2015-11-17 01:59:59,431 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-1] Executing database script /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-4220164947366785582.sql (Copied from alfresco/dbscripts/utility/org.hibernate.dialect.PostgreSQLDialect/CleanAlfPropTablesPostExec.sql).
2015-11-17 01:59:59,431 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-4] Executing database script /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-5537965044981921163.sql (Copied from alfresco/dbscripts/utility/org.hibernate.dialect.PostgreSQLDialect/CleanAlfPropTablesPostExec.sql).2015-11-17 01:59:59,426 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-1] Processing from 0 to 1 rows of 0 rows from table temp_del_double1.
2015-11-17 01:59:59,431 ERROR [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-4] Statement execution failed:
   SQL: create table temp_prop_root_ref
(
    id INT8 NOT NULL
)
   Error: ERROR: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
  Detail: Key (typname, typnamespace)=(temp_prop_root_ref, 2200) already exists.
   File: /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-8302102904952264845.sql
   Line: 20
2015-11-17 01:59:59,431 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-1] Executing database script /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-4220164947366785582.sql (Copied from alfresco/dbscripts/utility/org.hibernate.dialect.PostgreSQLDialect/CleanAlfPropTablesPostExec.sql).
2015-11-17 01:59:59,431 INFO  [org.alfresco.repo.domain.schema.script.ScriptExecutorImpl] [DefaultScheduler_Worker-4] Executing database script /home/wadams/alfresco-5.1ea-201510/tomcat/temp/Alfresco/AlfrescoSchema-PostgreSQLDialect-Update-5537965044981921163.sql (Copied from alfresco/dbscripts/utility/org.hibernate.dialect.PostgreSQLDialect/CleanAlfPropTablesPostExec.sql).</blockcode>

resplin
Elite Collaborator
Elite Collaborator
It looks like Axel ruled out the obvious problems, so I created an issue to investigate if this is a bug in 5.1:

https://issues.alfresco.com/jira/browse/ALF-21482

Please add yourself as a watcher so that you will see if engineering needs additional information.

Thank you for creating the issue.  I signed up and added myself as a "watcher," though I don't seem to be able to comment at all so I'm not sure how to add any info they might require.

I left clean-up the script enabled in the alfresco-global.properties and it ran again this morning at 1am.  The log file is well over 3-million lines this time (including many of the same error messages noted before) and there is still no change to the database size.

I notice that in the comments on the issue that you created Derek Hulley suggests "fix the cleaner job so that it can only run one instance at a time" and that would get rid of the errors messages. Unfortunately, I'm not sure how I would go about doing this.  Do you have any suggestions on how I might proceed with this?

Thanks again for your help!

afaust
Legendary Innovator
Legendary Innovator
Hello,

that comment from Derek Hulley is meant as a note to any Alfresco developer about what needs to be done. It is not a suggestion about what you can do. Derek just found at least one quality issue with the current implementation of the cleaner…

How large is the log file if you compress it? Since it might contain quite a lot of identical message, it should compress very well. It might be worth appending the complete log (for a single cleaner run) in the JIRA ticket if you can compress it to a decent size.

Regards
Axel

resplin
Elite Collaborator
Elite Collaborator
Axel is right.

When issues get moved from ALF to MNT, it means the team recognizes it is a real issue and have the information they need. It is nice when that goes quickly. Unfortunately, the public can't comment on MNT issues at the moment. I'm not sure when we will get that fixed.

In the meantime, let me know if there is anything I should add to the issue for you.

Thank you for trying Alfresco 5.1 and reporting the issue. Hopefully we can get this fixed for the December release of Community Edition.

I seem to have fixed the immediate problem this morning by using PgAdmin to run a "VACUUM" maintenance operation (with the "FULL" option selected) on the two tables.  The alf_prop_value and alf_prop_value_serializable tables are now both around 50k in size.  Total database size is only 125MB.

I don't know what caused the initial ballooning in size of those two tables in the first place, though, so it's probably still worth looking into as a possible bug for the Alfresco team.

I'm happy to provide any information I can about my installation that would help.  Let me know if you need anything from me.