cancel
Showing results for 
Search instead for 
Did you mean: 

Binaries in contentstore but not in the database?

morganp1
Confirmed Champ
Confirmed Champ
Hi everybody!

Version: Alfresco CE 4.2.c
Database: PostgreSQL 9.0.4
OS: OEL 6.4

Some days ago, I faced a very strange issue with my Alfresco instance. Indeed, it seems that some "ghosts files" have been created on the contentstore without being linked in the database. At 6 PM, I received an alert showing that the file system was full. After the deletion of some GB under the contentstore.deleted directory, I noticed that some big files were created between 5 pm and 6 pm (from 100MB to 1.6GB per file).

So I tried to find those documents from Alfresco Share using the Solr engine but nothing special comes out… So I decided to take a look at the database and here is the thing: those documents can't be found in the database.

I used the following command to find the relation between the path on the File System of a document and a node UUID:
select n.id, n.uuid, u.content_url
from alf_node n, alf_node_properties p, alf_namespace ns, alf_qname q, alf_content_data d, alf_content_url u
where n.id = p.node_id
and q.local_name = 'content'
and ns.uri = 'http://www.alfresco.org/model/content/1.0'
and ns.id = q.ns_id
and p.qname_id = q.id
and p.long_value = d.id
and d.content_url_id = u.id
and u.content_url = 'store://2014/6/5/17/43/6a1e9ffc-82c2-4c92-88fb-7b56407d6a8b.bin';


With that command, I'm able to find all documents that are stored in Alfresco but not those created from 5pm to 6pm on that day because the command return 0 row…

So, does anyone know what happened? Is it safe to remove those files from the file system? And finally how to prevent this from happen again?

Thanks for your help, it will be highly appreciated!
Morgan
9 REPLIES 9

mitpatoliya
Star Collaborator
Star Collaborator
any pointers from log files?

Hi Mitpatoliya,

Unfortunately no, there is absolutely nothing in the log files and of course I can't replicate this with Alfresco in debug mode because I don't even know what happened ^^.

romschn
Star Collaborator
Star Collaborator
What are those binary files? Do you have any custom implementation to upload files in your system like where in to handle large file uploads to alfresco, you first upload a file to some location on file system and create a 0KB content in alfresco and then link the file system location to the alfresco node?

There is no custom implementation to upload files in Alfresco. The Alfresco repository is only used to store M$ documents (docx, ppt, xlsx, …), pdf, scripts (sh, ksh, bat) and some images but all these files are quite small (less than 10MB per file). That's why I don't understand why there is some files on the contentstore that are so big (1.6GB).

mrogers
Star Contributor
Star Contributor
In most cases the repository will clean up after a failure.

My guess is that someone tried to do an export and ran out of disk space.    Is your abnormally big file a zip format file?

just to understand the thing:


[alfresco /2014/6/4]# du -sh *
108K    11
264K    12
2.6M    14
444K    15
364K    16
4.1G    17
65M     18
14M     23
3.5M    9


As you can see above, this is a quite small repository without so much activities (except for this day at 5PM).


[alfresco /2014/6/4]# ll 17/**/
17/1:
total 123M
-rw-r–r–. 1 alfresco alfresco 128321901 Jun  4 17:01 182add02-a9b5-4528-ad10-b3787088774f.bin

17/7:
total 998M
-rw-r–r–. 1 alfresco alfresco 1045594205 Jun  4 17:08 fbf83d1e-2e41-4620-814d-942f0e1b0542.bin

17/21:
total 753M
-rw-r–r–. 1 alfresco alfresco 789059440 Jun  4 17:22 64254a43-b72e-437d-b87a-5d8838c2e796.bin

17/28:
total 1.6G
-rw-r–r–. 1 alfresco alfresco 1624551666 Jun  4 17:30 5b1deaf4-b933-41ae-8c7d-2d773869dba4.bin

17/35:
total 3.9M
-rw-r–r–. 1 alfresco alfresco 4034284 Jun  4 17:35 17a6c538-ff8b-4b98-95af-c56cbea97b77.bin

17/37:
total 58M
-rw-r–r–. 1 alfresco alfresco 60807238 Jun  4 17:37 891f20e6-d8c7-46db-b5e2-b9917a8dd1d7.bin

17/47:
total 57M
-rw-r–r–. 1 alfresco alfresco 59578013 Jun  4 17:47 305d2dfa-965a-4880-a207-8d57d54e02da.bin

17/51:
total 422M
-rw-r–r–. 1 alfresco alfresco 442258525 Jun  4 17:51 dbbbd66d-dc60-451b-aa6b-7e8c2d86561e.bin

17/54:
total 158M
-rw-r–r–. 1 alfresco alfresco 164688350 Jun  4 17:54 aa2340cc-9c9d-4857-9e09-c4affe86c3b6.bin


When I first read your answer, I was skeptical because as you can see below, there are 9 files created from 5.01 PM  to 5.54 PM. Indeed, the first file was created almost 1 hour before than the file system was full. But I still tried to download the first document with WinSCP to change the extension and indeed this file (2014/6/4/17/1/182add02-a9b5-4528-ad10-b3787088774f.bin) is a zip file containing a part of our repository… So you were right!


Does Alfresco keep all zip files created for the action "Download as Zip" until the contentStoreCleaner job runs (14 days by default)? And then those files are moved to the contentstore.deleted directory? I hope that's not the case but if not, I wonder why Alfresco hasn't removed these files yet (5 days later).

Thanks for your help, I would never have thought to zip files created by Alfresco…

mrogers
Star Contributor
Star Contributor
I, personally, don't have the foggiest how "download as zip" is implemented.  

It is an educated guess on my part.  The content has to be exist somewhere and it can't be in memory.  

So there's probably some analysis needed and perhaps a bug raised.

Hi,

After 14 days, the zip files binaries (created from the "Download as Zip" action) were automatically transferred from the contentstore to the contentstore.deleted directory.

So everything is fine!

romschn
Star Collaborator
Star Collaborator
The default value for system.content.orphanProtectDays is set to 14 in repository.properties. It indicates that any orphaned content would be protected for 14 days in the content store and then it would be deleted by the content store cleanup cron job.

Hope this helps.