cancel
Showing results for 
Search instead for 
Did you mean: 

Files Disappearing

mm_doug
Champ in-the-making
Champ in-the-making
I have started using Alfresco with a few people experimentally at work.  All the features are great and what we need, but the files disappear sometimes.  Some of them left references that let me find them in the contentstore.deleted folder, others left no references at all.  We typically interact with Alfresco through the samba share using windows clients.

What log files can I look at to figure out what is happening?  I would like to figure out how to prevent this in the future.

Is there a way to figure out which files have gone missing and recover them?

Should we be using the webdav interface instead of the samba interface?  It seems to be faster.
5 REPLIES 5

_sax
Champ in-the-making
Champ in-the-making
All files that are put into Alfresco via CIFS are deletable via CIFS, but not without reference, that being stored in the recycle bin or contentstore.deleted.
It seems like they aren't successfully transferred to Alfresco, and thus not present.
In Alfresco/alfresco.log or Alfresco/tomcat/logs/catalina.out you'll find the log.
Make sure to activate and increase the CIFS log levels to debug in /Alfresco/tomcat/webapps/alfresco/WEB-INF/classes/log4j.properties.

# CIFS server debugging
log4j.logger.org.alfresco.smb.protocol=info
#log4j.logger.org.alfresco.smb.protocol.auth=debug
#log4j.logger.org.alfresco.acegi=debug

Is there any scheme that applies to the files that are not viewable via web client (browser) or CIFS?

mm_doug
Champ in-the-making
Champ in-the-making
I have enabled info output.  I noticed a couple things in the log files.


14:42:35,392 User:admin WARN  [alfresco.missingProperties] Failed to find property 'mimetype' for node: versionStore://version2Store/7f3570da-0c88-4f7d-8f28-b13f0d002c6f
14:42:35,422 User:admin WARN  [alfresco.missingProperties] Failed to find property 'size' for node: versionStore://version2Store/7f3570da-0c88-4f7d-8f28-b13f0d002c6f
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at org.alfresco.jlan.smb.server.nio.ChannelPacketHandler.readBytes(ChannelPacketHandler.java:114)
        at org.alfresco.jlan.smb.server.nio.TcpipSMBChannelHandler.readPacket(TcpipSMBChannelHandler.java:66)
        at org.alfresco.jlan.smb.server.nio.NIOCIFSThreadRequest.runRequest(NIOCIFSThreadRequest.java:80)
        at org.alfresco.jlan.server.thread.ThreadRequestPool$ThreadWorker.run(ThreadRequestPool.java:141)
        at java.lang.Thread.run(Thread.java:619)

Right now I'm working on a way to search the contents of the .bin files in the contentstore to see if the files are still in there somewhere.  I am also considering a script that will compare the database to the files in the contentstore.  It appears that the table alf_node_properties contains references to the contentstore.  Does anyone know if all files in contentstore and contentstore.deleted are referenced from this table?

_sax
Champ in-the-making
Champ in-the-making
I cannot answer your last question, but since even the size of the file cannot be determined, I'd say, that this is a corrupted file or installation. It would be easier to reset Alfresco and start over with your tests. To do this, you just need to shut it down, empty alf_data and delete the database, followed by creating a new one.

mm_doug
Champ in-the-making
Champ in-the-making
I agree that something likely happened to corrupt the database.  I will be continuing my analysis to try to figure out what happened so that I don't make the mistake again.  I created a script to help me find the mystery files.  It gets all of the file references from the database, then gets all the files from the contentstore and contentstore.deleted, sorts them, and figures out which entries are missing from the database, and which database files are missing entries.

#!/bin/bash
# run from /Alfresco/alf_data/
mysql -u root -p alfresco -B -e "select string_value from alf_node_properties where string_value regexp 'contentUrl=store'" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' | sed 's/\"contentUrl=store:\/\///' |\
sed 's/|.*$//' | sort | uniq > indexes.txt
cd contentstore
find . -type f | sed 's/^.\///' | sort | uniq > ../files.txt
cd ../contentstore.deleted
find . -type f | sed 's/^.\///' | sort | uniq > ../del.txt
cd ..
cat files.txt del.txt | sort | uniq > all.txt
diff all.txt indexes.txt | grep "<" | sed 's/< //' > missingindexes.txt
diff all.txt indexes.txt | grep ">" | sed 's/> //' > missingfiles.txt
rm indexes.txt files.txt del.txt all.txt

Results: out of about 60,000 files, there are 10,000 files that have no database entry.  About half are in the contentstore and half are deleted.  Using the command file helped me figure out what type of files they were, then I started opening files until I recognized one that had gone missing.

Plans: I will be comparing md5sums of files from Alfresco to the set of files that were originally imported to alfresco to see how many missing files are from the initial import, and how many have been lost since then.

I have restarted Alfresco a number of times when people started complaining about it going slow.  Some of the times that I restarted it, it said the following:
Using CATALINA_BASE:   /opt/alfresco/tomcat
Using CATALINA_HOME:   /opt/alfresco/tomcat
Using CATALINA_TMPDIR: /opt/alfresco/tomcat/temp
Using JRE_HOME:       /usr/lib/jvm/java-6-sun/
CompilerOracle: exclude org/apache/lucene/index/IndexReader$1.doBody
CompilerOracle: exclude org/alfresco/repo/search/impl/lucene/index/IndexInfo$Merger.mergeIndexes
CompilerOracle: exclude org/alfresco/repo/search/impl/lucene/index/IndexInfo$Merger.mergeDeletions
I have speculated that restarting may have something to do with it even though I'm not very convinced.  I am also planning on seeing what kind of information I can gather around when I have restarted it, and when the files went missing.

_sax
Champ in-the-making
Champ in-the-making
This looks both very comprehensive and professional at a time.
One idea that came to me, that the problem could be, that sometime during your evaluation Alfresco was started another time.
That would then make files go to 'nowhere': they get to contentstore, but not into the database because the other Alfresco instance blocked that.
Unfortunately Alfresco doesn't inform you via log when you start it twice.
If it were mysql that was stopped during tests, that would stop Alfresco, instead of allowing more data to come in.

But maybe your analysis shows another source.