cancel
Showing results for 
Search instead for 
Did you mean: 

File names degenerate spontaneously after some time

michalwrobel
Champ on-the-rise
Champ on-the-rise
I'm dealing with a strange problem with one of alfresco instances I maintain.

Symptoms are always the same:

Alfresco 3.4e

1. Files with diacritical signs in names are uploaded/created (to be specific polish signs like ąęłóćż )
2. By this point everything looks fine names are perfectly all right, but after some lapse of time (few days, more than one day, but don't know exactly yet)…
3. All diacritical signs in names get changed to '?' question marks, well all except 'ó'
               It doesn't matter if these files are uploaded/created or whether they're text or binary, the problem occurs on ALL files and ALL regardless I'm viewing them using Share or Explorer.


4. Strange thing occurring in Share, but not in Explorer - after typing something in search box, and clicking search diacritical signs get substituted for other diacritical signs.

I've spent two days with this problems and I'm stuck. Some things I've already tried:
               - Nothing really alarming in logs

               - Encoding in MySql database seems ok (utf-8)

               - MySql is explicitly configured to use utf-8 in my.cnf

               - Database connection string is explicitly changed to use utf-8 too.

               - Think it may be related with Lucene indexes (because it's not instant error) but deleting indexes and rebuilding with FULL option didn't solve the problem, nor helped to reproduce it. (So it seems that indexing have no influence here).

Looks like some background periodic process malforms the file names (some strange transformation?) but I don't have any clue, cause I double checked the indexing.. Orphan repository purging doesn't make sense here either. Do you know other periodic processes in Alfresco?
I've seen problems with question marks on this forum but it was mainly bad database configuration and faulty letters where visible instantly - not after big lapse of time like in my case…
5 REPLIES 5

loftux
Star Contributor
Star Contributor
Thats an interesting problem…
If you do a select in the database, does it show correctly then? If so, then the actual data isn't changed.

Assuming you use tomcat, what is your setting in server.xml for
    <Connector port="8080" protocol="HTTP/1.1" URIEncoding="UTF-8"
               connectionTimeout="20000"
               redirectPort="8443" />
Do you have the UTF-8 declaration there?
Do you have any JAVA_OPT configurations that may interfere? Example of what I have
    export JAVA_OPTS="${JAVA_OPTS} -Duser.country=US -Duser.region=US -Duser.language=en -Duser.timezone="Europe/Stockholm"
    #File encoding may be correct, but specify them to be sure what is used.
    export JAVA_OPTS="${JAVA_OPTS} -Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8"

michalwrobel
Champ on-the-rise
Champ on-the-rise
Hi, thanks for your response, I checked things you mentioned.

Tomcat settings were OK, and changing java_opts didn't help.

To my astonishment data in database IS malformed. There are '?' question marks stored there.
Both data entered some time earlier which displays incorrectly AND fresh data which displays all right in web browser is corrupted in MySql table.
I can't comprehend how it is possible, it's not browser caching or even Tomcat caching (clearing browser data, tomcat cache and restarting tomcat doesn't change anything).
Does alfresco has it's own managed cache?

Talking about the database:

- The alf_node_properties collation is: utf8_bin, so it seems to be OK

original intended value: "ęątóćkszzłooooęęęęźźź alfresco"
- Performing query like:
select * from alf_node_properties where node_id=99753;

gives:

node_id   actual_type_n   persisted_type_n   boolean_value   long_value   float_value   double_value   string_value   serializable_value   qname_id   list_index   locale_id
99753   6   6   \0   0   0   0   [b]??tó?kszz?oooo??????? alfresco[/b]   NULL   27   -1   2
99753   6   6   \0   0   0   0   ??tó?kszz?oooo??????? alfresco   NULL   28   -1   1
99753   6   6   \0   0   0   0   ??tó?kszz?oooo??????? alfresco   NULL   29   -1   1
99753   21   3   \0   89951   0   0   NULL   NULL   34   -1   2
99753   1   1       0   0   0   NULL   NULL   36   -1   2

I then manually altered string_value column to "from mysql aącćzżxźeęlłoó" to make sure if mysql can handle my diacriticals
update alf_node_properties set string_value="from mysql aącćzżxźeęlłoó" where node_id=99753;

selecting again:

node_id   actual_type_n   persisted_type_n   boolean_value   long_value   float_value   double_value   string_value   serializable_value   qname_id   list_index   locale_id
99753   6   6   \0   0   0   0   [b]from mysql aącćzżxźeęlłoó[/b]   NULL   27   -1   2
99753   6   6   \0   0   0   0   from mysql aącćzżxźeęlłoó   NULL   28   -1   1
99753   6   6   \0   0   0   0   from mysql aącćzżxźeęlłoó   NULL   29   -1   1
99753   21   3   \0   89951   0   0   from mysql aącćzżxźeęlłoó   NULL   34   -1   2
99753   1   1       0   0   0   from mysql aącćzżxźeęlłoó   NULL   36   -1   2

everything OK !

restarting tomcat and the result is… :
"from mysql aÄ…cćzA1xAoeÄ™lA‚oA3"
(both Explorer and Share)
:shock:  :shock:

REGARDLESS which coding in browser I use (default UTF-8/ISO-8859-2/Windows-1250)

So to sum all up, confusion is increasing now.

Looks like MySql does fine and that Alfresco malforms UTF-8 strings both ways (while inserting to MySql and after selecting persisted data)

Any new ideas?
Waiting for your feedback about this weird case.

mrogers
Star Contributor
Star Contributor
As you point out above these sort of issues are usually due to either misconfiguration of the database or the connection string.  And usually if misconfigured the the initial upload will not seem O.K.

Alfresco does not have anything that will update the metadata on a periodic basis.

So I think its likely that the metadata is getting corrupted as a side effect of something else.    Do you have any jobs or rules which update names and descriptions?  Or move content around?

michalwrobel
Champ on-the-rise
Champ on-the-rise
I didn't create any rules or jobs in this system on my own and haven't found any so far…

There are no beans using org.alfresco.util.CronTriggerBean or org.alfresco.util.TriggerBean apart from these standard declared in scheduled-jobs-context.xml.

Is there any possibility to easily list all rules present in Alfresco repository?
And it would have to be a really 'strong' rule affecting any content in any site, cause this error is happening in all Share sites.

As I mentioned in my post above the strings are damaged instantly on the road in and from the database, but for some reason their proper form is stored somewhere for some time. Can you think of any such temporary place in alfresco repository?

And also important question for me, can you give me a hint how to possibly easy debug the path through alfresco classes  from the user interface to database and vice versa? Maybe putting log4j = debug all along the route could help me to track down 'the evil place'.

What do you think about the proper string from database being read bad way by alfresco on instant (that "from mysql aÄ…cćzA1xAoeÄ™lA‚oA3" from the post above)? What may be the cause? Maybe some strange exceptional encoding/encoding conversion call inside alfresco classes? But where?)

EDIT:
I've found this : https://issues.alfresco.com/jira/browse/ALF-2363

May it be related? Maybe correct files are stored here and deleted after some time?

michalwrobel
Champ on-the-rise
Champ on-the-rise
Sorry for reposting, but changes are relevant.

I tampered a little with mysql console client. It turned out after that the data isn't corrupted at first in database, and letters get changed to '???' after indefinite time.
So there's no mysterious cache as I thought, data is correct for some time.

How I've figured it out:

Default settings:

mysql> show variables like "%char%";
+————————–+—————————-+
| Variable_name            | Value                      |
+————————–+—————————-+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+————————–+—————————-+

These settings produce bad results on screen, when UTF8 is interpreted as latin (results from my posts above)

select * from alf_node_properties where node_id=99872;
+———+—————+——————+—————+————+————-+————–+———————————-+——————–+———-+————+———–+
| node_id | actual_type_n | persisted_type_n | boolean_value | long_value | float_value | double_value | string_value                     | serializable_value | qname_id | list_index | locale_id |
+———+—————+——————+—————+————+————-+————–+———————————-+——————–+———-+————+———–+
|   99872 |             6 |                6 |               |          0 |           0 |            0 | [b]a?c?n??z-o?le?  16 sty 12 50.txt [/b]| NULL               |       27 |         -1 |         2 |
|   99872 |             6 |                6 |               |          0 |           0 |            0 | a?c?n??z-o?le?  16 sty 12 50.txt | NULL               |       28 |         -1 |         1 |
|   99872 |            21 |                3 |               |      90045 |           0 |            0 | NULL                             | NULL               |       34 |         -1 |         2 |
+———+—————+——————+—————+————+————-+————–+———————————-+——————–+———-+————+———–+


I changed character_set_results and here it is..

mysql> set character_set_results=utf8;

select * from alf_node_properties where node_id=99872;
+———+—————+——————+—————+————+————-+————–+—————————————–+——————–+———-+————+———–+
| node_id | actual_type_n | persisted_type_n | boolean_value | long_value | float_value | double_value | string_value                            | serializable_value | qname_id | list_index | locale_id |
+———+—————+——————+—————+————+————-+————–+—————————————–+——————–+———-+————+———–+
|   99872 |             6 |                6 |               |          0 |           0 |            0 | [b]aącćnńżzóołleę  16 sty 12 50.txt [/b]| NULL               |       27 |         -1 |         2 |
|   99872 |             6 |                6 |               |          0 |           0 |            0 | aącćnńżzóołleę  16 sty 12 50.txt | NULL               |       28 |         -1 |         1 |
|   99872 |            21 |                3 |               |      90045 |           0 |            0 | NULL                                    | NULL               |       34 |         -1 |         2 |
+———+—————+——————+—————+————+————-+————–+—————————————–+——————–+———-+————+———–+

This also explain why manually inserted values were interpreted bad by alfresco - they were in fact latin1.
Another philosophical question is: WHY someone decided to set default variables like that.


The data being incorrectly displayed in alfresco is indeed corrupted:

 select * from alf_node_properties where node_id=95813;
+———+—————+——————+—————+————+————-+————–+————————————-+——————–+———-+————+———–+
| node_id | actual_type_n | persisted_type_n | boolean_value | long_value | float_value | double_value | string_value                        | serializable_value | qname_id | list_index | locale_id |
+———+—————+——————+—————+————+————-+————–+————————————-+——————–+———-+————+———–+
|   95813 |             6 |                6 |               |          0 |           0 |            0 | [b]???????ód-2011-od prof.docx[/b] | NULL               |       27 |         -1 |         2 |
|   95813 |             6 |                6 |               |          0 |           0 |            0 | ???????ód-2011-od prof.docx | NULL               |       28 |         -1 |         1 |
|   95813 |            21 |                3 |               |      87172 |           0 |            0 | NULL                                | NULL               |       34 |         -1 |         2 |
|   95813 |             6 |                6 |               |          0 |           0 |            0 | user                                | NULL               |       35 |         -1 |         2 |
+———+—————+——————+—————+————+————-+————–+————————————-+——————–+———-+————+———–+




So still the question remains: what can corrupt data in this way?
Do you know is it possible to log operations precisely in mysql, let's say this particular record being changed?
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.