Problem with POI on Linux

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-13-2007 12:16 PM
Hello, everyone!
I have encountered such a problem with metadata extraction: poi library extracts metadata from office files as unreadable symbols.
In some files I have metadata written in English. It is extracted correctly. However, when metadata (for example document title or description) is written in Russian, Alfresco fails to extract it correctly.
Firstly, I had such a problem both on Windows (I am using win 200 server) and on Linux. I managed to cope with this problem on Windows by setting the system locale to Russian. Both English and Russian metadata were extracted correctly. Then I set the locale on Linux (/etc/sysconfig/i18n) to "ru_RU.UTF-8". However the problem remains - metadata is not exctracted correctly
I would be grateful if you could help me to solve this problem.
Thanks in advance.
I have encountered such a problem with metadata extraction: poi library extracts metadata from office files as unreadable symbols.
In some files I have metadata written in English. It is extracted correctly. However, when metadata (for example document title or description) is written in Russian, Alfresco fails to extract it correctly.
Firstly, I had such a problem both on Windows (I am using win 200 server) and on Linux. I managed to cope with this problem on Windows by setting the system locale to Russian. Both English and Russian metadata were extracted correctly. Then I set the locale on Linux (/etc/sysconfig/i18n) to "ru_RU.UTF-8". However the problem remains - metadata is not exctracted correctly

I would be grateful if you could help me to solve this problem.
Thanks in advance.
Labels:
- Labels:
-
Archive
2 REPLIES 2

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2007 03:35 AM
OS: Windows 2000 Server (English)
Installed Locales: English (US), Russian.
Found some interesting effects:
1) If the default locale in Russian then metadata is extracted successfully from ms office documents BUT :!: CIFS does not allow to enter any folder with cyrillic characters in it.
2) When the default locale is English(US) CIFS is working, BUT :!: metadata is not extracted correctly
Did enyone observe such effects? Could anyone propose some solution/workaround for this?
Thanks in advance.
Installed Locales: English (US), Russian.
Found some interesting effects:
1) If the default locale in Russian then metadata is extracted successfully from ms office documents BUT :!: CIFS does not allow to enter any folder with cyrillic characters in it.
2) When the default locale is English(US) CIFS is working, BUT :!: metadata is not extracted correctly
Did enyone observe such effects? Could anyone propose some solution/workaround for this?
Thanks in advance.

Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2007 04:19 AM
I have found a solution to this problem. Both POI and CIFS are working correctly when we explicitly defined application language and file encoding. So all we need is to add the following parameters to java executable:
After it is done, metadata is extracted successfully from office documents and CIFS is working correctly with folders containing cyrillic characters
-Duser.language=ru -Duser.country=RU -Duser.region=RU -Duser.variant=RU -Dfile.encoding=CP1251
After it is done, metadata is extracted successfully from office documents and CIFS is working correctly with folders containing cyrillic characters
