Files imported via CIFS not indexed correctly by lucene
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-09-2007 04:13 AM
When I import the same file via the Alfresco Web interface, the file is indexed correct and I can search over all the content.
Is this a known issue?
- Labels:
-
Archive
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-15-2007 06:39 AM
Perhabs I did not described my problem correct. I will try again:
When I add content to Alfresco using the Web-Client (add content) than the document is correct indexed by Lucene. I can search over all words in the document and I will find my test document.
But when I import a copy of the same document to the same space in Alfresco using CIFS, than the document is not being indexed correct by Lucene.
My test: I search for the word 'Paris' - I will find only the document I have imported via the Web-Client. But when I search for 'video' I will find both.
If you want I can send my test PDFs to you, so that you can test by yourself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-15-2007 06:41 AM
What is the size of you PDF document in Alfresco when you import it through CIFS ? On some platforms it seems there is a bug and the file size is zero. As a consequence many operations fail on them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-15-2007 09:37 AM
both files have exact the same size (144 KB). So that's not the problem.
Again: also the file which I import through CIFS is indexed by Lucene, but not in the same way as the one which I import through Web-Client.
I have 2 similar files (except the file name AlfrescoLuceneTestdocument.pdf/AlfrescoLuceneTestdocument - Kopie.pdf). The content is exact the same.
Both documents contain the same words. But when I search for the word 'Paris', I will find only the file imported through Web-Client.
When I search for the word 'video', I will find both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-16-2007 04:34 AM
when I search for the word 'Paris', I will find only the file imported through Web-Client.
When I search for the word 'video', I will find both.
Paris… video…
Hum… Does your document contains Paris Hilton video ? Alfresco may filter those items :lol:
As nobody her can explain this behaviour, you may open a JIRA issue to raise the problem to Alfresco engineer.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎11-16-2007 05:58 AM
It could be that the files generate different tokens (by default only the first 10,000 are used). This can be increased in the config. If the document structure is different it could be the pdf->text transformation does not work. Or uses a different route the second time, and thus produces a different result.
Are you using Open Office? What difference does it make with and without it?
Are the files loaded up in the same locale as you search? Locale affects tokenisation for indexing and search.
Add these docs in reverse order and see what happens.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎04-22-2008 08:09 AM
Let me describe the procedure once again:
I use a simple MS Word 2003 document, one page about 200 words. As you can see not really a complex document.
When I upload this document (I can give it to you if you want) with the web-client, all word in the document are getting indexed correct.
When I upload the same document with CIFS not all words get indexed.
I think this is a serious problem and should be under investigation by Alfresco.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-24-2008 11:53 AM
Please give pointers to solve this issue. Thanks in advance for your help.
Thanks,
Vinoda Kumar S.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-26-2008 02:50 AM
In other words: if you use a different language on you Windows clients than you use with Alfresco Webclient, than you will face this problem.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎05-26-2008 07:13 AM
Thanks,
Vinod
