cancel
Showing results for 
Search instead for 
Did you mean: 

search inside pdf

jbrowne
Champ in-the-making
Champ in-the-making

I've got a new install of Nuxeo (I downloaded the 5.7.1 vmware) it seems to work and launch fine.

I uploaded 2 different PDF files, but the search does not find the text inside of one of them. why?

the one pdf is not an image, it is text selectable and searchable inside of adobe reader.

1 ACCEPTED ANSWER

Florent_Guillau
World-Class Innovator
World-Class Innovator

From your traces above you're victim of a PDFBox bug: PDFBOX-1512.

A workaround is add this to the JAVA_OPTS:

-Djava.util.Arrays.useLegacyMergeSort=true

View answer in original post

10 REPLIES 10

Marwane_K_A_
Star Contributor
Star Contributor

Hi, does this happen only with this particular PDF?

jbrowne
Champ in-the-making
Champ in-the-making

after re-uploading the file (renamed to 'eli.pdf'), this is what is in the server.log (located here

Florent_Guillau
World-Class Innovator
World-Class Innovator

If you activate the DEBUG level for org.nuxeo.ecm.core.storage.sql.FulltextExtractorWork in lib/log4j.xml then you'll get more info about the cause of the error.

jbrowne
Champ in-the-making
Champ in-the-making

there is no listing for 'Fulltext' in that file.

Florent_Guillau
World-Class Innovator
World-Class Innovator

This

jbrowne
Champ in-the-making
Champ in-the-making

2013-08-05 14

Florent_Guillau
World-Class Innovator
World-Class Innovator

From your traces above you're victim of a PDFBox bug: PDFBOX-1512.

A workaround is add this to the JAVA_OPTS:

-Djava.util.Arrays.useLegacyMergeSort=true

excuse my lack of knowledge - where do I add this?

Inside bin/nuxeo.conf. See the other places where JAVA_OPTS is used in this file.