cancel
Showing results for 
Search instead for 
Did you mean: 

xml meta data extractor doesn't work with DOCTYPE

kilo
Champ in-the-making
Champ in-the-making
Hello,

I noticed that my xml meta data extractor doesn't work when the content (xml, of course) has <!DOCTYPE > declaration. I'm using the built-in XML extractor, just like the sample. I get log message:


No working metadata extractor could be found:
   Document: ContentAccessor[ contentUrl=store://2010/1/29/17/54/eed0a6a1-49b3-4863-b3aa-4edfa0fef55d.bin, mimetype=text/xml, size=4371, encoding=UTF-8, locale=en_US]
17:54:10,870 INFO  [STDOUT] 17:54:10,870 User:admin DEBUG [metadata.xml.XPathMetadataExtracter]
XML metadata extractor redirected:
   Reader:    ContentAccessor[ contentUrl=store://2010/1/29/17/54/eed0a6a1-49b3-4863-b3aa-4edfa0fef55d.bin, mimetype=text/xml, size=4371, encoding=UTF-8, locale=en_US]
   Extracter: null

which seems strange since everything works as expected without <!DOCTYPE > in the xml document. Is this because the built-in extractor is trying to validate the xml document?

Is there any option to disable <!DOCTYPE > interpretation in built-in extractor?

Thank you. I will appreciate your suggestions.
2 REPLIES 2

_valerio_
Champ in-the-making
Champ in-the-making
Hi Kilo, I'm trying to extract  metadata (through wcm-xml-metadata-extracter-context.xml) from an xml file that looks like this
<documento>
  <destinatario>Pippo</destinatario>
  <tipo_documento>FATTURA</tipo_documento>
  <codice_articolo>Art1234</codice_articolo>
  <numero_fattura>31</numero_fattura>
</documento>
but my extracter doesn't work!
please can you post the code of your extractor

syspro
Champ in-the-making
Champ in-the-making
Hi,

I'm having a similar issue with regards to extracting Meta data from DITA XML topic and Map files. The information is extracted fine when the Doctype declaration is not present. Have you had any luck with this?