cancel
Showing results for 
Search instead for 
Did you mean: 

CDATA on xml extraction are skipped

jsc
Champ in-the-making
Champ in-the-making
Hi,

I extract metadata from xml like this :

<root>
    <text><![CDATA[Date de l'événement : 01/07/2008]]></text>
</root>

When I try to extract /root/text/text() alfresco, I'm supposed to get <![CDATA[Date de l'événement : 01/07/2008]]> whereas i get nothing. The content of text tag is skipped because of CDATA section.

How can I do to get  <![CDATA[Date de l'événement : 01/07/2008]]>  for /root/text/text() xpath ?

Thanks.
6 REPLIES 6

pmonks
Star Contributor
Star Contributor
Actually, that XPath expression should return "Date de l'événement : 01/07/2008" (the text inside the CDATA section, but without the CDATA markers themselves).  As described at http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#omitted, CDATA sections are not part of the XML infoset so are "invisible" to XPath (although the text within them is visible and should be accessible).

That said, it sounds like you're not getting the text inside the CDATA section either, which sounds like a bug.  Are you able to reduce this to a small reproducible test case?  If so it'd be worth raising in JIRA (http://issues.alfresco.com/).

Cheers,
Peter

jsc
Champ in-the-making
Champ in-the-making
You're right about CDATA sections.

I made a mistake, the problem is where there are end of line beforeCDATA section. The example I provided works well whereas this one does not work :

<root>
    <text>
<![CDATA[Date de l'événement : 01/07/2008]]></text>
</root>

pmonks
Star Contributor
Star Contributor
Just to clarify, if there's a newline you don't get any text at all?  Or you get the text but without the leading newline (which is what I'd expect to happen)?

I'm not entirely sure what the Infoset is supposed to look like if there's a leading newline prior to a CDATA block - it would be worth verifying that that's well formed & valid XML (I assume it is, but don't know for sure).

Cheers,
Peter

jsc
Champ in-the-making
Champ in-the-making
if there's a newline I get spaces, and newline character. No more.

pmonks
Star Contributor
Star Contributor
Ok in that case I think the first step is to confirm that a leading newline is allowed prior to a CDATA section.  If not the XML is invalid; if so then it sounds like a bug and should be raised in JIRA (http://issues.alfresco.com/).

Cheers,
Peter

jsc
Champ in-the-making
Champ in-the-making
OK I raised a bug in JIRA.

Is there a workaround to reformat xml uploaded to remove leading and trailing whitespaces in content ? I know there is content transformer but I do not want to generate a new file I just want to work on uploaded file.
Getting started

Tags


Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.