<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: CDATA on xml extraction are skipped in Alfresco Archive</title>
    <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172662#M125864</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Actually, that XPath expression should return "Date de l'événement : 01/07/2008" (the text inside the CDATA section, but without the CDATA markers themselves).&amp;nbsp; As described at &lt;/SPAN&gt;&lt;A href="http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#omitted" rel="nofollow noopener noreferrer"&gt;http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#omitted&lt;/A&gt;&lt;SPAN&gt;, CDATA sections are not part of the XML infoset so are "invisible" to XPath (although the text within them is visible and should be accessible).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;That said, it sounds like you're not getting the text inside the CDATA section either, which sounds like a bug.&amp;nbsp; Are you able to reduce this to a small reproducible test case?&amp;nbsp; If so it'd be worth raising in JIRA (&lt;/SPAN&gt;&lt;A href="http://issues.alfresco.com/" rel="nofollow noopener noreferrer"&gt;http://issues.alfresco.com/&lt;/A&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peter&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 20 Aug 2008 05:55:55 GMT</pubDate>
    <dc:creator>pmonks</dc:creator>
    <dc:date>2008-08-20T05:55:55Z</dc:date>
    <item>
      <title>CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172661#M125863</link>
      <description>Hi,I extract metadata from xml like this :&amp;lt;root&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;text&amp;gt;&amp;lt;![CDATA[Date de l'événement : 01/07/2008]]&amp;gt;&amp;lt;/text&amp;gt;&amp;lt;/root&amp;gt;‍‍‍‍When I try to extract /root/text/text() alfresco, I'm supposed to get &amp;lt;![CDATA[Date de l'événement : 01/07/2008]]&amp;gt; whereas i get nothing. The co</description>
      <pubDate>Tue, 19 Aug 2008 14:45:37 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172661#M125863</guid>
      <dc:creator>jsc</dc:creator>
      <dc:date>2008-08-19T14:45:37Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172662#M125864</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Actually, that XPath expression should return "Date de l'événement : 01/07/2008" (the text inside the CDATA section, but without the CDATA markers themselves).&amp;nbsp; As described at &lt;/SPAN&gt;&lt;A href="http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#omitted" rel="nofollow noopener noreferrer"&gt;http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#omitted&lt;/A&gt;&lt;SPAN&gt;, CDATA sections are not part of the XML infoset so are "invisible" to XPath (although the text within them is visible and should be accessible).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;That said, it sounds like you're not getting the text inside the CDATA section either, which sounds like a bug.&amp;nbsp; Are you able to reduce this to a small reproducible test case?&amp;nbsp; If so it'd be worth raising in JIRA (&lt;/SPAN&gt;&lt;A href="http://issues.alfresco.com/" rel="nofollow noopener noreferrer"&gt;http://issues.alfresco.com/&lt;/A&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peter&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Aug 2008 05:55:55 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172662#M125864</guid>
      <dc:creator>pmonks</dc:creator>
      <dc:date>2008-08-20T05:55:55Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172663#M125865</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;You're right about CDATA sections.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I made a mistake, the problem is where there are end of line beforeCDATA section. The example I provided works well whereas this one does not work :&lt;/SPAN&gt;&lt;BR /&gt;&lt;PRE class="language-none line-numbers"&gt;&lt;CODE&gt;&lt;BR /&gt;&amp;lt;root&amp;gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;text&amp;gt;&lt;BR /&gt;&amp;lt;![CDATA[Date de l'événement : 01/07/2008]]&amp;gt;&amp;lt;/text&amp;gt;&lt;BR /&gt;&amp;lt;/root&amp;gt;&lt;BR /&gt;&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Aug 2008 07:53:54 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172663#M125865</guid>
      <dc:creator>jsc</dc:creator>
      <dc:date>2008-08-20T07:53:54Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172664#M125866</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Just to clarify, if there's a newline you don't get any text at all?&amp;nbsp; Or you get the text but without the leading newline (which is what I'd expect to happen)?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I'm not entirely sure what the Infoset is supposed to look like if there's a leading newline prior to a CDATA block - it would be worth verifying that that's well formed &amp;amp; valid XML (I assume it is, but don't know for sure).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peter&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 20 Aug 2008 16:43:47 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172664#M125866</guid>
      <dc:creator>pmonks</dc:creator>
      <dc:date>2008-08-20T16:43:47Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172665#M125867</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;if there's a newline I get spaces, and newline character. No more.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 21 Aug 2008 10:26:36 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172665#M125867</guid>
      <dc:creator>jsc</dc:creator>
      <dc:date>2008-08-21T10:26:36Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172666#M125868</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Ok in that case I think the first step is to confirm that a leading newline is allowed prior to a CDATA section.&amp;nbsp; If not the XML is invalid; if so then it sounds like a bug and should be raised in JIRA (&lt;/SPAN&gt;&lt;A href="http://issues.alfresco.com/" rel="nofollow noopener noreferrer"&gt;http://issues.alfresco.com/&lt;/A&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Cheers,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Peter&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 21 Aug 2008 16:23:22 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172666#M125868</guid>
      <dc:creator>pmonks</dc:creator>
      <dc:date>2008-08-21T16:23:22Z</dc:date>
    </item>
    <item>
      <title>Re: CDATA on xml extraction are skipped</title>
      <link>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172667#M125869</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;OK I raised a bug in JIRA.&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Is there a workaround to reformat xml uploaded to remove leading and trailing whitespaces in content ? I know there is content transformer but I do not want to generate a new file I just want to work on uploaded file.&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 22 Aug 2008 13:30:13 GMT</pubDate>
      <guid>https://connect.hyland.com/t5/alfresco-archive/cdata-on-xml-extraction-are-skipped/m-p/172667#M125869</guid>
      <dc:creator>jsc</dc:creator>
      <dc:date>2008-08-22T13:30:13Z</dc:date>
    </item>
  </channel>
</rss>

