cancel
Showing results for 
Search instead for 
Did you mean: 

Extracting metadata from xml

tytanix
Champ in-the-making
Champ in-the-making
Good morning!
In this days i've extended my alfresco in various way with great results. Now i have to scout the metadata extraction.
In my system i have to store curricula of a business employees. This cv are created from https://europass.cedefop.europa.eu. The employee has to create his own cv and then download the pdf and xml format. Is there a way to extract metadata directly from the field of the cv xml format?
I was thinking about an approach just like the one used for accessing to document properties (ex. document.properties["cm:doc"]Smiley Wink

For example, this is one of the xml i was talking about:

<blockcode>
<?xml version='1.0' encoding='UTF-8'?>
<SkillsPassport xmlns="http://europass.cedefop.europa.eu/Europass" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://europass.cedefop.europa.eu/Europass http://europass.cedefop.europa.eu/xml/EuropassSchema_V3.0.1.xsd" locale="it">
   <DocumentInfo>
      <DocumentType>ECV</DocumentType>
      <CreationDate>2013-11-27T08:33:27.750Z</CreationDate>
      <LastUpdateDate>2013-11-27T08:42:29.610Z</LastUpdateDate>
      <XSDVersion>V3.0</XSDVersion>
      <Generator>EWA</Generator>
      <Comment>Europass CV</Comment>
   </DocumentInfo>
   <PrintingPreferences>
      
      <Document type="ECV">
         <Field name="LearnerInfo" show="true" order="Identification Headline WorkExperience Education Skills Achievement ReferenceTo"/>
         <Field name="LearnerInfo.Identification" show="true"/>
         <Field name="LearnerInfo.Identification.PersonName" show="true" order="Surname FirstName"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Address" show="true" format="s, z m ©"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Email" show="true"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Telephone" show="true"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Telephone[0]" show="true"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Website" show="false"/>
         <Field name="LearnerInfo.Identification.ContactInfo.Website[0]" show="false"/>
         <Field name="LearnerInfo.Identification.ContactInfo.InstantMessaging" show="false"/>
         <Field name="LearnerInfo.Identification.ContactInfo.InstantMessaging[0]" show="false"/>
         <Field name="LearnerInfo.Identification.Demographics.Birthdate" show="true" format="numeric/long"/>
         <Field name="LearnerInfo.Identification.Demographics.Gender" show="true"/>
         <Field name="LearnerInfo.Identification.Demographics.Nationality" show="true"/>
         <Field name="LearnerInfo.Identification.Demographics.Nationality[0]" show="true"/>
         <Field name="LearnerInfo.Identification.Photo" show="false"/>
         <Field name="LearnerInfo.Headline" show="true"/>
         <Field name="LearnerInfo.WorkExperience" show="true"/>
         <Field name="LearnerInfo.WorkExperience[0]" show="true"/>
         <Field name="LearnerInfo.WorkExperience[0].Period" show="true" format="text/long"/>
         <Field name="LearnerInfo.WorkExperience[0].ReferenceTo" show="false"/>
         <Field name="LearnerInfo.WorkExperience[0].ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.WorkExperience[0].Position" show="true"/>
         <Field name="LearnerInfo.WorkExperience[0].Activities" show="true"/>
         <Field name="LearnerInfo.WorkExperience[0].Employer" show="false"/>
         <Field name="LearnerInfo.WorkExperience[0].Employer.ContactInfo.Address" show="false" format="s, z m ©"/>
         <Field name="LearnerInfo.WorkExperience[0].Employer.ContactInfo.Website" show="false"/>
         <Field name="LearnerInfo.WorkExperience[0].Employer.Sector" show="false"/>
         <Field name="LearnerInfo.Education" show="true"/>
         <Field name="LearnerInfo.Education[0]" show="true"/>
         <Field name="LearnerInfo.Education[0].Period" show="true" format="text/long"/>
         <Field name="LearnerInfo.Education[0].ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Education[0].ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Education[0].Title" show="true"/>
         <Field name="LearnerInfo.Education[0].Activities" show="true"/>
         <Field name="LearnerInfo.Education[0].Organisation" show="false"/>
         <Field name="LearnerInfo.Education[0].Organisation.ContactInfo.Address" show="false" format="s, z m ©"/>
         <Field name="LearnerInfo.Education[0].Organisation.ContactInfo.Website" show="false"/>
         <Field name="LearnerInfo.Education[0].Level" show="false"/>
         <Field name="LearnerInfo.Education[0].Field" show="false"/>
         <Field name="LearnerInfo.Skills" show="true"/>
         <Field name="LearnerInfo.Skills.Linguistic.MotherTongue" show="true"/>
         <Field name="LearnerInfo.Skills.Linguistic.MotherTongue[0]" show="true"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage" show="true"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage[0]" show="true"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage[0].ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage[0].ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage[0].Certificate" show="false"/>
         <Field name="LearnerInfo.Skills.Linguistic.ForeignLanguage[0].Certificate[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Communication" show="true"/>
         <Field name="LearnerInfo.Skills.Communication.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Communication.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Organisational" show="true"/>
         <Field name="LearnerInfo.Skills.Organisational.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Organisational.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.JobRelated" show="true"/>
         <Field name="LearnerInfo.Skills.JobRelated.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.JobRelated.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Computer" show="true"/>
         <Field name="LearnerInfo.Skills.Computer.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Computer.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Driving" show="false"/>
         <Field name="LearnerInfo.Skills.Driving.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Driving.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Skills.Other" show="false"/>
         <Field name="LearnerInfo.Skills.Other.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Skills.Other.ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.Achievement" show="false"/>
         <Field name="LearnerInfo.Achievement[0]" show="false"/>
         <Field name="LearnerInfo.Achievement[0].ReferenceTo" show="false"/>
         <Field name="LearnerInfo.Achievement[0].ReferenceTo[0]" show="false"/>
         <Field name="LearnerInfo.ReferenceTo" show="false"/>
         <Field name="LearnerInfo.ReferenceTo[0]" show="false"/>
      </Document>
      
   </PrintingPreferences>
   
   <LearnerInfo>
      
      <Identification>
         
         <PersonName>
            <FirstName></FirstName>
            <Surname></Surname>
         </PersonName>
         
         <ContactInfo>
            <Address>
               <Contact>
                  <AddressLine></AddressLine>
                  <PostalCode></PostalCode>
                  <Municipality></Municipality>
                  <Country>
                     <Code></Code>
                     <Label></Label>
                  </Country>
               </Contact>
            </Address>
            
            <Email>
               <Contact></Contact>
            </Email>
            
            <TelephoneList>
               <Telephone>
                  <Contact></Contact>
                  <Use>
                     <Code></Code>
                     <Label></Label>
                  </Use>
               </Telephone>
            </TelephoneList>
         
         </ContactInfo>
         
         <Demographics>
            
            <Birthdate year="" month="" day=""/>
            
            <Gender>
               <Code></Code>
               <Label></Label>
            </Gender>
            
            <NationalityList>
               <Nationality>
                  <Code></Code>
                  <Label></Label>
               </Nationality>
            </NationalityList>
         
         </Demographics>
      
      </Identification>
      
      <Skills>
         
         <Linguistic>
            
            <ForeignLanguageList>
               
               <ForeignLanguage>
                  
                  <Description>
                     <Code></Code>
                     <Label></Label>
                  </Description>
                  
                  <ProficiencyLevel>
                     <Listening></Listening>
                     <Reading></Reading>
                     <SpokenInteraction></SpokenInteraction>
                     <SpokenProduction></SpokenProduction>
                     <Writing></Writing>
                  </ProficiencyLevel>
               
               </ForeignLanguage>
            
            </ForeignLanguageList>
         
         </Linguistic>
      
      </Skills>
   
   </LearnerInfo>

</SkillsPassport>
</blockcode>

So i would like to extract the data from xml fields and, if it's possible, call this xml in create document as template, edit xml fields in a form, save the values and then convert the populated xml to a pdf.
8 REPLIES 8

mitpatoliya
Star Collaborator
Star Collaborator
Best way to achieve this is create your custom metadata extractor which will be called when any of your CV gets uploaded.
You extractor will parse the xml and extract metadata from xml content using XML parser provided by java.Then those values can be mapped to common properties.

tytanix
Champ in-the-making
Champ in-the-making
Thank you! It's a good thing to know that i can do what i want. I'll try to find out how to do this. If you have any clue just to give me a start point you will be very helpfull, or even if you know some links to understand how to achieve that.

Just for best understanding, i will upload an xml cv as a normal document (with my custom type) i'll parse the content of the document to the extractor and then the extractor will return the values specified in each tag of the parsed xml, is that right?

mitpatoliya
Star Collaborator
Star Collaborator
Yes, you can change the extension of all your xml files to something like ".cvfile"
then bind your custom metadata extractor with this new filetype.
thats the way to go.

tytanix
Champ in-the-making
Champ in-the-making
Thank you very much for your help! A last question, i've found that i'have to implement a java class for my extractor, but i can't find out where to deploy that java. In which path should i put my java extractor?

mitpatoliya
Star Collaborator
Star Collaborator
you can create jar after compiling your java classes and put it under alfresco installation

<TOMCAT_HOME>/Webapps/alfresco/WEB-INF/lib

tytanix
Champ in-the-making
Champ in-the-making
This is my script to extract metadata from the earlier xml.

<blockcode>
function main()
{
                              
            
   var fileContent = document.content;
   var xml = new XML(fileContent);
   
                  
   
      logger.log("#### XML CONTENT  #####  " + xml);
   
   var firstName = xml.SkillsPassport.LearnerInfo.Identification.PersonName.FirstName;
      logger.log("#### FIRST NAME ####  " + firstName);
         

}

main();
</blockcode>

Unfortunately, firstName results empty while var xml contains the content of my xml curriculum. Is my approach wrong?

kaynezhang
World-Class Innovator
World-Class Innovator
in ECMAScript for XML if you write like this
 var xml = new XML(fileContent);

object xml stands for SkillsPassport element,So if you want to get a person's first name you should write it like this

function main()
{
   var fileContent = document.content;
   var xml = new XML(fileContent);
   logger.log("#### XML CONTENT  #####  " + xml);
   var firstName = xml.LearnerInfo.Identification.PersonName.FirstName;
        logger.log("#### FIRST NAME ####  " + firstName);
}

main();


and make sure your logging is enabled ,if not you can write it like this

function main()
{
   var fileContent = document.content;
   var xml = new XML(fileContent);
   logger.getSystem().out("#### XML CONTENT  #####  " + xml);
   var firstName = xml.LearnerInfo.Identification.PersonName.FirstName;
        logger.getSystem().out("#### FIRST NAME ####  " + firstName);
}

main();

vicmutish
Champ in-the-making
Champ in-the-making
I am very new to alfresco. I am scanning so many document using kodak capture pro and am facing the problem on how to read and index the document metadata of each image. attached is a sample xml file. I have created custom aspects bt i dont know how to index this information.
Kindly give me seggestion on where to start.