cancel
Showing results for 
Search instead for 
Did you mean: 

Regular Expression to extract alphanumeric (with spaces) data

Mike_Rogers
Champ in-the-making
Champ in-the-making

What would be the RegEx to extract the data that's in column 4 from the screen shot below?  I'm using Grouped Line Item Extraction.

The screen shot isn't the best but hopefully you can see that columns 3 & 4 (and others) are being OCR'd.  In reality, the data is a a 5-digit color number followed by a slash followed by a [varying length] color name (which could include spaces).

I seem to be getting the color number field OK.  (I do not have an Expression configured for the column.)

For the color number field, without a Regular Expression applied in configuration, the columns seem to be so tight (i.e. no spaces), OCR is extracting the slash as part of the color name.

Applying a Regular Expression of \w[1,20], OCR's the first name from the column.  Which is fine if the color name is only name, like 'Tupelo'.  It doesn't capture names like 'Vital Gray' and and 'Smart Blue'

Any suggestions on how to capture names with a space in them and omit the leading slash?  I'm assuming there's no hope for the color name of 'Bronze Seal' since it word wraps to a second line?

Thanks in advance...

1 ACCEPTED ANSWER

Gilberto_Cortes
Star Contributor
Star Contributor

Hello Mike,

Try the following...

As a tag in Grouped Line Item Extraction, this expression grabs the number:

(\d{5})\/[\sA-Z]+

As a tag in Grouped Line Item Extraction, this expression grabs the color name:

\d{5}\/([\sA-Z]+)

Also, in OnBase 15, we now have the ability to split a value captured in a column between two keyword types. Especially useful when the values are so close together.

View answer in original post

1 REPLY 1

Gilberto_Cortes
Star Contributor
Star Contributor

Hello Mike,

Try the following...

As a tag in Grouped Line Item Extraction, this expression grabs the number:

(\d{5})\/[\sA-Z]+

As a tag in Grouped Line Item Extraction, this expression grabs the color name:

\d{5}\/([\sA-Z]+)

Also, in OnBase 15, we now have the ability to split a value captured in a column between two keyword types. Especially useful when the values are so close together.

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.