cancel
Showing results for 
Search instead for 
Did you mean: 

Help with Person Name Extraction When Name Wraps

Teri_Rowe
Confirmed Champ
Confirmed Champ

I have a Drivers License MVR form where the person's name sometimes wraps.  I need to extract the first and last name and account for both scenarios below.  When using person name extraction, it's missing the last name when the name wraps even though the full name shows in the verification window.  I've had this problem before on other forms.  Any suggestions?

 

Without name wrap:

NAMEDOBDLNGENDER
John Doe Smith1/22/1970D0012345Male

 

With name wrap:

NAMEDOBDLNGENDER
John Doe Smith1/22/1970D0012345Male
1 ACCEPTED ANSWER

Hi Teri,

 

The default system regular expressions for person name extraction (there are four of them, one each for FML with or without mixed case and LFM also with or without mixed case) don't have a lot of wiggle room for the amount of whitespace (so either spaces or possibly carriage return/line feed) that could be counted between the name parts.  Basically, the defaults mostly expect that the name parts will generally be on the same logical line, or close to it.  In your case, I assume you're drawing the extraction box to be around the first column, under "NAME" and tall enough to encompass what could be three lines of text (as in the 'with name wrap' example above), correct?  Setting it to a 'person name extraction' type zone and do not have 'Format is Last First [Middle] with or without comma' checked, also correct?  If so, then you may need to go into the regular expression library and make some changes to the built-in system regular expression for 'Person Name - First Middle Last/Mixed Case' as this is the regular expression that the engine is trying to use to extract the individual name part values.  The default expression here has instances of:

 

[[:space:]][[:space:]]?

 

which you'll see between each potential name part, try changing all of these instances to allow for more optional [[:space:]] characters to appear, perhaps this:

 

[[:space:]][[:space:]]?[[:space:]]?[[:space:]]?

 

so instead of the original at least one space, maybe two now it will be at least one, maybe up to four.  This would help account for the fact that the OCR engine may be reading a space or two as well as a line feed and a separate carriage return between each name part, and this would be too many whitespace characters for the original default expression to account for, which is why it isn't playing nice when the name is broken up into more than one line (with a little wiggle room, as I mentioned - depends on how the OCR engine decides if/when/how to interpret the 'whitespace' between the text blocks and the line feeds).

 

View answer in original post

2 REPLIES 2

Hi Teri,

 

The default system regular expressions for person name extraction (there are four of them, one each for FML with or without mixed case and LFM also with or without mixed case) don't have a lot of wiggle room for the amount of whitespace (so either spaces or possibly carriage return/line feed) that could be counted between the name parts.  Basically, the defaults mostly expect that the name parts will generally be on the same logical line, or close to it.  In your case, I assume you're drawing the extraction box to be around the first column, under "NAME" and tall enough to encompass what could be three lines of text (as in the 'with name wrap' example above), correct?  Setting it to a 'person name extraction' type zone and do not have 'Format is Last First [Middle] with or without comma' checked, also correct?  If so, then you may need to go into the regular expression library and make some changes to the built-in system regular expression for 'Person Name - First Middle Last/Mixed Case' as this is the regular expression that the engine is trying to use to extract the individual name part values.  The default expression here has instances of:

 

[[:space:]][[:space:]]?

 

which you'll see between each potential name part, try changing all of these instances to allow for more optional [[:space:]] characters to appear, perhaps this:

 

[[:space:]][[:space:]]?[[:space:]]?[[:space:]]?

 

so instead of the original at least one space, maybe two now it will be at least one, maybe up to four.  This would help account for the fact that the OCR engine may be reading a space or two as well as a line feed and a separate carriage return between each name part, and this would be too many whitespace characters for the original default expression to account for, which is why it isn't playing nice when the name is broken up into more than one line (with a little wiggle room, as I mentioned - depends on how the OCR engine decides if/when/how to interpret the 'whitespace' between the text blocks and the line feeds).

 

That worked!!  Thank you so much for the help!

Getting started

Find what you came for

We want to make your experience in Hyland Connect as valuable as possible, so we put together some helpful links.