Hyland

Teri_Rowe · ‎11-13-2023

I have a Drivers License MVR form where the person's name sometimes wraps. I need to extract the first and last name and account for both scenarios below. When using person name extraction, it's missing the last name when the name wraps even though the full name shows in the verification window. I've had this problem before on other forms. Any suggestions?

Without name wrap:

NAME	DOB	DLN	GENDER
John Doe Smith	1/22/1970	D0012345	Male

With name wrap:

NAME	DOB	DLN	GENDER
John Doe Smith	1/22/1970	D0012345	Male

Steve_Reed · ‎11-13-2023

Hi Teri,

The default system regular expressions for person name extraction (there are four of them, one each for FML with or without mixed case and LFM also with or without mixed case) don't have a lot of wiggle room for the amount of whitespace (so either spaces or possibly carriage return/line feed) that could be counted between the name parts. Basically, the defaults mostly expect that the name parts will generally be on the same logical line, or close to it. In your case, I assume you're drawing the extraction box to be around the first column, under "NAME" and tall enough to encompass what could be three lines of text (as in the 'with name wrap' example above), correct? Setting it to a 'person name extraction' type zone and do not have 'Format is Last First [Middle] with or without comma' checked, also correct? If so, then you may need to go into the regular expression library and make some changes to the built-in system regular expression for 'Person Name - First Middle Last/Mixed Case' as this is the regular expression that the engine is trying to use to extract the individual name part values. The default expression here has instances of:

[[:space:]][[:space:]]?

which you'll see between each potential name part, try changing all of these instances to allow for more optional [[:space:]] characters to appear, perhaps this:

[[:space:]][[:space:]]?[[:space:]]?[[:space:]]?

so instead of the original at least one space, maybe two now it will be at least one, maybe up to four. This would help account for the fact that the OCR engine may be reading a space or two as well as a line feed and a separate carriage return between each name part, and this would be too many whitespace characters for the original default expression to account for, which is why it isn't playing nice when the name is broken up into more than one line (with a little wiggle room, as I mentioned - depends on how the OCR engine decides if/when/how to interpret the 'whitespace' between the text blocks and the line feeds).

View answer in original post

Steve_Reed · ‎11-13-2023

Hi Teri,

The default system regular expressions for person name extraction (there are four of them, one each for FML with or without mixed case and LFM also with or without mixed case) don't have a lot of wiggle room for the amount of whitespace (so either spaces or possibly carriage return/line feed) that could be counted between the name parts. Basically, the defaults mostly expect that the name parts will generally be on the same logical line, or close to it. In your case, I assume you're drawing the extraction box to be around the first column, under "NAME" and tall enough to encompass what could be three lines of text (as in the 'with name wrap' example above), correct? Setting it to a 'person name extraction' type zone and do not have 'Format is Last First [Middle] with or without comma' checked, also correct? If so, then you may need to go into the regular expression library and make some changes to the built-in system regular expression for 'Person Name - First Middle Last/Mixed Case' as this is the regular expression that the engine is trying to use to extract the individual name part values. The default expression here has instances of:

[[:space:]][[:space:]]?

which you'll see between each potential name part, try changing all of these instances to allow for more optional [[:space:]] characters to appear, perhaps this:

[[:space:]][[:space:]]?[[:space:]]?[[:space:]]?

so instead of the original at least one space, maybe two now it will be at least one, maybe up to four. This would help account for the fact that the OCR engine may be reading a space or two as well as a line feed and a separate carriage return between each name part, and this would be too many whitespace characters for the original default expression to account for, which is why it isn't playing nice when the name is broken up into more than one line (with a little wiggle room, as I mentioned - depends on how the OCR engine decides if/when/how to interpret the 'whitespace' between the text blocks and the line feeds).

Teri_Rowe · ‎11-13-2023

That worked!! Thank you so much for the help!

Hyland

Help with Person Name Extraction When Name Wraps