8
We are using REGEX to normalize pharmaceutical data from a string field and we need to distinguish very similar strings from an exception command.
For example, in a very simple way, we have the following records:
0,5 MG WITH CT BL AL/AL X 30 ----> WITH = Simple Pill
0,4 MG WITH REV CT BL AL X 90 ----> WITH REV = Coated Tablet
0,7 MG LIBLY CT BL AL X 30 ----> LIBLY = Compressed Extended Release
To identify a coated tablet, we use the syntax: WITH sREV s
To identify the Liber Tablet. Prolong., we use the syntax: COM sLIB sPROL s
In this example simplified we need to identify a Simple Pill and for that we need to look for an expression where there is only WITH, without the existence of whole words REV and LIB. Something like syntax:
COM s[ (REV|LIB)]
.. but that syntax didn’t work. Someone can help?
EDITED
Not always the REV shall be immediately after the WITH. The string may come, for example:
0,4 MG WITH CT REV BL AL AL X 90 ---> or with any other words.
The point is that you can’t exist REV at no point in the string.
EDITED 27/07
The syntax bcom b s(?!.*REV|.*LIB) worked well for cases that REV and LIB are after WITH, however, you cannot find the expressions below because there is REV and LIB before the WITH
0,4 MG REV COM CT BL AL X 90
0,7 MG LIB PROL COM CT BL AL X 30
And then the syntax needs to be comprehensive to identify the COM and discard any REV or LIB
Something like: (?!. *REV|. *LIB) bcom b s(?!. *REV|. *LIB)
It is possible?
Can you give an example of the result you want to get? Do you want to organize it into an object for example? What language are you using?
– Sergio
@Sergio, we’ll use java to build the code. In this case, we need to scan an entire table and sort the records according to a description field, string, in which all the information is mixed. I was responsible for constructing the REGEX syntax to identify the records. So, for example, when reading the string field, when you find WITH we know it’s a pill and when to find WITH REV we know it’s a coated tablet, and so on.
– Denise DAmaro