2
I have the following input, and I do a line-by-line interaction at all.
D b1308 pspE; thiosulfate sulfurtransferase PspE K03972 pspE; phage shock protein E
B 09193 Unclassified: signaling and cellular processes-6
C 99977 Transport
D b2347 yfdC; inner membrane protein YfdC K21990 yfdC; formate-nitrite transporter family protein
D b3657 yicJ; putative xyloside transporter YicJ K03292 TC.GPH; glycoside/pentoside/hexuronide:cation symporter, GPH family
D b3876 yihO; putative sulfoquinovose transporter K03292 TC.GPH; glycoside/pentoside/hexuronide:cation symporter, GPH family
D b0361 insD-1; IS2 element protein K07497 K07497; putative transposase
D b1402 insD-2; IS2 insertion element protein InsB K07497 K07497; putative transposase
However using the following Regex for each line to extract the gene name (for example b2347 yfdC
):
[b]\d{4}\s[a-zA-z]{3,4}
But this Regex does not extract the full name in cases like b1402 insD-2
.
There is a single Regex to extract both cases?
The format is always 3 or 4 letters, optionally followed by "hyphen + 1 number"?
– hkotsubo
Or is the hyphenated or hyphenated, always 3-4 letter format, and yes hyphen and a number
– FourZeroFive
I put a "generic" answer, but if you are using a specific language/site/tool, you can [Dit] the question and add this information, because each language implements regex in one way and not always everything works the same way at all
– hkotsubo
I just tested it, it’s perfect, thank you
– FourZeroFive