If the codes have a training standard as per your example LL-NNN.NNN
, where L
corresponds to a letter uppercase and N
corresponds to a number. In addition, both the hyphenate as to the dot appear at that specific position. You can use the method str.extract
with the regular expression ([A-Z]{2}-\d{3}\.\d{3})
Note When it comes to regular expressions, there is always a more comprehensive.
See the example below:
Importing library
import pandas as pd
Creating Test Dataframe
df = pd.DataFrame({"cod": [1, 2, 3, 4], "descricao": ["Um texto qualquer AA-123.456 e segue com mais coisa", "Outro texto com o codigo BB-232.444 e vamos lá", "Por fim um produto CC-666.888 e fim", "um errado AA-12.3.456"]})
print(df)
cod descricao
0 1 Um texto qualquer AA-123.456 e segue com mais ...
1 2 Outro texto com o codigo BB-232.444 e vamos lá
2 3 Por fim um produto CC-666.888 e fim
3 4 um errado AA-12.3.456
Creating new column
df["novaColuna"] = df["descricao"].str.extract(r'([A-Z]{2}-\d{3}\.\d{3})')
print(df)
cod descricao novaColuna
0 1 Um texto qualquer AA-123.456 e segue com mais ... AA-123.456
1 2 Outro texto com o codigo BB-232.444 e vamos lá BB-232.444
2 3 Por fim um produto CC-666.888 e fim CC-666.888
3 4 um errado AA-12.3.456 NaN
Note that for the last item, the code was not extracted, as it is not in the cited pattern.
Regular expression?
– FourZeroFive
@Scoring tries to specify your question better, the way it is no one will be able to help. If you could just put in the code so we can see what’s going on or what you need. In this description of your problem it seems that using regular expressions ( regex ) will help you.
– William