How to apply regex , using C# in a pdf file?

Asked

Viewed 56 times

-1

    private static string BuscaComparacao (string url)
        {
            Regex r =new Regex("\S+\D{3}[.]\D{3}\S+\D)";
        var result = r.Matches(url);
        return result[0].ToString();

        }

the code does not find the data I sent for.

  • What information is being researched? You need to further clarify your question.

  • a regular expression ta right, but it says that Message = "analyzing " s+ d{3}[. ] d{3} s+ d)" - Excessive number of )’s."

  • It does not lack a parenthesis at the beginning of the expression? new Regex("( S+ D{3}[. ] D{3} S+ D)"? Or the parenthesis at the end is too much?

  • yes, corrected, but not returning anything, is that you are giving error by sending the url of a pdf

  • What you want to get with regular expression?

  • 042.964 3 (concrete unit and verification code)

  • 1

    Let me get this straight, you’re passing a URL to the method. Right?

  • This is C: Annex 26342018013427.pdf

  • now exiting giving the Message = "Specified argument was outside the range of valid values. r Parameter name: i" would be why it needs iis to read the pdf?

  • Being a path to a file, what do you want with regular expression? Get file name? From folder?

  • obtain data within the file

  • @This is impossible. What you are doing is applying the regex to the text C:\\AnexosEmail\\26342018013427.pdf

  • would have to convert the pdf to text first?

  • You will need to use an external library, such as Docotip.pdf or Itextsharp. The way you are doing is just applying regular expression to the URL itself and not inside the file.

  • Use this library, it is easy to use and then you can search the text for what you need. https://www.codeproject.com/Articles/14170/Extract-Text-from-PDF-in-C-100-NET

Show 10 more comments

1 answer

1


Resolved as follows

PdfReader reader = new PdfReader(arquivo);
var page = PdfTextExtractor.GetTextFromPage(reader,1);
var exp = @"\s+\d{3}[.]\d{3}\s+\d";
Regex r =new Regex(exp);
var result = r.Matches(page);
return result[0].ToString();

Thank you @joaomartins

Browser other questions tagged

You are not signed in. Login or sign up in order to post.