PHP Text Interpreter

Question

PHP Text Interpreter

Asked 10 years, 1 month ago

Viewed 547 times

5

I’m creating a text interpreter based on the Github do Duckduckgo in PHP.

This is one of the codes I created:

if (strpos(strtolower($qt), "rand") !== FALSE){
  if (preg_match("/^rand$/", strtolower(removeAccents($qt)), $match)){
    $result = rand(1,9999);
    $sndline = "Random number";
  }
  elseif (preg_match("/^random$/", strtolower(removeAccents($qt)), $match)){
    $result = rand(1,9999);
    $sndline = "Random number";
  }
  elseif (preg_match("/^rand *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", strtolower(removeAccents($qt)), $match)){
    $result = rand($match['min'],$match['max']);
    $sndline = "Random number";
  }
  elseif (preg_match("/^random *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", strtolower(removeAccents($qt)), $match)){
    $result = rand($match['min'],$match['max']);
    $sndline = "Random number";
  }
}

As you can see, the script performs the action if the user types rand, random, rand (1,99) and random (1,99).

The whole interpreter was written with functions preg_match (I mean, everything with Regex) but found that they overload the system.

How can I create something that is fast (without overloading the system) and also understands different "orders" given by the user without using Regex (remembering that the user can type anything in the search)?

2 answers

5

I see at least three points that can be optimized with ease:

The if more from the outside seems unnecessary.
The strtolower(removeAccents($qt)) may end up running more than once (4 times if you type random (1,99)).
The first two cases are a comparison of equality, and you do not need to use regex for this.

Translating it into code:

$cleanQt = strtolower(removeAccents($qt));
if ($cleanQt == "rand"){
  $result = rand(1,9999);
  $sndline = "Random number";
}
elseif ($cleanQt == "random"){
  $result = rand(1,9999);
  $sndline = "Random number";
}
elseif (preg_match("/^rand *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", $cleanQt), $match)){
  $result = rand($match['min'],$match['max']);
  $sndline = "Random number";
}
elseif (preg_match("/^random *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", $cleanQt, $match)){
  $result = rand($match['min'],$match['max']);
  $sndline = "Random number";
}

And I think we can still condense the last two tests into one:

$cleanQt = strtolower(removeAccents($qt));
if ($cleanQt == "rand"){
  $result = rand(1,9999);
  $sndline = "Random number";
}
elseif ($cleanQt == "random"){
  $result = rand(1,9999);
  $sndline = "Random number";
}
elseif (preg_match("/^rand|random *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", $cleanQt), $match)){
  $result = rand($match['min'],$match['max']);
  $sndline = "Random number";
}

And yet, as qmechanik shows, you can merge the first two conditions:

$cleanQt = strtolower(removeAccents($qt));
if ($cleanQt == "rand" || $cleanQt == "random"){
  $result = rand(1,9999);
  $sndline = "Random number";
}
elseif (preg_match("/^rand|random *\((?<min>[0-9]+),(?<max>[0-9]+)\)$/", $cleanQt), $match)){
  $result = rand($match['min'],$match['max']);
  $sndline = "Random number";
}

Let’s assume that I have more than one of these "modules", I will have to add more conditions (elseif)?

– hsbpedro

2015/05/16 at 03:26
1

It’s a path. But depending on the complexity of your rules, it’s worth studying a little bit about lexers and parsers and implement something more generic.

– bfavaretto

2015/05/16 at 03:27
Complexity you say in what sense?

– hsbpedro

2015/05/16 at 03:30
For example, if you have many of these keywords to be identified in the input data, if they can appear in different positions, if they can be grouped into expressions, if the order makes a difference, among other things. The more its rules approach a "language", the more appropriate it is to represent it in BNF and create a parser for her.

– bfavaretto

2015/05/16 at 03:35
Without going in the direction of lexers and parsers, has another way to go adding new modules without having to create new elseifs?

– hsbpedro

2015/05/16 at 03:38
No, it’s on the basis of if, elseif and else same. In certain cases of string equality comparison with a list of terms you can opt for a switch and react accordingly, but it doesn’t make much difference.

– bfavaretto

2015/05/16 at 03:44
If I add too many elseif, the code will still be fast?

– hsbpedro

2015/05/16 at 03:47
It depends on what you do in it, there is no absolute answer.

– bfavaretto

2015/05/16 at 03:48
A general path (which lexers consist of taking) is to take the entire string, break into a list of "words" (tokens) and scan this list by comparing the terms with what you accept, interpreting linearly the entire "phrase".

– bfavaretto

2015/05/16 at 03:51
1

Unfortunately, I don’t have the competence or time to go into detail right now. But who knows any colleague here on the site does not post a response to that later.

– bfavaretto

2015/05/16 at 03:52
Let’s go continue this discussion in chat.

– hsbpedro

2015/05/16 at 03:52
1

@hsbpedro Sorry, but now we can not talk in chat, I’m ending the day. But keep trying small changes and testing the difference in performance.

– bfavaretto

2015/05/16 at 03:55
Thank you very much for the clarification and the help!

– hsbpedro

2015/05/16 at 03:57

Show 8 more comments

Browser other questions tagged php

You are not signed in. Login or sign up in order to post.

by stderr • **30,356** points · Answer 1 · 2015-05-16T03:09:28+00:00

In accordance with mentioned by bfavaretto, the first if is unnecessary, because right below there is another if what makes practically same thing, except that the latter uses the function removeAccents.

You could save some lines by declaring once only the variable sndline, outside the code block of the if.
The code blocks of the first two if are equal, just as the last two are also, use the logical operator or to check on the same line.

_{Note: You can also use the operator ||, more information on Qual a diferença entre “&&” e “||” e “and” e “or” em PHP? Qual usar?}
The use of the function preg_match in the first two if can be replaced by equality operator == (=== for something more rigid) or strcmp, strcasecmp.
Another point that can be improved is to declare a variable that contains the value of strtolower(removeAccents($qt)), this way you avoid running those function several times, no need.
You use the function preg_match in the last two if to check whether the variable qt contains the word Rand or Random followed by numbers, you can replace it by the function filter_var, used to filter the contents of a string, as you only want numbers, use as filter FILTER_SANITIZE_NUMBER_INT.
- Combine the function filter_var with strstr to find numbers prefixed to the word Rand or Random.

Your code with those suggestions would look like this:

function filtrarNumeros($texto){
    return filter_var($texto, FILTER_SANITIZE_NUMBER_INT);
}
$qt = strtolower(removeAccents($qt));
$sndline = "Random number";

if (($qt == "rand") or ($qt == "random")){
    $result = rand(1, 9999);
}
elseif (filtrarNumeros((strstr($qt, 'rand')) !== false) or (filtrarNumeros(strstr($qt, 'random')) !== false)){
    $tok = strtok($qt, ",");
    $min = filtrarNumeros($tok);
    $max = filtrarNumeros(strtok(','));
    $result = rand($min, $max);
}
else{
    // ....
}

Exemplo