Remove a part of the URL

Question

Remove a part of the URL

Asked 5 years, 10 months ago

Viewed 927 times

2

I have this URL:

https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/

How could I return only NormasDeProcedimento/ of the URL with Javascript and save in a variable? Since the goal would be to remove what was in the Normativo/ forward until the next /.

1

You have to capture the url or it is static?

– Marconi

2018/12/27 at 10:46
1

I have this code to remove var str = window.location.href;

– KmDroid

2018/12/27 at 10:48
By removing you mean removing that part of the string or returning it only?

– fernandosavio

2018/12/27 at 10:51
In this case it would remove from the "Normative/" forward until the next "/" and would be in this case "Standard/"

– KmDroid

2018/12/27 at 10:54
@fernandosavio Return only "Standard/"

– KmDroid

2018/12/27 at 11:01

3 answers

8

You can use regular expressions to isolate part of the string using String.replace():

const URL = "https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/"
const REGEX = /.*\/Normativo\/(.+?)\/.*/

console.log(URL.replace(REGEX, "$1"))

Explanation:

I use a regular expression to create a group with the part of the string that comes after /Normativo/ and then replace the entire string with the group contents.

A regex:

.*\/Normativo\/(.+?)\/.*

.*: Marry with any character 0 to N times;
\/Normativo\/: Will match the string /Normativo/
(): Creates a group
.+: Will marry with any character 1 to N times;
?: Makes the quantifier + non Greedy (will match the smallest possible occurrence)

That is, the URL:

https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/

The parts of the Regex that will be:

.*: https://teste.teste.pt/sites/teste
\/Normativo\/: /Normativo/
(.+?): NormasDeProcedimentos (This value is captured in a group)
\/.*: /Documents/Histórico/

Done this just use the String.replace() to replace the entire string only with the captured group using the notation $n where n is the number of the group captured.

Since we only captured one group in regex, then the group number is 1 (Docs):

URL.replace(REGEX, "$1");

Browser other questions tagged javascript regex

You are not signed in. Login or sign up in order to post.

by Marconi • **17,287** points · Answer 1 · 2018-12-27T10:48:44+00:00

If you need to capture the url you must use the current code window.location.href. If it is static you can do straight as in the code below, just a simple replace.

console.log(window.location.href)

let url = 'https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/'

url = url.replace('NormasDeProcedimentos/', '')
console.log(url)

Update:

If you want to capture after Normativo/ until the next / this regex will suffice: (?<=Normativo\/)\w+\/

Explaining

(?<=) Look Behind Positive, IE looking back, look where they have the text Normativo\/
w+\/ look for a text followed by a bar.

Translating: Find a text bar that is preceded by the word Normativo/

const expressao = /(?<=Normativo\/)\w+\//

let url = 'https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/'

let variavelCaptura = url.match(expressao)

url = url.replace(variavelCaptura, '')

console.log(url)

Compatibility of Look Behind Positive: http://kangax.github.io/compat-table/es2016plus/

by hkotsubo • **55,826** points · Answer 2 · 2018-12-27T14:39:35+00:00

Complementing the other answers, another alternative is:

const url = "https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/";
const regex = /\/Normativo\/([^\/]+)/;

console.log(regex.exec(url)[1]); // NormasDeProcedimentos

The excerpt \/Normativo\/ checks if there is the word "Normative" between two bars.

Then I use it [^\/]+:

[^\/]: The ^ between brackets means "any character that is not inside the brackets". In this case, we only have the bar (properly escaped with \ not to be confused with the regex delimiters). Therefore, this expression means "any character other than /"
the quantifier + means: one or more occurrences.

This whole section is in parentheses to form a catch group. And since it’s the first pair of parentheses, that means any stretch will be captured in group 1. Then I use the method exec, that returns the match, and picked up position 1, which corresponds to the first capture group. The result will be "Standard".

Going a little further...

Use .+? instead of [^\/]+ also works. This only starts to make a difference if we have a URL that does not satisfy the expression.

For example, it was unclear whether the URL could be just: https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos

The expression .*\/Normativo\/(.+?)\/.* makes a bar mandatory after NormasDeProcedimentos, then to the above URL it would fail. Only .+? means "one or more occurrences of whichever character", where the ? means "the minimum of characters satisfying the expression".

This means that regex will test several possibilities before failing (since . means "any character", meaning there is an enormous amount of possibilities to be tested).

I made a test of this regex in regex101.com, and if you get into the mode of debug, will see that the regex back and forth several times in the string, verifying several possibilities in several different positions of the same. On this screen you can use the keyboard (arrows to the right and left to go forward and back, being able to see what the regex does with each step). When a red arrow appears pointing to the left, this represents a backtracking, that is, an attempt by regex to return some string positions and test new possibilities.

At this same link, note also on the left side: it indicates that regex took more than 4500 steps to realize that the string does not satisfy the expression. This is thanks to .+?, and also because of .* at the beginning and end of the expression. As the point means "any character", and the quantifiers + and * does not have a maximum limit, regex tries all possibilities (with 1, 2, 3... n characters), until it realizes that none match can be found.

On the other hand, let’s see what happens if we use \/Normativo\/([^\/]+)\/ (note that I added a bar at the end, only so that it is mandatory and regex fails to the URL https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos).

I also put it in regex101.com, and see that she needs much less steps (about 90) to realize that there is a match. That’s because I removed the .* beginning and end (for I am only interested in what I have after "/Normative/"), and I have explicitly put what I want ([^\/]+ - anything but the character /).

This difference happens because lazy quantifiers (such as the .+?), although very useful for cases like this, have their price. And use stitch . It’s very tempting, but it’s not always what you need. The dot means "any character", but you don’t want any character, you want "any character other than /", then the best is always explicitly say what you want and what you don’t want.

Of course for small programs, where regex will run a few times, and especially for cases where a match, the difference in performance will be irrelevant. But it’s important to keep those details in mind, because there are cases where this can make a difference.

Also, remember that the exact amount of steps depends on the engine and the input strings. But the difference between the expressions remains more or less the same (the version with [^\/] will always be faster than .+?).

Why not validate the URL?

Since the input is a URL, you could use the object URL and obtain only the pathname:

let url = 'https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/';
let path = new URL(url).pathname;

console.log(path); // "/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/"

// usar a regex com a string path (em vez de usar a URL inteira)
const regex = /\/Normativo\/([^\/]+)/;
console.log(regex.exec(path)[1]); // NormasDeProcedimentos

With this, you validate whether the input is a valid URL and still get a smaller string for regex to evaluate, making it run a little faster. Again, for a few executions, the difference will be irrelevant, but it may be that the validation made by new URL worthwhile, because then you do not accept simply any string. It is up to you to use or not.

In question you said you want to return NormasDeProcedimento/ (with the bar at the end). Therefore, you just need to include this bar in the regex (and within the parentheses, so that it is already available in the capture group).

const url = "https://teste.teste.pt/sites/teste/Normativo/NormasDeProcedimentos/Documents/Histórico/";
const regex = /\/Normativo\/([^\/]+\/)/;

console.log(regex.exec(url)[1]); // NormasDeProcedimentos/