Three different approaches to locating data
The idea of the file until it is good because the registration of cities changes very slowly, then the updates would occur in a very sporadic and casual way.
The reason is that with the database, the server has to:
- Receive the request.
- Connect to the database.
- Make an SQL query.
- Obtain the results of the consultation.
- Assemble an answer in HTML, XML, JSON format or the format you prefer and that will have an X size.
- Send this X-size response to the browser.
Without the database, it gets faster because the server will:
- Receive the request.
- Find the file already ready.
- Open the file and read it in full.
- Upload the file contents (X-size) to the browser.
It gets faster even if you keep it in memory like this:
- Receive the request.
- Send the contents of a global variable, string type constant, or fixed text from memory to browser.
One way to do this is with a <?php include
, since PHP will do keep in memory the content of all accessed PHP files (it will not keep the output they generate, but the PHP source code). By putting in this PHP code a fixed content, the source code will be the output itself, which will induce PHP to at the end of the accounts simply copy the contents of that memory directly to the request response without any or almost no further processing.
To avoid having to manually edit the XML (or the format it is) containing the data of the cities, you can write some procedure that reads the necessary information from the database and at the end generates the necessary content and saves everything in a PHP file already in the desired format. That way, if/when your database changes, you can just run the procedure again. You will not need to perform this procedure at any time other than when something changes or in the database, or in the data format to be served, or in the algorithm that generates it.
Caching
In addition to keeping everything in memory, you can take advantage of the browser’s skills in caching content, considering that it will change very rarely. To do this, you post the content somewhere with a URL called cidades-2016-08-14.php
or similar thing. There, you put the following headers in the request response:
Cache-control: max-age=31536000
ETag: "Cidades20160814"
And then, on the server, before sending the content in memory to the client, you check for a request header called If-None-Match
and if its value is "Cidades20160814"
(with quotes). If it is, you return the status 304 Not Modified
and the response of the request blank or with a very short content. If it is not, you return the contents of your data and add these two headers to the request response. Note that the contents of ETag
and of If-None-Match
should always come in quotes.
The field Cache-control
specifies the expiration date. Here I put in 31536000 seconds, which gives 365 days. The field ETag
serves to make conditional Gets. This value you pass to the browser when it first fetches the resource and the browser stores it. After the expiry of the max-age
, the browser sends the value of ETag
back in the header If-None-Match
to know if the content has changed or not, and when responding 304, you signal that it has not changed. If it had changed, you would return the new string with the ETag
different.
The ETag
should change when the content of your resource (in case the city list) changes. Let’s assume that on 01/10/2017 some municipality will be created somewhere. At this point, you could update the content of your resource in PHP and switch to ETag
for "Cidades20170110"
.
This scheme of ETag
still has a problem: when the database changes, the browser will still have the old database cached for a long time until the client clears the browser cache, or the max-age
expire, or something similar happens, which can be problematic. A possible solution would be to reduce the value of max-age
, but this is not necessary. In this case, you just change the URL to cidades-2017-01-10.php
, putting the new database there. Therefore, the day the database changes, as the URL will also change, automatically the cache of the previous content will no longer be used, as long as all references in the other PHP files and javascript point to the new URL, and that is exactly whyif surely a very high value is placed for the max-age
in that case.
Searches in the database
To do research in cities or something like that, the best thing would be to give the whole content to the browser and implement all the research through javascript. This way, the browser will be able to do it alone without having to ask the server for anything.
If a search on the server is necessary, to get the best performance, you search only in the memory of the server itself, and generates for each request a value of ETag
which can be easily determined based on the data provided in the survey (e.g., city name) and also based on the date the data was last updated. The exact details of the algorithm that generates the ETag
and the exact content of ETag
do not matter much, as long as these two important rules are respected:
The algorithm has to be deterministic. That is, for the same input, same version of your dataset and same query URL, it should generate the same output.
The algorithm has to be collision-free. That is, in the same query URL, two different entries or two identical entries in different versions of your dataset should generate ETag
s different.
It is easier to manage these two properties (mainly the second) when each version of your dataset has a different URL, because in this case the complexity is reduced only to generate the ETag
based on input only. An easy and simple way to generate the ETag
in this case, if the input is guaranteed to be a short string, is to make the contents of the ETag
be the actual content of the input (maybe with some Escaping/encoding only).
Additional reading
Worth a reading in the text I used as source, here: https://developers.google.com/web/fundamentals/performance/optimizing-content-efficiency/http-caching
Much more complex. XML is one of the worst things they invented, because it’s bad for everything. The format only came because the market is regulated by theorists with good positions in companies, and academics, who do not deal with problems in practice. It even has its niche of use, but almost always those who use only use it by ignorance.
– Bacco
Other examples of things often used by "fashion": Base64 (of every 10 uses I see, 9 are absurd), Regex, OO in language you don’t need, REST, SOAP, MVC, Patterns design in general.
– Bacco
If you want to simplify without installing Mysql, better use an Sqlite, which takes about 300k or even include with an array. If Mysql is already installed on the machine, it has no advantage not to use.
– Bacco
Obgd for the clarification.
– Luis
I just wanted to advance the subject, and take the opportunity to counter the fashion :) But let’s see if someone put a "official" answer with legal and interesting guidelines. As the format of the site allows multiple responses, it may be that different people present different alternatives.
– Bacco
@Bacco Glad I’m not the only one not to like XML. In the years between 2005 and 2010 mainly, it was a lot of boredom of these people who think(va) that XML was the silver bullet and the magic solution to all problems, where everything in the world had to be in XML and that not using XML was considered a crime. When I criticized this, it was called stupid and/ or crazy.
– Victor Stafusa