How to quickly recover a large PHP array?

Question

How to quickly recover a large PHP array?

Asked 11 years, 7 months ago

Viewed 673 times

8

In the example below, I have a array in PHP with about 128,000 records (Portuguese language entries) that I retrieved from another file, and use in some applications, to compare the indexes as HashMap.

array(128521) {
  [0] =>
  string(3) "aba"
  [1] =>
  string(3) "aba"
  [2] =>
  string(4) "abá"
  [3] =>
  string(7) "ababás"
  [4] =>
  // ...
}

Question

How can I keep this one safe array so that you can quickly recover it from the hard drive without affecting the application’s performance too much? The focus is to make the array for the classes that will use it as soon as possible.

5

I was curious as to why I voted against this question. Could the person responsible explain?

– bfavaretto

2014/01/11 at 13:10
5

I share the curiosity of @bfavaretto, a vote against a question should always be commented. Who votes against it is because they saw something wrong with the question and in that case the comment would help to make the necessary improvements.

– Zuul

2014/01/11 at 14:07

2 answers

3

In addition to Nosql options and a layer of Memcached as noted above, a second option would be search engines. This type of problem has characteristics that I would treat with indexing. You will need the entire array or just query certain words according to the search criteria?

If you need the second option I would use a tool like the Apache Solr or Elasticsearch to create a dedicated index (e.g., index containing a tuple [hash / word]). Well-tuned indexes are very fast and already have intelligent internal cache policies, able to return frequent queries "instantly" even in an index with millions of entries.

did not understand the meaning of dedicated index. You could explain?

– Calebe Oliveira

2014/01/12 at 01:29
Hello, Caleb, this is a sub-area of Information retrieval quite complex, but I will try to explain it in the best possible way. Starting from Google’s case, it indexes zillions of documents (html, pdfs, pages, images, etc.), that is, when a Google Doc finds a document it is pre-processed and the information relevant to it is stored in an index, so that Google users can search and get "instant results".

– Anthony Accioly

2014/01/12 at 21:08
In this sense, when you make a query the search platform is not hitting directly a data, but an index optimized to answer that type of search (following the ranking criteria that "sort" the data according to relevance). Apache Solr and Elasticsearch are search platforms for this purpose. In short, you build / model your own indexes, feed them with the necessary information and these tools use the library Apache Lucene to build efficient and scalable indices.

– Anthony Accioly

2014/01/12 at 21:16
These tools can handle a lot more "crap" than a database and implement more specific algorithms than a typical Nosql tool (for example, engines for enforcing ranking policies as per the above comment). Also, for gigantic indexes (not yours), these tools are ready to work with partitioning, Clusterization, etc. While I understand that your problem currently is nothing so big; it has color, sound and smell of information retrieval. A specific index for words should meet your needs.

– Anthony Accioly

2014/01/12 at 21:26
Okay, I get it, I’m going to research these tools, maybe they’ll solve my problem.

– Calebe Oliveira

2014/01/13 at 01:02

Browser other questions tagged php array

You are not signed in. Login or sign up in order to post.

by Patrick Maciel • **2,281** points · Answer 1 · 2014-01-11T12:32:39+00:00

In fact it’s completely unfeasible to do that.

128,000 indices in an array, is a large amount to be manipulated and/or compared.

The ideal is to keep such information in the database and search it based on a minimum limit of parameters.

But I have maybe two solutions:

Physical file

This array be kept as a physical file (which extension you want, in json format)

[EDITED] Idea

Just as a complement, in your case, I think it’s best to create a file for each "letter" of the alphabet.

a. json
b. json
c.json

Unless you compare, include values between the array value, for example: Ada

Ada
adá
guava
giraffe (joke, haha)

With this I believe that the search, is faster (in a single file), but of course, it will increase the number of requests on the server, and to help in this case, leave any file information inline (), so that the size is reduced/compressed to the maximum.

Cached page (json/xml)

I don’t like and/or recommend xml, but feel free

Have a page (route) of your system/site, which contains this array also in json format, however, such page will be cached for the time determined by you.

That way, when other applications go to the specific page (for example: http://www.examplo.com.br/ptbr.json or http://www.examplo.com.br/ptbr.php - the extension itself does not matter, but rather how you will handle the request), you return that page, which in this case will be cached, and will not be loaded again.

Doubt

Do you use any framework? The vast majority of them already have a cache system ready, and an excellent administration of the extension/response/protocol type of their request/route.

Some links to help you

Excellent post on cache by Thiago Belem: http://blog.thiagobelem.net/criando-um-sistema-de-cache-no-php/
Stack Overflow (in English) - search only
Standard search by google

I hope I’ve helped.