0
I am looking for methods/ideas that can help me to solve my problem. I have a structure of folders / files that my program generates, so far so good. But what happens is that this folder already contains more than 900 thousand files. Each file of this is very small, about 1 KB, has a header and a text.
Currently the search used by the software is basic: it actually opens file by file and searches for the word. But imagine the delay of searching... in an SSD, searching by word saude
generated me a wait of more than 8 minutes.
I did some tests to see if reducing the number of files would help, but I noticed that it would help little, yet the search would take minutes.
The idea currently is (not using any database) index manually, with an external process responsible for this, words of 3 or more characters per folder, decreasing the search from millions to a few thousand but still could take a few seconds in some cases.
I was also thinking about how Windows indexing works for file content:
[
I did some small tests on "what takes research":
Time to open files + read: 465.948. Found: 2921
Time looking : 264.318. Found: 2921
Time to open files + read: 788.992. Found: 2921
Looking for temp : 599.093. Found: 2921
Time to open files + read: 834.300. Found: 2921
Time looking : 572.496. Found: 2921
Time to open files + read: 709.464. Found: 2921
Time looking : 539.053. Found: 2921
Time to open files + read: 857.443. Found: 2921
Time searching : 761,121. Found: 2921
Time to open files + read: 909.440. Found: 2921
Time looking : 602.000. Found: 2921
Time to open files + read: 865.306. Found: 2921
Seeking weather : 499.046. Found: 2921
The test was done on 1000 files only. In the first result, take into account the opening of the file, in the second only the time when it is searching something (in my test I used strstr
).
Is there any method to make this faster without using a database? Which I’m not sure would solve the case, since on average it would have 200 characters per file where millions of characters would be available to search. If it is not possible without a database, what would be the general idea? A database can handle this data volume well?
where is the code?
– Leandro Angelo
Ever think about using the Windows Search API?
– Leandro Angelo
@Leandroangelo good, actually no, but it would be like activating the option to index contents and searching for code in the application?
– Kevin Kouketsu
This, if I do not miss the memory you open an Oledb connection to the search engine and make the query as SQL itself
– Leandro Angelo
@Leandroangelo. I’ll see this, thanks for the tip. This search for minutes is a little problem for me!
– Kevin Kouketsu
https://docs.microsoft.com/en-us/windows/desktop/search/-search-3x-wds-overview
– Leandro Angelo