Some time ago (2013? 2012?) I developed a system very similar to the question. I had a few million objects and a dozen "tags", and each object could have 0 or more tags associated. I had to filter these objects based on searches for tag sets. Similar, no?
As the number of tags per object, in my case, was theoretically unlimited (since new tags could be added to the system after release), the solution proposed in the question did not suit me, i.e., I could not use a table with columns "tag1", "tag2"etc. Also, this scheme does not allow use of indexes (see my first comment in the @mgibsonbr reply).
As I needed a lot of performance (i.e., queries answered in "seconds"), at the time I made a comparison between several solutions, including the two proposals by @mgibsonbr in his reply.
Next, my results -- if my memory does not fail!
Trade offs...
To @mgibsonbr’s "1." solution has the disadvantage of possibly occupying too much disk space (since you will have the "characters" of the tags repeated numerous times in the whole table). This is a disadvantage as it forces your database to have to read many "pages" from your hard drive, so you have to turn the disk around a lot and move the reading head a lot, which can have great latency. The advantage is that you make only 1 select to get your result.
Already the @mgibsonbr’s "2." solution uses less disk space (because in the giant table only tags ids -- will be saved and if you use the numeric type of size appropriate to the maximum amount of tags, you can reduce to 4, 2 or even 1 byte per line). Thus, you can read more lines per "page" read from disk reducing latency. In contrast, your select would probably have a Join:
select from tags_objects, tags
where tags.id = tags_objects.tagId
and (tags.name = 'tag-buscado-1'
or tags.name = 'tag-buscado-2') -- etc...
This Join is to blame for performance problems in this solution.
Most efficient solution (in my case of specific use)
In the end, the most efficient solution I was able to use was the "2." solution with 2 different selects. The first select searches for the tag ids, and the second select uses the tag ids in the giant table. It’s like I did Join "manually".
This was advantageous for me because, in my case, I was able to cache, in my application, the tags ids. This cache was updated by a background thread (doing a "full scan" on the lowercase table that contains tags and their ids every "X" second). At the end of the day, in practical terms, the "synchronous" calculation was only a select in the giant table with the column "tagId" being some numerical type, therefore smaller than having to do joins.
Obviously, for performance reasons, it is necessary to place an index in the "tagId" column of the giant table.
Before implementing all this solution, my queries lasted ~1min or ~2min with, if I’m not mistaken, about 5 tags. After all this, I was able to reduce the time of the queries to something around ~10s!
Considerations
It is quite complicated to analyze beforehand which solution will perform best in this case, because it really depends on the characteristics of your project. I hope that this response will give some guidance in your search for the most efficient solution for your specific case.
Is the goal to query the X tag and return the Ids with it, or a query that returns the entire "index"? And do you want a solution to that specific table structure, or are you open to review the structure?
– bfavaretto
@bfavaretto edited to answer your questions
– Maniero