Probabilistic M2M Relationships Using Bloom Filters

“Using a junction table for a sparse or small data set (where there are not many associations between movies and people) gives acceptable space and time consumption properties. But for denser association matrices (which may grow over time), the upper bound on the size of the junction table is O(n(movies) * n(people)), and the upper bound on the time taken to join all three tables will be the square of that. So what optimizations and trade-offs can be made in such a situation? Well, we can use a bloom filter on each side of the M2M relationship and do away with the junction table altogether…”