Randy Au
1 min readMar 1, 2019

--

The point of the “Insane Query” wasn’t really to refute the Venn model, but to illustrate that you can put arbitrary logic into a join.

A stronger refutation of the venn diagram model would probably be something like…

select A.City_name, B.City_name, Godzilla_attacks
from A
LEFT JOIN B
on A.City_name = B.City_name
and random() > 0.6

Where you’re turning a random 40% of what would be joinable rows into NULLs on the B output, but still retaining the A side.

You could use this to generate pathological cases, do targeted sampling, or perhaps join in a table C that uses the NULLs to decide whether to join in some other data (Mothra_attacks for example).

--

--

Randy Au
Randy Au

Written by Randy Au

I stress about data quality a lot. Data nerd/scientist, camera junkie. Quant UXR @Google Cloud. Formerly @bitly, @Meetup, @primarydotcom. Opinions are my own.

No responses yet