redis - Tag huge list of elements with lat/long with large list of geolocation data -
i have huge list of geolocation events:
event (1 billion) ------ id datetime lat long
and list of point of interest loaded open street map:
poi (1 million) ------ id tag (shop, restaurant, etc.) lat long
i assign each each event tag of point of interest. best architecture achieve problem? tried using google bigquery have cross join , not work. open use other big data system.
using dataflow can cross join pretty using cogroupbykey. using approach event , poi joining need fit in memory (dataflow automatically spill disk if list of items given key large fit in memory).
here's more detail.
- create pcollection of events keyed latitude , longitude.
- create pcollection of poi keyed latitude , longitude
- use cogroupbykey join 2 pcollections.
- write dofn processes cogbkresult
the dofn like:
pcollection<t> finalresultcollection = cogbkresultcollection.apply(pardo.of( new dofn<kv<k, cogbkresult>, t>() { @override public void processelement(processcontext c) { kv<k, cogbkresult> e = c.element(); // collection 1 values iterable<event> eventvals = e.getvalue().getall(eventtag); // collection 2 values iterable<poi> poivals = e.getvalue().getall(poitag); (event e : eventvals) { (poi p : poival) { ... c.output(...tagged event...); } } } }));
as discussed in answer use side input pass map keys latitude , longitude , values details of poi. approach work if data can fit in memory. if have 1 million poi , storing fields listed fit in memory.
note: i'm on dataflow team.
Comments
Post a Comment