redis - Tag huge list of elements with lat/long with large list of geolocation data -


i have huge list of geolocation events:

event (1 billion) ------ id datetime lat long 

and list of point of interest loaded open street map:

poi (1 million) ------ id tag   (shop, restaurant, etc.) lat long 

i assign each each event tag of point of interest. best architecture achieve problem? tried using google bigquery have cross join , not work. open use other big data system.

using dataflow can cross join pretty using cogroupbykey. using approach event , poi joining need fit in memory (dataflow automatically spill disk if list of items given key large fit in memory).

here's more detail.

  • create pcollection of events keyed latitude , longitude.
  • create pcollection of poi keyed latitude , longitude
  • use cogroupbykey join 2 pcollections.
  • write dofn processes cogbkresult
  • the dofn like:

     pcollection<t> finalresultcollection = cogbkresultcollection.apply(pardo.of(   new dofn<kv<k, cogbkresult>, t>() {     @override     public void processelement(processcontext c) {       kv<k, cogbkresult> e = c.element();       // collection 1 values       iterable<event> eventvals = e.getvalue().getall(eventtag);       // collection 2 values       iterable<poi> poivals = e.getvalue().getall(poitag);       (event e : eventvals) {         (poi p : poival) {           ...           c.output(...tagged event...);         }       }     }   })); 

as discussed in answer use side input pass map keys latitude , longitude , values details of poi. approach work if data can fit in memory. if have 1 million poi , storing fields listed fit in memory.

note: i'm on dataflow team.


Comments

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -