Invert map and reduceByKey in Spark-Scala -

- March 15, 2013

i'm have csv dataset want process using spark, second column of format:

yyyy-mm-dd hh:mm:ss

i want group each mm-dd

val days : rdd = sc.textfile(<csv file>)  val partitioned = days.map(row => {      row.split(",")(1).substring(5,10)  }).invertthemap.grouporreducebykey

the result of grouporreducebykey of form:

("mm-dd" -> (row1, row2, row3, ..., row_n) )

how should implement invertthemap , grouporreducebykey?

i saw in python here wonder how done in scala?

this should trick

val testdata = list("a, 1987-09-30",   "a, 2001-09-29",   "b, 2002-09-30")  val input = sc.parallelize(testdata)  val grouped = input.map{   row =>     val columns = row.split(",")      (columns(1).substring(6, 11), row) }.groupbykey()  grouped.foreach(println)

the output is

(09-29,compactbuffer(a, 2001-09-29)) (09-30,compactbuffer(a, 1987-09-30, b, 2002-09-30))

Search This Blog

Maxid

Invert map and reduceByKey in Spark-Scala -

Comments

Post a Comment

Popular posts from this blog

How to show in django cms breadcrumbs full path? -

php - Invalid Cofiguration - yii\base\InvalidConfigException - Yii2 -

ruby on rails - npm error: tunneling socket could not be established, cause=connect ETIMEDOUT -