Invert map and reduceByKey in Spark-Scala -
i'm have csv dataset want process using spark, second column of format:
yyyy-mm-dd hh:mm:ss
i want group each mm-dd
val days : rdd = sc.textfile(<csv file>) val partitioned = days.map(row => { row.split(",")(1).substring(5,10) }).invertthemap.grouporreducebykey
the result of grouporreducebykey
of form:
("mm-dd" -> (row1, row2, row3, ..., row_n) )
how should implement invertthemap
, grouporreducebykey
?
i saw in python here wonder how done in scala?
this should trick
val testdata = list("a, 1987-09-30", "a, 2001-09-29", "b, 2002-09-30") val input = sc.parallelize(testdata) val grouped = input.map{ row => val columns = row.split(",") (columns(1).substring(6, 11), row) }.groupbykey() grouped.foreach(println)
the output is
(09-29,compactbuffer(a, 2001-09-29)) (09-30,compactbuffer(a, 1987-09-30, b, 2002-09-30))
Comments
Post a Comment