apache spark - Combine output of parallel operations using Scala -

February 15, 2015

the following snippet of code processes filters in parallel , write out individual files output directory.is there way 1 large output file?

array(         (filter1, outputpathbase + filename),         (filter2, outputpathbase + filename),         (filter3, outputpathbase + filename)           ).par.foreach {         case (extract, path) => extract.coalesce(1).write.mode("append").csv(path)       }

thank you.

you can reduce array single rdd union them, parallelize execution of each filter* spark

val rdd = array(         filter1         filter2,         filter3).reduce(_.union(_))  rdd.write.mode("append").csv(path)

there no need in case convert array pararray

i assuming filter1, filter2 , filter3 of same type rdd[t]

Search This Blog

Enable

apache spark - Combine output of parallel operations using Scala -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

recursion - Can every recursive algorithm be improved with dynamic programming? -