apache spark - Combine output of parallel operations using Scala -
the following snippet of code processes filters in parallel , write out individual files output directory.is there way 1 large output file?
array( (filter1, outputpathbase + filename), (filter2, outputpathbase + filename), (filter3, outputpathbase + filename) ).par.foreach { case (extract, path) => extract.coalesce(1).write.mode("append").csv(path) }
thank you.
you can reduce array single rdd union them, parallelize execution of each filter* spark
val rdd = array( filter1 filter2, filter3).reduce(_.union(_)) rdd.write.mode("append").csv(path)
there no need in case convert array
pararray
i assuming filter1
, filter2
, filter3
of same type rdd[t]
Comments
Post a Comment