apache spark - Combine output of parallel operations using Scala -


the following snippet of code processes filters in parallel , write out individual files output directory.is there way 1 large output file?

array(         (filter1, outputpathbase + filename),         (filter2, outputpathbase + filename),         (filter3, outputpathbase + filename)           ).par.foreach {         case (extract, path) => extract.coalesce(1).write.mode("append").csv(path)       } 

thank you.

you can reduce array single rdd union them, parallelize execution of each filter* spark

val rdd = array(         filter1         filter2,         filter3).reduce(_.union(_))  rdd.write.mode("append").csv(path) 

there no need in case convert array pararray

i assuming filter1, filter2 , filter3 of same type rdd[t]


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -