design patterns - Apache Spark distributed sql -


i use spark dataframereader perform sql query database. each query performed sparksession required. is: each of javapairrdds perform map, invoke sql query parameters rdd. means need pass sparksession in each lambda, seems bad design. common approach in such problems?

it like:

roots.map(r -> dbloader.getdata(sparksession, r._1)); 

how load data now:

javardd<row> javardd = sparksession.read().format("jdbc")             .options(options)             .load()             .javardd(); 

the purpose of big data have data locality , able execute code data resides, ok big load of table memory or local disk (cache/persist), continuous remote jdbc queries defeat purpose.


Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -