hadoop - Which version of Spark to download? -


i understand can download spark source code (1.5.1), or prebuilt binaries various versions of hadoop. of oct 2015, spark webpage http://spark.apache.org/downloads.html has prebuilt binaries against hadoop 2.6+, 2.4+, 2.3, , 1.x.

i'm not sure version download.

i want run spark cluster in standalone mode using aws machines.

<edit>

i running 24/7 streaming process. data coming kafka stream. thought using spark-ec2, since have persistent ec2 machines, thought might use them.

my understanding since persistent workers need perform checkpoint(), needs have access kind of shared file system master node. s3 seems logical choice.
</edit>

this means need access s3, not hdfs. not have hadoop installed.

i got pre-built spark hadoop 2.6. can run in local mode, such wordcount example. however, whenever start up, message

warn nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable 

is problem? need hadoop?

<edit>

it's not show stopper want make sure understand reason of warning message. under assumption spark doesn't need hadoop, why showing up? </edit>

we're running spark on ec2 against s3 (via s3n file system). had issue pre-built versions hadoop 2.x. regrettably don't remember issue was. in end we're running pre-built spark hadoop 1.x , works great.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -