hadoop - Which version of Spark to download? -
i understand can download spark source code (1.5.1), or prebuilt binaries various versions of hadoop. of oct 2015, spark webpage http://spark.apache.org/downloads.html has prebuilt binaries against hadoop 2.6+, 2.4+, 2.3, , 1.x.
i'm not sure version download.
i want run spark cluster in standalone mode using aws machines.
<edit>
i running 24/7 streaming process. data coming kafka stream. thought using spark-ec2, since have persistent ec2 machines, thought might use them.
my understanding since persistent workers need perform checkpoint()
, needs have access kind of shared file system master node. s3 seems logical choice.
</edit>
this means need access s3, not hdfs. not have hadoop installed.
i got pre-built spark hadoop 2.6. can run in local mode, such wordcount example. however, whenever start up, message
warn nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable
is problem? need hadoop?
<edit>
it's not show stopper want make sure understand reason of warning message. under assumption spark doesn't need hadoop, why showing up? </edit>
we're running spark on ec2 against s3 (via s3n
file system). had issue pre-built versions hadoop 2.x. regrettably don't remember issue was. in end we're running pre-built spark hadoop 1.x , works great.
Comments
Post a Comment