java - Spark doesn't read/write information from s3 (ResponseCode=400, ResponseMessage=Bad Request) -


i implemented spark application. i've created spark context:

    private javasparkcontext createjavasparkcontext() {             sparkconf conf = new sparkconf();             conf.setappname("test");             if (conf.get("spark.master", null) == null) {                 conf.setmaster("local[4]");             }             conf.set("fs.s3a.awsaccesskeyid", getcredentialconfig().gets3key());             conf.set("fs.s3a.awssecretaccesskey", getcredentialconfig().gets3secret());             conf.set("fs.s3a.endpoint", getcredentialconfig().gets3endpoint());              return new javasparkcontext(conf);         } 

and try data s3 via spark dataset api (spark sql):

     string s = "s3a://" + getcredentialconfig().gets3bucket();      dataset<row> csv = getsparksession()                         .read()                         .option("header", "true")                         .csv(s + "/dataset.csv");       system.out.println("read size :" + csv.count()); 

there error:

exception in thread "main" com.amazonaws.services.s3.model.amazons3exception: status code: 400, aws service: amazon s3, aws request id: 1a3e8cbd4959289d, aws error code: null, aws error message: bad request, s3 extended request id: q1fv8snvcsowgbhjsu2d3nfgow00388ipxiihnkhz8vi/zysc8v8/yyq1ilvsm2gwqiyty1mijc= 

hadoop version: 2.7

aws endpoint: s3.eu-central-1.amazonaws.com

(on hadoop 2.8 - works fine)

the problem is: frankfurt doesn't support s3n. need use s3a. , region has v4 auth version. http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

eu (frankfurt) eu-central-1 version 4 only

it mean's need enable on aws client. need add system property

com.amazonaws.services.s3.enablev4 -> true

conf.set("com.amazonaws.services.s3.enablev4", "true");//doesn't work me 

on local machine i've used:

system.setproperty("com.amazonaws.services.s3.enablev4", "true"); 

for running on aws emr need add params spark-submit:

spark.executor.extrajavaoptions=-dcom.amazonaws.services.s3.enablev4=true spark.driver.extrajavaoptions=-dcom.amazonaws.services.s3.enablev4=true 

additionally should add class implementation file systems:

conf.set("spark.hadoop.fs.s3a.impl", org.apache.hadoop.fs.s3a.s3afilesystem.class.getname()); conf.set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.distributedfilesystem.class.getname()); conf.set("spark.hadoop.fs.file.impl", org.apache.hadoop.fs.localfilesystem.class.getname()); 

Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -