java - Spark doesn't read/write information from s3 (ResponseCode=400, ResponseMessage=Bad Request) -
i implemented spark application. i've created spark context:
private javasparkcontext createjavasparkcontext() { sparkconf conf = new sparkconf(); conf.setappname("test"); if (conf.get("spark.master", null) == null) { conf.setmaster("local[4]"); } conf.set("fs.s3a.awsaccesskeyid", getcredentialconfig().gets3key()); conf.set("fs.s3a.awssecretaccesskey", getcredentialconfig().gets3secret()); conf.set("fs.s3a.endpoint", getcredentialconfig().gets3endpoint()); return new javasparkcontext(conf); } and try data s3 via spark dataset api (spark sql):
string s = "s3a://" + getcredentialconfig().gets3bucket(); dataset<row> csv = getsparksession() .read() .option("header", "true") .csv(s + "/dataset.csv"); system.out.println("read size :" + csv.count()); there error:
exception in thread "main" com.amazonaws.services.s3.model.amazons3exception: status code: 400, aws service: amazon s3, aws request id: 1a3e8cbd4959289d, aws error code: null, aws error message: bad request, s3 extended request id: q1fv8snvcsowgbhjsu2d3nfgow00388ipxiihnkhz8vi/zysc8v8/yyq1ilvsm2gwqiyty1mijc= hadoop version: 2.7
aws endpoint: s3.eu-central-1.amazonaws.com
(on hadoop 2.8 - works fine)
the problem is: frankfurt doesn't support s3n. need use s3a. , region has v4 auth version. http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
eu (frankfurt) eu-central-1 version 4 only
it mean's need enable on aws client. need add system property
com.amazonaws.services.s3.enablev4 -> true
conf.set("com.amazonaws.services.s3.enablev4", "true");//doesn't work me on local machine i've used:
system.setproperty("com.amazonaws.services.s3.enablev4", "true"); for running on aws emr need add params spark-submit:
spark.executor.extrajavaoptions=-dcom.amazonaws.services.s3.enablev4=true spark.driver.extrajavaoptions=-dcom.amazonaws.services.s3.enablev4=true additionally should add class implementation file systems:
conf.set("spark.hadoop.fs.s3a.impl", org.apache.hadoop.fs.s3a.s3afilesystem.class.getname()); conf.set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.distributedfilesystem.class.getname()); conf.set("spark.hadoop.fs.file.impl", org.apache.hadoop.fs.localfilesystem.class.getname());
Comments
Post a Comment