Pyspark : is there any way to use Unix exported variables in python code -
i looking solution unix exported variables in pyspark code when running on cluster mode. got solution using os.getenv not working in cluster mode me. in local mode working fine.
is there other way replaces complete set of variables in 1 go. passing n number of arguments , reading them bit overhead.
for cluster mode should use spark configuration named spark.yarn.appmasterenv.xxx
pass variables driver. can find more information in documentation.
but, woudn't environment variables best way deal parameters. did head optparse
library? shipped python distributions (2.x , 3.x) , allows parse parameters in quite easy way, like:
from optparse import optionparser parser = optionparser() parser.add_option("-i", "--input-file", default="hdfs:///some/input/file.csv") parser.add_option("-o", "--output-dir", default= "hdfs:///tmp/output") parser.add_option("-l", "--log-level", default='info') (options, args) = parser.parse_args()
Comments
Post a Comment