scala - Integrate BigQuery with Spark -
how can connect spark google's bigquery?
i imagine 1 use spark's jdbc functionality communicate bigquery.
but jdbc driver found starschema old.
if answer involves jdbc should url parameter like?
from spark docs:
rdd.todf.write.format("jdbc").options(map( "url" -> "jdbc:postgresql:dbserver", "dbtable" -> "schema.tablename" ))
you can use bigquery connector hadoop (which works spark): https://cloud.google.com/hadoop/bigquery-connector
if use google cloud dataproc (https://cloud.google.com/dataproc/) deploy spark cluster, bigquery connector (as gcs connector) automatically deployed , configured out of box.
but can add connector existing spark deployment, whether runs on google cloud or anywhere else. if cluster not deployed on google cloud you'll have configure authentication (using service-account "keyfile" authentication).
[added] answer other question (dataproc + bigquery examples - available?) provides example of using bigquery spark.
Comments
Post a Comment