apache spark - Create Hive table from parquet files and load the data -
i finding difficult load parquet files hive tables. working on amazon emr cluster , spark data processing. need read output parquet files validate transformations. have parquet files following schema:
root |-- attr_year: long (nullable = true) |-- afil: struct (nullable = true) | |-- clm: struct (nullable = true) | | |-- amb: struct (nullable = true) | | | |-- l: string (nullable = true) | | | |-- cdtransrsn: string (nullable = true) | | | |-- dist: struct (nullable = true) | | | | |-- t: string (nullable = true) | | | | |-- content: double (nullable = true) | | | |-- dscstrchpurp: string (nullable = true) | | |-- amt: struct (nullable = true) | | | |-- l: string (nullable = true) | | | |-- t: string (nullable = true) | | | |-- content: double (nullable = true) | | |-- amttotchrg: double (nullable = true) | | |-- cdaccstate: string (nullable = true) | | |-- cdcause: string (nullable = true)
how can create hive external table using type of schema , load parquet files hive table analysis?
you can use catalog.createexternaltable
(spark before 2.2) or catalog.createtable
(spark 2.2 , later).
catalog
instance can accessed using sparksession
:
val spark: sparksession spark.catalog.createtable(...)
session should initialized hive support enabled.
Comments
Post a Comment