apache spark - Create Hive table from parquet files and load the data -


i finding difficult load parquet files hive tables. working on amazon emr cluster , spark data processing. need read output parquet files validate transformations. have parquet files following schema:

root  |-- attr_year: long (nullable = true)  |-- afil: struct (nullable = true)  |    |-- clm: struct (nullable = true)  |    |    |-- amb: struct (nullable = true)  |    |    |    |-- l: string (nullable = true)  |    |    |    |-- cdtransrsn: string (nullable = true)  |    |    |    |-- dist: struct (nullable = true)  |    |    |    |    |-- t: string (nullable = true)  |    |    |    |    |-- content: double (nullable = true)  |    |    |    |-- dscstrchpurp: string (nullable = true)  |    |    |-- amt: struct (nullable = true)  |    |    |    |-- l: string (nullable = true)  |    |    |    |-- t: string (nullable = true)  |    |    |    |-- content: double (nullable = true)  |    |    |-- amttotchrg: double (nullable = true)  |    |    |-- cdaccstate: string (nullable = true)  |    |    |-- cdcause: string (nullable = true) 

how can create hive external table using type of schema , load parquet files hive table analysis?

you can use catalog.createexternaltable (spark before 2.2) or catalog.createtable (spark 2.2 , later).

catalog instance can accessed using sparksession:

val spark: sparksession spark.catalog.createtable(...) 

session should initialized hive support enabled.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -