apache spark - Create Hive table from parquet files and load the data -

September 15, 2013

i finding difficult load parquet files hive tables. working on amazon emr cluster , spark data processing. need read output parquet files validate transformations. have parquet files following schema:

root  |-- attr_year: long (nullable = true)  |-- afil: struct (nullable = true)  |    |-- clm: struct (nullable = true)  |    |    |-- amb: struct (nullable = true)  |    |    |    |-- l: string (nullable = true)  |    |    |    |-- cdtransrsn: string (nullable = true)  |    |    |    |-- dist: struct (nullable = true)  |    |    |    |    |-- t: string (nullable = true)  |    |    |    |    |-- content: double (nullable = true)  |    |    |    |-- dscstrchpurp: string (nullable = true)  |    |    |-- amt: struct (nullable = true)  |    |    |    |-- l: string (nullable = true)  |    |    |    |-- t: string (nullable = true)  |    |    |    |-- content: double (nullable = true)  |    |    |-- amttotchrg: double (nullable = true)  |    |    |-- cdaccstate: string (nullable = true)  |    |    |-- cdcause: string (nullable = true)

how can create hive external table using type of schema , load parquet files hive table analysis?

you can use catalog.createexternaltable (spark before 2.2) or catalog.createtable (spark 2.2 , later).

catalog instance can accessed using sparksession:

val spark: sparksession spark.catalog.createtable(...)

session should initialized hive support enabled.

Search This Blog

Enable

apache spark - Create Hive table from parquet files and load the data -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

c - Why does alarm() cause fgets() to stop waiting? -