hive - Spark HiveContext : Insert Overwrite the same table it is read from -

September 15, 2012

i want apply scd1 , scd2 using pyspark in hivecontext. in approach, reading incremental data , target table. after reading, joining them upsert approach. doing registertemptable on source dataframes. trying write final dataset target table , facing issue insert overwrite not possible in table read from.

please suggest solution this. not want write intermediate data physical table , read again.

is there property or way store final data set without keeping dependency on table read from. way, might possible overwrite table.

please suggest.

you should never overwrite table reading. can result in between data corruption , complete data loss in case of failure.

it important point out correctly implemented scd2 shouldn't never overwrite whole table , can implemented (mostly) append operation. far aware scd1 cannot efficiently implemented without mutable storage, therefore not fit spark.

Search This Blog

Enable

hive - Spark HiveContext : Insert Overwrite the same table it is read from -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

c - Why does alarm() cause fgets() to stop waiting? -