hive - Spark HiveContext : Insert Overwrite the same table it is read from -


i want apply scd1 , scd2 using pyspark in hivecontext. in approach, reading incremental data , target table. after reading, joining them upsert approach. doing registertemptable on source dataframes. trying write final dataset target table , facing issue insert overwrite not possible in table read from.

please suggest solution this. not want write intermediate data physical table , read again.

is there property or way store final data set without keeping dependency on table read from. way, might possible overwrite table.

please suggest.

you should never overwrite table reading. can result in between data corruption , complete data loss in case of failure.

it important point out correctly implemented scd2 shouldn't never overwrite whole table , can implemented (mostly) append operation. far aware scd1 cannot efficiently implemented without mutable storage, therefore not fit spark.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -