scala - update dataframe if not equal spark dataframe -


i have 2 dataframes , in want know whether different based on column key other wisei update 1 different become equal

      val tmp_site = spark.load("jdbc", map("url" -> "jdbc:oracle:thin:system/maher@//localhost:1521/xe", "dbtable" -> "iptech.tmp_site"))       .withcolumn("site",'site.cast(longtype))        val local_pos = spark.load("jdbc", map("url" -> url, "dbtable" -> "pos")).select("id","name")      tmp_site.printschema()     local_pos.printschema()        val join = tmp_site.join(local_pos, 'site === 'id, "inner")   root  |-- site: long (nullable = true)  |-- libelle: string (nullable = false)  root  |-- id: long (nullable = false)  |-- name: string (nullable = true) 

the result of joining

  |id |name                  |site|libelle               | +---+----------------------+----+----------------------+ |51 |ezzahra               |51  |ezzahra               | |7  |benikhalled           |7   |benikhalled           | |15 |kram                  |15  |kram                  | |54 |el mourouj            |54  |el mourouj            | |11 |le bardo              |11  |le bardo              | |29 |mini m ksar said      |29  |mini m ksar said      | |69 |zaghouan              |69  |zaghouan              | |42 |beb el khadhra        |42  |beb el khadhra        | |73 |zaouit kontech        |73  |zaouit kontech        | |87 |aouina                |87  |aouina                | |64 |sousse i            |64  |sousse i            | |3  |sahra confort : korba |3   |sahra confort : korba | |34 |soukra square         |34  |soukra square         | |59 |sahra confort : zarzis|59  |sahra confort : zarzis| |8  |jerba                 |8   |jerba                 | |22 |moknine               |22  |moknine               | |28 |rdayef                |28  |rdayef                | |85 |monastir absorba      |85  |monastir absorba      | |16 |bardo hanaya          |16  |bardo hanaya          | |35 |mini m agba           |35  |mini m agba           | +---+----------------------+----+----------------------+ 

i did

val temp = join.withcolumn("changes", when($"libelle" === $"name", lit("nothing")).otherwise("need update")) got

|id |name                  |site|libelle               |changes       | +---+----------------------+----+----------------------+--------------+ |51 |ezzahra               |51  |ezzahra               |nothing       | |7  |benikhalled           |7   |benikhalled           |nothing       | |15 |kram                  |15  |kram                  |nothing       | |54 |el mourouj            |54  |el mourouj            |nothing       | |11 |le bardo              |11  |le bardo              |nothing       | |29 |mini m ksar said      |29  |mini m ksar said      |nothing       | |69 |zaghouan              |69  |zaghouan              |nothing       | |42 |beb el khadhra        |42  |beb el khadhra        |nothing       | |73 |zaouit kontech        |73  |zaouit kontech        |need update| |87 |aouina                |87  |aouina                |nothing       | |64 |sousse i            |64  |sousse i            |nothing       | |3  |sahra confort : korba |3   |sahra confort : korba |nothing       | |34 |soukra square         |34  |soukra square         |nothing       | |59 |sahra confort : zarzis|59  |sahra confort : zarzis|nothing       | |8  |jerba                 |8   |jerba                 |nothing       | |22 |moknine               |22  |moknine               |need update| |28 |rdayef                |28  |rdayef                |nothing       | |85 |monastir absorba      |85  |monastir absorba      |nothing       | |16 |bardo hanaya          |16  |bardo hanaya          |nothing       | |35 |mini m agba           |35  |mini m agba           |nothing       | +---+----------------------+----+----------------------+--------------+ 

i not why said need updated because same. altough should nothing of them , beause equal . should appreciated

once have dataframe, quite easy play columns , rows.

so have following dataframe after join

+----+---------------------+----+---------------------+ |site|libelle              |id  |name                 | +----+---------------------+----+---------------------+ |48  |mini m boumhel       |48  |mini m boumhel       | |67  |lac                  |67  |lac                  | |992 |test2                |992 |test                 | |44  |kairouan             |44  |kairouan             | |61  |tunis                |61  |tunis                | |9001|monoprix             |9001|monoprix             | |3   |sahra confort : korba|3   |sahra confort : korba| |37  |mini m borj lozir    |37  |mini m borj lozir    | |83  |jendouba             |83  |jendouba             | |12  |bigro                |12  |bigro                | +----+---------------------+----+---------------------+ 

you can create column logic have written using when function

import org.apache.spark.sql.functions._ val temp = join.withcolumn("changes", when($"libelle" === $"name", lit("nothing")).otherwise("need update")) 

temp dataframe be

+----+---------------------+----+---------------------+--------------+ |site|libelle              |id  |name                 |changes       | +----+---------------------+----+---------------------+--------------+ |48  |mini m boumhel       |48  |mini m boumhel       |nothing       | |67  |lac                  |67  |lac                  |nothing       | |992 |test2                |992 |test                 |need update| |44  |kairouan             |44  |kairouan             |nothing       | |61  |tunis                |61  |tunis                |nothing       | |9001|monoprix             |9001|monoprix             |nothing       | |3   |sahra confort : korba|3   |sahra confort : korba|nothing       | |37  |mini m borj lozir    |37  |mini m borj lozir    |nothing       | |83  |jendouba             |83  |jendouba             |nothing       | |12  |bigro                |12  |bigro                |nothing       | +----+---------------------+----+---------------------+--------------+ 

now can use filter method on dataframe

temp.filter($"changes" === "need update").show(false) 

which should give

+----+-------+---+----+--------------+ |site|libelle|id |name|changes       | +----+-------+---+----+--------------+ |992 |test2  |992|test|need update| +----+-------+---+----+--------------+ 

you need play columns using select, groupby, aggregations, filters , other inbuilt functions or using udf functions etc etc. can convert rdd , tuples did in example.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -