scala - How to count the number of occurrences of each distinct element in a column of a spark dataframe -

June 15, 2013

suppose have dataframe in following format:

-------------------------------    col1    |  col2    | col3 ------------------------------- value11    | value21  | value31 value12    | value22  | value32 value11    | value22  | value33 value12    | value21  | value33

here, column col1 has value11, value12 distinct value. want total number of occurrences of each distinct value value11, value12 of column col1.

you can groupby col1, count:

import org.apache.spark.sql.functions.count  df.groupby("col1").agg(count("col1")).show +-------+-----------+ |   col1|count(col1)| +-------+-----------+ |value12|          2| |value11|          2| +-------+-----------+

in case want know how many distinct values there in col1, can use countdistinct:

import org.apache.spark.sql.functions.countdistinct  df.agg(countdistinct("col1").as("n_distinct")).show +----------+ |n_distinct| +----------+ |         2| +----------+

Search This Blog

Enable

scala - How to count the number of occurrences of each distinct element in a column of a spark dataframe -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

javascript - How to bind ViewModel Store to View? -

c - Why does alarm() cause fgets() to stop waiting? -