scala - How to count the number of occurrences of each distinct element in a column of a spark dataframe -
suppose have dataframe
in following format:
------------------------------- col1 | col2 | col3 ------------------------------- value11 | value21 | value31 value12 | value22 | value32 value11 | value22 | value33 value12 | value21 | value33
here, column col1
has value11, value12
distinct value. want total number of occurrences of each distinct value value11, value12
of column col1
.
you can groupby
col1, count
:
import org.apache.spark.sql.functions.count df.groupby("col1").agg(count("col1")).show +-------+-----------+ | col1|count(col1)| +-------+-----------+ |value12| 2| |value11| 2| +-------+-----------+
in case want know how many distinct values there in col1, can use countdistinct
:
import org.apache.spark.sql.functions.countdistinct df.agg(countdistinct("col1").as("n_distinct")).show +----------+ |n_distinct| +----------+ | 2| +----------+
Comments
Post a Comment