scala - How to count the number of occurrences of each distinct element in a column of a spark dataframe -
suppose have dataframe in following format:
------------------------------- col1 | col2 | col3 ------------------------------- value11 | value21 | value31 value12 | value22 | value32 value11 | value22 | value33 value12 | value21 | value33 here, column col1 has value11, value12 distinct value. want total number of occurrences of each distinct value value11, value12 of column col1.
you can groupby col1, count:
import org.apache.spark.sql.functions.count df.groupby("col1").agg(count("col1")).show +-------+-----------+ | col1|count(col1)| +-------+-----------+ |value12| 2| |value11| 2| +-------+-----------+ in case want know how many distinct values there in col1, can use countdistinct:
import org.apache.spark.sql.functions.countdistinct df.agg(countdistinct("col1").as("n_distinct")).show +----------+ |n_distinct| +----------+ | 2| +----------+
Comments
Post a Comment