google cloud dataflow - How to sample per key in a PCollection using different sampling rates per key? -


i going through , switching few spark jobs on cloud dataflow/apache beam 2.0.

one of these jobs uses pairrdd.samplebykey(samplerates) samplerates map key match key in pairrdd , value rate @ key should sampled.

i've found beam has sample.fixedsizeperkey(samplecount) seems closest equivalent. however, samples @ fixed amount (as method name implies), every key.

i've dug sample class bit see if can modified accept map , different count per key, can't find way access key inside pcollection<kv<k,v>.

how can access key inside pcollection in ptransform in order this?


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -