google cloud dataflow - How to sample per key in a PCollection using different sampling rates per key? -

April 15, 2014

i going through , switching few spark jobs on cloud dataflow/apache beam 2.0.

one of these jobs uses pairrdd.samplebykey(samplerates) samplerates map key match key in pairrdd , value rate @ key should sampled.

i've found beam has sample.fixedsizeperkey(samplecount) seems closest equivalent. however, samples @ fixed amount (as method name implies), every key.

i've dug sample class bit see if can modified accept map , different count per key, can't find way access key inside pcollection<kv<k,v>.

how can access key inside pcollection in ptransform in order this?

Search This Blog

Enable

google cloud dataflow - How to sample per key in a PCollection using different sampling rates per key? -

Comments

Post a Comment

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

android - How to prevent keyboard from closing when bottom dialog is open? -