scala - ReadFile and keep order consistent -


i reading logfiles , every line provides timestamp. in logfiles order correct there can multiple instances of same timestamp. if @ little example down below want make sure whenever work dataset start looked @ before recstart. (this not reflect actual data / status possibilities)

system1;2017-07-04t08:33:26;start;start execution system1;2017-07-04t08:33:26;recstart;start recorder 

i read files spark.read.textfile(file.path)

i read monotonically_increasing_id not guarantee ids sequential.
read groupby not bound keep order. same partitionby? if partitionby keeps order correct following command should solve problems
val ds2 = ds .withcolumn("rn",row_number.over(window.partitionby($"systemname").orderby($"timestamp")

i thought adding milliseconds of current time timestamp while reading file doesn't seem save because of parallelism.

is there way make sure order same in text files?


Comments

Popular posts from this blog

Sort a complex associative array in PHP -

vb.net - How to ignore if a cell is empty nothing -

recursion - Can every recursive algorithm be improved with dynamic programming? -