scala - ReadFile and keep order consistent -
i reading logfiles , every line provides timestamp. in logfiles order correct there can multiple instances of same timestamp. if @ little example down below want make sure whenever work dataset start looked @ before recstart. (this not reflect actual data / status possibilities)
system1;2017-07-04t08:33:26;start;start execution system1;2017-07-04t08:33:26;recstart;start recorder i read files spark.read.textfile(file.path)
i read monotonically_increasing_id not guarantee ids sequential.
read groupby not bound keep order. same partitionby? if partitionby keeps order correct following command should solve problems
val ds2 = ds .withcolumn("rn",row_number.over(window.partitionby($"systemname").orderby($"timestamp")
i thought adding milliseconds of current time timestamp while reading file doesn't seem save because of parallelism.
is there way make sure order same in text files?
Comments
Post a Comment