Having multiple Regex in Java 8 Stream to read text from Line -
i want have more 1 regex below, how can add flatmap iterator put matching values of line list during single stream read?
static string retimestamp="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; static string rehostname="host=(\\\")((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])(\\\")"; static string reservicetime="service=(\\d+)ms"; private static final patternstreamer quoteregex1 = new patternstreamer(retimestamp); private static final patternstreamer quoteregex2 = new patternstreamer(rehostname); private static final patternstreamer quoteregex3 = new patternstreamer(reservicetime); public static void main(string[] args) throws exception { string infilename = "sample.log"; string outfilename = "sample_output.log"; try (stream<string> stream = files.lines(paths.get(infilename))) { //stream.foreach(system.out::println); list<string> timestamp = stream.flatmap(quoteregex1::results) .map(r -> r.group(1)) .collect(collectors.tolist()); timestamp.foreach(system.out::println); //files.write(paths.get(outfilename), dataset); } }
this question extension match pattern , write stream file using java 8 stream
you can concatenate streams:
string infilename = "sample.log"; string outfilename = "sample_output.log"; try (stream<string> stream = files.lines(paths.get(infilename))) { list<string> timestamp = stream .flatmap(s -> stream.concat(quoteregex1.results(s), stream.concat(quoteregex2.results(s), quoteregex3.results(s)))) .map(r -> r.group(1)) .collect(collectors.tolist()); timestamp.foreach(system.out::println); //files.write(paths.get(outfilename), dataset); }
but note perform 3 individual searches through each line, might not imply lower performance, order of matches within 1 line not reflect actual occurrence. doesn’t seem issue patterns, individual searches imply possible overlapping matches.
the patternstreamer
of linked answer greedily collects matches of 1 string arraylist
before creating stream. spliterator
based solution in this answer preferable.
since numerical group references preclude combining patterns in (pattern1|pattern2|pattern3)
manner, true streaming on matches of multiple different patterns bit more elaborated:
public final class multipatternspliterator extends spliterators.abstractspliterator<matchresult> { public static stream<matchresult> matches(string input, string... patterns) { return matches(input, arrays.stream(patterns) .map(pattern::compile).toarray(pattern[]::new)); } public static stream<matchresult> matches(string input, pattern... patterns) { return streamsupport.stream(new multipatternspliterator(patterns,input), false); } private pattern[] pattern; private string input; private int pos; private priorityqueue<matcher> pendingmatches; multipatternspliterator(pattern[] p, string inputstring) { super(inputstring.length(), ordered|nonnull); pattern = p; input = inputstring; } @override public boolean tryadvance(consumer<? super matchresult> action) { if(pendingmatches == null) { pendingmatches = new priorityqueue<>( pattern.length, comparator.comparingint(matchresult::start)); for(pattern p: pattern) { matcher m = p.matcher(input); if(m.find()) pendingmatches.add(m); } } matchresult mr = null; { matcher m = pendingmatches.poll(); if(m == null) return false; if(m.start() >= pos) { mr = m.tomatchresult(); pos = mr.end(); } if(m.region(pos, m.regionend()).find()) pendingmatches.add(m); } while(mr == null); action.accept(mr); return true; } }
this facility allows match multiple pattern in (pattern1|pattern2|pattern3)
fashion while still having original groups of each pattern. when searching hell
, llo
in hello
, find hell
, not llo
. difference there no guaranteed order if more 1 pattern matches @ same position.
this can used like
pattern[] p = stream.of(retimestamp, rehostname, reservicetime) .map(pattern::compile) .toarray(pattern[]::new); try (stream<string> stream = files.lines(paths.get(infilename))) { list<string> timestamp = stream .flatmap(s -> multipatternspliterator.matches(s, p)) .map(r -> r.group(1)) .collect(collectors.tolist()); timestamp.foreach(system.out::println); //files.write(paths.get(outfilename), dataset); }
while overloaded method allow use multipatternspliterator.matches(s, retimestamp, rehostname, reservicetime)
using pattern strings create stream, should avoided within flatmap
operation recompile every regex every input line. that’s why code above compiles patterns array first. original code instantiating patternstreamer
s outside stream operation.
Comments
Post a Comment