Having multiple Regex in Java 8 Stream to read text from Line -


i want have more 1 regex below, how can add flatmap iterator put matching values of line list during single stream read?

static string retimestamp="((?:2|1)\\d{3}(?:-|\\/)(?:(?:0[1-9])|(?:1[0-2]))(?:-|\\/)(?:(?:0[1-9])|(?:[1-2][0-9])|(?:3[0-1]))(?:t|\\s)(?:(?:[0-1][0-9])|(?:2[0-3])):(?:[0-5][0-9]):(?:[0-5][0-9]))"; static string rehostname="host=(\\\")((?:[a-z][a-z\\.\\d\\-]+)\\.(?:[a-z][a-z\\-]+))(?![\\w\\.])(\\\")"; static string reservicetime="service=(\\d+)ms";  private static final patternstreamer quoteregex1 = new patternstreamer(retimestamp); private static final patternstreamer quoteregex2 = new patternstreamer(rehostname); private static final patternstreamer quoteregex3 = new patternstreamer(reservicetime);   public static void main(string[] args) throws exception {     string infilename = "sample.log";     string outfilename = "sample_output.log";     try (stream<string> stream = files.lines(paths.get(infilename))) {         //stream.foreach(system.out::println);         list<string> timestamp = stream.flatmap(quoteregex1::results)                                     .map(r -> r.group(1))                                     .collect(collectors.tolist());          timestamp.foreach(system.out::println);         //files.write(paths.get(outfilename), dataset);     } } 

this question extension match pattern , write stream file using java 8 stream

you can concatenate streams:

string infilename = "sample.log"; string outfilename = "sample_output.log"; try (stream<string> stream = files.lines(paths.get(infilename))) {     list<string> timestamp = stream         .flatmap(s -> stream.concat(quoteregex1.results(s),                         stream.concat(quoteregex2.results(s), quoteregex3.results(s))))         .map(r -> r.group(1))         .collect(collectors.tolist());      timestamp.foreach(system.out::println);     //files.write(paths.get(outfilename), dataset); } 

but note perform 3 individual searches through each line, might not imply lower performance, order of matches within 1 line not reflect actual occurrence. doesn’t seem issue patterns, individual searches imply possible overlapping matches.

the patternstreamer of linked answer greedily collects matches of 1 string arraylist before creating stream. spliterator based solution in this answer preferable.

since numerical group references preclude combining patterns in (pattern1|pattern2|pattern3) manner, true streaming on matches of multiple different patterns bit more elaborated:

public final class multipatternspliterator extends spliterators.abstractspliterator<matchresult> {     public static stream<matchresult> matches(string input, string... patterns) {         return matches(input, arrays.stream(patterns)                 .map(pattern::compile).toarray(pattern[]::new));     }     public static stream<matchresult> matches(string input, pattern... patterns) {         return streamsupport.stream(new multipatternspliterator(patterns,input), false);     }     private pattern[] pattern;     private string input;     private int pos;     private priorityqueue<matcher> pendingmatches;      multipatternspliterator(pattern[] p, string inputstring) {         super(inputstring.length(), ordered|nonnull);         pattern = p;         input = inputstring;     }      @override     public boolean tryadvance(consumer<? super matchresult> action) {         if(pendingmatches == null) {             pendingmatches = new priorityqueue<>(                 pattern.length, comparator.comparingint(matchresult::start));             for(pattern p: pattern) {                 matcher m = p.matcher(input);                 if(m.find()) pendingmatches.add(m);             }         }         matchresult mr = null;         {             matcher m = pendingmatches.poll();             if(m == null) return false;             if(m.start() >= pos) {                 mr = m.tomatchresult();                 pos = mr.end();             }             if(m.region(pos, m.regionend()).find()) pendingmatches.add(m);         } while(mr == null);         action.accept(mr);         return true;     } } 

this facility allows match multiple pattern in (pattern1|pattern2|pattern3) fashion while still having original groups of each pattern. when searching hell , llo in hello, find hell , not llo. difference there no guaranteed order if more 1 pattern matches @ same position.

this can used like

pattern[] p = stream.of(retimestamp, rehostname, reservicetime)         .map(pattern::compile)         .toarray(pattern[]::new); try (stream<string> stream = files.lines(paths.get(infilename))) {     list<string> timestamp = stream         .flatmap(s -> multipatternspliterator.matches(s, p))         .map(r -> r.group(1))         .collect(collectors.tolist());      timestamp.foreach(system.out::println);     //files.write(paths.get(outfilename), dataset); } 

while overloaded method allow use multipatternspliterator.matches(s, retimestamp, rehostname, reservicetime) using pattern strings create stream, should avoided within flatmap operation recompile every regex every input line. that’s why code above compiles patterns array first. original code instantiating patternstreamers outside stream operation.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -