regex - What is the search flow of pattern match? -
my string is
$s = "aaataatagcav";
pattern 1
$s =~m/at?.?a/g;
here t?
fails search. .?
match string second character (aaa) a
matches (aaa)
pattern 2
$s =~m/a.?t?a/g
.?
match second character. t?
fails search. results same
here doubt
$s =~m/a.?t?aa/
from beginning a
matches first character string
.?
matches 1 character match or not match string. match second character pattern 1 , pattern 2.
t?
match 1 character match or not match string.
aa
match aa
character string.
why above pattern won't match aataa
or ataa
. how search engine works.? why result aaa
there's nice easy way see regex doing:
use re 'debug';
e.g.:
#!/usr/bin/env perl use strict; use warnings; use re 'debug'; $str = 'aaataatagcav'; $str =~m/a.?t?aa/;
this print:
compiling rex "a.?t?aa" final program: 1: exact <a> (3) 3: curly {0,1} (6) 5: reg_any (0) 6: curly {0,1} (10) 8: exact <t> (0) 10: exact <aa> (12) 12: end (0) anchored "a" @ 0 floating "aa" @ 1..3 (checking floating) minlen 3 matching rex "a.?t?aa" against "aaataatagcav" intuit: trying determine minimum start position... doing 'check' fbm scan, [1..12] gave 1 found floating substr "aa" @ offset 1 (rx_origin 0)... doing 'other' fbm scan, [0..1] gave 0 found anchored substr "a" @ offset 0 (rx_origin 0)... (multiline anchor test skipped) intuit: guessed: match @ offset 0 0 <> <aaataatagc> | 1:exact <a>(3) 1 <a> <aataatagca> | 3:curly {0,1}(6) reg_any can match 1 times out of 1... 2 <aa> <ataatagcav> | 6: curly {0,1}(10) exact <t> can match 0 times out of 1... 2 <aa> <ataatagcav> | 10: exact <aa>(12) failed... failed... 1 <a> <aataatagca> | 6: curly {0,1}(10) exact <t> can match 0 times out of 1... 1 <a> <aataatagca> | 10: exact <aa>(12) 3 <aaa> <taatagcav> | 12: end(0) match successful! freeing rex: "a.?t?aa"
so answer question - matched against aaa
@ start - because have optional (.?
) means 0 valid. , optional t?
means none valid there too.
it therefore taking first substring matches target pattern. doing (by default!) consumes piece of pattern cannot match against again. take first example:
#!/usr/bin/env perl use strict; use warnings; use data::dumper; use re 'debug'; $s = 'aaataatagcav'; @matches = $s =~m/(at?.?a)/g; print dumper \@matches;
gives:
$var1 = [ 'aaa', 'aa' ];
this puts first match aaa
, second aa
string: (aaa)t(aa)tagcav
. because substring has matched, it's no longer 'available' pattern matching engine use. can if need to, if you're going down line i'd suggest thinking hard you're trying regex. can use "look around" matching.
also: it's bad form use single char variable names generally. call descriptive.
Comments
Post a Comment