regex - What is the search flow of pattern match? -


my string is

$s = "aaataatagcav"; 

pattern 1

$s =~m/at?.?a/g;  

here t? fails search. .? match string second character (aaa) a matches (aaa)

pattern 2

$s =~m/a.?t?a/g 

.? match second character. t? fails search. results same

here doubt

$s =~m/a.?t?aa/ 

from beginning a matches first character string

.? matches 1 character match or not match string. match second character pattern 1 , pattern 2.

t? match 1 character match or not match string.

aa match aa character string.

why above pattern won't match aataa or ataa. how search engine works.? why result aaa

there's nice easy way see regex doing:

 use re 'debug'; 

e.g.:

#!/usr/bin/env perl  use strict; use warnings;  use re 'debug';  $str = 'aaataatagcav';     $str =~m/a.?t?aa/; 

this print:

compiling rex "a.?t?aa" final program:    1: exact <a> (3)    3: curly {0,1} (6)    5:   reg_any (0)    6: curly {0,1} (10)    8:   exact <t> (0)   10: exact <aa> (12)   12: end (0) anchored "a" @ 0 floating "aa" @ 1..3 (checking floating) minlen 3  matching rex "a.?t?aa" against "aaataatagcav" intuit: trying determine minimum start position...   doing 'check' fbm scan, [1..12] gave 1   found floating substr "aa" @ offset 1 (rx_origin 0)...   doing 'other' fbm scan, [0..1] gave 0   found anchored substr "a" @ offset 0 (rx_origin 0)...   (multiline anchor test skipped) intuit: guessed: match @ offset 0    0 <> <aaataatagc>         |  1:exact <a>(3)    1 <a> <aataatagca>        |  3:curly {0,1}(6)                                   reg_any can match 1 times out of 1...    2 <aa> <ataatagcav>       |  6:  curly {0,1}(10)                                     exact <t> can match 0 times out of 1...    2 <aa> <ataatagcav>       | 10:    exact <aa>(12)                                       failed...                                     failed...    1 <a> <aataatagca>        |  6:  curly {0,1}(10)                                     exact <t> can match 0 times out of 1...    1 <a> <aataatagca>        | 10:    exact <aa>(12)    3 <aaa> <taatagcav>       | 12:    end(0) match successful! freeing rex: "a.?t?aa" 

so answer question - matched against aaa @ start - because have optional (.?) means 0 valid. , optional t? means none valid there too.

it therefore taking first substring matches target pattern. doing (by default!) consumes piece of pattern cannot match against again. take first example:

#!/usr/bin/env perl  use strict; use warnings; use data::dumper;  use re 'debug';  $s = 'aaataatagcav'; @matches = $s =~m/(at?.?a)/g;  print dumper \@matches;  

gives:

$var1 = [           'aaa',           'aa'         ]; 

this puts first match aaa , second aa string: (aaa)t(aa)tagcav. because substring has matched, it's no longer 'available' pattern matching engine use. can if need to, if you're going down line i'd suggest thinking hard you're trying regex. can use "look around" matching.

also: it's bad form use single char variable names generally. call descriptive.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -