Restructure CSV data with Notepad++, Regex -
i have csv file following headers , (sample) data:
stopname,routename,travel_direction,latitude,longitude streeta @ streetb,1 namea,directiona,lat,long streetc @ streetd,1 namea,directiona,lat,long ... streete @ streetf,1 namea,directionb,lat,long streetg @ streeth,1 namea,directionb,lat,long ... streeti @ streetj,2 nameb,directionc,lat,long streetk @ streetl,2 nameb,directionc,lat,long ... streetm @ streetn,2 nameb,directiond,lat,long streeto @ streetp,2 nameb,directiond,lat,long . . .
i wanting use regex (currently in notepad++) following results:
1 namea - directiona=[[streeta @ streetb,[lat,long]], [streetc @ streetd,[lat,long]], ...] 1 namea - directionb=[[streetd @ streete,[lat,long]], [streetf @ streetg,[lat,long]], ...] 2 nameb - directionc=[[streeth @ streeti,[lat,long]], [streetj @ streetk,[lat,long]], ...] 2 nameb - directiond=[[streetl @ streetm,[lat,long]], [streetn @ streeto,[lat,long]], ...] . . .
with regex , substitution,
rgx: ^([^,]*),([^,]*),([^,]*),(.*) sub: $2 - $3=[$1,[\4]] demo: https://regex101.com/r/gs9hd6/1
i have gotten far:
1 namea - directiona=[streeta @ streetb,[lat,long]] 1 namea - directiona=[streetc @ streetd,[lat,long]] 1 namea - directionb=[streete @ streetf,[lat,long]] 1 namea - directionb=[streetg @ streeth,[lat,long]] 2 nameb - directionc=[streeti @ streetj,[lat,long]] 2 nameb - directionc=[streetk @ streetl,[lat,long]] 2 nameb - directiond=[streetm @ streetn,[lat,long]] 2 nameb - directiond=[streeto @ streetp,[lat,long]]
in new regex, tried splitting above result on "=", didn't know go there.
i think 1 way desired results keep first unique instance of what's before "=", replace new line "," , enclose [..] make array form.
edit: there 10k stops (total), 100 unique routes.
edit 2: (maybe asking many changes now)
for first regex:
- what if want use "\n" instead of "="?
at beginning of 2nd regex replacement,
- what if have routename , stopname columns, this:
1 namea - directiona=[streeta @ streetb, ...]
? - similarly, if have routename , coordinates, this:
1 namea - directiona=[[lat,long]]
?
steps
1. first replacement:
- find what:
^([^,]*),([^,]*),([^,]*),(.*)
- replace with:
\2 - \3=[[\1,[\4]]]
- replace all
2. second replacement:
- find what:
^[\s\s]*?^([^][]*=)\[\[.*\]\]\k\]\r\1\[(.*)\]$
- replace with:
, \2]
- replace all
3. repeat step 2 until there no more occurences.
- this means if there 100 instances (stops) same key (route - direction pair), have click replace all 7 times (
ceiling(log2(n))
).
description
i modified regex in step 1 add pair of brackets enclose whole set.
for step 2, finds pair of lines same direction, appending last previous one.
^[\s\s]*?^([^][]*=) #group 1: captures "1 namea - dira=" \[\[.*\]\] #matches set of stops - "[[sta @ stb,[lat,long]], ..." \k #keeps text matched far out of match \]\r #closing "]" , newline \1 #match next line (if same route) \[(.*)\]$ #and capture stop (group 2)
Comments
Post a Comment