python - Regular Expression To Find C Style Comments -


i trying write regular expression find c style headers in java source files. @ present time experimenting python.

here source code:

import re  text = """/*        * copyright blah blah blha blah         * blah blah blah blah         * 2008 blah blah blah @ org        */"""  print print "i guess program printed correct thing."  pattern = re.compile("^/.+/$")  print "-----------" print pattern   pos = 0 while true:     match = pattern.search(text, pos)     if not match:         break     s = match.start()     e = match.end()     print '   %2d : %2d = "%s"' % (s, e-1, text[s:e])     pos = e  

i trying write simple expression looks between forward slash , forward slash. can make regular expression more complicated later.

does know going wrong? using forward slash dot meta-character, plus symbol 1 or more things, , dollar symbol end.

i don't think should anchor (using '^' , '$') match.

secondly, think regex should r"/[^/]*/" matches (portion of) string starts slash, followed 0 or more non-slash characters , terminates slash.

to wit:

>>> import re                                                                                                                            >>> text = """foo bar baz                                                                                                      ... /*                                                                                   ...        * copyright blah blah blha blah                                                                                               ...        * blah blah blah blah                                                                                                         ...        * 2008 blah blah blah @ org                                                                                                   ...        */"""                                                                                                                           >>> rx = re.compile(r"/[^/]*/", re.dotall)                                                                                               >>> mo = rx.search(text)                                                                                                                 >>> text[mo.start(): mo.end()]                                                                                                           '/*\n       * copyright blah blah blha blah \n       * blah blah blah blah \n       * 2008 blah blah blah @ org\n       */' 

note comment not start start of string regex finds nicely.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -