html - RegEx match open tags except XHTML self-contained tags -


i need match of these opening tags:

<p> <a href="foo"> 

but not these:

<br /> <hr class="foo" /> 

i came , wanted make sure i've got right. capturing a-z.

<([a-z]+) *[^/]*?> 

i believe says:

  • find less-than, then
  • find (and capture) a-z 1 or more times, then
  • find 0 or more spaces, then
  • find character 0 or more times, greedy, except /, then
  • find greater-than

do have right? , more importantly, think?

you can't parse [x]html regex. because html can't parsed regex. regex not tool can used correctly parse html. have answered in html-and-regex questions here many times before, use of regex not allow consume html. regular expressions tool insufficiently sophisticated understand constructs employed html. html not regular language , hence cannot parsed regular expressions. regex queries not equipped break down html meaningful parts. many times not getting me. enhanced irregular regular expressions used perl not task of parsing html. never make me crack. html language of sufficient complexity cannot parsed regular expressions. jon skeet cannot parse html using regular expressions. every time attempt parse html regular expressions, unholy child weeps blood of virgins, , russian hackers pwn webapp. parsing html regex summons tainted souls realm of living. html , regex go love, marriage, , ritual infanticide. <center> cannot hold late. force of regex , html in same conceptual space destroy mind watery putty. if parse html regex giving in them , blasphemous ways doom inhuman toil 1 name cannot expressed in basic multilingual plane, comes. html-plus-regexp liquify n​erves of sentient whilst observe, psyche withering in onslaught of horror. rege̿̔̉x-based html parsers cancer killing stackoverflow it late late cannot saved trangession of chi͡ld ensures regex consume living tissue (except html cannot, prophesied) dear lord how can survive scourge using regex parse html has doomed humanity eternity of dread torture , security holes using regex tool process html establishes breach between world , dread realm of c͒ͪo͛ͫrrupt entities (like sgml entities, more corrupt) mere glimpse of world of reg​ex parsers html ins​tantly transport programmer's consciousness into world of ceaseless screaming, comes, pestilent slithy regex-infection wil​l devour ht​ml parser, application , existence time visual basic worse he comes comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying enli̍̈́̂̈́ghtenment, html tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, song of re̸gular exp​ression parsing will exti​nguish voices of mor​tal man sp​here can see can see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ beautiful t​he final snuffing of lie​s of man loś͖̩͇̗̪̏̈́t all i​s lost the pon̷y comes c̶̮omes comes the ich​or permeates all face face ᵒh god no no noo̼o​o nΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ za̡͊͠͝lgΌ isͮ̂҉̯͈͕̹̘̱ to͇̹̺ͅƝ̴ȳ̳ th̘ë͖́̉ ͠p̯͍̭o̚​n̐y̡ h̸̡̪̯ͨ͊̽̅̾̎ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬc̷̙̲̝͖ͭ̏ͥͮ͟oͮ͏̮̪̝͍m̲̖͊̒ͪͩͬ̚̚͜ȇ̴̟̟͙̞ͩ͌͝s̨̥̫͎̭ͯ̿̔̀ͅ


have tried using xml parser instead?


moderator's note

this post locked prevent inappropriate edits content. post looks supposed - there no problems content. please not flag our attention.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -