java - Regex to remove email address from html -


i have html input method need remove email address. problem email address not coming inside div. split across multiple divs. find sample input below

div  class="p" id="p9" style="top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;">hello</div> div class="p" id="p10" style="top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;">.</div> div class="p" id="p11" style="top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;">p</div> <div class="p" id="p1" style="top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;">@</div> div class="p" id="p13" style="top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;">gmail</div> div class="p" id="p" style="top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;">.</div> <div class="r" style="left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;">&nbsp;</div> div class="p" id="p1" style="top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;">com</div>" 

and regex using [a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6} gives standard email format. appreciated.

edit: removed div start tag parsed text page.

this worked me.

public static void main(string[] args) {     string text = "div  class=\"p\" id=\"p9\" style=\"top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;\">hello</div>\n"             + "div class=\"p\" id=\"p10\" style=\"top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n"             + "div class=\"p\" id=\"p11\" style=\"top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;\">p</div>\n"             + "<div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;\">@</div>\n"             + "div class=\"p\" id=\"p13\" style=\"top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;\">gmail</div>\n"             + "div class=\"p\" id=\"p\" style=\"top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n"             + "<div class=\"r\" style=\"left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;\">&nbsp;</div>\n"             + "div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;\">com</div>\"";      stringbuilder sb = new stringbuilder();     string[] tokens = text.split("\n");      pattern p = pattern.compile(".*>(.*)</div.*");      (string line : tokens) {         matcher m = p.matcher(line);         if (m.matches()) {             sb.append(m.group(1));         }     }      system.out.println(sb.tostring()); } 

edit: may need adjust pattern if there more divs match on div's email.


Comments

Popular posts from this blog

resizing Telegram inline keyboard -

command line - How can a Python program background itself? -

php - "cURL error 28: Resolving timed out" on Wordpress on Azure App Service on Linux -