java - Regex to remove email address from html -
i have html input method need remove email address. problem email address not coming inside div. split across multiple divs. find sample input below
div  class="p" id="p9" style="top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;">hello</div> div class="p" id="p10" style="top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;">.</div> div class="p" id="p11" style="top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;">p</div> <div class="p" id="p1" style="top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;">@</div> div class="p" id="p13" style="top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;">gmail</div> div class="p" id="p" style="top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;">.</div> <div class="r" style="left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;"> </div> div class="p" id="p1" style="top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;">com</div>"   and regex using     [a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6} gives standard email format. appreciated.
edit: removed div start tag parsed text page.
this worked me.
public static void main(string[] args) {     string text = "div  class=\"p\" id=\"p9\" style=\"top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;\">hello</div>\n"             + "div class=\"p\" id=\"p10\" style=\"top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n"             + "div class=\"p\" id=\"p11\" style=\"top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;\">p</div>\n"             + "<div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;\">@</div>\n"             + "div class=\"p\" id=\"p13\" style=\"top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;\">gmail</div>\n"             + "div class=\"p\" id=\"p\" style=\"top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n"             + "<div class=\"r\" style=\"left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;\"> </div>\n"             + "div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;\">com</div>\"";      stringbuilder sb = new stringbuilder();     string[] tokens = text.split("\n");      pattern p = pattern.compile(".*>(.*)</div.*");      (string line : tokens) {         matcher m = p.matcher(line);         if (m.matches()) {             sb.append(m.group(1));         }     }      system.out.println(sb.tostring()); }   edit: may need adjust pattern if there more divs match on div's email.
Comments
Post a Comment