java - Regex to remove email address from html -
i have html input method need remove email address. problem email address not coming inside div. split across multiple divs. find sample input below
div class="p" id="p9" style="top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;">hello</div> div class="p" id="p10" style="top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;">.</div> div class="p" id="p11" style="top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;">p</div> <div class="p" id="p1" style="top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;">@</div> div class="p" id="p13" style="top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;">gmail</div> div class="p" id="p" style="top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;">.</div> <div class="r" style="left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;"> </div> div class="p" id="p1" style="top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;">com</div>"
and regex using [a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}
gives standard email format. appreciated.
edit: removed div start tag parsed text page.
this worked me.
public static void main(string[] args) { string text = "div class=\"p\" id=\"p9\" style=\"top:89.17999pt;left:430.7740pt;font-family:times new roman;font-size:1.0pt;\">hello</div>\n" + "div class=\"p\" id=\"p10\" style=\"top:89.17999pt;left:484.100pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n" + "div class=\"p\" id=\"p11\" style=\"top:89.17999pt;left:487.100pt;font-family:times new roman;font-size:1.0pt;\">p</div>\n" + "<div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:493.9300pt;font-family:times new roman;font-size:1.0pt;\">@</div>\n" + "div class=\"p\" id=\"p13\" style=\"top:89.17999pt;left:0.09003pt;font-family:times new roman;font-size:1.0pt;\">gmail</div>\n" + "div class=\"p\" id=\"p\" style=\"top:89.17999pt;left:33.18pt;font-family:times new roman;font-size:1.0pt;\">.</div>\n" + "<div class=\"r\" style=\"left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;\"> </div>\n" + "div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:3.18pt;font-family:times new roman;font-size:1.0pt;\">com</div>\""; stringbuilder sb = new stringbuilder(); string[] tokens = text.split("\n"); pattern p = pattern.compile(".*>(.*)</div.*"); (string line : tokens) { matcher m = p.matcher(line); if (m.matches()) { sb.append(m.group(1)); } } system.out.println(sb.tostring()); }
edit: may need adjust pattern if there more divs match on div's email.
Comments
Post a Comment