Php html parsing, I want to save parsed elements into an array -
i'm trying parse html page , accessing of tags. parsing of tags , displaying result in form of indentation according level of tags e.g. header tags h1, h2, h3 etc. now, want save resultant data (indented table of contents) array along name of tags. kindly me sort out problem.
here php code... i'm using html dom parser.
include ("simple_html_dom.php"); session_start(); error_reporting(0); $string = file_get_contents('test.php'); $tags = array(0 => '<h1', 1 => '<h2', 2 => '<h3', 3 => '<h4', 4 => '<h5', 5 => '<h6'); function parser($html, $needles = array()){ $positions = array(); foreach ($needles $needle){ $lastpos = 0; while (($lastpos = strpos($html, $needle, $lastpos))!== false) { $positions[] = $lastpos; $lastpos = $lastpos + strlen($needle); } unset($needles[0]); if(count($positions) > 0){ break; } } if(count($positions) > 0){ ($i = 0; $i < count($positions); $i++) { ?> <div class="<?php echo $i; ?>" style="padding-left: 20px; font-size: 14px;"> <?php if($i < count($positions)-1){ $temp = explode('</', substr($html, $positions[$i]+4)); $pos = strpos($temp[0], '>'); echo substr($temp[0], $pos); parser(substr($html, $positions[$i]+4, $positions[$i+1]-$positions[$i]-4), $needles); } else { $temp = explode('</', substr($html, $positions[$i]+4)); $pos = strpos($temp[0], '>'); echo substr($temp[0], $pos+1); parser(substr($html, $positions[$i]+4), $needles); } ?> </div> <?php } } else { // not found position of tag } } parser($string, $tags);
if wanted using simplexml , xpath, there shorter , more readable version try...
$xml = new simplexmlelement($string); $tags = $xml->xpath("//h1 | //h2 | //h3 | //h4"); $data = []; foreach ( $tags $tag ) { $elementdata['name'] = $tag->getname(); $elementdata['content'] = (string)$tag; $data[] = $elementdata; } print_r($data); you can see pattern in xpath - combines of elements need. use of // means find @ level , name of element want find. these combined using |, 'or' operator. expanded using same type of expression build full set of tags need.
the program loops on elements found , builds array of each element @ time. taking name , content , adding them $data array.
update: if file isn't formed xml, may have use domdocument , loadhtml. slight difference more tollerant of errors...
$string = file_get_contents("links.html"); $xml = new domdocument(); libxml_use_internal_errors(); $xml->loadhtml($string); $xp = new domxpath($xml); $tags = $xp->query("//h1 | //h2 | //h3 | //h4"); $data = []; foreach ( $tags $tag ) { $elementdata['name'] = $tag->tagname; $elementdata['content'] = $tag->nodevalue; $data[] = $elementdata; } print_r($data);
Comments
Post a Comment