How to Modify HTML in a PHP WordPress Plugin Using The New Tag Processor API

Adam Zieliński Avatar

Posted by

on

Adjusting the HTML markup in PHP has always been a struggle, but WordPress 6.2 makes it a breeze with the WP_HTML_Tag_Processor API.

For example, here’s how you can add an alt="" attribute to an <img /> tag:

The PHP code snippets in this post are live!
You can edit and re-run them to your heart’s content.

Loading live code snippet… Here’s a static version:
<?php
$html 
'<img src="/husky.jpg">';
$p = new WP_HTML_Tag_Processor$html );
if ( 
$p->next_tag() ) {
    
$p->set_attribute'alt''Husky in the snow' );
}
echo 
$p->get_updated_html();

If you’ve ever struggled to add an HTML attribute using regular expressions, you know how big of an improvement this is! In fact, Tag Processor was born out of this exact struggle.

Last year, Dennis Snell and I tried to add a CSS class to the first <h1>, <h2>, … tag in every WordPress heading block. However, the hours spent on crafting the perfect regular expression were largely wasted. At that point, I really wanted to use an HTML parser. No existing library was suitable so we rolled up our sleeves and started building a new one.

Today, WP_HTML_Tag_Processor is a part of the upcoming WordPress 6.2 release, and this post will show you how to use it. Enjoy!

Tag Processor is linear and reads one tag at a time

Tag Processor sees HTML as a list of tags, not as a document tree. It does not understand what a child or a parent is. It only understands the next tag as read from left to right:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1><p></p></h1><div></div>';
$p = new WP_HTML_Tag_Processor$html );

while(
$p->next_tag()) {
   echo 
$p->get_tag()."\n";
}

While this is limiting, it also makes tag processor extremely fast and memory-efficient. There is no virtual document tree, preemptive parsing, or backtracking. The tag processor has a light footprint because it does not do anything you don’t specifically request.

HTML operations only affect the selected tag

To use Tag Processor provides methods like get_tag() or set_attribute($name, $value), you first need to select a target tag. No tag is selected at first. The Tag Processor will only read the first tag once you call $p->next_tag():

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1></h1><p></p>';
$p = new WP_HTML_Tag_Processor$html );

// No tag is selected until the
// first $p->next_tag() call:
var_dump($p->get_tag());

$p->next_tag();
echo 
$p->get_tag()."\n";

To select the p tag you’ll need to call $p->next_tag() again. Go ahead and paste this PHP snippet at the bottom of the code editor above:

$p->next_tag();
echo $p->get_tag()."\n";

So far so good!

Checking whether the next tag exists

Suppose you call $p->next_tag() too many times and go past the final <p></p>. There will be no errors, but the selected tag will be null:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1></h1><p></p>';
$p = new WP_HTML_Tag_Processor$html );

$p->next_tag();
$p->next_tag();
$p->next_tag();

// No tag is selected once we go past
// the last tag:
var_dump($p->get_tag());

To make sure your assumptions about the processed HTML hold, consult the return value of $p->next_tag(). It returns true when it finds a tag, and false when it goes past the last tag in the document:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1></h1><p></p>';
$p = new WP_HTML_Tag_Processor$html );

if(
$p->next_tag()) {
  
var_dump($p->get_tag());
}
if(
$p->next_tag()) {
  
var_dump($p->get_tag());
}
if(
$p->next_tag()) {
  
// There is no third tag so this will not run:
  
var_dump($p->get_tag());
}

Finding the right tag

The next_tag() moves one tag at a time, but it can also perform lookups if you pass a $query argument. It supports matching a specific tag name, a CSS class, or both:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<div><div class="block-group"></div></div>';
$p = new WP_HTML_Tag_Processor$html );
// Tag and attribute name lookup is case-insensitive
// according to the HTML specification
$query = array(
  
'tag_name' => 'DIV',
  
'class_name' => 'block-group'
);
if ( 
$p->next_tag$query ) ) {
    
$p->remove_class'block-group' );
    
$p->add_class'wp-block-group' );
}
echo 
$p->get_updated_html();

Reading HTML attributes

You can read the selected tag’s name and attributes using $p->get_tag() and $p->get_attribute($name). Notice how the HTML entities are automatically decoded:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1 title="Tag Processor Tutorial &lt;3"></h1>';
$p = new WP_HTML_Tag_Processor$html );

// Select h1:
$p->next_tag();

// Echo the details:
echo $p->get_tag() . PHP_EOL;
echo 
$p->get_attribute('title');

Unfortunately, reading the text or HTML contents of a tag is not supported yet.

Modifying HTML attributes

You can update the tag’s attributes using the $p->set_attribute($name, $value) and $p->remove_attribute($name) methods. Just like in the previous example, the HTML entities are handled automatically:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h1 id="main">
    Site title
</h1>
<p>Content</p>'
;
$p = new WP_HTML_Tag_Processor$html );

$p->next_tag();
$p->remove_attribute'id' );

// There is no class attribute, but that's okay –
// there won't be any errors:
$p->remove_attribute'class' );

// The escaping is handled automatically:
$p->set_attribute'title''Using <html> "tags"' );
echo 
$p->get_updated_html();

Working with CSS classes

Tag Processor can adjust the CSS classes via the add_class( $class ) and remove_class( $class ) methods. This is how Dennis and I ended up adding the wp-block-heading tag to the first <h1>, <h2>, … tag in every WordPress heading block:

Loading live code snippet… Here’s a static version:
<?php
$html 
'<h2 class="bold">This is a heading</h2>';
$p = new WP_HTML_Tag_Processor$html );

$header_tags = array( 'H1''H2''H3''H4''H5''H6' );
while ( 
$p->next_tag() ) {
    if ( 
in_array$p->get_tag(), $header_tagstrue ) ) {
        
$p->add_class'wp-block-heading' );
        break;
    }
}

echo 
$p->get_updated_html();

Handling tricky HTML inputs

Tag Processor implements the WHATWG HTML parsing spec which means it can safely process HTML markup that would derail most regular expressions and even DOMDocument:

Loading live code snippet… Here’s a static version:
<?php
$tricky_html 
= <<<HTML
<textarea src="These <p>'s are not actual HTML elements">
    <p><p<!--<p>-->="</p>"</p>
</textarea>
<p></p>
HTML;
$p = new WP_HTML_Tag_Processor$tricky_html );
$p->next_tag('p');
$p->add_class('bold');

echo 
$p->get_updated_html();

In contrast, the DOMDocument finds three <p> tags and throws a bunch of warnings:

Loading live code snippet… Here’s a static version:
<?php
$tricky_html 
= <<<HTML
<textarea src="These <p>'s are not actual HTML elements">
    <p><p<!--<p>-->="</p>"</p>
</textarea>
<p></p>
HTML;
$d = new DOMDocument();
$d->loadHTML($tricky_html);
var_dump($d->getElementsByTagName('p'));

More HTML APIs is coming in the future

This is just the first HTML API in WordPress. In the future you’ll be able to find tags by CSS selectors, update the inner HTML, and construct new HTML trees from scratch. Stay tuned!

Follow me on Twitter for more tutorials like this one. I can also send new articles directly to your inbox – just sign up to my Substack:

5 responses to “How to Modify HTML in a PHP WordPress Plugin Using The New Tag Processor API”

  1. awesome! finally its shipped with 6.2 and i save to much code now. thank you for the review

  2. Is it possible to use the WP_HTML_Tag_Processor to modify a complete page?
    Per example, adding/modifying an attribute on all anchors?
    I expected this to be possible, but wouldn’t know what hook I should use.

    1. I don’t see why not!

      I don’t think there’s a filter for full HTML output, but I found this hack on StackOverflow: https://stackoverflow.com/questions/772510/wordpress-filter-to-modify-final-html-output

      But perhaps you could do what you need with a hook to modify a post or page content. Would you please share a bit more?

      1. Hi Adam,

        Excuse me for the late response. For a specific website I need to add parameters to every internal URL on the page. This goes beyond the content, but also menu links and other sections of the page.

        The parameter is used on the next page to modify the color theme of the website by adding a body class or something like that. I wanted to figure out a way to do this with PHP instead of JavaScript because it will be there on page load instead of when JavaScript executes.

  3. Finally, simple!

Leave a Reply

Blog at WordPress.com.

Discover more from Adam's Perspective

Subscribe now to keep reading and get access to the full archive.

Continue reading