CLI based WordPress imports

I frequently need to import client content and I usually do it a couple of times so a CLI version of the importer is much more useful for me than the web interface.

I guess many of you have the same issue so I published my current work in progress for a CLI wrapper around the wordpress-importer at https://github.com/tott/WordPress-CLI-Importer.

As mentioned – this is a very raw, very undocumented work in progress. Feel free to check it out and let me know if you run into any trouble with it or have any suggestions. All feedback welcome!

Caching WordPress navigation menus – wp_nav_menu() wrapper

Navigation menus in WordPress are a great thing, sadly they are usually not performing very well and you might want to cache them. Here’s a small set of functions that allow you to cache the output of the navigation menus. Please note the inline comments in case your navigation menus have different layouts depending on the posts or categories.

Once you throw this in your themes’ functions.php or a file included from there you can use hh_cached_nav_menu() as replacement for wp_nav_menu().

<?php
/**
 * Wrapper function around wp_nav_menu() that will cache the wp_nav_menu for all tag/category
 * pages used in the nav menus
 * @see http://lookup.hitchhackerguide.com/wp_nav_menu for $args
 * @author tott
 */ 
function hh_cached_nav_menu( $args = array(), $prime_cache = false ) {
	global $wp_query;
	
	$queried_object_id = empty( $wp_query->queried_object_id ) ? 0 : (int) $wp_query->queried_object_id;
	
	// If design of navigation menus differs per queried object use the key below
	// $nav_menu_key = md5( serialize( $args ) . '-' . serialize( get_queried_object() ) );
	
	// Otherwise
	$nav_menu_key = md5( serialize( $args ) );
	
	$my_args = wp_parse_args( $args );
	$my_args = apply_filters( 'wp_nav_menu_args', $my_args );
	$my_args = (object) $my_args;
	
	if ( ( isset( $my_args->echo ) && true === $my_args->echo ) || !isset( $my_args->echo ) ) {
		$echo = true;
	} else {
		$echo = false;
	}
	
	$skip_cache = false;
	$use_cache = ( true === $prime_cache ) ? false : true;
	
	// If design of navigation menus differs per queried object comment out this section
	//*
	if ( is_singular() ) {
		$skip_cache = true;
	} else if ( !in_array( $queried_object_id, hh_get_nav_menu_cache_objects( $use_cache ) ) ) {
		$skip_cache = true;
	}
	//*/
	
	if ( true === $skip_cache || true === $prime_cache || false === ( $nav_menu = get_transient( $nav_menu_key ) ) ) {
		if ( false === $echo ) {
			$nav_menu = wp_nav_menu( $args );
		} else {
			ob_start();
			wp_nav_menu( $args );
			$nav_menu = ob_get_clean();
		}
		if ( false === $skip_cache )
			set_transient( $nav_menu_key, $nav_menu );
	} 
	if ( true === $echo )
		echo $nav_menu;
	else
		return $nav_menu;
}

/**
 * Invalidate navigation menu when an update occurs
 */
function hh_update_nav_menu_objects( $menu_id = null, $menu_data = null ) {
	hh_cached_nav_menu( array( 'echo' => false ), $prime_cache = true );
}
add_action( 'wp_update_nav_menu', 'hh_update_nav_menu_objects' );

/** 
 * Helper function that returns the object_ids we'd like to cache
 */
function hh_get_nav_menu_cache_objects( $use_cache = true ) {
	$object_ids = get_transient( 'hh_nav_menu_cache_object_ids' );
	if ( true === $use_cache && !empty( $object_ids ) ) {
		return $object_ids;
	}

	$object_ids = $objects = array();
	
	$menus = wp_get_nav_menus();
	foreach ( $menus as $menu_maybe ) {
		if ( $menu_items = wp_get_nav_menu_items( $menu_maybe->term_id ) ) {
			foreach( $menu_items as $menu_item ) {
				if ( preg_match( "#.*/category/([^/]+)/?$#", $menu_item->url, $match ) )
					$objects['category'][] = $match[1];
				if ( preg_match( "#.*/tag/([^/]+)/?$#", $menu_item->url, $match ) )
					$objects['post_tag'][] = $match[1];
			}
		}
	}
	if ( !empty( $objects ) ) {
		foreach( $objects as $taxonomy => $term_names ) {
			foreach( $term_names as $term_name ) {
				$term = get_term_by( 'slug', $term_name, $taxonomy );
				if ( $term )
					$object_ids[] = $term->term_id;
			}
		}
	}
	
	$object_ids[] = 0; // that's for the homepage
	
	set_transient( 'hh_nav_menu_cache_object_ids', $object_ids );
	return $object_ids;
}

Caching get_posts()

get_posts() is a nice little function and is widely used within themes and plugins, but what most people don’t keep in mind is that there is at least one database query behind each of the calls. With taxonomy,category or meta queries it’s even more, so caching the result of this function can have quite an performance impact.

Here is my approach:

/**
 * Wrapper around get_posts that utilizes object caching
 * 
 * @access public
 * @param mixed $args (default: NUL)
 * @param bool $force_refresh (default: false)
 * @return void
 */
function get_posts_cached( $args = NULL, $force_refresh = false ) {
	$cache_incrementor = wp_cache_get( 'get_posts_cached', 'cache_incrementors' );
	if ( !is_numeric( $cache_incrementor ) || true === $force_refresh ) {
		$now = time();
		wp_cache_set( 'get_posts_cached', $now, 'cache_incrementors' );
		$cache_incrementor = $now;
	}
	
	$cache_key = 'get_posts_cached_' . $cache_incrementor . '_' . md5( serialize( $args ) );
	$cache_group = 'get_posts_cached';
	
	$posts = wp_cache_get( $cache_key, $cache_group );
	if ( false === $posts || true === $force_refresh ) {
		$posts = get_posts( $args );
		if ( count( $posts ) < 11 ) // we don't want to cache too much
			wp_cache_set( $cache_key, $posts, $cache_group );
	}
	return $posts;
}


/**
 * Invalidate get_posts_cached stored values.
 * 
 * @access public
 * @return void
 */
function invalidate_get_posts_cache( $post_id ) {
	$args = array( 'include' => array( $post_id ) );
	get_posts_cached( $args, $force_refresh=true );
}
add_action( 'save_post', 'invalidate_get_posts_cache', 1 );

As you can see this is mainly a wrapper around get_posts() but it adds some magic. It utilizes the object cache to store the results of the get_posts() call. If there is already a valid cache object it will return this, if not it will rebuild the cache from the db by calling get_posts().

As you can see there is no timeout on the object. We simply enforce a refresh of all objects each time a post is saved. This can be done by adding a counter ($cache_incrementor) to the cache key. Once we want to invalidate all cache objects in the group we simply increment this counter. Kudos for this method goes to advanced-caching

If you don’t have an object cache you could also replace the wp_cache_get and wp_cache_set functions with their get_transient and set_transient counter-parts.

Snippet – Spot bad queries in WordPress the quick&dirty way

Here’s a little piece of code I use for debugging when I evaluate sites and themes which I don’t know in detail.
I simply drop this in a file and include this file from within the themes’ functions.php or simply by dropping it in wp-content/mu-plugins as something like 0-my-debugging.php or the like.
Then all you need to do is add your remote IP address to the $my_debug_ips array and you’ll receive a log in /tmp/debug.log the next time you reload the page.
This piece of code gives you a log of the queries that are run along with the backtrace from where they were called along with a timestamp.
This makes it relatively easy to identify slow running queries and/or code segments that could benefit from caching.
When define( 'EXPLAIN_QUERIES', true ) is set in the script it will also run a MySQL EXPLAIN on each of the queries that are executed and dump the result to the logfile as well.
It also hooks into http_request_args to log any calls that are made using the HTTP API.
As a bonus you can also use my_var_log( $msg, $trace=true ) somewhere in your code to dump the content of other variables.

$my_start = microtime( true );
$my_debug_ips = array(
	'111.222.333.444', // your remote IP address
);

if ( !in_array( $_SERVER['REMOTE_ADDR'], $my_debug_ips ) )
	return;

define( 'EXPLAIN_QUERIES', true );

add_filter( 'query', 'my_query_log' );
function my_query_log( $q ) {
	my_var_log( $q );
	if ( defined( 'EXPLAIN_QUERIES' ) && EXPLAIN_QUERIES ) {
		global $wpdb;
		remove_filter( 'query', 'my_query_log' );
		$res = $wpdb->get_results( $wpdb->prepare( 'EXPLAIN ' . $q ) );
		my_var_log( $res, false );
		add_filter( 'query', 'my_query_log' );
	}
	return $q;
}

function my_var_log( $msg, $trace=true ) {
	global $my_start;
	$time = microtime( true ) - $my_start;
	error_log( sprintf( "%s - %.2fs - %s - %s\n", date( 'Y-m-d H:i:s' ), $time, var_export( $msg, true ), $_SERVER['REQUEST_URI'] ), 3, '/tmp/debug.log' );
	if ( true === $trace ) {
		$trax = array();
		$trace = debug_backtrace();
		foreach ( $trace as $key => $trc ) {
			$trax[$key] = $trc['file'] . '::' . $trc['line'] . ' - ' . $trc['function'];
		}
		error_log( var_export( $trax, true ) . "\n", 3, '/tmp/debug.log' );
	}
}

add_filter( 'http_request_args', 'my_httpapidebug', 10, 2 );
function my_httpapidebug( $r, $url ) {
	$r['_url'] = $url;
	if ( function_exists( 'my_var_log' ) )
		my_var_log( $r );
	return $r;
}