Google News To Release With WordPress


Google News can be a coveted platform. But Google needs for the aggregation a special format - News Sitemap.

Basically you can create this format in two ways with WordPress. Both solutions will be presented here. I will talk about the second example more in detail, because I believe it shows very nicely how to use content from WordPress outside your blog.

  1. The first way is to create a sitemap, similar to a feed in WordPress. This has several advantages for the administration in WordPress.
    How to create a feed, I have in the tutorial „WordPress Feed for Drafts“ shown. You can download this solution as a plugin and simple use Google News-Sitemap.
  2. A second possibility is to create a PHP file in the root directory and to write the latest posts into the appropriate format.

Include WordPress

To get the data from WordPress, you have to have access to wp-load.php, therefore I include it and can get it from the global variables of WordPress, for example the $wpdb database .
This means you can retrieve now all data from the database, which are relevant to the XML format of the Google News Sitemap.

The format

Google provides the following XML structure. I build the structure in the file, and fill it only with the last 20 News.
Backgrounds and tips from Google are on their document site.

<urlset xmlns=“http://www.sitemaps.org/schemas/sitemap/0.9?
xmlns:news=“http://www.google.com/schemas/sitemap-news/0.9?>
	<url>
		<loc>http://www.domain.de/news/news1.html</loc>
		<news:news>
			<news:publication_date>2008-22-01T00:29:19+01:00</news:publication_date>
			<news:keywords>key1, key2, key3</news:keywords>
		</news:news>
	</url>
</urlset>

The file

Below you'll find a simple solution that you can surely expand. In the SQL query is the example of a defined category. This is simply the ID of the category compared (AND wp_term_taxonomy.term_id = 7). If all content should be drawn, then it's suffice to delete this line.

<?php
require('wp-load.php');

// XML header
echo '<?xml version="1.0" encoding="utf-8"?>' . "\n";

// urlset
echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
				xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">' . "\n";

// Select posts; set limit 20
$rows = $wpdb->get_results("SELECT DISTINCT ID, post_date_gmt
                            FROM $wpdb->posts, $wpdb->term_relationships, $wpdb->term_taxonomy
                            WHERE wp_term_relationships.object_id = wp_posts.id
                            AND post_status = 'publish'
                            AND post_type = 'post'
                            AND wp_term_taxonomy.term_taxonomy_id = wp_term_relationships.term_taxonomy_id
                            AND wp_term_taxonomy.taxonomy = 'category'
                            AND wp_term_taxonomy.term_id = 7
                            ORDER BY wp_posts.post_date_gmt DESC
                            LIMIT 0, 20");

// sitemap data
// set keywords !

foreach ($rows as $row) {
	echo "\t" . '<url>' . "\n";

	echo "\t\t" . '<loc>';
	echo get_permalink($row->ID);
	echo '</loc>' . "\n";
	echo "\t\t" . '<news:news>' . "\n";
	echo "\t\t" . '<news:publication_date>';
	$thedate = substr($row->post_date_gmt, 0, 10);
	$thetime = substr($row->post_date_gmt, 11, 20);
	echo $thedate . 'T' . $thetime . 'Z';
	echo '</news:publication_date>' . "\n";
	echo "\t\t" . '<news:keywords>online, news</news:keywords>' . "\n"; // change keywords
	echo "\t\t" . '</news:news>' . "\n";
	echo "\t" . '</url>' . "\n";
}

// End urlset
echo '</urlset>';
?>

In the above syntax, the unique keywords are statically assigned. If you assign tags to a blog post, then it is advisable to use it and create there. Following addition will help.

	$tags     = wp_get_post_tags( $row->ID, array('fields' => 'all') );
	$tagcount = count($tags);
	echo "\t\t" . '<news:keywords>';
	for ($i = 1; $i < $tagcount; $i++) {
		echo $taglist  = str_replace( "'", '', str_replace( '"', '', urldecode($tags[$i]->name) ) );
		if ( $i != $tagcount-1 )
		 echo ', ';
	}
	echo '</news:keywords>' . "\n";

Use this line instead

echo "\t\t" . '<news:keywords>online, news</news:keywords>' . "\n"; // change keywords

and it will assign tags automatically, seperated by a comma.

Inclusion in Google

Once you have the above syntax as a file in the root installation and successfully tested, then you only have to ask for inclusion in Google News. There is a form available. Then just wait for an answer from Google.
To see if you are indexed, you can simple search in Google News: site:domain.com.


15 Comments
  1. DD32 says:

    Its best to include wp-load.php instead of wp-config.php, for a few reasons, 1 of which is that wp-config.php can exist 1 level up from WP.

    See the ticket here: http://trac.wordpress.org/ticket/6933

    (Note: wp-load.php is WP 2.6+)

  2. Michael says:

    Thanks DD32, you are right. I use wp-load.php for the help files of my upcomming theme too. Frank should fix that after holidays.

  3. Blog Expert says:

    This was an awesome post. I definitely agree with you and I am looking forward to reading more of your blog.

  4. Alex says:

    Hey Blog Expert, thanks for the compliment and happy new year!

  5. Greg says:

    Great post! Does anyone know how I can get the unique 3 digits into my URLs?

  6. Alex says:

    Hey Greg, I see you already have 3 digits into your URLs, looks like you worked it out.

  7. Greg says:

    Hi Alex, yes thanks a friend on DP told me how to do this :) Imagine yesterday I got an email from Google... My site has been approved :)

    Do you know how I can submit the sitemap to them?

  8. nemoprincess says:

    Hello Alex, great post for me too.
    I follow it but running my php file I have only three old posts of my blog.
    What's wrong?
    Thank you very much

  9. @Greg
    How did you get the 3 digit unique code in to your urls without updating your permalink structure? Or did you?

    @Alex,
    From what I can see on the webmaster tools you can simply submit an rss feed instead of using a sitemap.xml. Why go through the trouble of creating this is their own feed from feedburner would work?

  10. Greg says:

    Hi Siriusbuzz to get the 3 digit id I updated my permalinks structure. I couldn't find any other way to do this.

  11. Is there any reason we would want to limit the sitemap to 20 items? I have looked around but I cant find a suggested/optimal number to feed Google with.

  12. Greg says:

    According to Google:

    A News sitemap can contain no more than 1,000 URLs. If you want to include more, you can either break these into multiple sitemaps or create a Sitemap index file to manage them. Use the XML format provided in the Sitemap protocol. Your sitemap index file shouldn't list more than 1,000 sitemaps. These limits help ensure that your web server isn't overloaded by serving large files to Google News.

    http://www.google.com/support/webmasters/bin/answer.py?answer=74288&topic=10078

    Hope that answers your question...

  13. I also noticed that they said you shouldn't have any news items older then 72 hours old. Actually, they said 3 days but, you get the idea. I don't know who the heck has the ability to post 1k items in 3 days.

  14. Greg says:

    Where did you see this?

  15. Simon says:

    Once you create the php file, how do you then access it as an XML file?

2 Pings
  1. WordPress Links - Week 53/1 - 2008/2009 | WPStart.org - WordPress themes, plugins and news
  2. Using Google News to drive traffic to your site | CoPress
Leave a Reply