Apr
19
Posted on 19-04-2007
Filed Under (pixzone) by Federico Feroldi on 19-04-2007

While looking for my lost posts I’ve found that you can get back the last fetched feed from Google Reader from the URL below:

http://www.google.com/reader/atom/feed/FEED_URL?n=MAX_ITEMS

Then I got back about 50 posts from my lost pixzone.com blog in atom format.

My next step is to create a small converter from Atom to Wordpress Extended RSS (WXR) format that I can use to import back the posts in Wordpress, I already found some Ruby code for reference.

Update: I’ve created a small perl script to convert the Atom generated by Google Reader to the Wordpress Extended RSS (WXR) format, you’ll find the Perl source below. You’ll also need to patch the wp_create_post() function in wp-include/post.php to keep the post IDs unchanged if you have included the post ID in the permalink and you don’t want to have a lot of 404s in your logs! :)

#!/home/y/bin/perl

use Data::Dumper;
use XML::Smart;
use Date::Manip qw(ParseDate UnixDate);
use HTML::Entities;

my $xml = XML::Smart->new('feed.xml');

print <<HERE;
<?xml version="1.0" encoding="ISO-8859-1"?>
<rss version="2.0"
        xmlns:content="http://purl.org/rss/1.0/modules/content/"
        xmlns:wfw="http://wellformedweb.org/CommentAPI/"
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:wp="http://wordpress.org/export/1.0/"
>

<channel>
        <title>the pix zone</title>
        <link>http://www.pixzone.com/blog</link>
        <description>Federico 'pix' Feroldi weblog</description>
        <pubDate>Mon, 19 Apr 2007 18:29:58 +0000</pubDate>
        <generator>http://wordpress.org/?v=2.1.3</generator>
        <language>en</language>
HERE

foreach my $entry (@{$xml->{'feed'}->{'entry'}}) {
    my $e_title = encode_entities($entry->{'title'}, '<>&"');
    my $e_link = $entry->{'link'}->{'href'};

    $e_link =~ m#blog/(\d+)/([^/]+)#;
    my $e_id = $1;
    my $e_slut = $2;

    my $e_content = decode_entities($entry->{'content'});

    my $e_updated = $entry->{'updated'};
    $e_updated =~ s/T/ /;
    $e_updated =~ s/Z//;

    my $e_updated_rfc = $e_updated; # UnixDate($e_updated_date, "%d-%b-%Y %H:%M:00");


    print <<HERE;
<item>
<title>$e_title</title>
<link>$e_link</link>
<pubDate>$e_updated_rfc</pubDate>
<dc:creator>admin</dc:creator>
<category><![CDATA[pixzone]]></category>
<guid isPermaLink="false">http://www.pixzone.com/blog/?p=$e_id</guid>
<description></description>
<content:encoded><![CDATA[$e_content]]></content:encoded>
<wp:post_id>$e_id</wp:post_id>
<wp:post_date>$e_updated_rfc</wp:post_date>
<wp:post_date_gmt>$e_updated_rfc</wp:post_date_gmt>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name>$e_slut</wp:post_name>
<wp:status>publish</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:post_type>post</wp:post_type>
</item>
HERE
}

print <<HERE;
</channel>
</rss>
HERE
Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • DZone
  • Reddit
  • Technorati
  • YahooMyWeb
    Read More   
Post a Comment
Name:
Email:
Website:
Comments: