Python script to post RSS (or atom) feeds to WordPress

Posted on 2009/05/16

2


This blog used the “Blog Posting” functionality of del.icio.us service to post a daily round-up of del.icio.us bookmarks. I wanted to do something similar for liks that I did not want to bookmark (news, twits).
I tried Digg, but:

  1. Submitting a story is a long process.
  2. Digg Blog posting is buggy
  3. It would create another “today’s links” post.

I eventually created my own implementation of a blog posting, using the “xmlrpc” capabilites of wordpress.com. I present to you a script in python that retrieves feed entries, then creates a simple html post and post it on WordPress.  In order to make it work, the “feedparser” module must be installed; instructions can be found on its site: Universal Feed Parser (in Linux it just involves downloading the zipped module, extracting it and running “python setup.py install”). This script can be run as a scheduled task (using cron/anacron in Linux and scheduled tasks in Windows) at the beginning of the day, and it automatically creates a post containing yesterday’s items of one or more feeds of your choice. You should just change the lines in the “Settings” section to make it work for you. With the presented settings I create a draft in this blog from my Google reader shared items, Diggs and del.icio.us bookmarks.

Security issues:

  1. If you want to schedule an automatic job, the file will contain your wordpress password in clear text. You can create your own version that prompts the user for the password.
  2. This script currently rewrites the HTML portion of the feed entries descriptions into your wordpress post. This is both powerful and problematic: the HTML could contain unwanted elements.

I am a novice in python scripting, so this script was also an opportunity to learn something on this language;  I find it really easy to learn! If you have some advice, feel free to improve this post with your comments.

#!/usr/bin/python
# -*- coding: utf-8 -*-

#    Copyright 2009 Francesco Balducci
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#    See <http://www.gnu.org/licenses/> for the GNU General Public
#    License details.

###########################
# Settings
###########################

blog_name = 'https://balau82.wordpress.com/xmlrpc.php'
user_name = 'balau82'
user_pass = '***'
draft = 1
days_offset = -1 #Yesterday's items
simulation = 0
MyFeeds = [
    'http://www.google.com/reader/public/atom/user%2F00036527680217814867%2Fstate%2Fcom.google%2Fbroadcast',
    'http://feeds.delicious.com/v2/rss/balau?count=15',
    'http://digg.com/users/balau/history.rss'
    ]

###########################

import sys
import xml.etree.ElementTree as ET
#http://feedparser.org/
import feedparser
import time
import datetime
import xml.sax.saxutils
import xmlrpclib

class MyLink:
    link = ''
    title = ''
    comment = ''

MyLinks = []

#Get links from feeds:

today = (datetime.date.today() + datetime.timedelta(days=days_offset)).timetuple()
for feed_url in MyFeeds:
    f = feedparser.parse(feed_url)
    print 'feed: ' + f.feed.title
    for entry in f.entries:
        te = entry.updated_parsed
        if te.tm_year == today.tm_year and te.tm_yday == today.tm_yday:
            print entry.title
            m = MyLink()
            m.link = entry.link
            m.title = entry.title
            try:
                m.comment = xml.sax.saxutils.unescape(entry.description)
            except:
                m.comment = ''
            MyLinks.append(m)
    print

#Post on wordpress:
#  title of post
title = 'Links for {0:04}-{1:02}-{2:02}'.format(today.tm_year, today.tm_mon, today.tm_mday)
#  create content:
# http://effbot.org/zone/element-index.htm
content = ET.Element('ul')

#    for each link:
for link in MyLinks:
    #      add bullet list item
    link_li = ET.SubElement(content, 'li')
    #      add link with title
    link_title_div = ET.SubElement(link_li, 'div')
    link_title = ET.SubElement(link_title_div, 'a')
    link_title.set('href', link.link)
    link_title.text = link.title
    #      add comment (trick to include html)
    s = u'<div>' + link.comment + u'</div>'
    link_comment = ET.XML(s.encode('UTF-8'))
    link_li.append(link_comment)

if not simulation:
    blog_id = 0
    post_content = ET.tostring(content)
    blog_content = { 'title' : title, 'description' : post_content }
    categories = [{'categoryId' : 'Links', 'isPrimary' : 1}]
    sp = xmlrpclib.ServerProxy(blog_name)
    post_id = int(sp.metaWeblog.newPost(blog_id, user_name, user_pass, blog_content, not draft))
    sp.mt.setPostCategories(post_id, user_name, user_pass, categories)
    if not draft:
        sp.mt.publishPost(post_id, user_name, user_pass)

add to del.icio.us :: Bookmark Post in Technorati :: Add to Blinkslist :: add to furl :: Digg it :: add to ma.gnolia :: Stumble It! :: add to simpy :: seed the vine :: :: :: TailRank :: post to facebook :: Bookmark on Google :: Add to Netscape :: Share on Yahoo :: Add this to Live

Posted in: Software