Log in

No account? Create an account
Previous Entry Share Next Entry
(no subject)
Being the master of data that I am, I have decided to embark on a project.

My project is - programmatically transfer the entire contents (including comments) of my journal to Wordpress.

I'll be honest - the weekly e-mails from LJ chronicalling the escapades of "Frank and Meme" genuinely persuade me that LJ and I are no longer compatible. I can't explain it in any more detail than to say that I find the whole vibe weird and alienating, and I object to having my content stored in a place that I no longer feel comfortable with.

So the mission is to grab all 7,000 entries, and 20,000 or so comments, and forcibly inject them into a Wordpress blog.

So the wp_posts table is easy enough, need to give it an ID (which auto0-increments anyway), some date bits (which I can easily grab from each LJ entry), content, title, an appropriate status, a name (which I can randomly generate), modification timestamp, a permalink URL (which can be derived from the autonumber ID), and a type. Plus a few other bits and bobs.

Then I'll chuck a load of stuff into a reference data table that maps from the autonumber ID to the LJ ID, which then makes it easier to track between everything. Simple...

Tags can be brought across using the wp_terms table and associated stuff, as far as I can tell...

Then to comments, where I can map names, URLs back to that person's LJ, their IP, timestamp, content, parent IDs, and so on. Again, it's going to take reference data tables to map everything in properly, but it's simple enough once those are in place..

I understand that something sort of like this exists already, taking LJ export XML files and doing this, but when last I checked (half a decade ago probably), that involved exporting stuff month by month. And I've been here for 128 months or thereabouts, so that's impractical. It also wouldn't cover comments, so why bother with that? More fun to build something myself :o)

  • 1
(Deleted comment)
Yeah, it's quite inflexible.. The advantage of Wordpress is that if you run it yourself, you control the entire database structure that puts it together, so you can really do whatever you want. You can also edit the text of comments people have left... ;o)

I'm building all this from scratch. The stuff in AMA was designed in a pretty different way.. The goal in AMA was that as soon as possible after a comment was posted, that fact would be recorded and a load of data would be downloaded. Emphasis was very much on working out how best to get hold of that data ASAP.

So in that case, the way in which my database knew an entry had been posted was because AMA was structured in order to notify it. It was all very reactive. It was crucial to the system that if I went back and posted a comment on a two-year-old post, that should be recorded.

In the case of what I'm doing here, I have to start from scratch and work out how best to get hold of all the data in one go, without worrying too much about capturing live data. So it's going in steps:
  1. Interface with the Month view to get hold of a list of all entries ever posted
  2. Interface with the Entry view for each entry to get hold of its content and metadata
  3. Further interface with the Entry view to get a list of every comment on every post
  4. Further interface with the Entry view to download the content of each comment
Another reason why it needs building from scratch is that it was only a couple of weeks before I left AMA that I even found a way to access friends-only content, so that has to be integrated into everything from the start.

So I'm now at a point where I have the IDs of every entry I've ever posted in my journal. The process that grabs the data will also post it to the backend Wordpress tables entry by entry, so in the next couple of days, it should sort itself out..

As for tagging - if it all goes well, there won't be too much content here for much longer, so you won't have much to read :o)

(Deleted comment)
It was something that would have been really handy in AMA, and its absence stood out at the time.

Sadly, the way to do it involves using a very poorly documented feature, which made it kind of tricky. 'twas a shame, knowing what I do now, I could have made the whole thing much better..

Do you expect any weird formatting issues to come of this? Or are your entries generally clean enough to make an elegant transfer?

I don't expect any formatting issues whatsoever. The great thing about LJ is that the use of friends pages etc. means that only utter cretins put fancy formatting into their posts, and I'm not one of those. So I don't have any entries where I hardcoded a pink font colour into the entry itself, on account of how anybody whose friends page has a pink background would therefore be unable to see it. The format promotes clean formatting, so it should all be fine. Probably.

Well, I can subscribe to your blog via RSS, yeah?

I'll probably have something that crossposts entries or similar - I wouldn't want to abandon my audience after all ;o)

As for monetising the whole thing, nah.. That would require me to build it in a way that others could use, and I haven't the patience for that... :o)

it would be easier to move the to dreamwidth and then get emails about their busted ass coding efforts

Very enlightening and beneficial to someone whose been out of the circuit for a long time.

  • 1