Glow's branches

Ramblings of a resto druid

Tag Archives: data

Visualising search terms

I’ve been playing with some of the data that WordPress supplies. I’m mildly disappointed with the stats package that you get in WordPress, and the inability to use Google Analytics with my blog, but I digress.

Over the past couple of months, the main thing I’ve been watching are the search terms people have used when they land on the site. It’s self-filtering, as the things they search for are the things I have on my site, or they wouldn’t end up here… but nonetheless it’s interesting to see what folks are looking for, and how that matches to the content I have created thus far.

I naturally started off playing with generating a graph* from the search terms and phrases. A bit of a population map showing the frequency of the terms and the number of connections between them. I tried using yEd and Gephi (both great graphing tools), and using TGF and DOT formats.

I had to spend a lot of time cleaning up this data before manipulating it. There are *many* ways to refer to the Twin Emperors for example, so I tried to unify as many synonyms and remove as many typos in the search terms as I could.

My clean data looked like this kind of thing: (this is just a fragment of course)

twin_emperors 85
druid healing magmaw
healing maloriak
twin_emperors 85
druid healing 4.0.6
healing magmaw druid
solo aq40 85

This then almost naturally falls into then had to convert the terms into a DOT file – which would be simple if either of yEd or Gephi fully supported DOT, but they don’t. So I ended up setting up adjacendies in the data for connected terms, which isn’t ideal, but at least preserves the relationships between terms. So the data I ended up importing into Gephi looked like this:

twin_emperors — 85;
druid — {healing ; magmaw}
healing — maloriak;
twin_emperors — 85;
druid — {healing ; 4.0.6}
healing — {magmaw ; druid}
solo — {aq40 ; 85}

And the output of this in Gephi, after some further massaging and playing with different layouts: (click for full version)

Graph of search terms using Gephi

This is only the ‘main body’ of the graph. There are quite a few ‘islands’ of terms that aren’t connected to this, but the scale starts to get a bit crazy, so I’ve just pasted in this main connected ‘contintental landmass’.

There’s a lot to be said for just removing the nodes that only appear once, to give a better feel for the connections between the terms that occur most often.

Then of course I tried not displaying the edges, and scaling the labels according to their connectedness. But of course if I’m going to do that, I may as well use a tool like Wordle:

Wordle word cloud of the search terms

But then I lose the actual connections between the terms, and some of that is really interesting. But of course Wordle is fast, and it’s pretty.

So. I will persevere with my graphs. Maybe in another couple of months when I have more terms under my belt I’ll have another play. And who knows,  maybe Gephi will fully support DOT so I can skip a few steps in transforming my data from search terms to a graph of connected relationships.

* [I like graphs. And when I say graph, I mean one of these, not one of these. I was actually goning to do a postgrad in graph theory, but ended up along an entirely different path. Alas.]