Sunday, 16 August 2009

Word clouds for textual analysis


Word clouds, including the pretty ones produced by Wordle (see right) can be a useful first step in analysing textual data. I regularly trawl through free text responses to surveys, looking for themes and patterns. By first creating a word cloud, which shows words used most frequently in a larger size, I get an idea of what to look for and how to categorise responses.

For example, the word cloud in this post is from textin response to a question asking students about the benefits of using wikis. From the cloud, I was able to instantly see some themes, and I could then go through and categorise the comments a little more methodically with the help of the keywords suggested by the cloud. The word clouds can also liven up any report or presentation made using the data.


Word clouds are not perfect, or even very scientific, for this kind of text analysis: they do not show the context in which the words were used (it could have been poor or good in front of the word 'access in the above example) and and don't account for similes (there may only be one word that corresponds to access, but several that represent organisation). A semantic word cloud, that recognised context and similes would be very powerful. There is a very nice prototype called 'concordle' (produced by Ladislav Kocba) that shows one way that context can be accounted for in a word cloud to produce a useful concordancing tool (very useful for textual analysis).

In fact, the survey software I use (Bristol Online Survey) produces word clouds, but not as prettily as Wordle does. If you use Wordle for this kind of analysis, beware that data you save is no longer yours. For this reason, when I use Wordle, I paste the data, take a screenshot of the wordcloud and then close without saving the resultant word cloud.

Google Docs also produced Word Clouds, and the spreadsheet forms can be used as a survey tool. There are lots of possibilities for gathering data and publishing clouds online through Google Docs. I plan to experiment with the possibilities this offers.

A former colleague, Andy Ramsden at Bath, uses word cloud as a way of aggregating textual responses gathered during a live presentation by getting the audience to text him from their phones, and then cutting and pasting the results into a word cloud generator. Word clouds are great for quick and dirty presentation of texts. Word clouds have lots of uses beyond making pretty pictures, or tag clouds at the side of a blog.

No comments: