A Tale of Two Word Clouds: The Perils of Outsourcing Decision-making to AI

In my last article, I compared views about ecological restoration presented by some ecologists to those presented by authors from an indigenous background. As I was writing, I used Claude to generate word clouds of the two scientific articles. My idea was to use the word clouds to help evaluate and explain the different worldviews.

But something about the word clouds didn’t seem right to me. Digging into these differences would have taken some time, and getting the article published in time was a higher priority for me. So I dropped the word clouds from the article.

But after publishing that article, I continued to feel a nagging curiosity. So for this article, I decided to investigate, to learn something about word clouds, and to produce something I’m confident in.

Original word clouds: something was off

Not knowing much about the different parameters that go into generating a word cloud, I asked Claude to make one for each of the two papers. At first glance, I was excited to have created such a cool data visualization with little effort. But on closer inspection, I noticed something. The word “restoration”–one of the most common words in Long et al., 2003 was absent from that paper’s word cloud but was one of the most prominent words in the Perring et al., 2015 word cloud. Looking at them now, I see that “ecological”–another of the Long paper’s most common words–is also missing.

Early iteration of a word cloud for Long et al., 2003. Note absence of “ecological” and “restoration”, two of the most common words in this paper.

Early iteration of a word cloud for Perring et al., 2015. Both “ecological” and “restoration” are rightly prominent.

Clearly, these two papers were being analyzed differently under the hood, and I didn’t understand why. I wasn’t comfortable publishing something I didn’t sufficiently understand.

What I learned

As I dug into how word clouds are generated, I learned that one of the most important parameters (i.e., choices) is the stopword list. A word cloud is a visual representation of word frequency. Words that appear more often are displayed in larger font and vice versa. But not all words convey useful information.

Enter the stopword list: a list of words to ignore. By default, universally common and non-informative words (e.g., “the”, “and”, “a”) are included in a standard English stopword list. Beyond this, the analyst can add additional words.

Apparently Claude decided that “ecological” and “restoration”–words directly describing the main topic of both papers–were noise rather than signal. While that could be a reasonable justification for adding them to a stopword list, it did not apply this rule uniformly to both papers. That’s a problem.

There are a few other parameters the analyst can control when making a word cloud, but deciding the stopword list is one of the most consequential decisions. And when outsourcing an analytical task to an AI (or person, for that matter), it’s still important for the analyst in charge to understand the choices being made and the possible consequences.

This time, instead of outsourcing most of the analyst’s decisions to Claude, I worked through the process step by step, iterating to make sure I had a better grasp of the decisions being made and the corresponding results of those decisions. These steps included deciding which parts of each paper to include in the analysis, which words to include in the stopword list, reviewing word frequency outputs, and adjusting the color schemes to make the graphics easier to understand. (The full frequency list for each paper and the stopword list are available for download at the end of this article.)

Some limitations

Before I present the revised word clouds, a few words about their limitations, both in general and in this specific case.

A word cloud is a fairly crude method of textual analysis that can point to vocabulary differences. My intention in the last article was to use them as a tool to prompt my thinking and aid my discussion. But the observations I made about different worldviews fundamentally came from understanding how words were being used in context rather than word frequency. The specific words that suggested certain views were often not among the most common.

This comparison of two papers also has some structural limitations. One paper is a literature review (Perring et al., 2015), while the other is a perspectives paper (Long et al., 2003). They were published over a decade apart. And the sample size is small (n=2).

Revised word clouds: more directly comparable

After several iterations working with Claude, I settled on these two word clouds.

I decided to leave both “ecological” and “restoration” in. Not surprisingly, they are the two most prominent words in each cloud. Apart from these two words, I do notice distinct differences in vocabularies between the two papers.

To illustrate these differences, I had Claude make a single, comparison word cloud. The size of words in this comparison cloud are based on differences in relative word frequencies between the two papers. And the color indicates the paper in which the word is more frequent.

This makes the vocabulary differences clearer to me. And while it doesn’t explain different worldviews by itself, it does hint at or reinforce certain things I noticed in my earlier comparisons. For example, the Long paper’s vocabulary suggests a more relational focus, with frequent use of words like “cultural”, “advisors”, and “tribal”. On the other hand, the Perring paper frequently uses more technical, mechanical vocabulary, like “scale”, “functional”, and “ecosystem”.

It’s also interesting that “restoration” appears to have a substantially higher relative frequency in Perring, while “ecological” is absent from the comparison cloud. Looking at the word frequency tables, the relative frequencies of “ecological” are similar, but not identical, in both papers. The relative frequency differential for “ecological” between the two papers is 0.00121, while that of “channel” (one of the smallest words in the comparison cloud) is greater [0.00166]. In other words, the difference for “ecological” was too small to make the cut.

Closing thoughts

Outcomes of analysis depend not only on input data–including textual data–but also on analysis and visualization parameters. Traditionally, these parameters are chosen by the analyst, even if those choices involve using software defaults. I believe that a good analyst understands how these choices affect the outcomes.

In recent times, the emergence of generative AI models has made this type of analysis faster and more accessible. As a data analyst, this is exciting. But my experience with these word clouds reminds me that understanding what’s going on under the hood still matters. Choices about input data and parameters are still being made by the AI, whether one sees them or not.

[supplemental data files]

stopwords_v1 Download

frequency_long_v2 Download

frequency_perring_v2 Download

AI RESPONSIBILITY RUBRIC
This rubric shows human vs AI contribution across stages of developing the article. It was generated by AI and reviewed by human, making adjustments as needed.
━━━━━━━━━━━━━━━━━━━━
RESEARCH/LEARNING
Human 60% | AI 40%
[============........]
AI: Claude
Taylan drove the inquiry — identifying the problem with the original word clouds, deciding what questions to ask, and determining when he understood enough to proceed. Claude explained word cloud mechanics, parameters, and methodology. The learning was Taylan's; the instruction was Claude's.
━━━━━━━━━━━━━━━━━━━━
DATA ANALYSIS/VISUALIZATION
Human 50% | AI 50%
[==========..........]
AI: Claude
Taylan made all consequential analytical decisions: which paper sections to include, which words to add to the stopword list, color scheme choices, and when the output was acceptable. Claude executed the code, generated frequency tables, and produced the visuals. Neither could have produced the final output without the other.
━━━━━━━━━━━━━━━━━━━━
WRITING
Human 85% | AI 15%
[=================...]
AI: Claude
Taylan wrote the article. Claude contributed the outline structure and occasional suggested language, and performed fact-checking and logical consistency review. All prose decisions were Taylan's.
━━━━━━━━━━━━━━━━━━━━
EDITING/REFINEMENT
Human 60% | AI 40%
[============........]
AI: Claude
Taylan made all final decisions about what to keep, cut, and revise. Claude identified logical inconsistencies, factual issues, and proofreading errors across multiple passes. The judgment calls were Taylan's; the flagging was Claude's.
━━━━━━━━━━━━━━━━━━━━

AI Tools: Claude (Sonnet 4.6)