Xiang Fu*, Shangdi Yu*, Austin R Benson
Large Question-and-Answer (Q&A) platforms support diverse knowledge curation on the Web. While researchers have studied user behaviour on such platforms in a variety of contexts, there is relatively little insight into important by-products of user behaviour that also encode knowledge. Here, we analyse and model the macroscopic structure of tags applied by users to annotate and catalogue questions, using a collection of 168 Stack Exchange websites that span a diversity of sizes and topics. We study the distribution of tag frequencies and also the structure of ‘co-tagging’ networks where nodes are tags and links connect tags that have been applied to the same question. We find striking similarity in tagging structure across Stack Exchange communities, even though each community evolves independently (albeit under similar guidelines). Our findings thus provide evidence that social tagging behaviour is largely driven by the Stack Exchange platform itself and not by the individual Stack Exchange communities. We also develop a simple generative model that creates random bipartite graphs of tags and questions. Our model accounts for the tag frequency distribution but does not explicitly account for co-tagging correlations. Even under these constraints, we demonstrate empirically and theoretically that our model can reproduce a number of the statistical properties that characterize co-tagging networks.