Peeping at the corpus – What is really going on behind the equality and welfare items of the Manifesto project?

The Comparative Manifestos Project (CMP) data set quantifies how much parties emphasize certain topics and positions and is very popular in the study of political parties. The data set is also increasingly applied in comparative political economy and welfare state studies that use the welfare-specific items rather than the CMP’s left–right scale to test hypotheses on the impact of political parties on social policies, (in)equality and the welfare state. But do these items provide a valid basis for descriptive and causal inferences? What do the items precisely capture? To answer these questions on concept validity, we use the new manifesto corpus data for German parties 2002–2013 and, to provide a further test, for US parties 2004–2012. Corpus data are the digitalized, originally hand-annotated and coded texts of electoral programmes. We assess the validity of the codings directly at the level of quasi-sentences by re-categorizing and subcategorizing the originally coded statements on equality, social justice and welfare state expansion. Although concept validity concerns about the data seem exaggerated, we find that theoretically relevant and meaningful variation is ‘hidden’ behind the original categories. Hence, our approach allows researchers to assess the substantive meaning of the CMP data directly, and we offer an efficient new strategy for testing more specific hypotheses on the impact of political parties on policy.