Published online by Cambridge University Press: 12 February 2009
A widely publicized 1983 Chinese survey found 43 per cent of all “specialized households” in a Shanxi county were households of cadres or former cadres. In what sense, if any, is this finding significant? More, generally, what can be concluded about Chinese society, politics and the economy based on findings from survey research conducted there? This article sets out what can (and what cannot) be inferred from the unrepresentative samples of the Chinese population that are the basis for most survey research conducted in mainland China.
1. For example, there is the whole set of problems of questionnaire design and response bias to consider. “Correct” responses exist in many more and different categories, respondents may not believe their anonymity can be protected, the content and consequences of “incorrect” thinking change frequently, and the experience of most respondents with surveys is with official Party or government investigations. These are only some of the problems associated with the political context in the PRC. There are also problems associated with the cultural context and with the fairly low level of economic development. For some practical advice on some of these, see Henderson, Gail, “Survival guide to survey research in China,” China Exchange News, Vol. 21, No. 1 (1993), pp. 23–25, 33.Google Scholar For a more general discussion, see the Appendix on Survey Methods in my Retirement of Revolutionaries in China: Public Policies, Social Norms, Private Interests (Princeton, N.J.: Princeton University Press, 1993).
2. So far as I know, only two social scientists based in the West have been successful indrawing a probability sample of the whole population in the PRC: political scientist Tianjian Shi and sociologist Victor Nee. For work based on Shi's nation-wide survey, see Nathan, Andrew J. and Shi, Tianjian, “Cultural requisites for democracy in China: some findings from a survey,” Daedalus, Vol. 122, No. 2 (1993), pp. 95–123.Google Scholar For findings from Nee's nation-wide survey, see “The emergence of a market society: changing mechanisms of stratification in China,” Cornell University Working Papers on Transitions from State Socialism, No. 93.4. Nee has also published excellent analyses of findings from local survey work. See n. 5 below.
3. “Population” refers here to a specified aggregate of individuals (or other units) that defines the scope of supposed relevance of a survey, that is, the aggregate about which sample findings are intended to generalize. In the samples discussed here, for example, the population could be Chinese peasants or rural Chinese. A population may also be defined as the Chinese population as a whole.
4. Stanley Rosen and David Chu provide an excellent overview and assessment of survey research conducted by the Chinese in “Survey research in the People's Republic of China” (Washington, D.C.: United States Information Agency, Office of Research, 1987). Rosen introduces and discusses findings of a wide range of Chinese political surveys in “Political education and student response: some background factors behind the 1989 Beijing demonstrations,” Issues and Studies, Vol. 25, No. 10 (1989), pp. 12–39, and “The Chinese Communist Party and Chinese society: popular attitudes toward Party membership and the Party's Australian Journal of Chinese Affairs, No. 24 (1990), pp. 51–92. See also his edited volume of translations from the Chinese, “Youth socialization and political recruitment in post-Mao China,” Chinese Law and Government, Vol. 20, No. 2 (1987). Rosen describes the institutional and political framework for Chinese public opinion polling in “Public opinion and reform in the People's Republic of China,” Studies in Comparative Communism, Vol. 22, Nos. 2 and 3 (1989), pp. 153–170, and “The rise (and fall) of public opinion in post-Mao China,” in Baum, Richard (ed.), Reform and Reaction in Post-Mao China: The Road to Tiananmen (New York: Routledge, 1991), pp. 60–83.Google Scholar
5. Victor Nee, Andrew Walder and Martin King Whyte are a few of the sociologists in the China field who have in recent years published excellent work based on survey research in the PRC. For examples of Nee's findings based on local survey work, see “A theory of market transition: from redistribution to markets in state socialism,” American Sociological Review, Vol. 54, No. 5 (1989), pp. 663–681, and “Social inequalities in reforming state socialism: between redistribution and markets in China,” American Sociological Review, Vol. 56, No. 3 (1991), pp. 267–282. For Walder's work, see “Economic reform and income distribution in Tianjin, 1976–1986,” in Davis, Deborah and Vogel, Ezra F. (eds.), Chinese Society on the Eve of Tiananmen: The Impact of Reform (Cambridge, MA.: Harvard University Council on East Asian Studies, 1990), pp. 135–156Google Scholar, and “Property rights and stratification in socialist redistributive economies,” American Sociological Review, Vol. 57, No. 4 (1992), pp. 524–539. For Whyte's work, see “Changes in mate choice in Chengdu,” in Chinese Society on the Eve of Tiananmen, pp. 181–213, and Xiaohe, Xu and Whyte, Martin King, “Love matches and arranged marriages: a Chinese replication,” Journal of Marriage and the Family, Vol. 52, No. 3 (1990), pp. 709–722.CrossRefGoogle Scholar For a description of some recent collaborative survey projects in the PRC, see “Surveying the field: a sampling of collaborative survey projects,” in China Exchange News, Vol. 21, No. 1 (1993), pp. 17–22.
6. See the discussion in Rosen, “The rise (and fall) of public opinion in post-Mao China.”
7. For example, the Baoding survey project on mate choice and marriage, in which Martin King Whyte is involved, was revived in December 1992 and a second survey, this one on aging, and inter-generational relations, was undertaken in summer 1994. A University of Michigan in and Peking University collaborative survey of local government and political economy in four counties was reinstated and data released for coding and analysis in summer 1993. On the latter project, see Marshall, Eliot, “U.S. may renew collaboration after China relents on data,” Science, 6 August 1993, p. 677.Google Scholar
8. Some good introductions to sampling theory are: Kish, Leslie, Survey Sampling (New York: John Wiley and Sons, 1965)Google Scholar, and “Selection of the sample,” in Festinger, Leon and Katz, Daniel (eds.), Research Methods in the Behavioral Sciences (New York: Holt, Rinehart and Winston, 1953), pp. 175–239Google Scholar; Warwick, Donald P. and Lininger, Charles A., The Sample Survey: Theory and Practice (New York: McGraw-Hill, 1975).Google Scholar
9. This does not require that the dimension of interest be distributed normally in the population. The population can be highly skewed along the measured dimension; means from the repeated samples will nevertheless distribute themselves normally.
10. But see n. 2 above.
11. Some findings from Shi's nation-wide survey can be found in Nathan and Shi, “Cultural requisites for democracy in China: some findings from a survey.”
12. See especially the discussion in Kish, Survey Sampling, pp. 148–216, 301–439. See also Warwick and Lininger, The Sample Survey, pp. 111–125, for a good introductory illustration of a multi-stage area sample.
13. Probabilities of selection for each unit (e.g. county) in a cluster can be adjusted to take into account widely different sizes of units. See especially the discussion of selection of units with probabilities proportionate to size in Kish, Survey Sampling, pp. 217–253.
14. For example, a sample of size 2,000 yields about as precise estimates for a population of 40,000 as a sample of the same size does for a population of 200 million (assuming variability in the two populations is the same). See Kalton, Graham, Introduction to Survey Sampling, Sage University Paper, Quantitative Applications in the Social Sciences, No. 35 (Beverly Hills: Sage Publications, 1983).CrossRefGoogle Scholar The size of Shi's probability sample of the adult population of the PRC is only 2,800 individuals, selected from 200 townships or villages in 50 counties.
15. Moreover, once the sampler obtains the lists from which households or individuals are selected, other surveys can be conducted using the same lists - by drawing a new probability sample of elements from the lists. Shi is using his sampling frame (i.e. lists) again in the PRC component of a comparative survey of political culture and participation in the PRC, Taiwan, and Hong Kong currently under way.
16. For a generally accessible introduction to linear regression analysis, see Achen, Christopher H., Interpreting and Using Regression, Sage University Papers, Quantitative Applications in the Social Sciences, No. 29 (Beverly Hills: Sage Publications, 1982).CrossRefGoogle Scholar For a more thorough treatment, see especially Hanusbek, Eric A. and Jackson, John E., Statistical Methods for Social Scientists (New York: Academic Press, 1977).Google Scholar
17. We do not expect the line to run through the origin. That is, we expect that people with no education beyond primary school, for example, will earn some income. On a graph, that income is the point at which the line crosses the axis representing income. In a regression equation, that income is represented by the constant.
18. For a concise discussion of assumptions about the error term and how data collection procedures can affect estimation in analyses of relationships between variables, see Dubin, Jeffrey A. and Rivers, Douglas, “Selection Bias In Linear Regression, Logit And Probit Models,” in Fox, John and Long, J. Scott (eds.), Modern Methods of Data Analysis (Newbury Park, CA: Sage Publications, 1990), pp. 410–442.Google Scholar
19. It is important to point out, however, that sampling procedures are not the only source of potential violation and that probability sampling is by no means an absolute guarantee against violation. On the first point, model misspecification (e.g. omitting an explanatory variable) is the most obvious source of violation not related to data collection. Regarding sampling, the literature on sample selection bias generally assumes probability sampling methods are used.
20. The survey drew the attention of Western scholars soon after the Chinese press began to publicize it, although none assumed the findings were necessarily representative. It is cited, for example, in Burns, John P., “Local cadre accommodation to the ‘responsibility system’ in rural China,” Pacific Affairs, Vol.58, No. 4 (1985–86), pp. 607–625CrossRefGoogle Scholar; Oi, Jean C., “Commercializing China's rural cadres,” Problems of Communism, Vol. 35, No. 5 (1986), pp. 1–15Google Scholar; Zweig, David, “Prosperity and conflict in post-Mao China,” The China Quarterly, No. 105 (1986), pp. 1–18.CrossRefGoogle Scholar Burns also cites other Chinese surveys reporting similar findings.
21. See especially Oi, Jean C., “Peasant households between plan and market: cadre control over agricultural inputs,” Modem China, Vol. 12, No. 2 (1986), pp. 230–251CrossRefGoogle Scholar, State and Peasant in Contemporary China: The Political Economy of Village Government (Berkeley & Los Angeles: University of California Press, 1989), pp. 155–226, and “Commercializing China's rural cadres.” David Zweig advances the third argument in “Prosperity and conflict in post-Mao rural China.” And Richard J. Latham describes a policy in some localities of distributing better land to cadres to encourage them to remain, in face of widespread abandonment of positions by local cadres. See “The implications of rural reforms for grass-roots cadres,” in Perry, Elizabeth J. and Wong, Christine (eds.), The Political Economy of Reform in Post-Mao China (Cambridge, MA: Harvard University Press, 1985), pp. 157–173.Google Scholar
22. This is not to say that scholars dismiss the notion that correlates may additionally account for cadre success in the reformed rural order.
23. See especially Nee's “A theory of market transition,” and “Social inequalities in ict reforming state socialism.”
24. In terms of statistical analysis, there is no difference between these “control variables” and the cadre position variable. The term implies something about our motivation in attempting to explain change in household income: all the variables are theorized causes (explanatory variables), but the theoretical relationship we are mainly interested in is that between cadre position and change in household income. That is, interest in that relationship is the impetus for our survey. If the survey is prompted by the more general question “what is explains change in household income?” then the term “control variables” is inappropriate: the variables are simply explanatory variables, with the same theoretical status as cadre position.
25. Obviously, if there are no individual-level measures on the relevant dimension (and there may not be, as the variable is excluded from the original model), the generalizability of findings will have to be argued on purely theoretical grounds until new data are collected. That is, aggregate data cannot be used to solve this problem.
26. The cadre position variable illustrated in the Figures is an “indicator variable.” The values 1 and 2 are arbitrary and do not reflect a quantitatively interpretable relationship.
27. The classic discussion of the former problem and its solution is Tobin, James, “Estimation of relationships for limited dependent variables,” Econometrica, Vol. 26, No. 1 (1958), pp. 24–36.CrossRefGoogle Scholar For the latter, see especially Heckman, James J., “Sample selection bias as a specification error,” Econometrica, Vol. 47, No. 1 (1979), pp. 153–161.CrossRefGoogle Scholar
28. Dubin and Rivers, “Selection bias in linear regression, logit and probit models,” p. 413.
29. Tobin, “Estimation of relationships for limited dependent variables.”
30. For an excellent illustration of this problem, see Geddes, Barbara, “How the cases you choose affect the answers you get: selection bias in comparative politics,” in Stimson, James A. (ed.), Political Analysis, Vol. 2 (Ann Arbor: University of Michigan Press, 1990), pp. 131–150.Google Scholar
31. See n. 20, above.
32. Note that the essence of the problem is censoring of values on the dependent variable, not that the sample has few observations at the low values of the independent variable (education). This is not the problem of unrepresentative fractions discussed above.
33. The opportunities for manipulation by lower-level cadres seem to be limited. In interviews with 28 Chinese from the countryside, Jonathan Unger found that country-level leaders typically imposed on lower levels strict rules governing division of collective resources. In a village where village cadres controlled the process, blatant favouritism did take place. See ‘The decollectivization of the Chinese countryside: a survey of twenty-eight villages,” Pacific Affairs, Vol. 58, No. 4 (1985–86), pp. 585–625.
34. Accessable introductions are: Achen, Christopher H., The Statistical Analysis of Quasi-Experiments (Berkeley & Los Angeles: University of California Press, 1986), pp. 73–161Google Scholar; Berk, Richard A., “An introduction to sample selection bias in sociological data,” American Sociological Review, Vol. 48, No. 3 (1983), pp. 386–398.CrossRefGoogle Scholar