Журнал: Социология: методология, методы, математическое моделирование (Социология:4М)

Бызов А. А.
Интеллектуальный анализ текстов в социальных науках


Бызов Александр Александрович
Национальный исследовательский университет «Высшая школа экономики».


 

Полный текст

Открыть текст

Ссылка при цитировании:

Бызов А. А. Интеллектуальный анализ текстов в социальных науках // Социология: методология, методы, математическое моделирование (Социология:4М). 2019. Том. 0. № 49. С. 131-160.

Рубрика:

ОНЛАЙН-ИССЛЕДОВАНИЯ

Аннотация:

На протяжении практически всей истории социологии социологи стремились изучать неструктурированные органические тексты: материалы газет, дневники, мемуары, письма, документы, а с недавнего времени и сообщения, публикации и другие тексты на различных онлайн-платформах. В этой статье обсуждается то, как современные техники интеллектуального анализа текста (ИАТ) могут улучшить классические социологические подходы к анализу такого типа данных. Статья построена по следующему плану. Сначала обсуждаются примеры классического количественного контент-анализа и его ограничения, которые решаются с помощью ИАТ. Затем обсуждается, как ИАТ применяется в современных исследованиях социальных наук. На примере исследования с применением структурного тематического моделирования показывается, как ИАТ может применяться в исследованиях аннотаций научных статей для выявления встречающихся в этих статьях тем, их распространенности в разные годы и связей между этими темами. На другом примере исследования, в котором классифицировались сообщения в социальной сети Twitter, показывается, как такой тип нереактивных текстовых данных сопоставляется с результатами интернет-опросов и телефонных опросов. Наконец, в заключении статьи обсуждаются некоторые современные подходы к анализу текстов с применением глубинного обучения.

Литература:

  • Text Mining for Central Banks: Handbook / D. Bholat [et al.] // LSE Research Online [site]. Last update: 11.04.2020. URL: http://eprints.lse.ac.uk/62548/ (date of access: 15.12.2019).
  • Benoit K. Text as Data: An Overview. Version: 17.07.2019. URL: https:// kenbenoit.net/pdfs/28%20Benoit%20Text%20as%20Data%20draft%202.pdf (date of access: 15.12.2019).
  • Девятко И. Инструментарий онлайн-исследований: попытка каталоги¬зации // Онлайн исследования в России 3.0. Москва: Online Market Intelligence, 2012. С. 17–31.
  • Lazarsfeld P.F., Oberschall A.R. Max Weber and Empirical Social Research // American Sociological Review. 1965. Vol. 30. No. 2. P. 185–199.
  • Krippendorff K. Content Analysis: An Introduction to its Methodology. Thousands Oaks, CA: Sage Publications, 2004.
  • Evans J.A., Aceves P. Machine Translation: Mining Text for Social Theory // Annual Review of Sociology. 2016. No. 42. P. 21–50.
  • Bail C.A. The Cultural Environment: Measuring Culture with Big Data // Theory and Society. 2014. Vol. 43. No. 3–4. P. 465–482.
  • 1 Second – Internet Live Stats. URL: https://www.internetlivestats.com/one-second/ (date of access: 03.11.2019).
  • Ledford H. Facebook Gives Social Scientists Unprecedented Access to its User Data // Nature [site]. 2019. May 03. URL: https://www.nature.com/articles/d41586- 019-01447-5 (date of access: 15.12.2019).
  • Kitchin R. Big Data, New Epistemologies and Paradigm Shifts // Big Data & Society. 2014. Vol. 1. No. 1. P. 1–12.
  • Salganik M. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press, 2019.
  • Golder S.A., Macy M.W. Digital Footprints: Opportunities and Challenges for Online Social Research // Annual Review of Sociology. 2014. No. 40. P. 129–152.
  • Lazer D., Radford J. Data ex Machina: Introduction to Big Data // Annual Review of Sociology. 2017. No. 43. P. 19–39.
  • Bor?us K., Bergstr?m G. Analyzing Text and Discourse: Eight Approaches for the Social Sciences. Washington, DC: Melbourne Sage, 2017.
  • Ignatow G., Mihalcea R. An Introduction to Text Mining: Research Design, Data Collection, and Analysis. Thousand Oaks, CA: Sage Publications, 2018.
  • Bryman A. Social Research Methods. Oxford: Oxford Univ. Press, 2016.
  • Nelson L.K. Computational Grounded Theory: A Methodological Framework // Sociological Methods & Research. 2020. Vol. 49. No. 1. P. 3–42.
  • Cioffi-Revilla C. Introduction to Computational Social Science: Principles and Applications. Fairfax, VA: Springer, 2017.
  • Chen S.-H. Big Data in Computational Social Sciences and Humanities. Cham: Springer, 2018.
  • Computational Social Science / D. Lazer [et al.] // Science. 2009. Vol. 323. No. 5915. P. 721–723.
  • A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in The UK Press / P. Baker [et al.] // Discourse & Society. 2008. Vol. 19. No. 3. P. 273–306.
  • Bednarek M., Caple H. Why Do News Values Matter? Towards a New Methodological Framework for Analyzing News Discourse in Critical Discourse Analysis and Beyond // Discourse & Society. 2014. Vol. 25. No. 2. P. 135–158.
  • Jo W. Possibility of Discourse Analysis Using Topic Modeling // Journal of Asian Sociology. 2019. Vol. 48. No. 3. P. 321–342.
  • Berelson B. Content Analysis in Communication Research. Glencoe, IL: Free Press, 1952.
  • Neuendorf K.A. The Content Analysis Guidebook. Los Angeles, CA: Sage Publications, 2017.
  • Mikhaylov S., Laver M., Benoit K.R. Coder Reliability and Misclassification in the Human Coding of Party Manifestos // Political Analysis. 2012. Vol. 20. No. 1. P. 78–91.
  • Лукашевич Н.В., Левчик А.В. Создание лексикона оценочных слов русского языка РуСентилекс // OSTIS-2016: материалы VI междунар. науч.-техн. конф. (Минск, 18–20 февраля 2016 года) / Отв. ред. В.В. Голенков. Минск: БГУИР, 2016. С. 377–382.
  • Stine R.A. Sentiment Analysis // Annual Review of Statistics and Its Application. 2019. Vol. 6. No. 1. P. 287–308.
  • Analyzing Media Messages: Using Quantitative Content Analysis in Research / D. Riff [et al.] London: Routledge, 2019.
  • Feldman R., Sanger J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge Univ. Press, 2007.
  • Wachsmuth H. Text Analysis Pipelines: Towards Ad-hoc Large-Scale Text Mining. Cham: Springer, 2015.
  • Practical Text Mining and Statistical Analysis for Non-Structured Text Data Applications / G. Miner [et al.] Amsterdam: Academic Press, 2012.
  • Hanna A. Computer-aided Content Analysis of Digitally Enabled Movements // Mobilization: An International Quarterly. 2013. Vol. 18. No. 4. P. 367–388.
  • Lindstedt N.C. Structural Topic Modeling for Social Scientists: A Brief Case Study with Social Movement Studies Literature, 2005–2017 // Social Currents. 2019. Vol. 6. No. 4. P. 307–318.
  • Big Data, Social Media, and Protest: Foundations for a Research Agenda / J.A. Tucker [et al.] // Computational Social Science: Discovery and Prediction. New York: Cambridge Univ. Press, 2016. P. 199–224.
  • Kozlowski A.C., Taddy M., Evans J.A. The Geometry of Culture: Analyzing Meaning through Word Embeddings // American Sociological Review. 2019. Vol. 84. No. 5. P. 905–949.
  • Brady H.E. The Challenge of Big Data and Data Science // Annual Review of Political Science. 2019. Vol. 22. No. 1. P. 297–323.
  • Grimmer J. A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases // Political Analysis. 2010. Vol. 18. No. 1. P. 1–35.
  • Grimmer J. Measuring Representational Style in the House: The Tea Party, Obama, and Legislators’ Changing Expressed Priorities // Computational Social Science: Discovery and Prediction. New York: Cambridge Univ. Press, 2016. P. 225–245.
  • Slapin J.B., Proksch S.-O. A Scaling Model for Estimating Time-series Party Positions from Texts // American Journal of Political Science. 2008. Vol. 52. No. 3. P. 705–722.
  • Young L., Soroka S. Affective News: The Automated Coding of Sentiment in Political Texts // Political Communication. 2012. Vol. 29. No. 2. P. 205–231.
  • Proksch S.-O., Slapin J.B. Position Taking in European Parliament Speeches // British Journal of Political Science. 2010. Vol. 40. No. 3. P. 587–611.
  • Roberts M.E. Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science // Political Analysis. 2016. Vol. 24. No. 10. P. 1–5.
  • Grimmer J., Stewart B.M. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts // Political Analysis. 2013. Vol. 21. No. 3. P. 267–297.
  • Schoonvelde M., Schumacher G., Bakker B.N. Friends with Text as Data Benefits: Assessing and Extending the Use of Automated Text Analysis in Political Science and Political Psychology // Journal of Social and Political Psychology. 2019. Vol. 7. No. 1. P. 124–143.
  • Wilkerson J., Casas A. Large-scale Computerized Text Analysis in Political Science: Opportunities and Challenges // Annual Review of Political Science. 2017. No. 20. P. 529–544.
  • Computational Communication Science: A Methodological Catalyzer for a Maturing Discipline / M. Hilbert [et al.] // International Journal of Communication. 2019. No. 13. P. 3913–3934.
  • The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods / L.K. Nelson [et al.] // Sociological Methods & Research. 2018. P. 1–36.
  • Kl?ver H. Measuring Interest Group Influence Using Quantitative Text Analysis // European Union Politics. 2009. Vol. 10. No. 4. P. 535–549.
  • Baerg N., Lowe W. A Textual Taylor Rule: Estimating Central Bank Preferences Combining Topic and Scaling Methods // Political Science Research and Methods. 2020. Vol. 8. No. 1. P. 106–122.
  • Lowe W., Benoit K. Validating Estimates of Latent Traits from Textual Data Using Human Judgment as a Benchmark // Political Analysis. 2013. Vol. 21. No. 3. P. 298–313.
  • Lowe W. Understanding Wordscores // Political Analysis. 2008. Vol. 16. No. 4. P. 356–371.
  • Text as Policy: Measuring Policy Similarity through Bill Text Reuse / F. Linder [et al.] // Policy Studies Journal. 2020. Vol. 48. P. 546–574..
  • Allee T., Lugg A. Who Wrote the Rules for the Trans-Pacific Partnership? // Research & Politics. 2016. Vol. 3. No. 3. P. 1–9.
  • Wilkerson J., Smith D., Stramp N. Tracing the Flow of Policy Ideas in Legislatures: A Text Reuse Approach // American Journal of Political Science. 2015. Vol. 59. No. 4. P. 943–956.
  • Automatic Personality Assessment through Social Media Language / G. Park [et al.] // Journal of Personality and Social Psychology. 2015. Vol. 108. No. 6. P. 934–952.
  • Personality, Gender, and Age in the Language of Social Media: The Open-vocabulary Approach / H.A. Schwartz [et al.] // PLOS ONE. 2013. Vol. 8. No. 9. P. 1–16.
  • Schwartz H.A., Ungar L.H. Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods // The ANNALS of the American Academy of Political and Social Science. 2015. Vol. 659. No. 1. P. 78–94.
  • Jurafsky D., Martin J.H. Speech and Language Processing: An Introduction to Speech Recognition, Computational Linguistics and Natural Language Processing // Upper Saddle River, NJ: Prentice Hall, 2008.
  • Bekkerman R., Allan J. Using Bigrams in Text Categorization. 27.12.2003. URL: http://ciir.cs.umass.edu/pubfiles/ir-408.pdf (date of access: 15.12.2019).
  • A Review of Best Practice Recommendations for Text Analysis in R (and a UserFriendly App) / G.C. Banks [et al.] // Journal of Business and Psychology. 2018. Vol. 33. No. 4. P. 445–459.
  • Кольцова О.Ю., Маслинский К.А. Выявление тематической структуры российской блогосферы: автоматические методы анализа текстов // Социология: методология, методы, математическое моделирование. 2013. № 36. P. 113–139.
  • Schofield A., Mimno D. Comparing Apples to Apple: The Effects of Stemmers on Topic Models // Transactions of the Association for Computational Linguistics. 2016. No. 4. P. 287–300.
  • Denny M.J., Spirling A. Text Preprocessing for Unsupervised Learning: Why it Matters, When it Misleads, and What to Do about it // Political Analysis. 2018. Vol. 26. No. 2. P. 168–189.
  • Schofield A., Magnusson M., Mimno D. Pulling out the Stops: Rethinking Stopword Removal for Topic Models // Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Vol. 2: Short Papers. Valencia: Association for Computational Linguistics, 2017. P. 432–436.
  • Molina M., Garip F. Machine Learning for Sociology // Annual Review of Sociology. 2019. No. 45. P.27–45.
  • Wesslen R. Computer-assisted Text Analysis for Social Science: Topic Models and Beyond. 03.04.2018. URL: https://arxiv.org/pdf/1803.11045 (date of access: 15.12.2019).
  • Structural Topic Models for Open-ended Survey Responses / M.E. Roberts [et al.] // American Journal of Political Science. 2014. Vol. 58. No. 4. P. 1064–1082.
  • Blei D.M. Probabilistic Topic Models // Communications of the ACM. 2012. Vol. 55. No. 4. P. 77–84.
  • Blei D.M., Lafferty J.D. A Correlated Topic Model of Science // The Annals of Applied Statistics. 2007. Vol. 1. No. 1. P.17–35.
  • Efficient Correlated Topic Modeling with Topic Embedding / J. He [et al.] // Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax, Canada: ACM, 2017. P. 225–233.
  • Blei D.M., Lafferty J.D. Dynamic Topic Models // Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006. P. 113–120.
  • Scaling up Dynamic Topic Models / A. Bhadury [et al.] // Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences. Montreal, 2016. P. 381–390.
  • Hierarchical Topic Modeling with Automatic Knowledge Mining / Y. Xu [et al.] // Expert Systems with Applications. 2018. No. 103. P. 106–117.
  • Scalable Training of Hierarchical Topic Models / J. Chen [et al.] // Proceedings of the VLDB Endowment. 2018. Vol. 11. No. 7. P. 826–839.
  • Roberts M., Stewart B., Tingley D. Structural Topic Models. URL: https:// www.structuraltopicmodel.com/ (date of access: 15.12.2019).
  • Loftis M.W., Mortensen P.B. Collaborating with the Machines: A Hybrid Method for Classifying Policy Documents // Policy Studies Journal. 2020. No. 48. P. 184–206.
  • Predicting the Brexit Vote by Tracking and Classifying Public Opinion Using Twitter Data / J. Amador [et al.] // Statistics, Politics and Policy. 2017. Vol. 8. No. 1. P. 85–104.
  • Watanabe K. Newsmap // Digital Journalism. 2018. Vol. 6. No. 3. P. 294–309.
  • Anandarajan M., Hill C., Nelson T. Classification Analysis: Machine Learning Applied to Text // Practical Text Analytics: Maximizing the Value of Text Data. Switzerland: Springer, 2019. P. 131–149.
  • Distributed Representations of Words and Phrases and their Compositionality / T. Mikolov [et al.] // Advances in Neural Information Processing Systems. 2013. P. 3111–3119.
  • Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes / N. Garg [et al.] // Proceedings of the National Academy of Sciences. 2018. Vol. 115. No. 16. P. E3635–E3644.
  • Gurciullo S., Mikhaylov S.J. Detecting Policy Preferences and Dynamics in the Un General Debate with Neural Word Embeddings // 2017 International Conference on the Frontiers and Advances in Data Science (FADS). Xi’an, China: IEEE, 2017. P. 74–79.
  • Seyeditabari A., Zadrozny W. Can Word Embeddings Help Find Latent Emotions in Text? Preliminary Results // The Thirtieth International Flairs Conference. Marco Island, USA, 2017. P. 206–209.
  • Zhang H., Pan J. CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media // Sociological Methodology. 2019. Vol. 49. No. 1. P. 1–57.
  • Ignatow G. Theoretical Foundations for Digital Text Analysis // Journal for the Theory of Social Behaviour. 2016. Vol. 46. No. 1. P. 104–120.

Сведения об авторах


Бызов Александр Александрович
Национальный исследовательский университет «Высшая школа экономики».
Аналитик Института образования, аспирант школы социологических наук, Национальный исследовательсий университет «Высшая школа экономики»

Содержание выпуска

>> Содержание выпуска 2019. Том. 0. № 49.
>> Архив журнала