Detecting sentiment in Twitter data – challenges and implementation

Joanna Michalak


Twitter is one of the most popular micro-blogging platforms where users publish their thoughts and opinions and much attention is paid to explore sentiment of these opinions. This paper focuses on the characteristic of Twitter, tweets and supervised machine-learning method for Twitter Sentiment Analysis. Discussion focuses on the following issues: access to the tweets and creating a database, the process of cleaning the database and process of tweets classification into positive and negative groups. The TSA process is presented in Python by simplified architecture.

Full Text:



Asur, S., & Huberman, B. (2010). Predicting the future with social media. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01 (WI-IAT’10). IEEE Computer Society, Washington, DC, 492–499.

Barbosa, B., & Feng J. (2010). Robust sentiment detection on twitter from biased and noisy data. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10), Association for Computational Linguistics, Stroudsburg, PA, 36–44.

Barham, A., & Shaknomirov, A. Methods for sentiment Analysis of Twitter Messages, Proceeding of the 12th Conference of FRUT Association, available online at:, 19.11.2016.

Bermingham, A., & Smeaton, A. (2010). Classifying sentiment in microblogs: Is brevity an advantage? In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, 1833–1836.

Bollen, J., & Mao, H., & Zeng X. (2010). Twitter mood predicts the stock market. J. Comput. Sci. 2, 1–8.

Bonzanini, M., Mastering Social Media Mining with Python, available online at: [27.08.2016].

da Silva, F., & Hruschka, E., & Hruschka E. (2014). Tweet sentiment analysis with classifier ensembles. Decision Supp. Syst. 66, 170–179.

Dodd, J., Twitter Sentiment Analysis, Final Project Report, at:

Giachanou, A., & Crestani, F. (2016) Like it or not: A survey of Twitter sentiment analysis methods, ACM Comput. Surv. 49, 2, Article 28.

Kaplan, A., & Haenlein, M. (2011). The early bird catches the news: Nine things you should know about micro-blogging. Business Horizons 54, 105-113.

Kiritchenko, S., & Zhu, X., & Mohammad S. (2014). Sentiment analysis of short informal texts. J. Artif. Intell. Res., 723–762.

Lin, J., & Kolcz A. (2012). Large-scale machine learning at twitter. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). ACM, New York, 793– 804.

Liu, B. (2012). Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers. available online at: [19.11.2016]

Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. In Mining Text Data. Springer, New York, 415–463.

Martїnez-Cámara, E., & Martїn-Valdivia, T., & Montejo-Ráez, U. (2012). Sentiment analysis in twitter. Nat, Lang, Eng, 20, 1–28.

O’Connor, B., & Balasubramanyan, R., & Routledge, B., & Smith, N. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media (ICWSM’10). AAAI Press.

Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the 7th on International Language Resources and Evaluation Conference (LREC’10). European Language Resources Association (ELRA), 1320–1326.

Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2, 1–135.

Pang, B., & Lee, L. & Vaithyanathan, S. Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10 (EMNLP’02). Association for Computational Linguistics.

Petrović, S., & Osborne, M., & Lavrenko, V. (2010). The Edinburgh twitter corpus. In Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media (WSA’10). Association for Computational Linguistics, Stroudsburg, PA , 25–26.

Saif, H., & Fernández, M., & He, Y., & Alani, H. (2012). On stopwords, filtering and data sparsity for sentiment analysis of twitter. In Proceedings of the 9th International Language Resources and Evaluation Conference (LREC’14). European Language Resources Association (ELRA), 810–817.

Saif, H., & He, Y., & Alani, H., (2012). Alleviating data sparsity for twitter sentiment analysis. In Workshop on Making Sense of Microposts (#MSM2012), Big Things Come in Small Packages at the 21st International Conference on the World Wide Web (WWW’12)., 2–9.

Tai, Y. J., & Kao, H., (2013). Automatic domain-specific sentiment lexicon generation with label propagation. In Proceedings of the International Conference on Information Integration and Web-Based Applications & Services (IIWAS’13). ACM, New York, 53–62.

Tsytsarau, M., & Palpanas, T. (2012). Survey on mining subjective data on the web. Data Min. Knowl. Discov. 24, 478–514.

Vasu, J. (2013). Prediction of movie success using sentiment analysis of tweets. The International Journal of Soft Computing and Software Engineering [JSCSE], Vol. 3, No. 3, Special Issue: The Proceeding of International Conference on Soft Computing and Software Engineering 2013 [SCSE’13], San Francisco State University, CA, U.S.A.

Zhang, L., & Ghosh, R., & Dekhil, M., & Hsu, M., & Liu, B. (2011). Combining Lexiconbased and Learning-based Methods for Twitter Sentiment Analysis. Technical Report.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Print ISSN: 1643-8175, Online ISSN: 2451-0955, DOI prefix: 10.19197, Principal Contact: