PROBABILISTIC TOPIC MODEL FOR ONTOLOGY-BASED ASPECT DETECTION AND EXTRACTION IN OPINION MINING

ABSTRACT

The ease of accessing documents on the web has drawn great attention to the web as one of the major sources of individual opinions. As a result, an enormous amount of opinion data is generated by this dynamically expanding web daily. This development is of great importance to an individual, businesses, and government organizations. However, the massive volume of these opinion data becomes overwhelming and makes it difficult to manually extract the critical concepts and knowledge inherent in these data. Hence, there is a need for automatic methods and algorithms that can effectively process these unstructured textual data. Aspect-based Opinion mining involves determining the sentiment orientation of the entities identified in an opinion text, and their respective aspects. Most prominent works in this area of research have been based on regular methods of supervised machine learning and lexicons-based approaches, which mainly depend on existing resources and tools such as senticNet, sentiWordNet, and other semantic networks. However, significant proportions of users’ vocabulary may not be mapped to their respective semantic types in these networks. This work proposed an ontology-based aspect and entity detection model that employs the Latent Dirichlet Allocation algorithm and natural language techniques to analyze user-generative texts and extract relevant information for conceptualizing the ontology. The usefulness of these hybrid techniques is shown by conducting experiments on real-world datasets which consist of user-generated opinions on COVID-19. The algorithm takes advantage of the unique token and bag of words generated, to extract the main topics in the corpus. The final output representing 100 topics that comprise top keywords with their associated weightage contribution to the topics was generated, with some level of patterns observed and inferred from the generated topics. The system was evaluated using a coherence score which has the highest value of 0.36204.

Keywords: Aspect-based Sentiment Analysis, Data Analytics, Latent Dirichlet Allocation algorithm, Natural Language Processing, Opinion Mining, Ontology, Text mining, Topic models.