The information is prepared on the basis of data from the information-analytical system RSF, informative part is represented in the author's edition. All rights belong to the authors, the use or reprinting of materials is permitted only with the prior consent of the authors.

Project titleResearch and development of multimodal unstructured data processing and analysis technology from various sources and their application for solving economic and social problems

Research area 01 - MATHEMATICS, INFORMATICS, AND SYSTEM SCIENCES, 01-201 - Artificial intelligence and decision-making

Annotation
In the era of the digital revolution (the emergence of the Internet, social networks, as well as smartphones and wearable devices) a huge amount of data, including those containing information about a person's personality, have become available in digital form. The depth of penetration of social networks into a person's daily life is increasing every day. According to research from leading marketing company GWI in 2015, on average, each person uses about three different social networks at the same time in their daily life. The number of communities and user accounts on social networks is constantly growing. Social networks are a unique source of massive primary data. At the same time, users themselves make their personal data to be publicly available in order to be more socially engaged and exposed to their friends. These data can be used to identify the primary signs of disease in the early stages, analyze consumer demand, prevent terrorist threats, antisocial behaviour, etc. Such information from the Internet, including the user-generated data from social networks, can be used to solve a wide range of important tasks listed in the "Strategy of the Scientific and Technological Development of the Russian Federation" statement (hereinafter - the Strategy). The so-called "big challenges" that are often related to significant risks for society, the economy, the public administration system, often require an immediate reaction, in particular those threats associated with extremism or asocial behaviour (i.e. posing internal threats to national security). Efficient processing of the multimodal data at a large scale can be done and was previously found efficient by using big data analysis methods, machine learning and artificial intelligence approaches. In view of the above, the development and use of advanced methods for analyzing these data for solving economic and social problems are highly relevant. In addition, it is important to have a clear indication of economic and social tasks of the scientific and technological development of the Russian Federation, the solution of which is possible using big data technologies, machine learning and artificial intelligence, as well as methods and approaches, the use of which is possible to solve problems from this list, including the order of their application. This checklist will be needed in the management of the appropriate technology development strategy. This project is aiming at solving the aforementioned problems by using big data technologies, machine learning and artificial intelligence. In particular, multimodal unstructured data from social networks, is planned to be utilized to solve a range of social and economic problems in the framework of the scientific and technological development of the Russian Federation. In addition, the project is aiming at solving the problem of increasing the efficiency of processing large volumes of multimodal data from various sources, including social networks. To put the project into real-world settings, the prototype of an information system for the analysis of multimodal data from various sources, including social networks, is planned to be developed. As a result, the project will significantly contribute to the development of in-house tools for the analysis of socially and economically significant information in the Russian-speaking segment of the Internet. Particularly, the need for the creation of in-house systems for processing large amounts of multimodal data, machine learning and artificial intelligence, is explicitly indicated as part of one of the directions of the Strategy of Scientific and Technological Development of the Russian Federation. In addition, taking into account the possible use case planned for obtaining the project results, compliance is ensured with such directions of the RF STD Strategy as "transition to personalized medicine, high-tech healthcare" and "counteraction to technogenic, biogenic, socio-cultural threats, terrorism and ideological extremism, and also cyber threats and other sources of danger for society, economy and state. " The scientific problem, which the project is aimed at solving, is significant for the above directions of the Strategy of Scientific and Technological Development of the Russian Federation. Based on our review of the patent and non-patent information sources, there is a limited number of scientific, technical and applied solutions for creating descriptive and predictive models of a person's personality that are capable of using this data adequately and with acceptable accuracy. Moreover, there are very few scientific, technical and applied solutions for creating these models that would use several multimodal unstructured data sources. Most of the social networks used to analyze data do not take advantage of user data from multiple sources. The novelty of the research lies in the development of new methods for analyzing multimodal data from various sources, profiling users of social networks, descriptive and predictive user models restored from various data sources, in the development of new models, algorithms for identifying communities, components of information and recommender systems, algorithms for finding a set similar users, and the corresponding support component.

Expected results
The main result of the project will be a scientific and technical backbone in the field of advanced models, methods, algorithms and tools for processing and analyzing multimodal unstructured data from various sources, including social networks, to solve a range of social and economic problems within the framework of scientific and technological development of the Russian Federation. ... In the course of the project, it is planned to perform the following works: 1) analysis of existing advanced solutions in the field of intellectual analysis of multimodal unstructured data from various sources, including social networks; 2) analysis of the applied application of the methods of intellectual analysis of these multimodal unstructured data for solving social and economic problems that ensure the sustainable development of the Russian Federation; 3) formation of a list of social and economic problems, the solution of which is possible using the available methods of intellectual analysis of multimodal unstructured data; 4) research of trends and directions of development of models, methods and algorithms for intellectual analysis of multimodal unstructured data; 5) development of new effective methods and algorithms for intelligent analysis of multimodal unstructured data, including from social networks; 6) software implementation of the developed algorithms as part of the components and prototype of the information system for analyzing multimodal unstructured data from various sources, including social networks. The adequacy of the developed technical solutions will be checked on the data sets available to the contractor, as well as using the data collected using the developed models, methods and algorithms. Taking into account the existing scientific and technical groundwork, the following results will be obtained during the project (new ones are developed or existing ones adapted to solve specific problems): 1) methods for analyzing various types of data (images, geolocation, data from wearable devices) from several sources (such as Vkontakte, Twitter, Instagram, etc.), ensuring the effectiveness of profiling users of social networks; 2) a model combining heterogeneous data types, taking into account spatial and temporal aspects; 3) a method of group user profiling based on multimodal unstructured data obtained from various sources; 4) a method of projecting multimodal unstructured data into latent space and/or merging multimodal unstructured data; 5) a method for profiling user characteristics, including an approach that minimizes the impact of "data gaps" on the quality of the results of their processing; 6) an algorithm for determining the similarity of sequences within the framework of a hierarchical model for measuring the similarity of users based on a graph; 4) an algorithm that ensures the use of the developed methods (with further software implementation in the form of components of information systems for analyzing multimodal unstructured data from various sources); 9) scientific and technical proposals for the application of the developed models, methods and algorithms; 10) the concept of the architecture of information systems for processing multimodal unstructured data from different sources. The scientific and applied significance of the project is determined by the need of the socio-economic system of the Russian Federation for the appropriate tools, including the development of descriptive and predictive models that work on the basis of various sources of multimodal unstructured data. The results of analytics of large multimodal unstructured data from various sources, including social networks, can be used in the interests of the Russian Federation for: 1) monitoring the psychoemotional state and everyday needs of the population of Russia, in particular, in stressful situations in society (for example, in a difficult epidemiological situation); 2) effective detection and restriction of access to unwanted, questionable and harmful information in public sources of unstructured multimodal data (for example, social networks) content, including content containing information about extremism and terrorism; 3) identification of risk factors for the safety of users of social networks (identification of persons prone, for example, to suicide or to commit an illegal act); 4) obtaining information about the consumer preferences of Internet users (social networks). Note - A more detailed analysis of the options and application procedure, planned for obtaining during the project implementation, technical solutions in the field of processing multimodal unstructured data from various sources, will be carried out during the implementation of the project. It should be noted that most of the patented technical solutions in the field of advanced models, methods and algorithms for intelligent analysis of multimodal unstructured data, unfortunately, belong to foreign authors (copyright holders), which is confirmed by the results of preliminary patent research. In the course of the work on the project, it is also planned to conduct patent research, the results of which will allow an assessment of the patent landscape in the relevant field, in order to determine the technical level of the planned developments, to determine the novelty, competitiveness and patent purity of the proposed technical solutions. The level of the planned results of the project will correspond to the world, which is indirectly ensured by the general relevance of the topic, the level of the existing scientific groundwork, the planned participation in international collaborations that took shape long before the application was filed. The compliance of the planned project results with the world level will be ensured, among other things, by the project manager's extensive experience and numerous practical results of scientific research recognized by the world scientific community based on the results of publications, and key reports at conferences of A * level, such as ACM MM, SIGIR, AAAI, CIKM, WSDM, ICMR, TOIS, TIST, WWW, etc. (see https://s.somin.ai/farseev-scholar) The planned results of the project will be original, based on scientific and technical achievements and the results of the project executors (obtained earlier and currently available), as well as based on modern achievements in the field of approaches, methods and algorithms for intelligent analysis of multimodal unstructured data, graph theory, modelling, etc. The main results of the project are planned to be published in several articles in journals indexed in the Web of Science or Scopus databases.

Annotation of the results obtained in 2022
The project is devoted to identifying breakthrough areas of research in the development and analysis of efficient technologies for processing and analyzing multimodal unstructured data collected from different sources to solve both economic and social problems. During the first year of the project, important results were obtained in all prospective studies and studies. First, we have prepared a comprehensive review of contemporary research and regulatory literature, covering the research and technical problems planned to be implemented in the project. Results of this review have confirmed the relevance of our plan to develop advanced technologies for processing and analyzing multimodal unstructured data that can be collected in huge amounts from the Internet, used already by a majority of people in the world. We have also conducted patent studies for the project topics, a study of the evaluation and selection of models, methods and algorithms for mining multimodal unstructured data, and an analysis of possible applications for mining multimodal unstructured data for solving social and economic problems. In addition, the results of the analytical study, analysis and studies have allowed us to narrow down the choice of research areas: we plan to develop a prototype of a recommender system as one of the currently most popular applications based on increasingly common analysis of unstructured multimodal data. The prototype will contain new results and technologies developed during the project. At the same time, models, methods and algorithms developed during the project for the collection and analysis of multimodal unstructured data can be used either directly or as the basis for further research for other social and economic problems. The effectiveness of systems for analyzing multimodal unstructured data depends on the choice of a certain composition (ensemble) of appropriate approaches, models, methods and algorithms. This choice should take into account the corresponding metrics and quality indicators, including the availability of software and hardware. In order to expand user profiling in recommended systems using features based on personality traits, we have developed a new user personality profiling system that can combine and share data from different sources (for example, social networks), as well as data from different modalities: texts, images, etc. Based on these data types, the developed PERS model predicts Myers-Briggs personality types (MBTI) by solving four parallel classification tasks for predicting four binary traits of users. Further, based on the PERS model for predicting user personality types, we have developed a new recommender system that uses the user's personality type according to the MBTI system as one of the features to help profile users and improve recommendations. Since in a real application ground truth personality types will not be available, we use the PERS model to predict personality types and use its predictions as features; however, even predicted personality types significantly improve the quality of recommendations. In a collaborative third party contract, datasets were collected from a variety of different social networks in compliance with the technical requirements for this data collection work; the requirements were also developed as part of the project. The collected dataset has already been used for experimental studies of the developed models and methods, and this experimental evaluation has validated the effectiveness of proposed approaches. As a result of the project, two research papers have been published in 2022: "Do we behave differently on Twitter and Facebook: multidisciplinary profiling of the identity of social media users to recommend content?" in a leading international journal Frontiers in Big Data and "Personality-Driven Social Multimedia Content Recommendation" in the proceedings of a leading international conference ACM International Conference on Multimedia (MM 2022).

Publications

1. Qi Yang, Aleksandr Farseev, Sergey Nikolenko, Andrey Filchenkov Do we behave differently on Twitter and Facebook: Multi-view social network user personality profiling for content recommendation Frontiers in big Data, Frontiers in big Data 5 (2022) (year - 2022) https://doi.org/10.3389/fdata.2022.931206

2. Qi Yang, Sergey Nikolenko, Alfred Huang, Aleksandr Farseev Personality-Driven Social Multimedia Content Recommendation Association for Computing MachineryNew YorkNYUnited States, Proceedings of the 30th ACM International Conference on Multimedia (pp. 7290-7299). (year - 2022) https://doi.org/10.1145/3503161.3548769

Annotation of the results obtained in 2023
The project “Research and development of multimodal unstructured data processing and analysis technology from various sources and their application for solving economic and social problems” is dedicated to several important areas of research in the field of development and training of machine learning models for analysis and generation of advertising content. During the second year of the project, we have obtained important results in all these areas. First, we have developed new machine learning models to evaluate rich media advertising content. This means that we have developed models that, based on the appearance and content of an advertisement (for example, an ad on a social network), can estimate how successful this advertisement will be for a specific user or for users from the target audience of this ad in general. We have developed SoCraft, an advertiser-level multimedia ad content measurement platform that uses a multimodal deep neural network to predict click-through rate (CTR), a key metric for measuring the success of content used in advertising. For advertisers (brands, marketing agencies, or SMEs), CTR measurement is very important to evaluate the effectiveness of advertising campaigns for different creatives and targeting settings. CTR is an important indicator of how effective an advertising campaign is in relation to its objective and is often used when allocating advertising budgets to maximize impact. However, many existing works on CTR estimation are performed at the user level and use user-specific features in their models. Much of this research has been published by teams from large advertising platforms, where it is indeed possible to predict user-level CTR based on huge datasets, but it would be much more useful for advertisers to measure performance before an ad is released, or at least while the advertising campaign is running. Since a user-level approach is not possible for advertisers due to lack of user data and privacy regulations, audience-level approaches are needed. The SoCraft content rating system uses two multimodal attention-based content rating models: SoDeep and SoWide. Both models predict the quality of multimodal content and provide attention maps to qualitatively visualize what had the greatest impact on the prediction. As a result, the system is able to identify potentially high-performing ads and provide high-quality visual analysis of why certain content will be liked or disliked by customers. Second, we have developed new machine learning models for virtual assistants. In the last decade, due to the rapid development of Internet access in the world and due to the impact of the pandemic, more and more people maintain work and personal relationships through online communication channels, and electronic assistance systems increasingly take the form of virtual concierges or avatars. Existing virtual assistance systems do not provide the required level of engagement and personalization because they are primarily text-based and therefore lack the real-life multimedia interaction aspects that are important to our daily communication. For face-to-face virtual communication in real time, virtual assistants need to be designed to match human behavioral characteristics to make virtual interaction scalable, efficient, and engaging. We have developed a new voice-controlled virtual avatar system called SMILEY. The system improves speaker engagement and overall customer experience through real-time voice control of facial expressions. To the best of our knowledge, this demonstration is one of the first attempts to synthesize facial expressions with voice control. The general structure of SMILEY is as follows: the user's speech is converted into text (SMILEY uses modern standard speech recognition models) to understand the user's input, as well as to analyze the user's mood through sentiment analysis, and then applies the ContraCLIP approach to change the facial expression of the “virtual assistant” accordingly. ContraCLIP enables finer-grained facial expression control based on generative adversarial networks (GANs). Evaluations of the quality of the resulting system show that the SMILEY system based on ContraCLIP and sentiment analysis of recognized text leads to a higher level of user satisfaction after interacting with the system. Third, we have started a very interesting and promising new line of research related to large language models. Real work of marketers consists largely of analyzing the content of advertisements and entire campaigns. This is becoming increasingly difficult in today's online campaigns, which use large volumes of images and text ads that are almost impossible to process manually. In this part of the project, we are trying to anticipate the next revolution in digital advertising and content marketing, which we believe will occur when both advertising itself and the results of opaque recommendation models can be explained in ways that are understandable to people and actionable in terms of business results. We have been able to show that large language models, such as ChatGPT and GPT-4, are already largely capable of providing the rationale for predictions of recommender systems, aggregating information about advertising campaigns consisting of hundreds of individual ads and drawing important and practically applicable conclusions. To this purpose, we have developed a system of special prompts for large language models based on few-shot learning intended for the analysis of advertisements. In particular, from each ad and entire groups of ads we have been able to extract information such as topics, thematic categories, types of content, tone (e.g., “friendly”), archetypes of advertising characters, the main features of the brand, human needs that the product is designed to fill, the product itself, and various characteristics of its target audience. On the next step, we use the ads and extracted features as input to a series of newly designed prompts that ask language models to summarize this information in a variety of formats commonly used in content marketing. As a result, we have seen successful generalization across the board, with important findings identified by large language models and presented in a clear and easily actionable format. In addition, work continued here on the development of new machine learning models for evaluating multimedia advertising content. In particular, a new variant of the previously developed SoWide model was proposed, called SoWide-v2, which uses the Vision Transformer for image analysis and a different feature fusion architecture. SoWide-v2 significantly improves previous results on predicting click-through rates (CTR) of advertisements. In 2023, three papers with project results have been published in proceedings of leading international conferences: “SoCraft: Advertiser-level Predictive Scoring for Creative Performance on Meta” (WSDM 2023, rank A by the CORE rating), “Just To See You Smile: SMILEY, a Voice-Guided GAN” (WSDM 2023, rank A by the CORE rating), “Against Opacity: Explainable AI and Large Language Models for Effective Digital Advertising” (ACM MM 2023, rank A* by the CORE rating). In addition, we have developed a software prototype of an information system for analyzing multimodal unstructured data from various sources, including social networks.

Publications

1. A. Huang, Q. Yang, S.I. Nikolenko, M. Ongpin, I. Gossoudarev, N.Y. Duong, K. Lepikhin, S. Vishnyakov, Y. Chu-Farseeva, A. Farseev SoCraft: Advertiser-level Predictive Scoring for Creative Performance on Meta Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM 2023), pp.1132-1135 (year - 2023) https://doi.org/10.1145/3539597.3573032

2. Q. Yang, C. Tzelepis, S.I. Nikolenko, I. Patras, A. Farseev “Just To See You Smile”: SMILEY, a Voice-Guided <s>GUY</s> GAN Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining (WSDM 2023), pp.1196-1199 (year - 2023) https://doi.org/10.1145/3539597.3573031

3. Q. Yang, M. Ongpin, S.I. Nikolenko, A. Huang, A. Farseev Against Opacity: Explainable AI and Large Language Models for Effective Digital Advertising Proceedings of the 31st ACM International Conference on Multimedia, Proceedings of the 31st ACM International Conference on Multimedia (MM 2023), pp. 9299–9305 (year - 2023) https://doi.org/10.1145/3581783.3612817