With the spread of smartphones and social media, anyone can now share media such as images, videos, sounds, and texts. Artificial intelligence (AI) technologies greatly enhance the convenience and value of social media, making it easy to analyze user preferences from a large amount of data as well as to perform media processing such as machine translation and speech-to-text conversion in accordance with user needs. However, with the evolution of AI technology and the enrichment of computer resources stemming from the ability to acquire a large amount of human-related information such as fingerprint, face, voice, body, and natural language, malicious actors can generate fake media (FM) such as fake images, fake voice data, and fake documentation that can pass for the real thing. The generation of FM has become a serious social problem.
The appearance and spread of the new coronavirus infection (COVID-19) led to the generation and spread on social media of fake news regarding preventive and therapeutic methods without scientific basis and of photographs of city scenes taken from a specific direction with a telephoto-lens camera that gave the impression of a crowded area. This "infodemic" of uncertain information can cause anxiety and confusion in society. We can easily envision an enterprising criminal group with a clear intention using AI to readily generate fake images, fake voice data, and fake documentation that can pass for the real thing and then spreading them on social media to create an infodemic. Moreover, repeatedly viewing specific information that is not true may facilitate thinking guidance and public opinion manipulation. To achieve a healthy human-centered cyber society, it is essential to improve the reliability of information by appropriately dealing with such threats and, at the same time, to support diverse communication and decision-making.
The purpose of this research project is to deal appropriately with the potential threats posed by FM generated by AI and, at the same time, to establish social information technologies that support diverse means of communication and decision-making. This technology should be able to detect and prevent advanced attacks based on FM of various modalities such as fake video, fake voice data, and fake documentation generated by AI as well as detect various types of highly reliable media. Incorporating these technologies into a cyber society will promote human decision-making and consensus building and lead to the establishment of social information infrastructure technologies that enhance cyberspace security.
In this research project, we will focus on three types of FM generated by AI: media clone (MC) FM, which is as close to being real as possible without being real; propaganda (PG) FM, which is created by intentionally editing the media that is the material or manipulating public opinion and other such purposes; and adversarial example (AE) FM, which is generated for the purpose of causing AI technology to malfunction and make incorrect judgments. We will focus on establishing technologies for generating and detecting these types of FM. In addition, we will establish technologies for "detoxification" in which FM is applied to media processing and used as normal media in order to counter thinking guidance, malfunctions, and misjudgments caused by FM. Using these technologies, we will build an experimental social media platform that presents auxiliary information for decision-making and evaluate its performance by behavioral experiments on a scale of 1,000 people.
Research action items
These objectives will be pursued in three areas: the security (SEC) area (Echizen Group, National Institute of Informatics), the multimedia (MM) area (Babaguchi Group, Osaka University), and the computational social science (CSS) area (Sasahara Group, Tokyo Institute of Technology). Making full use of knowledge, we will work on four research action items in a complementary manner while coordinating between areas.
- Advanced FM generation technologies for various modalities (mainly MM area): Establish FM generation technologies aimed at deceiving people and/or AI technology for various modalities: video (face, body, etc.), voice, documentation, etc. Legal aspects will also be considered.
- FM detection technologies (mainly SEC area): Establish advanced detection technologies corresponding to FM generation technologies mentioned in (1). The aim is to provide information to users in a format that explains not only FM detection but also the target to be deceived (i.e., persons or AI technology).
- FM detoxification technologies (mainly MM and SEC areas): Establish FM detoxification technologies that detoxify FM so that thinking guidance, malfunctions, and misjudgments are prevented and use the detoxified FM as normal media for learning data of machine learning models.
- Information technologies that counter infodemics and support diverse decision-making (mainly CSS area): Establish principles and technologies for social systems that make the most of the elemental technologies related to FM generation/detection and detoxification developed in the SEC and MM areas in order to enhance the reliability of information.