How Respeecher enables creative uses of its voice-cloning technology while preventing misuse

Can a voice-cloning startup successfully prevent its product from being misused?

  • Respeecher, in developing its voice cloning technology, sought to prevent misuse by obtaining consent and implementing content moderation.
  • Respeecher’s greatest obstacle was providing disclosure for synthetic voice in a creative context. How could the company provide direct disclosure to users without taking away from the immersive experience of the overall media?/li>
  • While the Framework provides clear guidelines for how to responsibly provide disclosure, the current version does not contain guidance on how to do so while balancing user experience, thus raising the question of what the best practice in a creative context would be.

This is Respeecher’s case submission as a supporter of PAI’s Synthetic Media Framework. Explore the other case studies 

1. Organizational Background

A contextual introduction to the case study.

Respeecher’s Response

Respeecher is a voice cloning company founded in 2019 and based in Ukraine. As a pioneer in the field, we categorize ourselves as Builders and Creators, per the definitions in Partnership on AI’s (PAI) Synthetic Media Framework. We craft advanced AI voice cloning models using sophisticated machine learning and deep learning techniques. We also operate the Voice Marketplace platform, enabling users to easily utilize custom synthetic voices. We aim to democratize access to high-quality synthetic voice technology for creators and developers globally.

Our services cater to a diverse range of clients across numerous countries. We focus on two primary service areas:

  • B2B (Voice Cloning on Demand): We provide voice cloning services to businesses, especially in the media and entertainment sectors. Our B2B services are tailored to the unique needs of each organization, ranging from movie dubbing to video game character voices. The ethical dimension in B2B is critical, as it involves ensuring proper consent from original voice owners and preventing potential misuse in deceptive or harmful scenarios.
  • B2C (Voice Marketplace): Our Voice Marketplace is designed for individual consumers and independent creators. This platform offers a selection of pre-made synthetic voices, making advanced voice cloning technology accessible to a broader audience. The ethical concerns here revolve around maintaining transparency in voice usage and safeguarding against fraudulent or harmful applications.

This case study centers on our policies and practices surrounding the consent requirements and transparent communication of our voice cloning technology and platform. The main challenges we aim to address surround ensuring our technology is used responsibly, mitigating potential misuse, and educating users on synthetic voice ethics.

Our key customers include independent creators and developers seeking affordable access to high-quality synthetic voices, as well as commercial organizations looking to incorporate our technology into their products and services. Our approach to ethics is rooted in the principles of consent, misuse prevention, and transparency – themes that overlap with PAI’s Framework. Specifically, aligning with the following practices for developers of tools:

  1. Be transparent to users about tools and technologies’ capabilities, functionality, limitations, and the potential risks of synthetic media.
  2. Take steps to provide disclosure mechanisms for those creating and distributing synthetic media.

We strive to balance creative freedom with ethical responsibility, particularly in cases where the overt labeling of synthetic voices might impact creative expression.

Respeecher’s global reach, combined with our commitment to ethical practices, positions us as a frontrunner in the responsible application of AI in voice cloning. Our dual role as a Builder and Creator reflects our comprehensive approach to meeting the diverse needs of our clients, from large-scale commercial projects to individual creative endeavors.

2. Challenge

Elaborate on the challenge being addressed in the case study, i.e. the issue to which your organization is applying the Framework.

Respeecher’s Response

Respeecher, in developing its voice cloning technology, identified the potential for misuse early. This risk manifested primarily in the form of creating synthetic voices without consent, a concern amplified as we noticed the ease with which our initial models could clone a voice from just a few samples of someone’s real voice. This capability raised the alarming possibility that bad actors could exploit public voice recordings for unauthorized impersonation. Many of the potential harms, identified in Appendix B of PAI’s Framework include:

  • Stealing someone’s likeness: The unethical use of someone’s voice or likeness without their permission.
  • Personal data protection: Voice data is a form of biometric data that can potentially identify people, and it therefore requires stringent protection measures.
  • Creation of misleading information: The risk of synthetic voices being used to generate false or misleading content.
  • Discrediting generative AI and voice cloning: Misuse of the technology could tarnish the public perception and trust in AI and voice cloning, which can also be used responsibly.

Recognizing these challenges, Respeecher has proactively worked to mitigate risks. This commitment extends to addressing concerns brought up by the recent SAG-AFTRA strike, which highlighted the importance of protecting actors’ rights and their likenesses. Respeecher ensures that explicit consent is obtained from individuals whose voices are cloned, a critical practice for safeguarding digital identities and rights.

Implementation of Mitigation Measures
Respeecher implemented several mitigation measures, after rolling out our voice cloning tool and the Voice Marketplace platform that link to the PAI Framework. These measures include:

  • Consent for voice/likeness use: Consent in our context means obtaining written permission from the individual whose voice or likeness is being cloned. This is essential to protect individuals’ rights and to prevent unauthorized use of their digital identities.
  • Adherence to the C2PA Standard: If someone uses our platform to pretend to be someone or an organization they are not, such as a fake news channel with synthetic characters, it might be challenging for the audience to realize this because of how realistic our synthetic voices are. As a leader in this space, we want to ensure that all generated content from our marketplace adopts the standards set forth by the Coalition for Content Provenance and Authenticity (C2PA), a standards body working to produce industry-wide metadata techniques for identifying AI-generated content and Content Credentials to communicate such signals to audiences. In the future, if all browsers and content distribution platforms support such metadata and signals, it will be easy for consumers to verify how the audio in a video was produced. If something lacks cryptographic Content Credentials, it will automatically raise suspicion about authenticity.

    Our community story about our involvement in the C2PA community, the Content Authenticity Initiative, can be found here.

  • Content moderation: We moderate synthetic voice content to prevent misuse. Moderation is done in two ways:
    • Automated filters and algorithms: We employ advanced algorithms and automated filters to scan and identify potential misuses of voice cloning, such as impersonation, hate speech, or content that could be used for fraudulent purposes. This automated layer of moderation helps in managing the vast amount of content efficiently.
    • Manual review by human moderators: In addition to automated systems, we utilize human moderation to manually review content flagged by our algorithms. This team comprises individuals with expertise in AI, ethics, and legal aspects of digital content, ensuring a thorough and nuanced review process.

These were challenging gray areas to navigate as voice cloning has many positive use cases as well – such as accessibility, which is missing from the responsible use cases in the Framework’s opening sections. However, we recognized the technology’s potential for misuse and knew it was essential to implement policies to mitigate this risk proactively.

3. Objective

Describe what your organization is attempting to accomplish by addressing this challenge and/or furthering the opportunities.

Respeecher’s Response

In this case study, we try to address some of the harms that can result from creating synthetic media without consent – for example, protection of personal likeness and consent are crucial ethical considerations for synthetic media that should be further emphasized in the PAI Framework.

Our objectives include building trust in voice cloning, developing standards and policies to protect and empower voice actors and users, and promoting ethical best practices widely.

Comparative Analysis with Other Synthetic Media

In terms of differentiating voice from other forms of synthetic media, the issue of watermarking, commonly used in synthetic images, arises. While watermarks for audio exist, they are not as straightforward – audio watermarking is less useful than it is for static images. Watermarking audio would involve embedding an imperceptible mark within the sound waves to identify it as synthetic. This method should be explored further to determine its feasibility and effectiveness in maintaining the balance between authenticity and transparency in synthetic voice media.

By addressing these challenges through the PAI Framework’s guidance, we aim to unlock opportunities for inclusive access to synthetic voice technology, while fostering a culture of responsible and ethical AI development. Our continuous effort to refine our policies and practices reflects our commitment to lead in the ethical application of voice cloning technology.

Challenges in PAI Framework Implementation

One of the main challenges was around the disclosure and labeling of synthetic voice content. The PAI Framework recommends clear labeling, which is straightforward for informational content. However, for creative uses such as in entertainment, art, or fiction, overt labeling might detract from the user experience. In our B2B stream, we deliver the audio files to the customers without disclosure mechanisms added for their post-production work. An example would be a movie production where voice cloning was used. In several cases, Respeecher and its Synthetic Speech Artists are credited. In many other cases, the use of voice cloning is not disclosed. We apply the C2PA standard for provenance in our B2C stream where users add audio files to the digital content distributed at social platforms for further monetization. Balancing transparency with creative freedom remains an ongoing challenge, and we continue to explore the best approaches to navigate these gray areas​​.

4. Framework Scope and Application

Identify which Framework principle was used to help address the challenge/opportunity, how it was chosen and implemented, and describe how it was applied.

Respeecher’s Response

Implementing the PAI Framework has improved our internal policies, procedures, and processes by centering ethical considerations, thereby leading to more robust consent procedures (only applicable in B2B use cases as we use pre-made synthetic voices in B2C use cases), transparency measures, and accountability systems​​.

Our approach in the B2C service includes:

  • Content moderation: We actively moderate Voice Marketplace content to ensure we remain accountable in preventing misuse, and to align with our content policies. This includes preventing harms identified in Appendix B of the PAI Framework, such as unauthorized impersonation and fraud. Our moderation process involves screening submissions, using automated algorithms, and manual reviews by experts to ensure compliance with our ethical standards​​.
  • Ethics and policies investment: As an emerging tech company, we heavily invest in ethics and policies. For example, our Department of Ethics/Trust Safety, which includes the Head of Ethics and Partnerships as well as positions in data governance and content moderation, plays a significant role in product development and business processes. The PAI Framework helps us assess our mitigations and policies against globally agreed-upon values and practices, enhancing our commitment to responsible and ethical AI practices for synthetic voice technology​​.
  • Awareness building: Significant resources are dedicated to educating and being transparent with users on the responsible use of generative AI when it comes to voice cloning. This includes developing consent flows, accountability systems, and educational content tailored to synthetic voice​​.

Our approach in B2B:

  • Consent for voice replication: We require written permission to replicate voices. This is secured through a mutually signed agreement, ensuring that all parties understand and consent to the use of the voice in question. We explain to our customers and guide them on how to get consent from family members, estate holders, or copyright owners in case the person is deceased. To do that we provide a sample of the agreement.
  • Use of personal data: The personal data of our clients is utilized solely for the purpose of training an AI voice model. We adhere to data privacy principles, ensuring that this information is not used for any other purposes.
  • Ethics and compliance: In line with our commitment to ethical practices, we have a dedicated page on our website outlining our ethics policies (Respeecher Ethics Policy). This includes guidelines on the responsible use of AI in voice cloning, emphasizing the importance of transparency, consent, and respect for individual rights.
  • Liability and infringement: Our liability in cases of copyright or personal rights infringement is defined within each client-specific agreement. These agreements detail our responsibilities for service provision and data security, adhering to relevant laws. A cap on liability, if applicable, is explicitly stated in these individual agreements.
  • User responsibility: We also stress that users of our technology bear responsibility for its application. This includes ensuring that the use of cloned voices does not infringe on the rights of others and adheres to legal and ethical standards.

By applying these principles, Respeecher aims to address challenges around consent and ethical use of voice cloning technology, with the ultimate goal of building trust, developing industry standards, and promoting responsible AI practices. Our work in promoting ethical consent practices is vital for setting industry standards in responsible AI use. By prioritizing consent, we contribute to the global conversation on digital rights and the ethical application of AI technology.

Our commitment to these principles is demonstrated in our proactive approach to labeling, metadata, moderation, and awareness-building, ensuring that our technology is used in ways that respect individual rights and promote positive outcomes.

5. Obstacles

Elaborate on any internal or external obstacles intrinsic to the Framework that were overcome.

Respeecher’s Response

One of the key obstacles Respeecher faces, as noted in the PAI Framework, is around the disclosure and labeling of synthetic voice content. This challenge is particularly pronounced in creative contexts, such as entertainment, art, or fiction, where overt labeling of a character’s voice as synthetic may detract from the user experience. Creators have expressed concerns that such labels could disrupt narrative immersion or artistic expression, suggesting a tension between transparency and the creative integrity of a project​​. In the case of a movie production, that can include big studios or indie productions using the B2C tool where we apply the C2PA standard. The disclosure of synthetic voices is often not credited or mentioned in disclaimers because we are still in the early days of developing standards by content creators on the use of synthetic media. We have limited data on whether content creators keep the metadata files attached in post-production and encourage the field to collect more data on this, and on creator attitudes towards metadata and labeling.

Challenges in B2B and B2C Services:

B2B (Voice Cloning on Demand): In the B2B context, where voice cloning is often used for media production, the need for labeling must be balanced against the artistic goals of a project. For instance, in movie production, overtly labeling a character’s synthetic voice might detract from the storytelling and viewers’ suspension of disbelief – elements that are vital to the creative act.

B2C (Voice Marketplace): For B2C services, such as the Voice Marketplace, the challenge is in ensuring that users understand when they are interacting with synthetic voices, while also preserving the creative potential of these tools for individual creators and small-scale projects.

PAI Framework Implementation Challenges:

Balancing ethical standards and user experience: The main challenge in implementing the PAI Framework has been balancing our commitment to ethical standards, particularly around consent and transparency, with the diverse needs and expectations of our users for creative expression.

Respeecher continues to explore the best practices for disclosure and labeling, acknowledging that this is a complex and evolving area, especially as synthetic media becomes more prevalent. By acknowledging and addressing these challenges, Respeecher aims to foster an environment where voice cloning technology is used responsibly and ethically, while also supporting the creative aspirations of our users. Our ongoing efforts to refine our disclosure and labeling practices demonstrate our commitment to navigating these gray areas effectively.

6. Benefits

Identify the opportunities created for your organization by utilizing the Framework to address the challenge.

Respeecher’s Response

The adoption of PAI’s Framework at Respeecher has brought forth significant opportunities and improvements in our approach to voice cloning technology, particularly in addressing consent and ethical use challenges.

Advancements in Ethical Practices and Industry Leadership

Establishing best practices: One of the foremost benefits has been the ability to establish and refine best practices around ethics and safety policies within the emerging voice cloning industry. Our proactive stance on ethical concerns has not only elevated our standards, but also contributed to building greater trust in the responsible use of voice cloning technology. Respeecher is the first voice cloning technology to require permission to clone someone’s voice. We have been advocating this standard and gaining trust in the entertainment industry.

Developing industry standards: PAI’s Framework has guided us in developing comprehensive standards and policies that prioritize the protection of voice actors, individual users, and consumers in the realm of synthetic media.

7. Conclusion/Key Takeaways

A description of how implementing the Framework ended for your organization, including any lessons learned.

Respeecher’s Response

The implementation of PAI’s Framework at Respeecher culminated in a strategic enhancement of our approach towards ethical AI practices, particularly in the realm of voice cloning. The final outcome was the adoption of comprehensive labeling and metadata strategies to ensure transparency and consent in our voice cloning services. These measures were integrated into both our B2B and B2C platforms, helping to alleviate the ethical concerns raised during the development and deployment of our technology.

Key Lessons Learned

Importance of disclosure and transparency: One of the primary lessons learned was the critical role of disclosure and transparency in maintaining the integrity of synthetic media. This includes the need for clear labeling and providing detailed metadata about the origin and creation of synthetic voices.

Challenges balancing ethical obligations and creative freedom: We recognized the delicate balance between fulfilling ethical obligations and respecting the creative freedom of users, especially in artistic and entertainment contexts where overt labeling might impact user experience.

Balancing disclosure with creative content: A major open question that remains is how to effectively balance the need for disclosure of synthetic media with the preservation of artistic integrity in creative content. We are exploring various approaches but have yet to find an optimal solution.

Gap in PAI Framework for creative content: We identified a gap in PAI’s Framework regarding guidance for the use of synthetic media in creative content. The Framework could be more beneficial if it offered more nuanced suggestions for handling such situations. This would aid organizations like ours in making informed decisions that respect both ethical standards and artistic expression.