Safety, Privacy, and Security in Generative AI

Yidi Sprei
⸱
Mar 6, 2025
Understanding Proprietary vs Open-source AI Models
Understanding the differences between proprietary and open-source models is crucial for making informed AI decisions. When properly self-hosted, open-source models can offer significantly enhanced security and privacy compared to closed-source models. By maintaining full control over the deployment environment, users can ensure that no external entity has access to their data, eliminating risks associated with provider data collection, breaches, or misuse. While proprietary models tend to have stronger built-in moderation, open-source models offer greater flexibility and security when self-hosted.
Selecting the right model depends on privacy needs, cost considerations, and trust in providers. Open-source models offer notable advantages in security, cost, and flexibility, whereas proprietary models often provide more robust out-of-the-box moderation.
At Infuzu, we are committed to educating both ourselves and the public on these issues, promoting informed and responsible AI adoption.
Defining AI Models
Generative AI presents immense opportunities for individuals and businesses. However, concerns regarding safety, privacy, and security remain critical.
Making informed decisions is key to the responsible use of generative AI, and understanding the differences between proprietary (closed-source) and open-source models is essential, as misconceptions around these definitions are common.
Proprietary (Closed-source) Models: These models have restricted access to their internal components, including weights and biases. Examples include OpenAI’s GPT series, Anthropic’s Claude series, Google’s Gemini series, and some Qwen models from Alibaba.
Open-source Models: These models have publicly available weights and biases for free use and modification. Examples include Meta’s Llama series, Mistral’s Mistral and Mixtral series, Google’s Gemma and BERT, Microsoft’s Phi, certain Qwen models from Alibaba, and DeepSeek’s V and R models.
Accuracy and Moderation Concerns
Large language models (LLMs) are trained on diverse datasets, and while they often generate helpful responses, inaccuracies and misleading information—commonly referred to as hallucinations—can occur. It is critical to verify LLM-generated information before relying on it for important decisions.
Modern mechanisms help mitigate hallucinations and enhance response accuracy. Some models feature built-in moderation layers that review prompts and responses for policy violations, while others implement ethical and legal guidelines to restrict certain queries. Proprietary models typically offer stronger built-in moderation, whereas open-source models allow greater flexibility to customize safeguards.
Privacy Concerns
Closed-source Models
LLM applications like ChatGPT and Claude improve personal responses over time by storing user interactions in order to ‘learn’ user preferences and refine future models. Unless manually disabled, most closed-source models collect user data. Even when data collection is disabled, providers may retain access to past conversations to improve service functionality. This stored personal information presents a risk if a data breach occurs or if providers use the data for analytics and business decisions beyond model training.
While some API providers claim not to store or use submitted data proving such claims can be challenging. Contractual zero-data retention policies are preferred, though a compromised server infrastructure could still expose processed data.
Third-party applications that integrate with closed-source APIs introduce additional privacy risks, as user data is accessed by the application provider before being passed to the API. This means data privacy depends on the security practices of both entities. Users must understand the storage and analytics policies and trust their data will not be misused by either entity. Additionally, whenever data is transmitted to a remote API provider, encryption during transit (e.g., HTTPS) is critical to prevent interception and exposure.
Open-source Models
Open-source models can offer superior security to closed-source models. Since they do not automatically send user inputs back to a centralized server, privacy is increased when used properly. It is also possible to use open-source models without an internet connection by self-hosting the model, increasing privacy and security by eliminating external data access risks. Users have full control over their data as these models require explicit configuration to share data with an external entity. Here, users have direct visibility on how their data is handled.
These models operate as large sets of mathematical weights and biases that transform input text into output text through predefined computations rather than communicating with external servers. They do not send data to their creators, making them fundamentally different from traditional software programs.
LLMs process text-based input through pre-trained parameters without the capability to access files, initiate network connections, or store persistent memory. Security risks stem from how an open-source model is integrated into an application or deployment pipeline rather than the model itself. When locally hosted without internet-connected integrations, open-source models have no inherent ability to transmit data externally or contain malicious programming.
The misconception that open-source models leak inputs when connected to the internet arises from agentic AI applications, where external software components enable models to interact with internet-connected tools such as search engines or APIs. While these integrations facilitate query generation and data retrieval, the model itself remains passive in the process and does not inherently access the internet.
Realistic Deployment Considerations
Since many users and businesses lack the resources to self-host large models, they often rely on cloud providers or third-party application hosts for open-source models. This introduces risks similar to those of closed source models, as users must trust the provider’s security and privacy policies. However, the availability of multiple independent hosting providers allows users to select one with the best privacy and security practices.
When evaluating a provider, users should consider the following:
Clear data retention policies
Encryption for data in transit and at rest
User-configurable privacy settings
Third-party audits, security certifications, and transparency reports
Unlike closed-source options, open-source models allow users to verify or modify how data is processed, providing an additional layer of security and trust.
Conclusion
At Infuzu, we are committed to advancing the field of generative AI while prioritizing safety, privacy, and security. As both an application provider and an API provider, we recognize the importance of educating ourselves and the public about the differences between proprietary and open-source models. By addressing misconceptions and promoting responsible AI use, we aim to empower users to make informed decisions.