Imagine a typical marketing agency or law firm in the center of Warsaw. Time pressure, looming deadlines. A junior specialist, let's call him Michał, is given a task: to analyze a complex NDA agreement with a new, key client. The document is 15 pages of dense legal text.
Michał thinks: "Why waste an hour? AI will do it in 30 seconds."
This is the moment – in our scenario – when the nightmare for the company's security begins.
"I'll just quickly check..."
In our example, the employee copies the entire content of the confidential agreement into a public chat with the question: "Are there any unusual clauses in this NDA that I should pay attention to?"
AI responds instantly, pointing out risks. The employee is satisfied – they saved time.
However, they are not aware of the key issue: the full content of the agreement has just been sent to external servers, and depending on the service's terms, it may have been added to the model's training database.
What did the pasted document contain?
In this type of document, you usually find:
- Names of both parties – revealing who the company is collaborating with
- Project details – e.g., the name of a new drug or product before its launch
- Personal data – names of the management, email addresses
- Financial clauses – rates and contractual penalties
All this information, according to the confidentiality clause, should never leave the company's secure infrastructure.
Leak mechanism: How AI "learns" secrets
When using free or standard versions of public AI models, you often accept terms that allow the provider to use your conversations to "improve services".
In our hypothetical scenario, after some time, the AI model – "trained" on data from Michał's agreement – may start using this information. Another user, for example, asking about "standard contractual penalty rates in industry X", could receive an answer based on your company's confidential data.
Consequences: Catastrophic scenario
If such a leak were exposed, the company would face serious problems:
1. Breach of NDA
The client could demand enormous compensation for breach of confidentiality. It would be enough for information about the collaboration to reach the competition.
2. Supervisory authority proceedings
Pasting personal data (signatures, names) into a tool without a data processing agreement is a straightforward path to a fine for violating GDPR.
3. Reputation loss
In industries based on trust (law, finance, medicine), information that the company "feeds" public AI with client data could mean the end of the business.
How to avoid this scenario?
Most employees do not have bad intentions – they simply want to work faster. Michał's mistake was using the wrong tool.
❌ Incorrect approach:
Pasting documents with sensitive data into public, free chatbots.
✅ Correct approach:
The company should provide a secure work environment:
- Implement the private aikeep.io model, which operates locally or in a private cloud. In such a model, data is analyzed but never used for training nor leaves the established infrastructure.
- Anonymization – if you must use a public tool, always remove company names, amounts, and personal data.
- Education – employees must know that the window of a public chat is not a notebook, and an external cloud service.
Warning signs: What NOT to paste into public AI?
Never process in the public cloud:
🚨 Documents with confidentiality clauses (NDA) 🚨 Customer databases and personal data (GDPR) 🚨 Business strategies and pre-launch marketing plans 🚨 Financial results before their publication 🚨 Source code
Summary
The described story is a hypothetical scenario, but the risk is very real. Companies like Samsung, Apple, and Amazon have long restricted their employees' access to public AI tools for this reason.
Don't wait until this scenario happens in your company.
Secure your data by implementing the aikeep.io solution – a system that provides the power of artificial intelligence but keeps your data under your full control, on Polish servers.
Check how to securely implement AI in your company
Note: The above article is a case study illustrating potential risks associated with the improper use of public language models. All names and situations are examples.