
A cybersecurity research firm claims that DeepSeek’s dataset may have suffered public exposure. According to a report, a publicly accessible Clickhouse database belonging to DeepSeek, which allows full control over its database operations. In addition, the exposure is said to contain a lot of sensitive information, including chat history, secret keys, log time and backend details. It is not clear whether the company has reported the incident to Chinese AI companies and whether the exposed data set has been reported to.
DeepSeek’s dataset may have suffered violations
In a blog post, cybersecurity company WIZ Research shows that it has discovered a fully open and unauthenticated dataset containing highly sensitive information about the DeepSeek platform. The information exposed is said to pose a potential risk to AI companies and end users.
The cybersecurity company claims that given the popularity of AI platforms, it aims to evaluate DeepSeek’s external security to identify any potential vulnerabilities. The researchers first mapped any internet-oriented subdomain, but found nothing that could suggest high-risk exposure.
However, after implementing the new technology, researchers were able to detect two open ports (8123 and 9000) associated with multiple public hosts. Wiz Research claims that these ports put them into a publicly exposed Clickhouse database, which can be accessed without any authentication.
It is worth noting that Clickhouse is an open source, columnar database management system developed by Yandex. It is used to quickly analyze queries and is often used by ethical hackers to scan the dark web for exposed data.
The log flow table in the dataset is said to contain more than one million log entries, including log timestamps starting from January 6, references multiple internal DeepSeek Application Programming Interface (API) Endpoints (API) Endpoints, and chat History, API keys, backend details, backend details, and manipulate metadata in plain text.
With this information, the researchers claim that bad actors can delete passwords, local files and proprietary information directly from the server. At the time of writing, there is no update on whether this data can be included and whether the dataset can be updated offline.