
DeepSeek hopes to put its own advantage in pressure. The Chinese startup sparked a $1 trillion (about Rs 872,003 billion) sold out in the global stock market last month, and it is better than many Western competitors.
Now, people from the easy-to-use company say the Hangzhou-based company is accelerating the successor to the January R1 model.
DeepSeek plans to release R2 in early May, but now hopes to release it sooner, two of which provide no specific details.
The company said it hopes that the new model will produce better coding and be able to reason in languages other than English. Details of the acceleration schedule released by R2 have not been reported before.
DeepSeek did not respond to a request for comment about this story.
Competitors are still digesting the meaning of R1, which is built with less-functional NVIDIA chips but is competitive with competitors developed by the U.S. tech giants at hundreds of billions of dollars.
“The launch of the DeepSeek R2 model could be a pivotal moment,” said Vijayasimha Alilughatta, chief operating officer of Indian tech service provider Zensar. He said DeepSeek’s success in creating cost-effective AI models “will likely prompt global companies to accelerate their efforts … break the shackles of a few major players in the field.”
R2 may worry that the U.S. government has identified AI leadership as a national priority. Its release could further inspire Chinese authorities and companies, dozens of whom said they have begun integrating the DeepSeek model into their products.
DeepSeek is little known, and DeepSeek founder Liang Wenfeng became a billionaire through his quantitative hedge fund climax. Since July 2024, the former employer has described the former employer as “low-key and introvert” and he has not spoken to any media.
Reuters interviewed more than a dozen former employees, as well as quantum fund professionals with extensive knowledge about the operations of DeepSeek and its parent company High-Flyer. It also reviews state media articles, company social media posts, and research papers dating back to 2019.
They tell the story of a company that operates more like a research lab than a for-profit business and is not influenced by the hierarchical tradition of China’s high-pressure tech industry, even if it is responsible for many investors as deemed the latest breakthrough in AI.
Different paths
Liang was born in 1985 in a rural village in southern Guangdong Province. Later, he earned a degree in communications engineering from an elite university at Zhige University.
One of his first jobs was to operate the research department in a smart imaging company in Shanghai. His then boss Zhou Chaoen told state media on February 9 that Liang had hired award-winning algorithm engineers and adopted a “flat management style.”
In DeepSeek and advanced flights, Liang similarly shied away from the practice of the Chinese tech giant known for its strict top-down management, low pay for young employees, “996” – from 9 a.m. to 9 p.m. every Saturday night.
Liang opened his Beijing office within walking distance of Tsinghua University and Peking University, two of the most prestigious educational institutions. According to two former employees, he often studies technical details and is pleased to work with Gen-Z interns and recent graduates, including most of its workforce. They also describe the time they usually work in a collaborative atmosphere for eight hours.
“Benjamin Liu, a 26-year-old researcher, said: “Benjamin Liu, gave us control and regarded us as experts. He kept asking questions and learning with us.” DeepSeek allowed me to own ownership of the critical parts of the pipeline, which was very exciting. ”
Liang did not answer the question sent through DeepSeek.
While Baidu and other Chinese tech giants are competing to build a 2023 consumer-facing version of Chatgpt and profit from global AI Boom, Liang told the Chinese media wave last year that he deliberately avoided a lot of spending on application development and instead focused on improving the quality of AI models.
DeepSeek and the High Pilot are both known for their generous expenses, according to three people familiar with their compensation practices. A Quant fund manager who knows liang’s competitor said it is not uncommon for senior data scientists to earn Rs 1.5 lakh (about Rs 1.8 lakh) per year to earn Rs 1.5 lakh (about Rs 1.8 lakh) per year.
The sprawling man was funded by senior flighters, one of China’s most successful quantitative funds, and managed tens of thousands of yuan even after the government cracked down on the industry, according to two people in the industry.
Computational capability
DeepSeek has been successful in low-cost AI models, based on a decade of advanced flights and a significant investment in research and computing power, the trio said.
Quant Fund is an earlier pioneer in AI trading, and an executive said in 2020 that senior flighters are “all” by reinvesting 70% of their revenue, mainly AI research.
High-aircraft spent $1.2 billion (approximately Rs 1,441 crore) on two supercomputing AI clusters in 2020 and 2021. The second cluster, Fire-Flyer II, consists of approximately 10,000 NVIDIA A100 chips for training AI models.
DeepSeek was not established at that time, so the accumulation of computing power attracted the attention of Chinese securities regulators.
“Regulators want to know why they need so much chip?” the person said. “How will they use it? What impact will it have on the market?”
Authorities decided not to intervene, which is crucial to DeepSeek’s fate: The United States prohibits exporting A100 chips to 2022, when the Fire-Flyer II is already in operation.
Beijing is now celebrating DeepSeek, but according to a person familiar with Chinese official thinking, it has instructed not to participate in the media, according to a person familiar with the matter.
The person said the authorities asked Liang to keep a low profile because they feared that too much media hype would attract unnecessary attention.
The Chinese cabinet and the Ministry of Commerce and China’s securities regulator did not respond to requests for comment.
As one of the few companies with large A100 clusters, Gaoxing and DeepSeek are able to attract some of China’s best research talents, two former employees said.
“The main advantage of broad (computing) resources is that it allows large-scale experiments,” said former employee Liu.
Some western AI entrepreneurs, such as AI Safor AI CEO Alexandr Wang, claim that DeepSeek has up to 50,000 high-end NVIDIA chips, which are banned from exporting to China. He did not make any evidence of the charges, nor did he respond to Reuters’ request for evidence.
DeepSeek has not responded to Wang’s claim yet. Two former employees attributed the company’s success to Liang’s focus on more cost-effective AI architectures.
The company’s research paper shows that the startup uses technologies such as Experts (MOE) and long potential attention (MLA) that are much cheaper to calculate.
MOE technology divides AI models into different areas of expertise and activates only the queries-related models rather than the more common architectures using the entire model.
The MLA architecture allows the model to process different aspects of a piece of information simultaneously, helping it detect critical details more effectively.
Although competitors such as France’s Mistral have developed MOE-based models, DeepSeek is the first company to rely on such architecture while reaching equal scope with more expensive build models.
Analysts at Bernstein Brokerage estimate that DeepSeek is priced 20 to 40 times cheaper than the same model Openai is responsible for.
Currently, Western and Chinese tech giants are marking plans to continue their plans to spend a lot of AI, but DeepSeek’s success in R1 and its early V3 models prompted some to change the strategy.
OpenAI cuts its price this month, while Google’s Gemini launches discount-level access. Since the release of R1, OpenAI has also released the O3-MINI model, which relies on less computing power.
Adnan Masood of UST, a U.S. technology service provider, told Reuters that his lab operating benchmark found that R1 usually uses tokens processed by AI models or three times the data units processed by AI models to reason about OpenAI’s scaling model.
National embrace
Even before R1 attracted global attention, there were signs that DeepSeek was favored by Beijing. In January, state media reported that Liang An attended a meeting of Beijing’s Chinese Prime Minister Li Qiang, a designated representative for the AI department, before the company’s leaders were well-known.
The subsequent big fanfare about the cost competitiveness of its model has inspired Beijing’s belief that it can surpass the United States, while Chinese companies and government agencies embrace the DeepSeek model at a rate that has not been offered to other companies.
At least 13 Chinese city governments and 10 state-owned energy companies say they have deployed DeepSeek into their systems, while technology giants Lenovo, Baidu and Tencent, owners of China’s largest social media app, incorporate DeepSeek’s model into their products.
Chinese leaders Xi Jinping and Lee “showed that they endorse DeepSeek,” said Alfred Wu, a Chinese decision-making expert at Lee Kuhn Yulin School of Public Policy in Singapore. “Now everyone just recognizes it.”
China’s embrace comes as the government from South Korea to Italy canceled DeepSeek from the national app store, citing privacy issues.
“If DeepSeek becomes the preferred AI model for Chinese national entities, Western regulators may see this as another reason for the restrictions on AI chips or software cooperation,” said Stephen Wu, an AI expert and founder at AI Carthage Capital.
Further limitations on advanced AI chips are a challenge Liang acknowledged.
“Our problem has never had funding,” he told Waves in July. “This is an embargo on high-end chips.”
©Tech Word News
(This story has not been edited by Tech Word News’s staff and is automatically generated from the joint feed.)