
NVIDIA B200 vs H200: When to choose which?
Weâve deployed NVIDIA Blackwell GPU servers on HPC-AI Cloud Platform with optimized AI-stacks for real-world AI projects. Start now to enjoy the performance gain!
Latest insights, updates, and stories on AI & compute.
Weâve deployed NVIDIA Blackwell GPU servers on HPC-AI Cloud Platform with optimized AI-stacks for real-world AI projects. Start now to enjoy the performance gain!
HPC-AI Tech, the company behind the large-model development and deployment platform HPC-AI.COM, has been named to Forbes Asiaâs "100 to Watch 2025". The list honors fast-growing companies across the region that are making a strong impact through technological innovation and business success.
At HPC-AI Research Team, we often explore ways to make deep learning models more efficient. One fundamental insight is that deep learning models are inherently sparseâmany weights can be safely neglected and zeroed out without significant accuracy loss. This idea, known as model pruning, was first introduced by Yann LeCun in the 1980s through the pioneering work Optimal Brain Damage.
OpenAI has launched gpt-oss-120b and gpt-oss-20bâpowerful open-weight language models optimized for reasoning, tool use, and efficient deployment on consumer hardware. These models are released under the Apache 2.0 license, one of the most flexible open-source licenses available, so you can integrate and scale your projects freely.
Although current pre-trained large language models (LLMs) have demonstrated strong generalizability across various tasks, they often underperform downstream natural language processing (NLP) tasks due to the lack of domain-specific knowledge. Retrieval-augmented generation (RAG) [1] emerges to address this challenge by retrieving relevant data from a knowledge base to augment the input prompts of LLMs, thereby enhancing their performance on specific tasks.
Reinforcement Learning (RL) was originally developed for sequential decision-making tasks such as control systems and game strategies, where agents learn by interacting with their environment to maximize long-term rewards.Large Language Models (LLMs) have transformed natural language understanding and generation, yet they still struggle with complex reasoning and multi-step thought processes.
SGLang is one of the fastest inference engines available today. It supports speculative decoding â a technique introduced by Google Research in 2023 and further optimized by frameworks such as SpecInfer, Medusa, EAGLE, and others. This method accelerates model inference **without any loss in output quality.
Reinforcement learning fine-tuning (RFT) is powerful â but letâs face it: it used to be a pain to run. Dual networks, huge memory needs, tons of config files... Thatâs why we built RUNRL JOB â the easiest way to run RFT workloads like GRPO directly on HPC-AI.COM. No complicated setup. Just pick your model, launch your job, and go.
Reinforcement learning (RL) has transformed how we fineâtune language models. Traditional approaches like Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO) use a âcriticâ value networkâdoubling model size, memory requirements, and complexity. Meanwhile, humanâalignment methods like Direct Preference Optimization (DPO) optimize for preference, not reasoning.
In 2025, open-source AI is exploding. Metaâs LLaMA 4, the latest in the LLaMA series, is setting new benchmarks for reasoning, multilingual fluency, and tool use. From chatbots to copilots, it's already powering the next wave of AI apps. However, running LLaMA 4 â or any large model at scale often requires time-consuming setup, infrastructure engineering, and DevOps.
We're thrilled to introduce Open-Sora 2.0, a cutting-edge open-source video generation model trained with just $200,000 â delivering 11B parameter performance on par with leading closed-source models like HunyuanVideo and Step-Video (30B). And now, you can fine-tune or run inference with Open-Sora 2.0 instantly â on the HPC-AI.COM GPU cloud, with no contracts, global coverage, and prices starting at just $1.99/GPU hour.
DeepSeek V3/R1 is a hit around the world, with solutions and API services based on the original model becoming widely available, leading to a race to the bottom in pricing and free offerings. How can we stand on the shoulders of the giant and leverage post-training with domain-specific data to build high-quality private models at low cost, enhancing business competitiveness and value?
DeepSeek-R1 is the most popular AI model nowadays, attracting global attention for its impressive reasoning capabilities. It is an open-source LLM featuring a full CoT (Chain-of-Thought) approach for human-like inference and an MoE design that enables dynamic resource allocation to optimize efficiency. It substantially outperforms other closed-source models in a wide range of tasks including coding, creative writing, and mathematics.
Weâre thrilled to announce that HPC-AI.COM will be at NeurIPS 2024, and we can't wait to meet you at Booth 55! Whether you're a researcher, developer, or AI enthusiast, we have something exciting for everyone, including cutting-edge GPU solutions, exclusive promotions, and insightful demos.
Recently, the free video generation platform Video Ocean went live, attracting widespread attention and praise. It supports generating videos with any character, in any style, from text, images, or roles. How did Video Ocean achieve rapid updates at low cost? What cutting-edge technologies are behind it?
Singapore-HPC-AI Tech, a startup specializing in AI Software Infrastructure and Video Generation AI, has announced the successful closure of a 50 Million USD Series A funding round. The investors include Singtel Innov8, Sinovation Ventures, Capstone Capital, Greater Bay Area Homeland, Lingfeng Capital, and Stony Creek Capital.
Large AI models have received unprecedented attention in recent years, and have had a profound impact in various application scenarios. Correspondingly, the demand for efficient and highly available large model inference systems is gradually growing, becoming a core challenge for many enterprises.
We are thrilled to announce that we have been selected for the AWS Activate and Google Startup Cloud Program, and have received support including cloud computing resources, AWS/Google Cloud Community, and co-marketing opportunities, etc. This recognition is a huge milestone for us, which will be invaluable to our continued growth and success.
HPC-AI Tech today announced it has joined NVIDIA Inception, a program designed to nurture startups revolutionizing industries with technology advancements. HPC-AI Tech is focused on increasing AI productivity and building a world-class distributed AI development and deployment platform that enables supercomputers and cloud platforms to serve AI at a much lower cost.