TL;DR

Norway’s National Library is training a Norwegian-language large language model (LLM) with 2 PB of Huawei flash storage. The project aims to create a sovereign AI reflecting Norwegian culture and language, amid technical and governance challenges.

Norway’s National Library is using 2 petabytes of Huawei OceanStor Dorado flash storage to develop a sovereign large language model (LLM) that understands Norwegian, marking a significant step in local AI development.

The project was discussed by Marius Husnes, Head of IT Platform at the Norwegian National Library, at Huawei’s ID Forum 2026 in Paris. The library aims to create a Norwegian-specific LLM because no commercial provider offers a local-language model, which Husnes said puts Norway at a disadvantage in AI applications related to its culture and history.

The library’s extensive digital collection, accumulated since 2005, includes 20 petabytes of unique data, encompassing books, newspapers, web content, and multimedia, stored across a 60-petabyte preservation system. The challenge lies in efficiently moving this data through the AI training pipeline, which involves data cleaning, deduplication, and normalization using an Nvidia DGX H200 system paired with Huawei’s all-flash arrays, totaling 2 PB of storage.

The training itself occurs on Norway’s Sigma2 Olivia supercomputer, equipped with 448 GPUs and a 5.3 PB Cray ClusterStor storage system. Husnes highlighted that the main bottleneck is not compute power but data quality and pipeline throughput, especially in transferring large datasets from the archival storage to training systems. The project also involves addressing technical issues like low-latency data access, data governance, and evaluation tools suited for the Norwegian language, which has multiple dialects and historical forms.

Why It Matters

This development underscores the strategic importance for nations to build sovereign AI capabilities, especially in non-English languages. It demonstrates how local data and infrastructure, supported by Huawei’s storage solutions, are critical in creating culturally relevant AI models. The project also highlights technical challenges in managing PB-scale datasets for AI training, which are relevant globally as countries seek to develop their own AI ecosystems.

Furthermore, Norway’s initiative signals a broader trend of smaller nations aiming for AI independence, emphasizing data sovereignty, cultural preservation, and governance issues. The involvement of Huawei’s storage technology indicates its growing role in the European AI infrastructure landscape, raising questions about supply chain dependencies and international tech alliances.

fanxiang 1TB PCIe 5.0 NVMe M.2 SSD,Up to 14000 MB/s,High Performance Solid State Drive for 8K Video Editing, AI Training,Gaming, PC, Laptop

fanxiang 1TB PCIe 5.0 NVMe M.2 SSD,Up to 14000 MB/s,High Performance Solid State Drive for 8K Video Editing, AI Training,Gaming, PC, Laptop

  • High-Speed PCIe 5.0 Interface: Up to 14000 MB/s read speeds
  • Broad Compatibility: Supports PCIe 5.0/4.0/3.0 M.2 slots
  • Efficient Dynamic Cooling: Real-time thermal management for stability

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Norway’s National Library has been digitizing its collection since 2005, creating one of the largest digital archives of Norwegian cultural content. The project reflects a global push for sovereign AI, driven by concerns over data control, bias, and cultural representation. Similar efforts are underway in other countries, but Norway’s approach is notable for its scale and integration of high-performance computing and advanced storage solutions.

Previous developments include the library’s legal agreements to use copyrighted content for training, and the technical groundwork of digitization and metadata management. The ongoing challenge is translating this vast, complex dataset into an effective LLM that accurately reflects Norway’s language and culture.

“No private company has this.”

— Marius Husnes

“The bottleneck was not compute; it was data quality, cleaning and pipeline throughput.”

— Marius Husnes

10Gtek PCIe Gen5 MCIO to 2xMCIO High-Speed Cable, 8X to Dual 4X 85-ohm, Server Storage Cable for NVMe Backplanes, Gen5 HBAs & All-Flash Arrays, 0.3-m(1ft)

10Gtek PCIe Gen5 MCIO to 2xMCIO High-Speed Cable, 8X to Dual 4X 85-ohm, Server Storage Cable for NVMe Backplanes, Gen5 HBAs & All-Flash Arrays, 0.3-m(1ft)

  • Connector Type: MCIO SFF-TA-1016 8i and 4i
  • Cable Length: 0.3 meters (1 foot)
  • Compatibility: Supports PCIe 5.0 Gen5

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how the Norwegian LLM will perform in practice, how evaluation metrics will be standardized, or how governance and access control will be managed long-term. The project is still in progress, and many technical and policy issues remain unresolved.

HHCJ6 Dell NVIDIA Tesla K80 24GB GDDR5 PCI-E 3.0 Server GPU Accelerator (Renewed)

HHCJ6 Dell NVIDIA Tesla K80 24GB GDDR5 PCI-E 3.0 Server GPU Accelerator (Renewed)

  • Product Model: Dell Nvidia Tesla K80 GPU
  • Memory Capacity: 24GB GDDR5 RAM
  • CUDA Cores: 4992 CUDA cores

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps involve completing the data pipeline optimization, refining evaluation tools tailored for Norwegian, and addressing governance questions. The project aims to finalize the LLM training and assess its capabilities, with broader deployment and policy discussions likely to follow.

Data for AI: Data Infrastructure for Machine Intelligence

Data for AI: Data Infrastructure for Machine Intelligence

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is Norway developing its own LLM?

Norway aims to create a sovereign AI that understands Norwegian language, culture, and history, addressing the limitations of foreign, English-centric models and ensuring cultural preservation.

What role does Huawei storage play in this project?

Huawei’s OceanStor Dorado flash storage provides the high-capacity, low-latency data infrastructure necessary for processing and training the large datasets involved in the LLM development.

What challenges are involved in this project?

Major challenges include managing PB-scale datasets, ensuring data quality, pipeline throughput, and developing evaluation and governance frameworks suitable for the Norwegian language and cultural context.

Will this model be available publicly?

It is not yet clear whether the Norwegian LLM will be publicly released or restricted to governmental and research use, as governance and policy questions are still under discussion.

Source: Hacker News

You May Also Like

OODA Explores Deep Tech and Cyber in Future Warfare

War is evolving with cutting-edge technologies and cyber threats, but what new challenges and strategies will define the future battlefield?

What Rack-Mount UPS Systems Do During Power Events

During power events, rack-mount UPS systems instantly switch to battery backup, providing…

The Cloud Divide: Data Security in a Fractured Global Cloud Ecosystem

Managing data security across fractured global clouds requires understanding regional laws and proactive strategies—discover how to stay protected in this complex environment.

Stealing from Biologists to Compile Haskell Faster

A Haskell compiler optimization problem led to discovering a biologist’s RNA folding algorithm, revealing cross-disciplinary connections and performance improvements.