SK Telecom Releases New Vision-Language and Document Analysis AI Models on Open-Source Platform

SK Telecom announced the official release of its latest vision-language model (VLM) and versatile document interpretation technology on July 29. The models, based on its proprietary large language model (LLM) called ‘A.X’, were made available through the open-source platform Hugging Face.

The newly released models include ‘A.X Encoder’ and ‘A.X 4.0 VL Light’, which are freely accessible for research and commercial use. This launch marks a significant step in SK Telecom’s ongoing effort to advance AI technology and expand its industrial applications. Following the release of two models in July—the standard and lightweight versions of A.X 4.0, as well as two versions of A.X 3.1 built from scratch—the company now offers a total of six models. SK Telecom plans to continue refining its A.X 4.0-based inference models and further expanding the practical scope of its LLM technology.

The ‘A.X Encoder’ is designed as a natural language processing (NLP) encoder model optimized for large-scale LLM training. It can process up to 16,384 tokens, offering up to three times faster inference and twice the training speed compared to previous models. This allows it to handle longer documents and more complex contexts effectively. With approximately 149 million parameters, the model achieved an average score of 85.47 in natural language understanding benchmarks, surpassing the 80.19 score of the global open-source model ‘RoBERTa-base (KLUE benchmark)’, proving its top-tier performance.

Lightweight yet Powerful: ‘A.X 4.0 VL Light’

‘A.X 4.0 VL Light’ is a multimodal Korean visual-language model trained on a diverse dataset that combines visual elements and text comprehension. It demonstrates excellent performance in understanding complex visual data such as tables, graphs, and manufacturing diagrams, making it ideal for corporate environments.

Despite its lighter structure containing 7 billion parameters, the model delivers competitive performance comparable to medium-sized models. It achieved an average score of 79.4 on Korean visual benchmarks and 60.2 on textual benchmarks, placing it among Korea’s top lightweight models. Additionally, on the K-Viscuit benchmark—which measures multimodal Korean cultural and contextual understanding—it scored 80.2, and on KoBizDoc (document and chart comprehension), it scored 89.8. Notably, it uses approximately 41% fewer text tokens than the Qwen2.5-VL32B model at the same input level, enhancing cost efficiency for enterprise users.

Kim Taeyoon, head of SK Telecom’s foundation models division, emphasized the importance of proprietary technology in realizing AI sovereignty. He stated, “Securing independent technological capabilities is at the core of AI sovereignty. We will continue to enhance our technology and strengthen collaboration within our consortium to increase our global AI competitiveness.”

Jung So-yeon 다른기사 보기