Bringing AI to UK Languages With NVIDIA Nemotron


Celtic languages — together with Cornish, Irish, Scottish Gaelic and Welsh — are the U.Okay.’s oldest residing languages. To empower their audio system, the UK-LLM sovereign AI initiative is constructing an AI mannequin primarily based on NVIDIA Nemotron that may cause in each English and Welsh, a language spoken by about 850,000 folks in Wales at present.

Enabling high-quality AI reasoning in Welsh will help the supply of public providers together with healthcare, training and authorized assets within the language.

“I would like each nook of the U.Okay. to have the ability to harness the advantages of synthetic intelligence. By enabling AI to cause in Welsh, we’re ensuring that public providers — from healthcare to training — are accessible to everybody, within the language they stay by,” mentioned U.Okay. Prime Minister Keir Starmer. “This can be a highly effective instance of how the newest AI know-how, educated on the U.Okay.’s most superior AI supercomputer in Bristol, can serve the general public good, shield cultural heritage and unlock alternative throughout the nation.”

The UK-LLM mission, established in 2023 as BritLLM and led by College School London, has beforehand launched two fashions for U.Okay. languages. Its new mannequin for Welsh, developed in collaboration with Wales’ Bangor College and NVIDIA, aligns with Welsh authorities efforts to spice up the lively use of the language, with the objective of reaching one million audio system by 2050 — an initiative often known as Cymraeg 2050.

U.Okay.-based AI cloud supplier Nscale will make the brand new mannequin accessible to builders by means of its utility programming interface.

“The purpose is to make sure that Welsh stays a residing, respiration language that continues to develop with the instances,” mentioned Gruffudd Prys, senior terminologist and head of the Language Applied sciences Unit at Canolfan Bedwyr, the college’s middle for Welsh language providers, analysis and know-how. “AI reveals monumental potential to assist with second-language acquisition of Welsh in addition to for enabling native audio system to enhance their language abilities.”

This new mannequin might additionally enhance the accessibility of Welsh assets by enabling public establishments and companies working in Wales to translate content material or present bilingual chatbot providers. This may help teams together with healthcare suppliers, educators, broadcasters, retailers and restaurant homeowners guarantee their written content material is as available in Welsh as they’re in English.

Past Welsh, the UK-LLM crew goals to use the identical methodology used for its new mannequin to develop AI fashions for different languages spoken throughout the U.Okay. akin to Cornish, Irish, Scots and Scottish Gaelic — in addition to work with worldwide collaborators to construct fashions for languages from Africa and Southeast Asia.

“This collaboration with NVIDIA and Bangor College enabled us to create new coaching knowledge and prepare a brand new mannequin in file time, accelerating our objective to construct the all-time language mannequin for Welsh,” mentioned Pontus Stenetorp, professor of pure language processing and deputy director for the Centre of Synthetic Intelligence at College School London. “Our purpose is to take the insights gained from the Welsh mannequin and apply them to different minority languages, within the U.Okay. and throughout the globe.”

Tapping Sovereign AI Infrastructure for Mannequin Growth 

The brand new mannequin for Welsh relies on NVIDIA Nemotron, a household of open-source fashions that options open weights, datasets and recipes. The UK-LLM improvement crew has tapped the 49-billion-parameter Llama Nemotron Tremendous mannequin and 9-billion-parameter Nemotron Nano mannequin, post-training them on Welsh-language knowledge.

In contrast with languages like English or Spanish, there’s much less accessible supply knowledge in Welsh for AI coaching. So to create a sufficiently massive Welsh coaching dataset, the crew used NVIDIA NIM microservices for gpt-oss-120b and DeepSeek-R1 to translate NVIDIA Nemotron open datasets with over 30 million entries from English to Welsh.

They used a GPU cluster by means of the NVIDIA DGX Cloud Lepton platform and are harnessing lots of of NVIDIA GH200 Grace Hopper Superchips on Isambard-AI — the U.Okay.’s strongest supercomputer, backed by £225 million in authorities funding and primarily based at College of Bristol — to speed up their translation and coaching workloads.

This new dataset dietary supplements present Welsh knowledge from the crew’s earlier efforts.

Capturing Linguistic Nuances With Cautious Analysis

Bangor College, positioned in Gwynedd — the county with the highest share of Welsh audio system — is supporting the brand new mannequin’s improvement with linguistic and cultural experience.

Welsh translation of: “The purpose is to make sure that Welsh stays a residing, respiration language that continues to develop with the instances.” — Gruffudd Prys, Bangor College

Prys, from the college’s Welsh-language middle, brings to the collaboration about twenty years of expertise with language know-how for Welsh. He and his crew are serving to to confirm the accuracy of machine-translated coaching knowledge and manually translated analysis knowledge, in addition to assess how the mannequin handles nuances of Welsh that AI sometimes struggles with — akin to the way in which consonants at first of Welsh phrases change primarily based on neighboring phrases.

The mannequin, in addition to the Welsh coaching and analysis datasets, are anticipated to be made accessible for enterprise and public sector use, supporting further analysis, mannequin coaching and utility improvement.

“It’s one factor to have this AI functionality exist in Welsh, nevertheless it’s one other to make it open and accessible for everybody,” Prys mentioned. “That delicate distinction might be the distinction between this know-how getting used or not getting used.”

Deploy Sovereign AI Fashions With NVIDIA Nemotron, NIM Microservices

The framework used to develop UK-LLM’s mannequin for Welsh can function a basis for multilingual AI improvement around the globe.

Benchmark-topping Nemotron fashions, knowledge and recipes are publicly accessible for builders to construct reasoning fashions tailor-made to nearly any language, area and workflow. Packaged as NVIDIA NIM microservices, Nemotron fashions are optimized for cost-effective compute and run anyplace, from laptop computer to cloud.

Europe’s enterprises will be capable of run open, sovereign fashions on the Perplexity AI-powered search engine.

Get began with NVIDIA Nemotron.


Welsh translation: 

Ymestyn Ar Attracts yr Ynysoedd: Mae DU-LLM yn Dod â Deallusrwydd Artiffisial i Ieithoedd y DU Gyda NVIDIA Nemotron

Wedi’i hyfforddi ar yr uwch gyfrifiadur Isambard-AI, mae mannequin newydd a ddatblygwyd gan College School London, NVIDIA a Phrifysgol Bangor yn manteisio ar dechnegau a setiau knowledge ffynhonnell agored NVIDIA Nemotron i alluogi rhesymu Deallusrwydd Artiffisial ar gyfer y Gymraeg ac ieithoedd eraill y DU ar gyfer gwasanaethau cyhoeddus gan gynnwys gofal iechyd, addysg ac adnoddau cyfreithiol.

Ieithoedd Celtaidd — gan gynnwys Cernyweg, Gwyddeleg, Gaeleg yr Alban a Chymraeg — yw ieithoedd byw hynaf y DU. Er mwyn grymuso eu siaradwyr, mae menter Deallusrwydd Artiffisial sofran y DU-LLM yn adeiladu mannequin Deallusrwydd Artiffisial yn seiliedig ar NVIDIA Nemotron a all resymu yn Saesneg a Chymraeg hefyd, iaith a siaredir gan tua 850,000 o bobl yng Nghymru heddiw.

Bydd galluogi rhesymu Deallusrwydd Artiffisial o ansawdd uchel yn y Gymraeg yn cefnogi’r ddarpariaeth o wasanaethau cyhoeddus gan gynnwys gofal iechyd, addysg ac adnoddau cyfreithiol yn yr iaith.

“Rwyf am i bob cwr o’r DU allu harneisio manteision deallusrwydd artiffisial. Drwy alluogi deallusrwydd artiffisial i resymu yn y Gymraeg, rydym yn sicrhau bod gwasanaethau cyhoeddus — o ofal iechyd i addysg — yn hygyrch i bawb, yn yr iaith maen nhw’n byw ynddi,” meddai Prif Weinidog y DU, Keir Starmer. “Mae hon yn enghraifft bwerus o sut y gall y dechnoleg dddiweddaraf, wedi’i hyfforddi ar uwch gyfrifiadur deallusrwydd artiffisial mwyaf datblygedig y DU ym Mryste, wasanaethu lles y cyhoedd, amddiffyn treftadaeth ddiwylliannol a datgloi cyfleoedd ledled y wlad.”

Mae prosiect DU-LLM, a sefydlwyd yn 2023 fel BritLLM ac a arweinir gan College School London, wedi rhyddhau dau fodel ar gyfer ieithoedd y DU yn flaenorol. Mae ei fodel newydd ar gyfer y Gymraeg, a ddatblygwyd mewn cydweithrediad â Phrifysgol Bangor Cymru ac NVIDIA, yn cyd-fynd ag ymdrechion llywodraeth Cymru i hybu defnydd gweithredol o’r iaith, gyda’r nod o gyflawni miliwn o siaradwyr erbyn 2050 — menter o’r enw Cymraeg 2050.

Bydd darparwr cwmwl Deallusrwydd Artiffisial yn y DU, Nscale, yn sicrhau bod y mannequin newydd ar gael i ddatblygwyr trwy ei ryngwyneb rhaglennu rhaglenni (API).

“Y nod yw sicrhau bod y Gymraeg yn parhau i fod yn iaith fyw, sy’n anadlu ac sy’n parhau i ddatblygu gyda’r oes,” meddai Gruffudd Prys, uwch derminolegydd a phennaeth yr Uned Technolegau Iaith yng Nghanolfan Bedwyr, canolfan y brifysgol ar gyfer gwasanaethau, ymchwil a thechnoleg y Gymraeg. “Mae deallusrwydd artiffisial yn dangos potensial aruthrol i helpu gyda chaffael y Gymraeg fel ail iaith yn ogystal â galluogi siaradwyr brodorol i wella eu sgiliau iaith.”

Gallai’r mannequin newydd hwn hefyd roi hwb i hygyrchedd adnoddau Cymraeg drwy alluogi sefydliadau cyhoeddus a busnesau sy’n gweithredu yng Nghymru i gyfieithu cynnwys neu ddarparu gwasanaethau sgwrsfot dwyieithog. Gall hyn helpu grwpiau gan gynnwys darparwyr gofal iechyd, addysgwyr, darlledwyr, manwerthwyr a pherchnogion bwytai i sicrhau bod eu cynnwys ysgrifenedig yr un mor hawdd ar gael yn y Gymraeg ag y mae yn Saesneg.

Y tu hwnt i’r Gymraeg, mae tîm y DU-LLM yn anelu at gymhwyso’r un fethodoleg a ddefnyddiwyd ar gyfer ei fodel newydd i ddatblygu modelau Deallusrwydd Artiffisial ar gyfer ieithoedd eraill a siaredir ledled y DU fel Cernyweg, Gwyddeleg, Sgoteg a Gaeleg yr Alban — yn ogystal â gweithio gyda chydweithwyr rhyngwladol i adeiladu modelau ar gyfer ieithoedd o Affrica a De-ddwyrain Asia.

“Mae’r cydweithrediad hwn gydag NVIDIA a Phrifysgol Bangor wedi ein galluogi i greu knowledge hyfforddi newydd a hyfforddi mannequin newydd mewn amser file, gan gyflymu ein nod o adeiladu’r mannequin iaith gorau erioed ar gyfer y Gymraeg,” meddai Pontus Stenetorp, yr athro prosesu iaith naturiol a dirprwy gyfarwyddwr y Ganolfan Deallusrwydd Artiffisial yn College School London. “Ein nod yw cymryd y mewnwelediadau a gafwyd o’r mannequin Cymraeg a’u cymhwyso i ieithoedd lleiafrifol eraill, yn y DU ac ar attracts y byd.

Manteisio ar Seilwaith Deallusrwydd Artiffisial Sofran ar gyfer Datblygu Mannequin 

Mae’r mannequin newydd ar gyfer y Gymraeg yn seiliedig ar NVIDIA Nemotron, teulu o fodelau ffynhonnell agored sy’n cynnwys pwysau, setiau knowledge a ryseitiau agored.Mae’r tîm datblygu DU-LLM wedi manteisio ar fodel 49-biliwn-paramedr Llama Nemotron Tremendous a mannequin 9-biliwn-paramedr Nemotron Nano, gan eu hôl hyfforddi ar ddata iaith Gymraeg.

O’i gymharu ag ieithoedd fel Saesneg neu Sbaeneg, mae llai o ddata ffynhonnell ar gael yn y Gymraeg ar gyfer hyfforddiant Deallusrwydd Artiffisial. Felly, er mwyn creu set ddata hyfforddi Cymraeg ddigon mawr, defnyddiodd y tîm ficrowasanaethau NVIDIA NIM ar gyfer gpt-oss-120b a DeepSeek-R1 i gyfieithu setiau knowledge agored NVIDIA gyda dros 30 miliwn o gofnodion o’r Saesneg i’r Gymraeg.

Defnyddion nhw glwstwr GPU drwy blatfform NVIDIA DGX Cloud Lepton ac yn harneisio cannoedd o Uwchsglodion NVIDIA GH200 Grace Hopper ar Isambard-AI — uwchgyfrifiadur mwyaf pwerus y DU, gyda chefnogaeth £225 miliwn o fuddsoddiad gan y llywodraeth ac wedi’i leoli ym Mhrifysgol Bryste — i gyflymu eu llwythi gwaith cyfieithu a hyfforddi.

Mae’r set ddata newydd hon yn ategu knowledge presennol yr iaith Gymraeg o ymdrechion blaenorol y tîm.

Cipio Naws Ieithyddol Gyda Gwerthusiad Gofalus

Mae Prifysgol Bangor, sydd wedi’i lleoli yng Ngwynedd — y sir gyda’r ganran uchaf o siaradwyr Cymraegs — yn cefnogi datblygiad y mannequin newydd gydag arbenigedd ieithyddol a diwylliannol.

Mae Prys, o ganolfan Gymraeg y brifysgol, yn dod â thua dau ddegawd o brofiad gyda thechnoleg iaith ar gyfer y Gymraeg i’r cydweithrediad. Mae ef a’i dîm yn helpu i wirio cywirdeb knowledge hyfforddi a gyfieithir gan beiriannau a knowledge gwerthuso a gyfieithir â llaw, yn ogystal ag asesu sut mae’r mannequin yn ymdrin â naws Gymraeg y mae Deallusrwydd Artiffisial fel arfer yn cael trafferth â nhw — megis y ffordd y mae cytseiniaid ar ddechrau geiriau Cymraeg yn newid yn seiliedig ar eiriau cyfagos.

Disgwylir i’r mannequin, yn ogystal â’r setiau knowledge hyfforddiant a gwerthuso’r Gymraeg, fod ar gael i fentrau a’r sector cyhoeddus eu defnyddio, gan gefnogi ymchwil ychwanegol, hyfforddiant modelu a datblygu rhaglenni.

“Mae’n un peth cael y gallu Deallusrwydd Artiffisial hwn yn bodoli yn y Gymraeg, ond mae’n beth arall ei wneud yn agored ac yn hygyrch i bawb,” meddai Prys. “Gall y gwahaniaeth cynnil hwnnw fod y gwahaniaeth rhwng y dechnoleg hon yn cael ei defnyddio ai peidio.”

Defnyddio Modelau Deallusrwydd Artiffisial Sofran Gyda NVIDIA Nemotron, Microwasanaethau NIM

Gall y fframwaith a ddefnyddiwyd i ddatblygu mannequin DU-LLM ar gyfer y Gymraeg fod yn sylfaen ar gyfer datblygu Deallusrwydd Artiffisial amlieithog ledled y byd.

Mae modelau, knowledge a ryseitiau Nemotron, sy’n cyrraedd y brig, ar gael yn gyhoeddus i ddatblygwyr er mwyn iddynt adeiladu modelau rhesymu sydd wedi’u teilwra i bron unrhyw iaith, parth a llif gwaith. Wedi’u pecynnu fel microgwasanaethau NVIDIA NIM, mae modelau Nemotron wedi’u hoptimeiddio ar gyfer cyfrifiadura cost-effeithiol a rhedeg yn unrhyw le, o liniadur i’r cwmwl.

Bydd mentrau Ewrop yn gallu rhedeg modelau agored, sofran ar y peiriant chwilio Perplexity wedi’i bweru gan Ddeallusrwydd Artiffisial.

Dewch i ddechrau arni gyda NVIDIA Nemotron.



Supply hyperlink

Leave a Reply

Your email address will not be published. Required fields are marked *

news-1701

sabung ayam online

yakinjp

yakinjp

rtp yakinjp

slot thailand

yakinjp

yakinjp

yakin jp

yakinjp id

maujp

maujp

maujp

maujp

sabung ayam online

sabung ayam online

judi bola online

sabung ayam online

judi bola online

slot mahjong ways

slot mahjong

sabung ayam online

judi bola

live casino

sabung ayam online

judi bola

live casino

SGP Pools

slot mahjong

sabung ayam online

slot mahjong

SLOT THAILAND

berita 128000696

berita 128000697

berita 128000698

berita 128000699

berita 128000700

berita 128000701

berita 128000702

berita 128000703

berita 128000704

berita 128000705

berita 128000706

berita 128000707

berita 128000708

berita 128000709

berita 128000710

berita 128000711

berita 128000712

berita 128000713

berita 128000714

berita 128000715

berita 128000716

berita 128000717

berita 128000718

berita 128000719

berita 128000720

berita 128000721

berita 128000722

berita 128000723

berita 128000724

berita 128000725

artikel-128000751

artikel-128000752

artikel-128000753

artikel-128000754

artikel-128000755

artikel-128000756

artikel-128000757

artikel-128000758

artikel-128000759

artikel-128000760

artikel-128000761

artikel-128000762

artikel-128000763

artikel-128000764

artikel-128000765

artikel-128000766

artikel-128000767

artikel-128000768

artikel-128000769

artikel-128000770

artikel-128000771

artikel-128000772

artikel-128000773

artikel-128000774

artikel-128000775

artikel-128000776

artikel-128000777

artikel-128000778

artikel-128000779

artikel-128000780

artikel-128000781

artikel-128000782

artikel-128000783

artikel-128000784

artikel-128000785

artikel-128000786

artikel-128000787

artikel-128000788

artikel-128000789

artikel-128000790

artikel 128000791

artikel 128000792

artikel 128000793

artikel 128000794

artikel 128000795

artikel 128000796

artikel 128000797

artikel 128000798

artikel 128000799

artikel 128000800

artikel 128000801

artikel 128000802

artikel 128000803

artikel 128000804

artikel 128000805

artikel 128000806

artikel 128000807

artikel 128000808

artikel 128000809

artikel 128000810

artikel 128000811

artikel 128000812

artikel 128000813

artikel 128000814

artikel 128000815

artikel 128000816

artikel 128000817

artikel 128000818

artikel 128000819

artikel 128000820

article 138000756

article 138000757

article 138000758

article 138000759

article 138000760

article 138000761

article 138000762

article 138000763

article 138000764

article 138000765

article 138000766

article 138000767

article 138000768

article 138000769

article 138000770

article 138000771

article 138000772

article 138000773

article 138000774

article 138000775

article 138000776

article 138000777

article 138000778

article 138000779

article 138000780

article 138000781

article 138000782

article 138000783

article 138000784

article 138000785

article 138000786

article 138000787

article 138000788

article 138000789

article 138000790

article 138000791

article 138000792

article 138000793

article 138000794

article 138000795

article 138000796

article 138000797

article 138000798

article 138000799

article 138000800

article 138000801

article 138000802

article 138000803

article 138000804

article 138000805

article 138000806

article 138000807

article 138000808

article 138000809

article 138000810

article 138000811

article 138000812

article 138000813

article 138000814

article 138000815

article 138000716

article 138000717

article 138000718

article 138000719

article 138000720

article 138000721

article 138000722

article 138000723

article 138000724

article 138000725

article 138000726

article 138000727

article 138000728

article 138000729

article 138000730

article 138000731

article 138000732

article 138000733

article 138000734

article 138000735

article 138000736

article 138000737

article 138000738

article 138000739

article 138000740

article 138000741

article 138000742

article 138000743

article 138000744

article 138000745

article 228000341

article 228000342

article 228000343

article 228000344

article 228000345

article 228000346

article 228000347

article 228000348

article 228000349

article 228000350

article 228000351

article 228000352

article 228000353

article 228000354

article 228000355

article 228000356

article 228000357

article 228000358

article 228000359

article 228000360

article 228000361

article 228000362

article 228000363

article 228000364

article 228000365

article 228000366

article 228000367

article 228000368

article 228000369

article 228000370

article 228000371

article 228000372

article 228000373

article 228000374

article 228000375

article 238000461

article 238000462

article 238000463

article 238000464

article 238000465

article 238000466

article 238000467

article 238000468

article 238000469

article 238000470

article 238000471

article 238000472

article 238000473

article 238000474

article 238000475

article 238000476

article 238000477

article 238000478

article 238000479

article 238000480

article 238000481

article 238000482

article 238000483

article 238000484

article 238000485

article 238000486

article 238000487

article 238000488

article 238000489

article 238000490

article 228000376

article 228000377

article 228000378

article 228000379

article 228000380

article 228000381

article 228000382

article 228000383

article 228000384

article 228000385

article 228000386

article 228000387

article 228000388

article 228000389

article 228000390

article 228000391

article 228000392

article 228000393

article 228000394

article 228000395

article 228000396

article 228000397

article 228000398

article 228000399

article 228000400

article 228000401

article 228000402

article 228000403

article 228000404

article 228000405

article 238000492

article 238000493

article 238000494

article 238000495

article 238000496

article 238000497

article 238000498

article 238000499

article 238000500

article 238000501

article 238000502

article 238000503

article 238000504

article 238000505

article 238000506

article 238000507

article 238000508

article 238000509

article 238000510

article 238000511

article 238000512

article 238000513

article 238000514

article 238000515

article 238000516

article 238000517

article 238000518

article 238000519

article 238000520

article 238000521

sumbar-238000381

sumbar-238000382

sumbar-238000383

sumbar-238000384

sumbar-238000385

sumbar-238000386

sumbar-238000387

sumbar-238000388

sumbar-238000389

sumbar-238000390

sumbar-238000391

sumbar-238000392

sumbar-238000393

sumbar-238000394

sumbar-238000395

sumbar-238000396

sumbar-238000397

sumbar-238000398

sumbar-238000399

sumbar-238000400

sumbar-238000401

sumbar-238000402

sumbar-238000403

sumbar-238000404

sumbar-238000405

sumbar-238000406

sumbar-238000407

sumbar-238000408

sumbar-238000409

sumbar-238000410

news-1701