OpenML Wiktionary Data

Wiktionary Extractions for NLP & Linguistics

Welcome! As part of OpenML's passion for data, NLP, and machine learning, we've processed English Wiktionary data into a structured, machine-readable JSONL format. This project makes multilingual dictionary data accessible for linguists and ML practitioners alike.

Deutsch (German)

A West Germanic language spoken by over 130 million people, known for its complex grammar and compound nouns.

Download

Latinum (Latin)

The classical language of ancient Rome, which formed the foundation for all Romance languages spoken today.

Download

Ἑλληνική (Ancient Greek)

The language of Homer and classical Athenian philosophers, crucial to the development of Western civilization.

Download

한국어 (Korean)

Known for its unique alphabet, Hangul, considered one of the most scientific writing systems in the world.

Download

𐎠𐎼𐎹 (Old Persian)

One of the two oldest attested Old Iranian languages, preserved in cuneiform inscriptions from the Achaemenid Empire.

Download

𒀝𒅗𒁺𒌑(𒌝) (Akkadian)

An extinct East Semitic language that was spoken in ancient Mesopotamia, one of the earliest known written languages.

Download

Elamite

A language isolate spoken by the ancient Elamites in modern-day Iran. Its script remains largely undeciphered.

Download

संस्कृतम् (Sanskrit)

A classical language of South Asia that is the liturgical language of Hinduism, Buddhism, and Jainism.

Download