Wiktionary Extractions for NLP & Linguistics
Welcome! As part of OpenML's passion for data, NLP, and machine learning, we've processed English Wiktionary data into a structured, machine-readable JSONL format. This project makes multilingual dictionary data accessible for linguists and ML practitioners alike.
Deutsch (German)
A West Germanic language spoken by over 130 million people, known for its complex grammar and compound nouns.
Latinum (Latin)
The classical language of ancient Rome, which formed the foundation for all Romance languages spoken today.
Ἑλληνική (Ancient Greek)
The language of Homer and classical Athenian philosophers, crucial to the development of Western civilization.
한국어 (Korean)
Known for its unique alphabet, Hangul, considered one of the most scientific writing systems in the world.
𐎠𐎼𐎹 (Old Persian)
One of the two oldest attested Old Iranian languages, preserved in cuneiform inscriptions from the Achaemenid Empire.
𒀝𒅗𒁺𒌑(𒌝) (Akkadian)
An extinct East Semitic language that was spoken in ancient Mesopotamia, one of the earliest known written languages.
Elamite
A language isolate spoken by the ancient Elamites in modern-day Iran. Its script remains largely undeciphered.
संस्कृतम् (Sanskrit)
A classical language of South Asia that is the liturgical language of Hinduism, Buddhism, and Jainism.