We aim to create a vocabulary module assisting students in memorizing and expanding their word bank. This module will facilitate word collection, and our expansive personalized question bank can generate tailored questions based on users’ weaker vocabulary areas.
Sunak previously assisted in establishing a basic Word Module, prompting our consideration of data sources.
Considering the extensive and authoritative nature of dictionary data, I discovered that Wiktionary offers a free dictionary, accessible via this link: https://dumps.wikimedia.org/enwiktionary/20231101/
The provided link directs to the latest data, and removing the date segment may reveal more recent updates. Specifically, please locate the file named “enwiktionary-20231101-pages-articles.xml.bz2,” highlighted for emphasis.
However, the file’s size is substantial; my recent download was 8GB, rendering it unopenable, let alone processing it with a program.
Subsequently, I came across a link (https://api.dictionaryapi.dev/api/v2/entries/en/) that provides processed information derived from the aforementioned XML. This discovery saves time spent on XML processing. Consequently, I modified the MNWord structure to align with the data format from this link.
The subsequent challenge was to determine the word count. To address this, I devised a solution: each time a user saves a question, the system calls upon this API, converting the data into our MNWord format, and storing both the data and audio in the database.