Learning Goals
Based on the experience from earlier runs of the course I have designed this year's edition to serve two different goals
- impart fundamental concepts of contemporary Natural Language Processing tasks, tools and applications and
- train elementary techniques of scientific work, i.e.
- formulating research questions,
- conducting literature studies,
- presenting scientific results and
- writing a scientific text in its different phases: drafting, revising, reviewing.
In particular the course aims at developing the capability to
actively perform these steps by placing the emphasis on self-study rather than lectures and on defining your own research interests rather than repeating the ones from the literature.
Content
According to the two goals mentioned above, the course will consist of two parts, a reading club and a writing club. To learn about different approaches to NLP, the reading club will study essential chapters from the draft version of the 3rd edition of
Daniel Jurafsky and James H. Martin (forthcoming) Speech and Language Processing, 3rd edition, version of Jan. 7th, 2023. If you are interested in additional information about neural network approaches in general I recommend
Ian Goodfellow, Yoshua Bengio, Aaron Courville: Deep Learning. MIT Press, 2016.
available under
https://www.deeplearningbook.org/. A good source of information about neural networks in NLP is Chris Mannings lecture
CS224N: NLP with Deep Learning at Stanford University on youtube.
To prepare our reading club sessions you will be given
reading assignments. You are expected to study the corresponding chapters
before our session starts and to post your questions about them on our common
blackboard. During the session we will try to answer these questions in a collaborative effort. Collecting questions prior to the online meetings might help us to organize the discussions in a structured manner.
NLP has grown considerably in terms of methods, tasks and application areas and this development is closely mirrored in the book which also has grown at an increasing rate from edition to edition. I am fully aware that the reading load I impose on you is quite heavy, but I doubt that it can be reduced significantly without missing important aspects and ideas from the field. Deliberately, I scheduled the meetings of our reading club with always a full day in between two subsequent appointments, to allow you to prepare the upcoming session properly. Additionally I recommend you to start reading as early as possible, in particular
- to get a first impression of the broad spectrum of research goals and approaches of NLP,
- to be able to understand the many cross-relationships between the different areas,
- to identify the areas, which seem most interesting to you and/or which are most relevant for your PhD project,
- to get familiar with the state-of-the-art in your favourite areas as a good starting point for your talk and the writing assignments.
Of course, the selection of topics I proposed is a biased one. I have chosen the chapters according to my individual preferences and my personal areas of expertise. If you feel that I have left out some important issue, please tell me. We certainly will find a way to better meet your specific interests. Similarly, the dates for the meetings can be subject to discussion and modification, although I have to admit that the leeway for adjustments is much more restricted in this case. We will discuss these questions at our first meeting.
In the second phase of the course, the writing club, you will be asked to formulate your individual (research) questions about your favourite topic. Such questions can address e.g.
- selected aspects which you found particularly difficult to comprehend or highly surprising,
- the reasons why one method performs better (or worse) than another one,
- the most important design decisions which made an approach feasible,
- the limitations of an approach and possible ways to overcome them,
- the applicability of an approach under unfavorable conditions (e.g. limited computational resources, limited or low quality data sources, the necessity for online processing etc.).
The relationship of the topic to the treatment of one (or several) Ethiopian languages will always be of great importance.
Eventually, you will present your questions and your answers to them both orally as a talk and in written form as an essay. Good talks and essays always result from a recurrent effort of drafting and revising. We will try to perform at least one full cycle of this process. To help you in improving your skills, you will receive feedback from both, your peers and me specific to the current phase of the drafting and writing process. In particular, you will be asked to review
- the draft of an essay written by one of your peers (in order to help her/him to improve it) and
- one final version (in order to inform a fictional program committee of a fictional conference or journal about the strong and weak sides of the contribution).
Grading
The overall grading of the course will be a weighted sum of the grades for your different contributions in three areas
- the active participation in our sessions (30%), i.e.
- asking and answering questions about the reading assignments,
- your talk,
- your participation in the discussion of other talks and
- the feedback you gave to the other talks.
- two peer reviews of the two essay versions of other participants (20%) and
- the final version of your essay (50%).
Blackboard access
Technically, our blackboard is an Etherpad, an interactive, text-based tool for group collaboration. It allows all participants to actively contribute, access and revise its content. In contrast to the chat channel of a video conferencing system, an Etherpad is a persistent means of information distribution, ideally tailored to the needs of a collaborative learning group.
To be able to access the private Etherpad for our course you have to register first. This is a four-step procedure.
- I will invite you to register as a guest to the server (mafiasi.de) where the Etherpad is hosted and maintained by our Computer Science students. You will receive a mail with the invitation (unfortunately in German only )
- You have to accept the invitation by choosing an individual password and clicking on the registration link in the mail.
- Afterwards I will invite you (again via the mafiasi.de server) to become a member of the etherpad group Addis2023, which is the group for the interaction within our course.
- Again you have to accept the invitation.
After accepting both invitations you should be able to access our (private) group pad
blackboard.
If you have difficulties to register or if you are in doubt, whether the links in the mail are trustworthy ones, you can send me a copy of the invitation mail and I will most likely be able to give you further advice. The mails are system-generated one and unfortunately I do not know what exactly the system is sending to you. We can also try to solve possible problems during our first session. Alternatively you can ask the participants of last year's round for help.
Try writing to the Etherpad and revising existing content, but respect other peoples entries.
--
WolfgangMenzel - 02 Feb 2023