Prof Aly Fahmy, Dr Wael Gomaa

THE 9th INTERNATIONAL CONFERENCE ON INFORMATICS
AND SYSTEMS
(INFOS2014)
Tutorial
Wednesday, 17th December, 2014, 12:40pm-2:40pm
Cairo University Conference Center
(At Student City of Cairo University, Ahmed Zewail Street, Orman, Dokki)
Tapping into the Power of Text Similarity
Prof. Aly Fahmy
Dr. Wael Gomaa
Abstract
Measures of text similarity have been used for a long time in applications in natural language processing,
including Information Retrieval, Document Clustering, Word Sense Disambiguation, Machine Translation,
Text Summarization and Short Answer Grading. This tutorial will introduce the theoretical and practical
aspects of String-based, Knowledge-based and Corpus based similarity measures. It will also describe how
these measures can be used in Short Answer Grading task.
This two hour tutorial will be divided into four sections.
The First Section
• String-Based similarity measures that depend on string metrics will be introduced.
• The practical part of this section will cover one of the open source library of similarity
called SimMetrics package.
The Second Section
• Knowledge-Based Similarity as a kind of semantic similarity measures that is based on identifying the
degree of similarity between words using information derived from semantic networks will be introduced.

The practical part of this section will cover WordNet::Similarity package. It covers a broader range of
relationships between concepts, such as is-a type-of, is-a-specific-example-of, is-a-part-of, and is-theopposite-of.
• WordNet::Similarity implements six measures of semantic similarity. Three of these measures are based
on Information content: Res, Lin and Jcn. The other three measures are based on the path length: Lch,
Wup and Path. It also implements three measures of semantic relatedness: Hso, Lesk and Vector.
The Third Section
• Corpus-Based Similarity that detects the similarity between words according to the information gained
from large corpora will be introduced.
• There are many corpus-based similarity techniques, such as Latent Semantic Analysis (LSA), Explicit
Semantic Analysis (ESA), PointwiseMutualInformation-Information Retrieval (PMI-IR), and Extracting
Distributionally similar words using CO-occurrences (DISCO).
• The practical part of this section will cover Gensim, SEMLAR, and DISCO packages.
The Fourth Section

Short answer grading task as a text similarity application will be introduced.
• From a learning point of view short-answer tests are more effective than multiple-choice tests. An
automatic short answer grading system is a system that automatically assigns a grade to a student
conducted answer by comparing it with one or more correct answers.
• The practical part of this section will cover briefly some experiments on three data sets that handle Data
Structure, Environmental Science and Philosophy courses and their results for both Arabic and English
languages
This tutorial presumes no prior knowledge of text similarity measures, and so should be accessible to
anyone with an interest in the topic.
Prof Aly Fahmy
Prof Aly Aly Fahmy is the x-Dean of Faculty of Computers & Information – Cairo University. His research
interest is in Artificial Intelligence Topics such as natural language processing, data and text mining, and
information retrieval. Prof Aly Fahmy has a number of publications. He obtained B.Sc in June 1972,
Computer Engineering Department Military Technical College (M.T.C) Excellent with Honor Grade. DPL Diploma: General Purpose Simulation, June 1973, Computer Department, Military Technical College
(M.T.C), M.Sc in Logical Database Systems 1976, Computer Department, E.N.S.A.E, Toulouse, France.
Ph.D in Artificial Intelligence Control of Automatic Deductions for Logic Based Systems 1979, Computer
Department, The Centre of Research and Studies, Toulouse, France(C.E.R.T) under the supervision of H.
Gallaire (Ex Vice President and Chief Technical Officer of Xerox Corporation) and J.M. Nicolas.
Dr. Wael Hassan Gomaa
Currently working as a lecturer, Computer Science Department, Modern Academy for Computer Science
& Management Technology, Cairo, Egypt. He recently obtained PhD degree from Faculty of Computers
and Information, Cairo University, Egypt in the field of Automatic Assessment under supervision of Prof.
Aly Aly Fahmy. He received his BSc and Master degrees from Faculty of Computers and Information,
Helwan University, Egypt. His master thesis was entitled “Text Mining Hybrid Approach for Clustered
Semantic Analysis”. His research interests include Natural Language Processing, Artificial Intelligence,
Data Mining and Text Mining.