Skip to content


New corpus for automatic summarisation

It is nice to see that people still create resources for automatic summarisation. A few days ago, The Essex Arabic Summaries Corpus (EASC) was announced on corpora list.  The corpus contains 153 Arabic articles and 765 human-generated extractive summaries produced using Mechanical Turk. The authors state that the corpus contains copyrighted material and it is the responsibility of the users to makes sure they comply with the legislation in their country. Unfortunately no further information is given in order to know what material was used. The annotation is freely available for research purposes and is distributed under the Creative Commons Attributive/Share Alike license. As a bonus the corpus comes with an Arabic version of ROUGE.

The corpus cannot be downloaded directly. Instead the author needs to be contacted.

Posted in resource.

Tagged with , , , .


One Response

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Wael Sharaf says

    This is really good with the lack of Arabic resourece specially for Arabic Summarization.
    I contacted the author and the reply is really quick and I exchanged some emails with him as I asked him for the words count of the corpus which is more than 18,000 distinct words.
    The corpus is simple where the articles and summaries are devided into Topics.
    Still didn’t try the Arabic version of ROUGE but it’s rather interesting to have it run on and available for Arabic, as he’s also providing the XML configeration file also.

    Best,
    W S



Some HTML is OK

or, reply to this post via trackback.



Easy AdSense by Unreal
Creative Commons License
This work is licenced under a Creative Commons Licence.