Poster IDIPS 2014
Document semantic hashing for hybrid security
SébastienEskenazi, Petra Gomez-Krämer, Jean-Marc Ogier
Document authentication and forgery prevention can be easily achieved for digital documents thanks to hashing technologies. However, once a document is printed and scanned this is still a problem. My thesis intends to bring a solution for it by creating a hash algorithm for document images.There are many challenges to solve to meet the requirements of a proper hash algorithm. We are currently working on hashing images of text only documents. The next step will be document layout hashing and finally we will include image and signature hashing.
Shape-based Analysis for Segmentation of Arabic Handwritten Text
Amani T. Jamal, Ching Y. Suen
A system for camera-based complex map image retrieval using a multi-layer approach
Q.B. Dang, M.M. Luqman, M. Coustaty, N. Nayef, C.D. Tran, J.M. Ogier
Seam Carving for Text Line Extraction on Grayscale Historical Manuscripts
Nikolaos Arvanitopoulos, Sabine Sόsstrunk
Binarization-free text line extraction for Historical Manuscripts. This is an accepted paper at the Digital Humanities conference in 2014. An extended version has been submitted to another conference. In the above work we present an approach to extract text lines from a historical manuscript without any need for prior binarization. Our approach is based on seam carving, which has been used in Computer Vision for image resizing. We use seam carving to compute seams that pass through parchment background between two consecutive text lines. We are able to successfully separate these lines without cutting through letter components. Our algorithm is robust to diverse manuscript pages.
Feature Selection for Historical Document Layout Analysis
Hao Wei, Kai Chen, Anguelos Nicolaou, Angelika Garz, Marcus Liwicki, Rolf Ingold
In this poster we propose a novel hybrid feature selection method for historical document layout analysis. An adapted greedy forward selection and a genetic selection are used in a cascade. We apply the proposed method to the task of historical document layout analysis on three handwritten datasets. Naive Bayes classifier is used to segment each page into four areas: periphery, background, text block, and decoration. Compared with several conventional feature selection methods, the proposed method is competitive with respect to the number of selected features and the resultant error rates.
Page Segmentation for Historical Document Images
Kai Chen, Hao Wei, Marcus Liwicki, Jean Hennebert, Rolf Ingold
In this poster, we present DIVADIA, a novel Document Image Analysis (DIA) framework which is part of an ongoing research project at DIVA (Document, Image and Voice Analysis) research group. It aims at the development of a tool for semi-automatic analysis in particular layout analysis of historical documents. DIVADIA assists users in labeling parts of documents, such as text, images, and initials by learning from their input, i.e., based on a few document images manually annotated by a user, the system learns a model of the document(s) which empowers it to predict the labeling of an unseen page. It suggests a solution to the user, who validates it. Validation in this context refers to accepting or modifying the predicted solution. The validated data is then used to improve the prediction model in order to generate better solutions for further pages.
Automatic Comics Analysis, Annotation, and Indexing Using Graph-based Approach
Thanh-Nam Le, Jean-Marc Ogier, Jean-Christophe Burie, Muhammad Muzzamil Luqman
My current approach to comics indexing/ retrieval is to use graph/hypergraph to represent characters/objects, then use graph/hypergraph mining techniques to spot the frequent pattern. Each panel or page is represented by a hypergraph, then from graph repository, the most frequent pattern is listed with the hypothesis that the frequent characters should be represented by corresponding graph pattern. Hope that I can prepare enough experiment data until then.
Automatic Modeling and Recognition of Heterogeneous Logical Structures from Digitized Business Documents
Louisa KESSI, Frank LEBOURGEOIS, Christophe GARCIA
My thesis is about Automatic Modeling and Recognition of Heterogeneous Logical Structures from Digitized Business Documents. I have started by establishing the complete state-of the-art of the domain (732 references and 250 pages) which will be subject to a book in English that will be submitted before the end of the month, ;a summary of 27 pages was also submitted in an international journal [IJDAR]. My second work is about Exact Image Registration for forms analysis, it will be submitted to ICIP and IEEE Transactions on Image Processing and finally I am working to devellop to develop an automatic modeling and pattern recognition that will adapt with all types of documents without an explicit model. I chose to entirely investing in the numerical approaches and more specifically the probabilistic methods to describe the structures. I have so, two difficult problems to overcome: pattern recognition without explicit model by numerical methods and the measurement of scaling. It is a completely exploratory search and whose success would have many impacts at Research as well as the automation of processes treatment of business documents.