Objective: A challenging R&D job in big data.
French private university. Master of Science Degree in Computer Science and Engineering. Specialization in Scientific Computing and Image Processing.
EPITA, 94270 Le Kremlin-Bicêtre, France.
Student Researcher: a dozen of students (in the first decile of their class) are involved in research projects in parallel with their education, supervised by researchers. This implies:
I worked on Decision Diagrams Distribution during 6 months and on the design of a Generic Decision Diagrams Library.
EPITA.(French Mathématiques Supérieures, Mathématiques Spéciales)
(French Baccalaureat S), major in Mathematics with honors.
Data Publica is a company focusing in developing datasets. It creates datasets by identifying sources, extracting data from these sources, transforming unstructured data in structured data, and finally delivering data to the final customer.
Plizy is an application created to discover, enjoy and share videos. To have an efficient discovery system, a good understanding of users and videos is important. Videos metadata are extracted through dedicated scrapers. The challenging system is user understanding, done using data gathered from users watching videos on the platform; but also using data from other sources such as Facebook. Data acquired from Facebook enables one to get insights of users interests and relationships to other users, useful to help a user discover new videos based on what they like. Data from our platform is best to understand what kind of videos a user really watch, and thus to recommend new videos based on what was previously watched.
Twenga is a shopping comparison site. In order to compare similar offers, it is essential that one efficiently extracts product reference as well as the product price. One also needs to extract the picture and its category to present the results to the final user. Using structural and semantical analysis, we were able to present a range of presumptions to an operator, who in turn would choose the correct ones within a matter of seconds.
To be able to process documents correctly, an OCR must use language-specific files on a well-oriented image (0 / 90 / 180 / 270). The internship objective was the development of a tool to detect orientation, script and language on an image. This tool had the constraint to be extensible, i.e. one can add any script or language to the training data and the accuracy must remain near 100%. The work was accomplished using some of the high level components of Tesseract, an Open-Source OCR, developed by Google, and some clustering and energy-minimization techniques.
Bouygues Telecom was in need of a tool to represent its large Information System. This Information System is stored in a CMDB, being modeled by a M1-meta model.The internship objective was the development of a platform to represent the Information System with the constraints of the genericity towards models and meta-models, and extensibility with the use of the Eclipse Rich Client Platform and its plug-in system.
Decision Diagrams are now widely used in model checking as extremely compact representations of state spaces. Many Decision Diagram categories have been developed over the past twenty years based on the same principles. Each one targets a specific domain with its own characteristics. Moreover, each one provides its own definition. It prevents sharing concepts and techniques between these structures. This paper aims to propose a basis for a common Framework for Decision Diagrams. It should help users of this technology to define new Decision Diagram categories thanks to a simple specification mechanism called Controller. This enables the building of efficient Decision Diagrams dedicated to a given problem.http://www.computer.org/portal/web/csdl/doi/10.1109/ACSD.2010.17
Decision diagrams are structures used to represent large data sets. Common data of the set elements are shared. This enables a big memory compacity. Various types of Decision Diagrams exist, with each one its implementation In this report, the Decision Diagram Library is presented. This library generalizes the concept of Decision Diagram to implement every possible types of Decision Diagram. Because algorithms are hard to define on Decision Diagrams, this report also present the work to add high-level dynamic structures on top of Decision Diagrams, and algorithms frequently used on these structures.http://lrde.epita.fr/~charron/index.php?id=seminar08
Decision diagrams are structures used in several domains where memory usage is critical. Data Decision Diagrams (DDD) are a kind of decision diagrams used in model-checking for example. However, they bring a solution to the memory problem that is not always sufficient. To overcome memory limit, a solution is to distribute memory. Some implementations exist for BDD (Binary Decision Diagrams), but are neither really efficient nor maintained. In this report, new distribution algorithms for decision diagrams are presented, based on DDD properties. An implementation in Erlang of a distributed DDD package is explained; then some results about distribution are given and discussed, based on this implementation.http://lrde.epita.fr/~charron/index.php?id=seminar06