Interview of Vincent Falconieri, trainee at Computer Incident Response Center Luxembourg (CIRCL) and co-creator of the Total Recall program.
_How did you come to be interested in information security?
Vincent Falconieri: “At the National Institute of Applied Sciences of Lyon (INSA) there is a computer security organisation called INSECURITY, I joined it from my arrival in the school.”
_Did you already know this area?
“Yes, but I had a pretty utopian vision, which has since changed, of a subject requiring to be an expert” in everything “. It was this initial vision that led me to join the IT security club as well as the variety of topics covered in this sector. Security was for me the “last rampart” to achieve in IT. My vision has since changed, understanding that it is important to have a broad general knowledge and be good everywhere, but without being an expert.”
_Why did you join CIRCL?
“Within INSECURITY, a teacher is in charge of overseeing the activities of the club, François Lesueur. He also plays a mentor role for students, especially to advise them in their career choices. Since he was already in contact with CIRCL, he offered me to apply there.”
_How was your integration?
“In a rather unexpected way! Knowing that I was directly offered to work on a MISP module to test my skills before joining a larger project. This first period lasted between two weeks and one month. It was a fairly easy project, well framed and a very good start. I was able to find out who were the referents in each area, how the tools worked and what were the underlying technologies.”
_Where did you get the idea of Total Recall?
“On the one hand, I had previous experience before the Total Recall project. And on the other hand, the team working on the development of the MISP project at CIRCL had encountered a problem regarding the correlation of images. This requires a much more complex technique than when it comes to textual elements. What comes closest to the tool needed is Google’s reverse image search. There was therefore no solution that respects the open source and free concept because of technical or legal constraint. It was first considered to use the technique of “fuzzy hash”. However, I attended a conference of Pierre Letessier at INSA who worked at the French National Audiovisual Institute (INA). Their team had encountered the same problem, for example to know the advertising time in football games. They had therefore chosen to rely on an older but effective technique, the SIFT points. So, I volunteered to work on the Total Recall project. The first step was to contact Pierre Letessier and asked him blankly: “I would like to recreate a tool close to yours, and free, do you have any advice? He almost instantly said yes!”
_How does it work?
“There are three elements: the State of the Art, Carl Hauser and Douglas Quaid. The State of the Art brings together theories which seem to work. Carl Hauser gives the methods used by Douglas Quaid to work. And the latter says if two images are identical.
From the theory, the State of the Art, we thus go to a sandbox phase, Carl Hauser, including the first implementations of algorithms; then if the method works, we transfer it to Douglas Quaid, which will eventually be used in MISP. We decided to keep the human included in the decision process in order to avoid the creation of a too important “gray zone” where the algorithms would generate many false positives, … The coordination between the analyst and the algorithm (a kind of augmented intelligence) is necessary since, in many cases of machine learning, it is possible to obtain a result without knowing the path followed, which can lead to aberrant and yet invisible results.”
General view of the articulation of the different projects
Does the open source aspect solve this issue?
“There are several levels of opacity: for a public system, which gives results, we can have a first level of “legal” opacity hiding the techniques used to obtain the results. The system is a black box because you do not have the right to open it.
A second layer could be an opacity of knowledge: we have the right to open the box, but we cannot read what is inside. In this case, as we are the ones who design the product, we do not have this problem.
A third layer corresponds to the technical or structural opacity. If the parameters or the functioning of the algorithm is opaque, even if the code is available, then we must believe or not believe the result but we cannot discuss it. We can open the black box, read what is inside, understand it, but not know how each decision was made. The algorithm itself is opaque during its execution.
Conversely, we wanted Douglas Quaid to make the whole decision-making process available. Analyst’s feedback can also be integrated into the algorithm. However, this feature is not yet present.”
Is it possible to automate the improvement of the algorithm thanks to the MISP community?
“Indeed, because if I do not have the skills or the time to improve the algorithm by myself members of the community will be able to improve and make it evolve in this or that way.”
Why did you use this cinematographic reference?
“It must be seen as a joke explaining the way in which we have advanced on this project. At first, we started with a library that was named Carl Hauser which was followed by a second: Douglas Quaid. CIRCL also wanted to avoid the problems of acronyms encountered with MISP and AIL, by giving more generic names to projects.”
What is your feedback on your experience?
“One of the main benefits of my experience at CIRCL has been to see how a CERT works. I had already seen information security professionals working in companies. However, I had not yet seen such a synergy between individuals around their missions. Every day, we discuss sensitive topics, we deal with information that is equally important, which implies a heavy workload and mutual trust.”
Have you found your way in the field of information security?
“It turns out that although I spent six months working in a CSIRT, I did not really do cyber security per se. My mission was to create software that aims to enhance security, which is an important nuance. It proves to me that you can do things that make as much sense without technically doing cyber security or incident response. Tool development, while not at the heart of the business, can improve the efficiency of analysts in their missions. This experience has allowed me to gain confidence in my software engineering capabilities while expanding my understanding of the field.”
Do you wish to pursue your career in this area?
“As I said, what motivated me when I joined the INSA security club was the image of these experts, who excelled in all areas. I have actually learned that this is a wrong vision, it must especially have a capacity for adaptation and learning to deal with various issues. Thanks to CIRCL, I take into account a new element which is the purpose of my work, its impact, its meaning. Much more than the technical challenge or the knowledge, it is the meaning and the utility for the community that matters to me. For six months I never asked myself why I was going to work.”
=> Présentation of Total Recall by Alexandre Dulaunoy : https://2019.pass-the-salt.org/files/slides/RUMP-ALEXANDRE-rumps-total-recall.pdf
This presentation of the project was made during the Pass the Salt conference
=> SOA/State of the art : https://github.com/CIRCL/carl-hauser/blob/master/SOTA/SOTA.pdf
It includes information on the different algorithms, tests of their implementations, limits found, advantages, disadvantages, evolutions, ideas for improvements, etc.
=> VisJS-Classificator : https://github.com/Vincent-CIRCL/visjs_classificator
=> Carl-Hauser : https://github.com/CIRCL/carl-hauser
=> Douglas-Quaid : https://github.com/CIRCL/douglas-quaid
=> Douglas-Quaid documentation: https://github.com/CIRCL/douglas-quaid/blob/master/docs/code_doc/core_doc.pdf
Extending precisely and in depth to the implementation details of Douglas-Quaid
=> Douglas-Quaid results: https://github.com/Vincent-CIRCL/douglas-quaid-results
Samples of the results obtained with Douglas Quaid and Carl Hauser. Also contains diagrams, diagrams, explanations, etc.
=> Open-Data webpage of CIRCL : https://circl.lu/opendata/
=> Dataset “circl-ail-dataset-01” : https://www.circl.lu/opendata/circl-ail-dataset-01/
=> Dataset “circl-phishing-dataset-01” : https://circl.lu/opendata/datasets/circl-phishing-dataset-01/
=> PDF export module for MISP: https://github.com/MISP/misp-modules/blob/master/misp_modules/modules/export_mod/pdfexport.py and https://github.com/MISP/PyMISP/blob/master/pymisp/tools/reportlab_generator.py
=> Taxonomy MISP-darkweb : https://github.com/MISP/misp-taxonomies/blob/master/dark-web/machinetag.json
To extract the tags from a machinetag.json and make it a “readable” list, this script is available: https://gist.github.com/Vincent-CIRCL/fc80ccbe7ec408b54666fcc1ad6352e1
=> Dataset paper : https://arxiv.org/pdf/1908.02449.pdf
=> VisJS-Classificator paper: https://arxiv.org/pdf/1908.02941.pdf
=> Carl-Hauser paper: https://arxiv.org/abs/1908.03449
=> Douglas-Quaid paper: https://arxiv.org/abs/1908.04014