PhD Project in Information and Communication Technology
|Thesis title:||Generalized Association Rule Mining – Improving Measures|
|PhD Programme:||Information and Communication Technology|
|Supervisor:||Dirk Draheim, Professor|
|Co-Supervisor:||Sadok Ben Yahia, Professor|
|Offered by:||School of Information Technologies, Department of Software Science|
|Industrial Partner:||Elme Messer Gaas|
PhD Scholarship and Early Stage Researcher Position
The successful candidate will receive a tax-free PhD scholarship of 660 EUR and will be employed as early stage researcher (for 4 years) at the Department of Software Science with (depending on negotiations) a salary of 400-1000 EUR brutto salary per month.
In today’s practice, major chunks of data analytics are still done rather interactively (OLAP, MOLAP with tools such as Cognos, SAP-BW; and related techniques such as conjoint analysis); i.e., they lack the exploitation of emerging machine learning (AI) approaches. A middle position with respect to this gap between and interactive and automatic data analytics is taken by association rule mining, which explains the current huge success of tools such as Rapidminer (association rule mining is a highly active data science and AI research field1). Unfortunately, current association rule mining suffers two categories of shortcomings (both theoretically and practically, i.e., in the tool landscape). First, association rule mining works only for discrete-valued columns, i.e., numerical-valued columns cannot be handled, which is a painful shortcoming in practical scenarios. Second, and equally relevant, there is still lack of adequate measures to cut down the size of the reports stemming form association rule mining. Exhaustive reports that encompass drill-ins of all possible combinations in a multi-factor analysis are simply too large to be practical useful for the data analysts. Even long-year established theoretical measures [TKS04] have not been taken up by practitioners and do not show in existing data analytics tools.
The objective is two overcome the two shortcomings of association rule mining described above. First, association rule mining will be generalized to numerical data and their aggregates (conditional expected values); second, adequate measures to practically cut down the complexity of multi-factor analysis reports will be developed. For the endeavors, advanced concepts in partial conditionalization are essential. The necessary know-how for these endeavors exists in the Taltech Information Systems group [Dra17; Dra18]. Deep expertise in association rule-mining exists in the Taltech Data Science Group, compare with [BSYN14; SLY14].
- How to integrate conditional expected values into existing association rule mining tools (so that the resulting tool is practically useful)?
- Why (in what respect) do visualization approaches (such as Grand Tour [HK02]) fail in dealing with the large-scale reporting problem?
- Which measures are relevant and effective to practically cut down the complexity of large-scale multi-factor reports? And: how to integrate such measures into existing association rule mining tools (so that the resulting tool is practically useful)?
The successful candidate will have:
- strong attitude towards research
- absolute team spirit
- strong experience in at least one of the following areas: data science, AI, mathematical statistics
- strong experience in either Python (Anaconda), R, Matlab, SPSS etc.
- excellent English language proficiency, both spoken and written (or at least the absolute will to gain it in a short time)
- (ideally) first experience in publishing papers for peer-reviewed conference proceedings or journals
- experience in one of the following areas is a plus: business process management, e-governance, e-government
- Inception phase: Systematic literature review (Kitchenham) with respect to the above research questions
- Contribution phase: Hybrid design science and technology acceptance model
Target scientific channels for the research outcomes are:
- Data Knowledge Engineering, Elsevier
- Information Systems, Elsevier
- Information Sciences, Elsevier
- CAiSE (Conference on Advanced Information Systems Engineering), Springer
Exploitation of the Results by the Industrial Partner
The industrial partner has the urgent need to analyze its extended CRM data (customer relationship management) including sales figures and surrounding market figures, to pro-actively manage his product portfolio. This is a classical problem that data analysists deal with. Still, the existing techniques and tool landscape are not satisfying as outlined above. The industrial partner is keen on using the techniques and tools expected from this PhD endeavor to gain deeper insights in (and significantly better predictions from) his extended CRM data.
About Elme Messer Gaas
Elme Messer Gaas AS is the leading gas company in the Baltic region. Elme Messer Gaas was founded in 1999 as a joint venture of BLRT Grupp AS (Estonia) and Messer Group (Germany), and the company operates in the field of industrial gas in the territory of the Baltic States as well as Russia and Ukraine. The company Elme Messer Gaas primarily produces and sells industrial, medical, food and specialty gases, as well as various gas equipment for the following industries: metallurgy, chemical, pharmaceutical, electronics, automotive, food and various scientific and medical institutions. Major products include oxygen, nitrogen, argon, carbon dioxide, acetylene, hydrogen, propane and helium. Today, the company employs more than 200 people. Using their scientific and technological capabilities, the company’s employees have been able to realise more than 220 investment projects that are still operational to this day. The company’s staff, using more than a century of Messer Group’s experiences and basic know-how, is continuously developing innovative solutions for the application of gas in metalworking, oil- and chemical industry, food industry and medicine.
The BLRT Grupp is one of the biggest industrial holding in the region of the Baltic Sea. Messer Group is one of the leading industrial gas companies. Messer Group operates in more than 30 countries in Europe and Asia, as well as in Peru, and comprises more than 60 operating companies. The parent company that manages the group as a whole is based in Frankfurt am Main, whilst the technical functions – logistics, development, production and application technology – are managed in Krefeld.
1 with over 1000 documents as result for SCOPUS title/abstract/keywords search with the search string „association rule“ OR „association rules“ in year 2018 alone!