Bachelor Project in Information Systems: Data Science for Web-Applications

Type and Deduction (in German only):

Pflichtmodul im Bachelor Informatik: Modul InfB-Proj
Pflichtmodul im Bachelor Software-System-Entwicklung: Modul InfB-Proj
Pflichtmodul im Bachelor Wirtschaftsinformatik: Modul InfB-WI-Proj
Pflichtmodul im Bachelor Mensch-Computer-Interaktion: Modul InfB-Proj

Requirements

The number of participants is limited to 24.

Course Language

German

Deduction Scope
9 LP
6 SWS

Lecturers

Dr. Kai Brüssau, Dr. Robert Stahlbock

Course Dates

Wed: 9:00 am - 2:00 pm, WiWi 1077

Registration

In order to participate in this course it is obligatory to register in STiNE during the STiNE registration periods.

Course Evaluation

type of evaluation: presentations, written project thesis and practical work

Data Science for Web Applications

The aim of the project is for students to work in groups of approx. 3 to 4 people on an exciting and current problem from the field of Machine Learning (ML) and Artificial Intelligence (AI) in a practice-oriented manner.

At the beginning of the project, the students work together to familiarize themselves with the topic of data science, in particular with various methods of machine learning, and familiarize themselves with Python (other programming languages are possible). Individual groups are formed and each group chooses a problem and works on it. Various investigations are carried out according to the CRISP-DM process model, which ultimately lead to a solution model that is to be evaluated and can be implemented in a web application. The students work together with their supervisors to identify topics and focus areas.

The main focus of the project is that students work on practice-oriented tasks. The data stems from various sources on the internet (e.g. Kaggle, social media etc.) or directly from a cooperation with companies that are currently dealing with such problems.

Possible problems are, for example:

Possible uses of Large Language Models (LLMs) in e-commerce, recommender systems, web scraping or other applications
Classification of customers to create customer profiles and predict their behavior
Named entity recognition in document management to identify specific content (e.g. addresses, customer numbers, invoice data, etc.)
Natural Language Processing (NLP) for the automated analysis of texts, such as tweets, customer reviews, etc. (sentiment analysis)
Topic modeling for clustering texts, e.g. tweets
Computer vision
Fraud detection (e.g. for transactions, applications, self-service checkouts, etc.)
Estimation of real-estate prices or leasing repurchase values.
Your own suggestions are welcome

Depending on the subject area, different machine learning methods and technologies are used to solve classification or regression problems, individually or in ensembles, e.g:

Regression: artificial neural networks and other regression methods
Classification: decision trees/random forests, logistic regression or support vector machines
Ensemble learning methods
Deep learning: Generative Pre-trained Transformer (GPT), LLMs, Convolutional Neural Network (CNN)
Topic modeling methods, Top2Vec

Depending on the task, typical problems may also arise that need to be considered in the solution, such as

Particularly small or large volumes of data (attributes and/or samples)
Data pre-processing: missing values, contradictory values...
Target function, in particular asymmetric evaluation of different results
Asymmetric distributions in the data
Evaluation of results in unsupervised learning, in particular evaluation of results when using GPTs and LLMs.

At the end of the course, students will prepare a written paper in which they describe the chosen problem and its solution approach. This paper should include both theoretical and practice-oriented perspectives.

Davenport, T. H., P. Barth, R. Bean (2013). How „big data“ is different. MIT Sloan Management Review 54(1).

Gantz, J., D. Reinsel (2011). Extracting value from chaos. http://germany.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf. last visit: 04.01.2016.

Jacobs, A. (2009). The pathologies of big data. Communications of the ACM 52(8), 36-44.

additional Literature:

Brown, B., M. Chui, J. Manyika (2011). Are you ready for the era of big data. McKinsey Quarterly 4, 24-35.

Krishnan, K. (2013) Data Warehousing in the Age of Big Data. Morgan Kaufman, Waltham, MA, USA.

Owen, S., R. Anil, T. Dunning, E. Friedman (2012). Mahout in Action. Manning, Shelter Island, NY, USA.

White, T. (2012) Hadoop. The Definitive Guide. 3. Ed., O’Reilly, Sebastpol, CA, USA.

Chen, H., R.H.L. Chiang, V.C. Storey (2012) Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4), 1-24.