Bachelor Project in Information Systems: Data Science for Web-Applications
Type and Deduction (in German only):
- Pflichtmodul im Bachelor Informatik: Modul InfB-Proj
- Pflichtmodul im Bachelor Software-System-Entwicklung: Modul InfB-Proj
- Pflichtmodul im Bachelor Wirtschaftsinformatik: Modul InfB-WI-Proj
- Pflichtmodul im Bachelor Mensch-Computer-Interaktion: Modul InfB-Proj
Requirements
The number of participants is limited to 24.
Course Language
German
Deduction Scope
9 LP
6 SWS
Lecturers
Course Dates
Wed: 9:00 am - 2:00 pm, WiWi 1077
Registration
In order to participate in this course it is obligatory to register in STiNE during the STiNE registration periods.
Course Evaluation
type of evaluation: presentations, written project thesis and practical work
Data Science for Web Applications
The aim of the project is for students to work in groups of approx. 3 to 4 people on an exciting and current problem from the field of Machine Learning (ML) and Artificial Intelligence (AI) in a practice-oriented manner.
At the beginning of the project, the students work together to familiarize themselves with the topic of data science, in particular with various methods of machine learning, and familiarize themselves with Python (other programming languages are possible). Individual groups are formed and each group chooses a problem and works on it. Various investigations are carried out according to the CRISP-DM process model, which ultimately lead to a solution model that is to be evaluated and can be implemented in a web application. The students work together with their supervisors to identify topics and focus areas.
The main focus of the project is that students work on practice-oriented tasks. The data stems from various sources on the internet (e.g. Kaggle, social media etc.) or directly from a cooperation with companies that are currently dealing with such problems.
Possible problems are, for example:
- Possible uses of Large Language Models (LLMs) in e-commerce, recommender systems, web scraping or other applications
- Classification of customers to create customer profiles and predict their behavior
- Named entity recognition in document management to identify specific content (e.g. addresses, customer numbers, invoice data, etc.)
- Natural Language Processing (NLP) for the automated analysis of texts, such as tweets, customer reviews, etc. (sentiment analysis)
- Topic modeling for clustering texts, e.g. tweets
- Computer vision
- Fraud detection (e.g. for transactions, applications, self-service checkouts, etc.)
- Estimation of real-estate prices or leasing repurchase values.
- Your own suggestions are welcome
Depending on the subject area, different machine learning methods and technologies are used to solve classification or regression problems, individually or in ensembles, e.g:
- Regression: artificial neural networks and other regression methods
- Classification: decision trees/random forests, logistic regression or support vector machines
- Ensemble learning methods
- Deep learning: Generative Pre-trained Transformer (GPT), LLMs, Convolutional Neural Network (CNN)
- Topic modeling methods, Top2Vec
Depending on the task, typical problems may also arise that need to be considered in the solution, such as
- Particularly small or large volumes of data (attributes and/or samples)
- Data pre-processing: missing values, contradictory values...
- Target function, in particular asymmetric evaluation of different results
- Asymmetric distributions in the data
- Evaluation of results in unsupervised learning, in particular evaluation of results when using GPTs and LLMs.
At the end of the course, students will prepare a written paper in which they describe the chosen problem and its solution approach. This paper should include both theoretical and practice-oriented perspectives.
Davenport, T. H., P. Barth, R. Bean (2013). How „big data“ is different. MIT Sloan Management Review 54(1).
Gantz, J., D. Reinsel (2011). Extracting value from chaos. http://germany.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf. last visit: 04.01.2016.
Jacobs, A. (2009). The pathologies of big data. Communications of the ACM 52(8), 36-44.
additional Literature:
Brown, B., M. Chui, J. Manyika (2011). Are you ready for the era of big data. McKinsey Quarterly 4, 24-35.
Krishnan, K. (2013) Data Warehousing in the Age of Big Data. Morgan Kaufman, Waltham, MA, USA.
Owen, S., R. Anil, T. Dunning, E. Friedman (2012). Mahout in Action. Manning, Shelter Island, NY, USA.
White, T. (2012) Hadoop. The Definitive Guide. 3. Ed., O’Reilly, Sebastpol, CA, USA.
Chen, H., R.H.L. Chiang, V.C. Storey (2012) Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4), 1-24.