Bachelor Project in Information Systems: Data Science for Web-Applications
Type and Deduction (in German only):
- Pflichtmodul im Bachelor Informatik: Modul InfB-Proj
- Pflichtmodul im Bachelor Software-System-Entwicklung: Modul InfB-Proj
- Pflichtmodul im Bachelor Wirtschaftsinformatik: Modul InfB-WI-Proj
- Pflichtmodul im Bachelor Mensch-Computer-Interaktion: Modul InfB-Proj
Requirements
The number of participants is limited to 24.
Course Language
German
Deduction Scope
9 LP
6 SWS
Lecturers
Course Dates
Tue: 9:00 am - 2:00 pm, WiWi 2043/47
Registration
In order to participate in this course it is obligatory to register in STiNE during the STiNE registration periods.
Course Evaluation
type of evaluation: presentations, written project thesis and practical work
Data Science for Web Applications
In this project the students gather in groups of about 3 - 4 people, practically address a problem in the field of data science and then prepare a project report, which presents the solution for the chosen problem from both the theoretical and the practical points of view.
At the beginning of the project, all students work together to familiarize themselves with the subject area of data science, in particular, data mining methods, and get to work with programming languages, such as Python and R. Subsequently, individual groups are formed, each of which selects a problem and tackles it in the further course. During this process, various investigations are carried out according to the KDD process (Knowledge Discovery in Databases) which ultimately leads to a solution model.
The project is characterized by the fact that concrete practical tasks are processed. The data comes from various sources on the Internet or directly from cooperations with companies that are currently dealing with such problems.
Possible problems are for example:
- Large Language Models (LLM) in E-Commerce, Recommender-systems, for web scraping or further applications
- Classification of customers in order to draft customer profiles and forecast their behavior
- Recommender systems for recommending music, movies or other products (such as Spotify, Netflix or Amazon)
- Text mining within document management for identifying specific content (e.g., addresses, customer numbers, billing information, etc.)
- Natural Language Processing (NLP) for an automated analysis of texts, such as tweets, customer reviews or others (Sentiment Mining)
- Topic Modeling for clustering texts, e.g. tweets
- Analysis of image/video/voice data
- Fraud detection (e.g. for transactions, applications, self-service checkouts, etc.)
- Estimation of real-estate prices or even lease buyback values
- Time series analysis/forecasting, e.g. for financial/sales time series
- Forecast of advertising campaign success
- Further examples: see kaggle.com or kdnuggets.com
- Your own topic ideas are welcome
Depending on the topic, different machine learning methods and technologies are used to solve classification or regression problems, individually or in ensembles, such as:
- Artificial Neural Networks and other regression methods.
- Decision Trees/Random Forests and Logistic Regression
- Support Vector Machines
- Boosting methods
- Topic modeling methods (LDA or BERT)
Depending on the problem, typical issues arise that must be taken into account in the solution process, such as:
- particularly small or large data volume (attributes and/or samples)
- Data preprocessing: missing values, contradictory values...
- objective function, esp. asymmetric evaluation of different results
- asymmetric distributions in the data
Davenport, T. H., P. Barth, R. Bean (2013). How „big data“ is different. MIT Sloan Management Review 54(1).
Gantz, J., D. Reinsel (2011). Extracting value from chaos. http://germany.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf. last visit: 04.01.2016.
Jacobs, A. (2009). The pathologies of big data. Communications of the ACM 52(8), 36-44.
additional Literature:
Brown, B., M. Chui, J. Manyika (2011). Are you ready for the era of big data. McKinsey Quarterly 4, 24-35.
Krishnan, K. (2013) Data Warehousing in the Age of Big Data. Morgan Kaufman, Waltham, MA, USA.
Owen, S., R. Anil, T. Dunning, E. Friedman (2012). Mahout in Action. Manning, Shelter Island, NY, USA.
White, T. (2012) Hadoop. The Definitive Guide. 3. Ed., O’Reilly, Sebastpol, CA, USA.
Chen, H., R.H.L. Chiang, V.C. Storey (2012) Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4), 1-24.