BIFOLD DAY

BIFOLD DAY – Spring Reception 2025

Open space for scientific exchange and networking

The BIFOLD Day 2025, April 30th, consists of a scientific part and the Spring Reception.

The goal of the scientific part is to promote knowledge transfer and exchange among BIFOLD scientists & partners and to share best practice examples. To this end, several parallel tutorials will take place. The exact titles and tutorial contents can be found below. Registration is mandatory.

The Spring Reception brings together BIFOLD researchers, friends and external guests, including scientists, policymakers, and industry partners. Celebrate another year of top AI research in Berlin: You can expect an exciting keynote by Prof. Dr. Matthias Bethge, Co-director at Tübingen AI Center, a review of BIFOLD activities in 2024, a look ahead to exciting new research, a poster session and plenty of time to make new contacts in the Berlin AI scene or maintain old ones over a drink and a snack.

Please register your participation for the scientific part and/or the Spring Reception until April 6th, 2025.

Registration

Register here
for the scientific part & Spring Reception

Register here
for the Spring Reception only

Time	Program
13:30	Coffee \| Registration
14:00-15:00	Tutorials Block I
	• Stefan Halfpap: Data Visualization
	• Haralampos Gavriilidis: Efficient Data loading in Python: Trade-offs and Best Practices
	• Lukas Pirch: Machine Learning on structured Data with PyTorch-Geometric
	• Stephanie Brandl: Introduction to Natural Language Processing
15:10-16:10	Tutorials Block II
	• Stefan Halfpap: Data Visualization
	• Haralampos Gavriilidis: Efficient Data Loading in Python: Trade-offs and Best Practices
	• Lukas Pirch: Machine Learning on structured Data with PyTorch-Geometric
	• Stephanie Brandl: Introduction to Natural Language Processing
16:00	Registration Spring Reception
16:30	Spring Reception: Welcome \| Prof. Dr. Volker Markl Co-Director BIFOLD
16:35	Welcome address \| Berlin Senator Dr. Ina Czyborra Senator for Higher Education and Research, Health and Long-Term Care
16:40	Welcome address \| Dr. Tina Klüwer (BMBF) Head of Department for Research on Technological Sovereignty and Innovation
16:45	Keynote \| Prof. Dr. Matthias Bethge Director Tübingen_AI: “Cosmopolitan AI”
17:15	Presentation „Best of BIFOLD“ Fatemeh Ahmadi: ”MaTElDa – Data Cleaning" Stefan Grafberger: "Making ML Pipelines More Fair, Robust, and Reliable”
17:35	Review and outlook \| Prof. Dr. Volker Markl Co-Director BIFOLD
17:45	Reception and Get together Poster session

Tutorials

Data Visualization Management in Research

Coach: Dr. Stefan Halfpap

Title: Data Visualization Management in Research

Abstract: Humans can more easily perceive differences in line lengths, shapes, and colors (hues) than process sequences of text or numbers. This ability makes data visualization a powerful tool for exploring, understanding, and communicating data. Creating effective data visualizations is a crucial research skill. Visualizations support research at various stages, from early prototypes over systematic experiments to final publications in papers/theses, presentations, or applications/demonstrations. This process often involves presenting the same data at different levels of detail or adapting styles for different audiences. Fortunately, many free tools are available for different types of visualizations, allowing us to choose what best fits our needs. This tutorial provides an overview of common visualization types and tools. We will discuss strategies for managing data visualizations in research, including automation, customization, and best practices. Through hands-on examples, we will explore the plotting library Matplotlib for static graphs and Chart.js for interactive web-based visualizations. Participants will be encouraged to share their experiences and insights throughout the session.

Efficient Data Loading in Python

Coach: Haralampos Gavriilidis

Title: Efficient Data Loading in Python: Trade offs and Best Practices

Abstract: Loading data is a critical step in any data-driven workflow, forming the foundation for tasks like machine learning (ML), exploratory data analysis (EDA), and Extract, Transform, Load (ETL) pipelines. Yet, inefficient methods can waste time and resources. In this hands-on tutorial, we will explore how different file formats—such as CSV, JSON, and binary formats like Parquet—affect performance, including loading speed and memory usage. We will also compare file-based storage with local databases like SQLite and PostgreSQL, discussing their trade-offs for structured data retrieval. The session will include an overview of key Python libraries for data loading and how to use them effectively. Along the way, we’ll identify key bottlenecks, from slow queries to file parsing overhead, and examine optimization techniques like chunking and indexing. Through interactive Jupyter Notebook exercises, participants will explore the trade-offs between file formats and databases and learn how to optimize their data-driven workflows.

Introduction to NLP

Coach: Dr. Stephanie Brandl

Title: Introduction to Natural Language Processing

Abstract: The Transformer architecture [Vaswani et al, 2017] has revolutionized the field of NLP as demonstrated by the wide public reception of large language models (LLM) like ChatGPT. LLMs have proven remarkable zero-shot capabilities, solving tasks without further training or fine-tuning. Together with ready-to-use interfaces they are made accessible to (almost) everyone with a smartphone. In this tutorial, we dive into the Python version of those interfaces. We will experiment with the “transformers” library and run small-scale experiments with (large) language models in Python. We will discuss basic tools used in NLP research so participants can learn how to run their own experiments. No prior knowledge of NLP is required, basic Python skills will be helpful.

Machine Learning on Structured Data

Coach: Lukas Pirch

Title: Machine Learning on Structured Data with PyTorch Geometrics

Abstract: Many real-world machine learning problems involve structured data that require specialized techniques beyond standard approaches. In this hands-on workshop, we explore code clone detection as a case study to understand how to extract meaningful features from structured data and apply graph-based learning. Step by step, we will cover preprocessing, feature extraction, and model training using PyTorch-Geometric, a powerful framework for graph-based machine learning. Bring your laptop and have Docker installed if you’d like to follow along with the practical exercises.

Venue

BIFOLD Day 2025 will take place at the Forum Digitale Technologien, Salzufer 15/16, 10587 Berlin.

Größere Karte anzeigen

Contact

Further information:
Berlin Institute for the Foundations of Learning and Data (BIFOLD)

Katharina Jung
Head of Communications
katharina.jung@tu-berlin.de

Register here for the scientific part & Spring Reception