Texera: An Open-Source System for Cloud-Based Collaborative Data Science and AI/ML Using Workflows

Abstract: Since 2016 our team at UC Irvine has been developing the Texera open-source system (texera.io), with the goal of enabling a cloud-based platform to support collaborative data science, AI, and ML. It allows users with various backgrounds, including those with limited coding skills, domain scientists, and ML experts, to conduct AI-centric data science with a collaboration experience similar to Google Docs. After eight years of development, the system has a rich set of features, such as shared editing, shared execution, version control, commenting, debugging, user-defined functions in multiple languages (e.g., Python, R, Java), and support of state-of-the-art AI/ML techniques. Its backend parallel engine enables scalable computation on large data sets using computing clusters. It allows bioinformaticians to elastically request resources from AWS to form a cluster to run computationally intensive jobs. It also supports community-based sharing of resources including datasets and workflows. In this talk, we will give an overview of the system, and focus on research challenges encountered in the development and our solutions. We will show use cases in both education and scientific communities.

Bio: Prof. Chen Li is a professor in the Department of Computer Science at UC Irvine. He received his Ph.D. degree in Computer Science from Stanford University, and his M.S. and B.S. in Computer Science from Tsinghua University, China. His research interests are in the fields of data management, data science, AI/ML, databases, data-intensive computing, search, and visualization. He was a co-founder and CTO of a startup to commercialize his research. He was a recipient of an NSF CAREER award, an ACM Distinguished Member, and an IEEE fellow.