"This site requires JavaScript to work correctly"

Robert Aufschläger, M.Sc.

Research Project jbDATA

Academic Staff


Doctorate Student Lecture: „KI Projekt“ (Bachelor Künstliche Intelligenz) / “AI Project” (Bachelor Artificial Intelligence) (SoSe)


Sortierung:
Journal article
  • Robert Aufschläger
  • Jakob Folz
  • E. März
  • J. Guggumos
  • Michael Heigl
  • B. Buchner
  • Martin Schramm

Anonymization Procedures for Tabular Data: An Explanatory Technical and Legal Synthesis.

In: Information vol. 14 pg. 487

  • (2023)

DOI: 10.3390/info14090487

In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset.
Lecture
  • Robert Aufschläger

Intelligente Methoden zur Anonymisierung personenbezogener Daten und deren Bewertung.

In: Konferenz des berufsbegleitenden Masterstudiengangs Cybersecurity: Ausgewählte Themen der Security Forschung

Technische Hochschule Deggendorf (THD)/Technologie Campus Vilshofen Online

  • 19.01.2023 (2023)
Lecture
  • Jakob Folz
  • Robert Aufschläger
  • Michael Heigl

PRIVATE OPEN DATA?! - EAsyAnon TRUSTSERVICE. Posterpräsentation.

In: Nationale Konferenz IT-Sicherheitsforschung 2023 - Die digital vernetzte Gesellschaft stärken

Bundesministerium für Bildung und Forschung Berlin

  • 13.-15.03.2023 (2023)
UnpublishedWork
  • Jakob Folz
  • Robert Aufschläger
  • Manjitha Vidanalage
  • E. März
  • J. Guggumos
  • Md Moin Uddin
  • Sebastian Wilhelm

Software Requirements Specification: EAsyAnon Audit System.

Zenodo Deggendorf Institute of Technology via Zenodo

  • 2024 (2024)

DOI: 10.5281/zenodo.13734418

UnpublishedWork
  • Jakob Folz
  • Robert Aufschläger
  • Manjitha Vidanalage
  • E. März
  • J. Guggumos
  • Md Moin Uddin
  • Sebastian Wilhelm

Software Requirements Specification: EAsyAnon Recommender System.

Zenodo Deggendorf Institute of Technology via Zenodo

  • 2024 (2024)

DOI: 10.5281/zenodo.13318624

Contribution
  • Robert Aufschläger
  • Sebastian Wilhelm
  • Michael Heigl
  • Martin Schramm

ClustEm4Ano: Clustering Text Embeddings of Nominal Textual Attributes for Microdata Anonymization.

In: Database Engineered Applications. (Lecture Notes in Computer Science) pg. 122-137

  • Eds.:
  • P. Revesz
  • R. Chbeir
  • Y. Manolopoulos
  • S. Ilarri
  • C. Leung
  • J. Bernardino

Springer Nature Switzerland Cham

  • (2025)

DOI: 10.1007/978-3-031-83472-1_9

This work introduces ClustEm4Ano, an anonymization pipeline that can be used for generalization and suppression-based anonymization of nominal textual tabular data. It automatically generates value generalization hierarchies (VGHs) that, in turn, can be used to generalize attributes in quasi-identifiers. The pipeline leverages embeddings to generate semantically close value generalizations through iterative clustering. We applied KMeans and Hierarchical Agglomerative Clustering on 13 different predefined text embeddings (both open and closed-source (via APIs)). Our approach is experimentally tested on a well-known benchmark dataset for anonymization: The UCI Machine Learning Repository’s Adult dataset. ClustEm4Ano supports anonymization procedures by offering more possibilities compared to using arbitrarily chosen VGHs. Experiments demonstrate that these VGHs can outperform manually constructed ones in terms of downstream efficacy (especially for small k-anonymity) and therefore can foster the quality of anonymized datasets. Our implementation is made public.
Journal article
  • Jakob Folz
  • Manjitha Vidanalage
  • Robert Aufschläger
  • Amar Almaini
  • Michael Heigl
  • D. Fiala
  • Martin Schramm

Scoring System for Quantifying the Privacy in Re-Identification of Tabular Datasets.

In: IEEE Access pg. 1-17

  • (2025)

DOI: 10.1109/ACCESS.2025.3563309

This study introduces the System for Calculating Open Data Re-identification Risk (SCORR), a framework for quantifying privacy risks in tabular datasets. SCORR extends conventional metrics such as k-anonymity, l-diversity, and t-closeness with novel extended metrics, including uniqueness-only risk, uniformity-only risk, correlation-only risk, and Markov Model risk, to identify a broader range of re-identification threats. It efficiently analyses event-level and person-level datasets with categorical and numerical attributes. Experimental evaluations were conducted on three publicly available datasets: OULAD, HID, and Adult, across multiple anonymisation levels. The results indicate that higher anonymisation levels do not always proportionally enhance privacy. While stronger generalisation improves k-anonymity, l-diversity and t-closeness vary significantly across datasets. Uniqueness-only and uniformity-only risk decreased with anonymisation, whereas correlation-only risk remained high. Meanwhile, Markov Model risk consistently remained high, indicating little to no improvement regardless of the anonymisation level. Scalability analysis revealed that conventional metrics and Uniqueness-only risk incurred minimal computational overhead, remaining independent of dataset size. However, correlation-only and uniformity-only risk required significantly more processing time, while Markov Model risk incurred the highest computational cost. Despite this, all metrics remained unaffected by the number of quasi-identifiers, except t-closeness, which scaled linearly beyond a certain threshold. A usability evaluation comparing SCORR with the freely available ARX Tool showed that SCORR reduced the number of user interactions required for risk analysis by 59.38%, offering a more streamlined and efficient process. These results confirm SCORR’s effectiveness in helping data custodians balance privacy protection and data utility, advancing privacy risk assessment beyond existing tools.
ArchiveMaterial
  • Robert Aufschläger
  • Y. Schoeb
  • A. Nowzad
  • Michael Heigl
  • Fabian Bally
  • A. Schramm

Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence.

arXiv arXiv

  • 2025 (2025)

DOI: 10.48550/arXiv.2507.01504

The collection and release of street-level recordings as Open Data play a vital role in advancing autonomous driving systems and AI research. However, these datasets pose significant privacy risks, particularly for pedestrians, due to the presence of Personally Identifiable Information (PII) that extends beyond biometric traits such as faces. In this paper, we present cRID, a novel cross-modal framework combining Large Vision-Language Models, Graph Attention Networks, and representation learning to detect textual describable clues of PII and enhance person re-identification (Re-ID). Our approach focuses on identifying and leveraging interpretable features, enabling the detection of semantically meaningful PII beyond low-level appearance cues. We conduct a systematic evaluation of PII presence in person image datasets. Our experiments show improved performance in practical cross-dataset Re-ID scenarios, notably from Market-1501 to CUHK03-np (detected), highlighting the framework's practical utility. Code is available at https://github.com/RAufschlaeger/cRID. accepted for publication at the 2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC 2025), taking place during November 18-21, 2025 in Gold Coast, Australia
Lecture
  • Robert Aufschläger

On the Threat of Pedestrian Re-Identification Using Vision-Language Models.

In: Konferenz des berufsbegleitenden Masterstudiengangs Cybersecurity: Ausgewählte Themen der Security Forschung

Technische Hochschule Deggendorf (THD)/Technologie Campus Vilshofen Online

  • 15.05.2025 (2025)