publications | Tianze Luo

arXiv

From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding

Heqing Zou*, Tianze Luo*, Guiyang Xie, Victor (Xiao Jie)Zhang, Fengmao Lv, Guangcong Wang, Juanyang Chen, Zhuochen Wang, Hansheng Zhang, Huaijian Zhang

arXiv preprint , 2024

Abs PDF

The integration of Large Language Models (LLMs) with visual encoders has recently shown promising performance in visual understanding tasks, leveraging their inherent capability to comprehend and generate human-like text for visual reasoning. Given the diverse nature of visual data, MultiModal Large Language Models (MM-LLMs) exhibit variations in model designing and training for understanding images, short videos, and long videos. Our paper focuses on the substantial differences and unique challenges posed by long video understanding compared to static image and short video understanding. Unlike static images, short videos encompass sequential frames with both spatial and within-event temporal information, while long videos consist of multiple events with between-event and long-term temporal information. In this survey, we aim to trace and summarize the advancements of MM-LLMs from image understanding to long video understanding. We review the differences among various visual understanding tasks and highlight the challenges in long video understanding, including more fine-grained spatiotemporal details, dynamic events, and long-term dependencies. We then provide a detailed summary of the advancements in MM-LLMs in terms of model design and training methodologies for understanding long videos. Finally, we compare the performance of existing MM-LLMs on video understanding benchmarks of various lengths and discuss potential future directions for MM-LLMs in long video understanding.

CIKM

Progressive multimodal pivot learning: towards semantic discordance understanding as humans

Junlin Fang, Wenya Wang, Tianze Luo, Yanyong Huang, Fengmao Lv

In ACM International Conference on Information and Knowledge Management (CIKM-24) , 2024

Abs PDF

ACM TOIS

Collaborative Sequential Recommendations via Multi-view GNN-Transformers

Tianze Luo, Yong Liu, and Sinno Jialin Pan

Accepted by ACM Transactions on Information Systems (ACM TOIS) , 2024

WebConf

Graph Principal Flow Network for Conditional Graph Generation

Zhanfeng Mo*, Tianze Luo*, and Sinno Jialin Pan

The Web Conference (WWW) , 2024

Abs PDF

Conditional graph generation is crucial and challenging since the conditional distribution of graph topology and feature is complicated and the semantic feature is hard to capture by the generative model. In this work, we propose a novel graph conditional generative model, termed Graph Principal Flow Network (GPrinFlowNet), which enables us to progressively generate graphs from low- to high-frequency components. Our GPrinFlowNet effectively captures the subtle yet essential semantic features of graph topology, resulting in high-quality generated graph data given a required condition. Extensive experiments and ablation studies showcase that our model achieves state-of-the-art performance compared to existing conditional graph generation models.

ICLR

Learning Adaptive Multiresolution Transforms via Meta-Framelet-based Graph Convolutional Network

Tianze Luo, Zhanfeng Mo, and Sinno Jialin Pan

International Conference on Learning Representations (ICLR) , 2024

Abs PDF

Graph Neural Networks are popular tools in graph representation learning that capture the graph structural properties. However, most GNNs employ single-resolution graph feature extraction, thereby failing to capture micro-level local patterns (high resolution) and macro-level graph cluster and community patterns (low resolution) simultaneously. Many multiresolution methods have been developed to capture graph patterns at multiple scales, but most of them depend on predefined and handcrafted multiresolution transforms that remain fixed throughout the training process once formulated. Due to variations in graph instances and distributions, fixed handcrafted transforms can not effectively tailor multiresolution representations to each graph instance. To acquire multiresolution representation suited to different graph instances and distributions, we introduce the Multiresolution Meta-Framelet-based Graph Convolutional Network (MM-FGCN), facilitating comprehensive and adaptive multiresolution analysis across diverse graphs. Extensive experiments demonstrate that our MM-FGCN achieves SOTA performance on various graph learning tasks.

ACL

Data Augmentation using LLMs: Data Perspectives, Learning Paradigms and Challenges

Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty

In Findings of the 62th Annual Meeting of the Association for Computational Linguistics (ACL-24) , 2024

Abs PDF

In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of Large Language Models (LLMs) on DA, particularly addressing the unique challenges and opportunities they present in the context of natural language processing (NLP) and beyond. From a data perspective and a learning perspective, we examine various strategies that utilize Large Language Models for data augmentation, including a novel exploration of learning paradigms where LLM-generated data is used for further training. Additionally, this paper delineates the primary challenges faced in this domain, ranging from controllable data augmentation to multi modal data augmentation. This survey highlights the paradigm shift introduced by LLMs in DA, aims to serve as a foundational guide for researchers and practitioners in this field.

IEEE TPAMI

Fast Graph Generation via Spectral Diffusion

Tianze Luo, Zhanfeng Mo, and Sinno Jialin Pan

IEEE Transactions on Pattern Analysis and Machine Intelligence , 2023

(Impact factor: 23.6)

Abs PDF

Generating graph-structured data is a challenging problem, which requires learning the underlying distribution of graphs. Various models such as graph VAE, graph GANs, and graph diffusion models have been proposed to generate meaningful and reliable graphs, among which the diffusion models have achieved state-of-the-art performance. In this paper, we argue that running full-rank diffusion SDEs on the whole graph adjacency matrix space hinders diffusion models from learning graph topology generation, and hence significantly deteriorates the quality of generated graph data. To address this limitation, we propose an efficient yet effective Graph Spectral Diffusion Model (GSDM), which is driven by low-rank diffusion SDEs on the graph spectrum space. Our spectral diffusion model is further proven to enjoy a substantially stronger theoretical guarantee than standard diffusion models. Extensive experiments across various datasets demonstrate that our proposed GSDM turns out to be the SOTA model, by exhibiting both significantly higher generation quality and much less computational consumption than the baselines.

ICML workshop

Conditional Graph Generation with Graph Principal Flow Network

Tianze Luo, Zhanfeng Mo, and Sinno Jialin Pan

ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling , 2023

Abs PDF

Conditional graph generation is crucial and challenging since the conditional distribution of graph topology and feature is complicated and the semantic feature is hard to be captured by the generative model. In this work, we propose a novel graph conditional generative model, termed Graph Principal Flow Network (GPrinFlowNet), which enables us to progressively generate graphs from low- to high-frequency components. Our GPrinFlowNet effectively captures the subtle yet essential semantic features of graph topology, resulting in high-quality generated graph data.

arXiv

Panda LLM: Training Data and Evaluation for Open-Sourced Chinese Instruction-Following Large Language Models

Fangkai Jiao*, Bosheng Ding* and Tianze Luo* and Zhanfeng Mo*

arXiv preprint , 2023

Abs PDF

This project focuses on enhancing open-source large language models through instruction-tuning and providing comprehensive evaluations of their performance. We explore how various training data factors, such as quantity, quality, and linguistic distribution, influence the performance of instruction-tuned models trained on publicly accessible high-quality instruction datasets for both English and Chinese languages. Our goal is to supplement evaluation with quantitative analyses, providing valuable insights for the continued advancement of open-source chat models. Our model, data, and code are publicly available for others to use and build upon.

NAACL

Domain confused contrastive learning for unsupervised domain adaptation

Quanyu Long, Tianze Luo, Wenya Wang, and Sinno Jialin Pan

2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) , 2022

Abs PDF

In this work, we study Unsupervised Domain Adaptation (UDA) in a challenging self-supervised approach. One of the difficulties is how to learn task discrimination in the absence of target labels. Unlike previous literature which directly aligns cross-domain distributions or leverages reverse gradient, we propose Domain Confused Contrastive Learning (DCCL), which can bridge the source and target domains via domain puzzles, and retain discriminative representations after adaptation. Technically, DCCL searches for a most domain-challenging direction and exquisitely crafts domain confused augmentations as positive pairs, then it contrastively encourages the model to pull representations towards the other domain, thus learning more stable and effective domain invariances. We also investigate whether contrastive learning necessarily helps with UDA when performing other data augmentations. Extensive experiments demonstrate that DCCL significantly outperforms baselines, further ablation study and analysis also show the effectiveness and availability of DCCL.

arXiv

Domain-Augmented Domain Adaptation

Qiuhao Zeng*, Tianze Luo* and Boyu Wang

arXiv preprint , 2022

Abs PDF

Unsupervised domain adaptation (UDA) enables knowledge transfer from the labelled source domain to the unlabeled target domain by reducing the cross-domain discrepancy. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. To overcome this challenge, in this paper, we propose the domain-augmented domain adaptation (DADA) to generate pseudo domains that have smaller discrepancies with the target domain, to enhance the knowledge transfer process by minimizing the discrepancy between the target domain and pseudo domains. Furthermore, we design a pseudo-labeling method for DADA by projecting representations from the target domain to multiple pseudo domains and taking the averaged predictions on the classification from the pseudo domains as the pseudo labels. We conduct extensive experiments with the state-of-the-art domain adaptation methods on four benchmark datasets: Office Home, Office-31, VisDA2017, and Digital datasets. The results demonstrate the superiority of our model.

IEEE

Real-Time Hierarchical Map Segmentation for Coordinating Multirobot Exploration

Tianze Luo, Zichen Chen, Budhitama Subagdja, and Ah-Hwee Tan

IEEE Access, 11, pp.15680-15692. , 2022

Abs PDF

Coordinating a team of autonomous agents to explore an environment can be done by partitioning the map of the environment into segments and allocating the segments as targets for the individual agents to visit. However, given an unknown environment, map segmentation must be conducted in a continuous and incremental manner. In this paper, we propose a novel real-time hierarchical map segmentation method for supporting multi-agent exploration of indoor environments, wherein clusters of regions of segments are formed hierarchically from randomly sampled points in the environment. Each cluster is then assigned with a cost-utility value based on the minimum cost possible for the agents to visit. In this way, map segmentation and target allocation can be performed continually in real-time to efficiently explore the environment. To evaluate our proposed model, we conduct extensive experiments on map segmentation and multi-agent exploration. The results show that the proposed method can produce more accurate and meaningful segments leading to a higher level of efficiency in exploring the environment. Furthermore, the robustness tests by adding noises to the environments were conducted to simulate the performance of our model in the real-world environment. The results demonstrate the robustness of our model in map segmentation and multi-agent environment exploration.

ACM SIGKDD

Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions

Tianbo Li*, Tianze Luo*, Yiping Ke, and Sinno Jialin Pan

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021

Abs PDF

Attributed event sequences are commonly encountered in practice. A recent research line focuses on incorporating neural networks with the statistical model -- marked point processes, which is the conventional tool for dealing with attributed event sequences. Neural marked point processes possess good interpretability of probabilistic models as well as the representational power of neural networks. However, we find that performance of neural marked point processes is not always increasing as the network architecture becomes more complicated and larger, which is what we call the performance saturation phenomenon. This is due to the fact that the generalization error of neural marked point processes is determined by both the network representational ability and the model specification at the same time. Therefore we can draw two major conclusions: first, simple network structures can perform no worse than complicated ones for some cases; second, using a proper probabilistic assumption is as equally, if not more, important as improving the complexity of the network. Based on this observation, we propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers, thus it can be easily accelerated by the parallel mechanism. We directly consider the distribution of interarrival times instead of imposing a specific assumption on the conditional intensity function, and propose to use a likelihood ratio loss with a moment matching mechanism for optimization and model selection. Experimental results show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.

arXiv

Re-ranking with constraints on diversified exposures for homepage recommender system

Qi Hao, Tianze Luo, and Guangda Huzhang

arXiv preprint, 2021

Abs PDF

The homepage recommendation on most E-commerce applications places items in a hierarchical manner, where different channels display items in different styles. Existing algorithms usually optimize the performance of a single channel. So designing the model to achieve the optimal recommendation list which maximize the Click-Through Rate (CTR) of whole homepage is a challenge problem. Other than the accuracy objective, display diversity on the homepage is also important since homogeneous display usually hurts user experience. In this paper, we propose a two-stage architecture of the homepage recommendation system. In the first stage, we develop efficient algorithms for recommending items to proper channels while maintaining diversity. The two methods can be combined: user-channel-item predictive model with diversity constraint. In the second stage, we provide an ordered list of items in each channel. Existing re-ranking models are hard to describe the mutual influence between items in both intra-channel and inter-channel. Therefore, we propose a Deep & Hierarchical Attention Network Re-ranking (DHANR) model for homepage recommender systems. The Hierarchical Attention Network consists of an item encoder, an item-level attention layer, a channel encoder and a channel-level attention layer. Our method achieves a significant improvement in terms of precision, intra-list average distance(ILAD) and channel-wise Precision@k in offline experiments and in terms of CTR and ILAD in our online systems.

IEEE

Multi-agent collaborative exploration through graph-based deep reinforcement learning

Tianze Luo, Budhitama Subagdja, Di Wang, and Ah-Hwee Tan

Proceedings of the 2019 IEEE International Conference on Agents, 2019

Won the Best Paper Award.

Abs PDF

Autonomous exploration by a single or multiple agents in an unknown environment leads to various applications in automation, such as cleaning, search and rescue, etc. Traditional methods normally take frontier locations and segmented regions of the environment into account to efficiently allocate target locations to different agents to visit. They may employ ad hoc solutions to allocate the task to the agents, but the allocation may not be efficient. In the literature, few studies focused on enhancing the traditional methods by applying machine learning models for agent performance improvement. In this paper, we propose a graph-based deep reinforcement learning approach to effectively perform multi-agent exploration. Specifically, we first design a hierarchical map segmentation method to transform the environment exploration problem to the graph domain, wherein each node of the graph corresponds to a segmented region in the environment and each edge indicates the distance between two nodes. Subsequently, based on the graph structure, we apply a Graph Convolutional Network (GCN) to allocate the exploration target to each agent. Our experiments show that our proposed model significantly improves the efficiency of map explorations across varying sizes of collaborative agents over the traditional methods.