Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

User Cold-Start Recommendation via Inductive Heterogeneous Graph Neural Network

User Cold-Start Recommendation via Inductive Heterogeneous Graph Neural Network User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network DESHENG CAI, Hefei University of Technology, HFUT, China SHENGSHENG QIAN, National Lab of Pattern Recognition, Institute of Automation, CAS, China QUAN FANG, National Lab of Pattern Recognition, Institute of Automation, CAS, China JUN HU, National Lab of Pattern Recognition, Institute of Automation, CAS, China CHANGSHENG XU, National Lab of Pattern Recognition, Institute of Automation, CAS, China In recent years, user cold-start recommendations have attracted a lot of attention from industry and academia. In user cold-start recommendation systems, the user attribute information is often used by existing approaches to learn user preferences due to the unavailability of the user action data. However, most existing recommendation methods often ignore the sparsity of user attributes in cold-start recommendation systems. To tackle this limitation, this paper proposes a novel Inductive Heterogeneous Graph Neural Network (IHGNN) model, which utilizes the relational information in user cold-start recommendation systems to alleviate the sparsity of user attributes. Our model converts new users, items, associated multimodal information into a Modality-aware Heterogeneous Graph (M-HG), which preserves the rich and heterogeneous relationship information among them. Speciically, to utilize rich and heterogeneous relational information in a M-HG for enriching the sparse attribute information of new users, we design a strategy, which basically is based on random walk operations, to collect associated neighbors of new users by multiple times sampling operation. Then, a well-designed multiple hierarchical attention aggregation model consisting of the intra- and inter-type attention aggregating module is proposed, focusing on useful connected neighbors and neglecting meaningless and noisy connected neighbors to generate high-quality representations for user cold-start recommendations. Experimental results on three real data sets demonstrate that IHGNN outperforms the state-of-the-art baselines. CCS Concepts: · Information systems → Recommender systems. Additional Key Words and Phrases: Multi-Modal, Heterogeneous Graph, User Cold-start Recommendation. 1 INTRODUCTION The personalized recommendation is an important and challenging task that has been receiving substantial attention from the academic community. For example, traditional collaborative iltering based recommendation algorithms, such as Matrix factorization (MF) techniques 17], are[developed to learn expressive representations (lookup tables) for users and items, and use past user-item ratings to predict future ratings for the personalized recommendation. Although much research has been made to develop reliable and eicient algorithms for the personalized recommendation task, these existing work still endure from the cold-start problem, i.e., the scenario Authors’ addresses: Desheng Cai, Hefei University of Technology, HFUT, Hefei, China, caidsml@gmail.com; Shengsheng Qian, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, shengsheng.qian@nlpr.ia.ac.cn; Quan Fang, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, qfang@nlpr.ia.ac.cn,; Jun Hu, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, hujunxianligong@gmail.com; Changsheng Xu, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, csxu@nlpr.ia.ac.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1046-8188/2022/9-ART $15.00 https://doi.org/10.1145/3560487 ACM Trans. Inf. Syst. 2 • Desheng Cai, et al. (a) (b) Fig. 1. (a) Real-world heterogeneous relationships among users and items; (b) A related heterogeneous graph with new users. needs to deal with new users or items while their vector embeddings have not been learned because of lacking preference information. In cold-start recommendation, there exist two information spaces, an attribute information space and a behavior information space. The attribute space describes user’s or item’s preferences (e.g., user’s personal information, item’s content information), and the behavior space is used to represent user interactions (e.g., purchase behavior and past interactions). Most of the existing cold start recommendations assume that there is no behavior interactions but abundant attribute information for new users or new items. Existing methods for the cold-start recommendation task can be roughly categorized as three research lines: (1) Content-based recommendation methods6[, 26, 43] utilize simple feature information of items (e.g. categories, textual content, related images, reviews information ) and users (e.g. locations, devices, apps, gender) to learn their respective representations for cold-start recommendation. (2) Some hybrid metho 8,ds 18[, 34] have been proposed to extend MF [17] (traditional and probabilistic) so that user- and item-related information can be learned in their respective representations. (3) Deep learning based hybrid approaches 1, 3, 10 [ , 20, 34, 41] aim to employ deep neural networks to obtain feature representations from user- and item-related attribute information and further incorporate these attributes into a collaborative iltering model for cold-start recommendation. In this paper, we are committed to solving user cold-start recommendation problems. Although these existing models have shown efectiveness in user cold-start recommendation task, most of them heavily rely on user attribute information and usually ignore existing rich, heterogeneous relationship information among new users (new users and their corresponding attributes), and existing historical information (existing users, items and their related attribute information), Interaction such as relationships (e.g., � −� , � −� , 1 2 1 3 � −� , � −� ), Co-occurrence relationships(e.g., � −� , � −� , � −� , � −� , � −� ) and Inclusion relationships 2 3 2 1 4 1 3 2 4 2 3 4 3 3 (e.g., � − � , � − � , � − � , � − � , � − � ), as shown in Figure 1(a). 5 8 6 9 6 8 7 9 7 8 With the rapid development of deep learning in various ields2r,e13 cently , 22, 28 [ , 30, 37, 47], Graph Neural Networks (GNNs) [9, 16, 38] also have pulled in expanding consideration with their fabulous ability in modeling information comprising of components and their dependency, which have gotten extraordinary improvements for recommendation systems25[, 36, 45]. It persuades us to leverage the predominant capacity of GNN models for exploiting existing such useful relationship information between users and existing historical information and further obtaining the superior recommendation model. However, most existing GNN-based recommendations aim to explicitly encode the crucial collaborative signal of user-item interactions to enhance the user/item representations through the propagation process based on user-item bipartite graphs. For example, STAR-GCN [48] leverages a stacked and reconstructed GCN encoder-decoder on the user-item bipartite interaction graph ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 3 to obtain representations of users and items for cold-stat recommendation. These methods just regard side information or attributes of users and items as initial features, and also ignore rich, heterogeneous relationship information among users, items and their corresponding attribute. In fact, such rich relationship information relects hidden inter-dependencies among data and can be used to get more relevant information for users. Therefore, we have to face theChallenge 1: How to efectively model rich, heterogeneous and hidden relationship information and then further enrich user attributes based on these relationship information. Another drawback of most existing user cold-start recommendation models is that they infer the representation of new users based on their related content information but usually do not take into account the heterogeneity and diferent impacts of these information. For instance, in Figure 1(b), each new user is always related to some heterogeneous property information (e.g. locations, mobile phone types, using apps). Among the heterogeneous attribute information, related information using apps should have more inluence on the embedding of each new user since using apps features is more representative than locations and mobile phone types for new users. Actually, there exits some heterogeneous GNN-based recommendation methods which leverage meta-paths to consider heterogeneity of graphs. For example, HeRe 31c][is a heterogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combine these representations with extended matrix factorization models for improving personalized recommendation performances. Meta path based models heavily depends on the selection of meta paths, and can not take advantage of the inluence of diferent types of nodes on the current node when aggregating content of neighboring nodes. Recently, some heterogeneous GNN-based recommendation methods, which rely on heterogeneous aggregations to consider heterogeneous neighboring nodes, have emerged. For instances, HetGNN 46],[which mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module, is proposed for learning heterogeneous node representations by incorporating their heterogeneous content information. However, HetGNN just utilizes relative simple sampling operations to obtain neighboring nodes on user-item bipartite graphs, and ignores rich relationships among multi-modal attribute information and impacts of these relationships on representation learning. Therefore, we need to addr Challenge ess the 2: How to consider the heterogeneity of nodes and diferent impacts of multi-modal attributes and rich relationships among these multi-modal attributes for generating high-quality representations. Moreover, conventional GNN-based recommendation methods straightforwardly produce the relational matrix heavily relying on existing inactive relationship knowledge, which may not precisely coordinate our objective and cause undesirable model performance. For instance, the graph consisting of users and their related attributes is predeined and prebuilt based on their co-occurrence relationships, which is not suitable for new user represen- tation learning in the user cold-start recommendation systems because of the new users’ unseen co-occurrence relationships. Thus, the Challenge 3 should be concerned: How to consider unseen non-static relationships associated with new users to obtain comprehensive and high-quality representations for them. To handle these challenges, we present a novel user cold-start recommendation modelÐnamely, Inductive Heterogeneous Graph Neural Network (IHGNN). For the Challenge 1, we irst create a heterogeneous graph (M- HG) to model existing hidden heterogeneous relationships associated with users and items. Then, we architecture a sampling strategy, which is basically based on random walk operations, to sample heterogeneous neighbors for each node, which can be regarded as nodes’ enriched information. For each node, multiple samples are taken to generate multiple sets of sampled neighbors if a single sample may miss important noChallenge des. For the 2, we design a new kind of hierarchical attention aggregation network that comprises two distinctive levels of module, intra-type self-attention aggregating module, and inter-type attention aggregating module, to aggregate nodal features of the sampled heterogeneous neighboring nodes. For each set of scanned neighbors, we irst group those scanned neighboring nodes in terms of their corresponding node types. Then, we leverage an intra-type self-attention aggregating module for each generated neighboring group, which is designed to obtain meaningful ACM Trans. Inf. Syst. 4 • Desheng Cai, et al. attention weights among homogeneous nodes to aggregate homogeneous node feature information. Based on these generated representations of all diferent neighboring groups, we further use an inter-type attention aggregating module that is designed to obtain signiicant attention weights for all neighboring groups to generate a useful vector representation for each group of scanned neighbors. Finally, we fuse all group representations to produce the inal representation for each node. For the Challenge 3, we infer new users’ representations based upon the inductive capacity of our proposed model. For all of unseen new users, our model irstly build connections, which are based on their sparse attributes, with constructed heterogeneous graph. Then, we take multiple sampling strategy for each new user to generate its corresponding related multiple sets of sampled neighbors. Finally, we generate new users’ representations by the well-trained hierarchical attention aggregation network. In addition, our proposed model forecasts new users’ inclinations by calculating matching scores between users and items with their obtained embeddings. In summary, our contributions are listed as follows: • We propose a novel Inductive Heterogeneous Graph Neural Network (IHGNN) for user cold-start rec- ommendation. IHGNN uses a heterogeneous graph which can take rich and heterogeneous relationship information among users, items and their corresponding attributes into consideration. Further, a sam- pling strategy, which is basically based on random walk operations, is proposed to sample correlated heterogeneous neighbors to produce better representations of new users. • IHGNN utilizes a multiple hierarchical attention mechanism on those sampled related neighbors and consider the impacts of homogeneous and heterogeneous multimodal attributes and various neighboring node groups in order to obtain nodes’ embeddings, including new users. • We show experimental results on three real data sets (the Kwai dataset, the Tiktok dataset, and the movieLens dataset). Compared with the state-of-the-art models, our proposed model performs better in user cold-start recommendation task. 2 RELATED WORK This work is associated with cold-start recommendation tasks, graph neural networks, and heterogeneous graph neural networks, which are briely reviewed. 2.1 Cold-start Recommendation Tasks Whereas collaborative iltering 7, 17[, 43, 44] has accomplished impressive victory in proposal recommendation frameworks, the trouble regularly emerges in managing new users or new items with their sparse historic interaction information, commonly known as the cold-start recommendation problems. In such cases, the only way that a personalized recommendation can be generated is to incorporate additional attributes information. Existing cold-start methods can be coarsely classiied into a Content-based recommendation metho 6, 8, 26ds], [ MF-based hybrid methods8[, 18, 34], and Deep leaning based hybird approaches 1, 3[, 10, 20, 34, 41]. For example, SVDFeature [6], a feature-based collaborative iltering model, is proposed to handle matrix factorization with pre- trained features. The capacity of leveraging pre-trained features permits us to construct factorization frameworks to incorporate side information such as neighbor relationship, temporal dynamics, and hierarchical information so that it can be used to improve the cold-start models’ performances efectively. A hybrid34 mo ] del is designe [ d to gather the leading factors of autoencoder frameworks in arrange to standardize the utilize of autoencoders for CF-based recommendation systems. It presents a appropriate training process of autoencoder models for deicient data and further coordinates both ratings and side data into a single autoencoder framework to handle the cold-start problem. Neural Matrix Factorization (NCF) 10] is [ a well-known collaborative iltering model that can capture the critical points in collaborative iltering information(interactions between users and items) efectively, which adopts matrix factorization and further applies an inner product operation to the learned ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 5 representations of users and items. The existing neural matrix factorization methods can easily utilize content information of users and items as pre-trained vectors to address the cold-start problem. Existing algorithms show promising performance in the area of cold start recommendations. However, most of these methods heavily depend on new user attribute information and usually ignore the sparsity problem of new user attributes. 2.2 Graph Neural Networks The main factor of graph neural networks (GNNs) is to employ neural networks for aggregating content informa- tion for nodes from their local neighbors. For example, graph convolutional 16 netw ], GraphSA ork [ GE [9] and graph attention network38 [ ] leverage convolutional operation, LSTM aggregator and self-attention module to gather neighboring information respectively, which are applied in many 12, 13,ields 27, 29].[Inspired by the advantage of GNN techniques in simulating the information difusion process, recent eforts have studied the design of GNN-based recommendation methods which mostly apply the GNN on the original user-item bipartite graph directly to learn more expressive representations of users and items 45]. For[ example, Multi-GCCF 35[] explicitly incorporates multiple graphs in the embedding learning process. Multi-Graph Convolution Collabora- tive Filtering (Multi-GCCF) not only expressively models the high-order information via a bipartite user-item interaction graph, but also integrates the short-range information by building user-user and item-item graphs by adding edges between two-hop neighbors on the original graph to obtain the user-user and item-item graph In this way, the proximity information among users and items can be explicitly incorporated into user-item interactions. Multi-Component graph convolutional Collaborative Filtering 40] is (MCCF) designe [ d to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Speciically, MCCF uses a decomposer module to decompos the edges in user-item graph to identify the latent components that may cause the purchasing relationship, and further recombines these latent components automatically to obtain uniied embeddings for prediction. DGCF 14] try[ to model the importance of diverse user-item relationships in collaborative iltering for obtaining better interpretability, and considers user-item relationships at the iner granularity of user intents for generating disentangled representations. And Dual channel hypergraph collab- orative iltering (DHCF) 39][leverages the divide-and-conquer strategy with Collaborative iltering (CF) to integrate users and items together for recommendation while still maintaining their speciic properties and further employees the hypergraph structure for modeling users and items with explicit hybrid high-order corre- lations. Hierarchical bipartite Graph Neural Network (HiGNN) 21] utilizes [ stacking multiple GNN modules and a deterministic clustering algorithm alternately to efectively and eiciently addresses the problem of utilizing high-order connections and non-linear interactions through hierarchical representation learning on bi-partite graphs for predicting user preferences on a larger scale. Meanwhile, there are some sampling strategies proposed to make GNN eicient and scalable to large-scale graph-based recommendation tasks. For instance, Pinsage 45] [ incorporates graph structure information and node content information (e.g. visual content, textual content) and uses a novel training method that depends on harder training samples to obtain useful representations of users and items for higher-quality recommendations at Pinterest. Moreover, GNNs can be applied to train and obtain more expressive representations for users and items in cold-start recommendations25[, 36, 42, 45, 48]. For example, RMGCNN [25] combines a graph convolutional network for multiple graphs which can capture stationary patterns for users and items, and a RNN module which leverage a learnable difusion process module with non-linear operation to generate the known ratings. In detail, RMGCNN can extract graph local statistical structure patterns for users and items, including cold-start users and cold-start items, in terms of their high dimensional feature spaces, and further apply these learned expressive embeddings to predict interaction ratings. GCMC 36] is [a autoencoder framework based on user-item bipartite graphs and used for the matrix completion problem. The autoencoder framework generates embeddings for users and items through message passing operation on the user-item bipartite interaction graph and these ACM Trans. Inf. Syst. 6 • Desheng Cai, et al. representations are further leveraged to reproduce the links through a bilinear operation. Ying 45] deetvelop al [ a highly-scalable GCN framework, PinSage, that combines random walk operation and multiple graph convolution modules to generate nodes’ representations. Pinsage incorporates graph structure information and node content information (e.g. visual content, textual content) and uses a novel training method that depends on harder training samples to obtain useful representations of users and items for higher-quality recommendations at Pinterest. Chen et al [5] propose a general bipartite embedding method FBNE (short for Folded Bipartite Network Embedding) for social recommendation, which explores the higher-order relationships among users and items by folding the user-item bipartite graphs, and a sequence-based self-attention module that learn node representations via node sequences sampled from graphs. FBNE aims to leverage implicit social relations in social graphs and higher- order implicit relations to enhance the user/item representations and boost the performance of current social recommendation, including cold-start recommendation. However, FBNE may ignore rich and heterogeneous relationship information among users, items and their corresponding attributes as shown in Fig.1 And the heterogeneity of nodes and diferent impacts of multi-modal attributes are not also taken into consideration for learning representations. STAR-GCN 48][ leverages a stacked and reconstructed GCN encoder-decoder on the user-item bipartite interaction graph with intermediate supervision information to achieve better prediction performance. STAR-GCN masks a portion of or the complete users and items representations and remakes these concealed representations with a block of graph encoder-decoder within the training stage, which can make learned embeddings more expressive and generalize the method to obtain useful representations of unseen nodes for the cold-start recommendation. However, the above GNN models are designed to embed homogeneous graphs or bipartite graphs and therefore cannot take beneit of the rich relationships connecting diferent types of heterogeneous data and may not be applicable to the actual cold start scenario. 2.3 Heterogeneous Graph Neural Networks Heterogeneous graphs [30] can be used to model multiple complex object types , and capture the plentiful correlations between these types efectively. Many existing heterogeneous graph based recommendation models leverage various hand-designed semantic meta-paths combined with a matrix factorization framework for recom- mendation tasks11 [ , 32, 33, 49]. For example, Shi et al.33[] design a semantic meta-path based recommendation approach, termed SemRec, to calculate the ratings between users and items for the personalized recommenda- tion lexibly. In detail, SemRec contains the weighted meta-path concept which can be used to subtly portray various meta-path semantics by distinguishing diverse properties. In addition, ground on the well pre-designed meta-paths, SemRec can calculate prioritized and personalized attention values representing user preferences on various meta-paths with incorporating heterogeneous information lexibly 31.]HeRe is acheter [ ogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combines these representations with extended matrix factorization models for improving personalized recom- mendation performances. Although the above approaches are designed to embed heterogeneous graphs, these models heavily rely on the meta-path design process and may not efectively capture high-order structural data for cold-start recommendation tasks. Recently, HetGNN 46],[ which mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module, is proposed for learning heterogeneous node representations by incorporating their heterogeneous content information. Although heterogeneous node type information in various heterogeneous graphs can be efectively used by the node type based neighboring aggregating module and the heterogeneous node type combining module, we argue that HetGNN may not efectively incorporate heterogeneous contents of nodes due to the lack of plentiful relationship information between the multimodal contents of nodes. 4] is HHF also ANa[heterogeneous representation learning based model which utilizes heterogeneous relationships to enhance representations ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 7 Table 1. The main notations of our proposed model. Notation Description ���� the item set ���� the user set ���� the attribute set of users ����� ���� the attribute set of items ����� � the tag trees � the i-th tag tree � the user-item interaction matrix � the implicit feedback value of� and useritem� �� � = (�,�,�� ,�� ) a modality-aware heterogeneous graph � � � the node set R the edge set � the inclusion relationships between tags �� �� the set of node types �� the set of edge types � the feature matrix � the feature of node� ��� the related node of node� �� the sampled neighbors set of node � �� the t-type sampled neighbors set of node � �� ���������� the t-type neighbors aggregator fucntion �,� � the attention weight of the �-type sampled neighbor set of node � E the combined embedding of the �-th sampling neighbors of no�de UE the inal representation of no�de � the predicted preference probability value of� and useritem� �� � the regularization weight � the parameters of the model � the aggregated embedding dimension � the ranking position � the top k results of users and items for micro-video recommendation tasks and is not suitable for cold-start recommendation tasks. To utilize the highly complex relationships between users (including cold-start users), items, and their corresponding associated multimodal attributes, IHGNN leverages a modality-aware heterogeneous graph to learn their expressive representations for user cold-start recommendation tasks. Furthermore, a novel hierarchical feature aggregation network, which mainly comprises intra- and inter-type feature aggregating modules, is designed to incorporate intricate graph structure information and abundant node content semantic information contained in M-HGs for obtaining nodes’ expressive representations, including cold-start nodes. 3 THE PROPOSED ALGORITHM 3.1 Problem Statement In this paper, we focus on user cold-start recommendation task, in which all items are denote ����d=as {����,����, ...,���� } and all new users are denoted as����= {����,����, ...,���� }. Each new user 1 2 |����| 1 2 |����| is related to sparse attributes denoted�as��� (e.g., locations, phone information, apps installed on their ����� ACM Trans. Inf. Syst. 8 • Desheng Cai, et al. Fig. 2. IHGNN: Inductive Heterogeneous Graph Neural Network Architecture. � � � phones): ���� = {����,����, ...,���� }. Similarly, the set of multimodal attribute in terms of each ����� 1 2 |���� | ����� item is deined�as��� (e.g., locations, some related tags, visual content, audio content, textual description of ����� � � � items):���� = {���� ,���� , ...,���� }. Since many used attributes of users, such as apps, and items ����� 1 2 |���� | ����� are related to tags, we adopt pre-designed tag trees to illustrate these associations among various tags. In this 1 2 |� | � paper, we deine pre-designed tag trees as a set � = {� ,� , ...,� } and a tag tree as �, 1 ≤ � ≤ |� |. We build a |����|×|����| user-item bipartite interaction matrix � ∈ � as , in which the entr�y is described from user’s implicit �� feedback data,� = 1 demonstrates that there exists interaction relationship between ne�wand useritem� �� and � = 0 shows that no interaction relationship is existed between ne � w and user item�. Note that only new �� users in the training set have interaction data, while new users in the testing set have no interaction data. In the training phase, we strive to acquire appropriate high-quality representations for new users and items with their attribute information and correlations between them, which can preserve their historic interaction data efectively. In the testing phase, we focus on inferring representations of new users in the test group and further computing the rating points for the preference estimate to predict these new user preferences for cold-start user recommendations. Table 1 lists the key notations of the IHGNN model. 3.2 Overall Framework In order to handle these challenges in Section 1, we propose a well-designed framework, Inductive Heterogeneous Graph Neural Network (IHGNN), for user cold-start recommendation. Our framework, as demonstrated in Figure 2, comprises four key components: • Heterogeneous Graph Construction: A modality-aware heterogeneous graph (M-HG) is built for modeling users, including new users, items, and associated attribute data based on much relationship information such as the relational data among users, items and their properties, user-item bipartite graph information, and tag trees. • Multiple Hierarchical Attention Aggregation Networks: In terms of each node in the constructed heterogeneous graph, we architecture and exert a sampling strategy, which is basically based on random ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 9 walk operation, and apply it to sample associated ixed-size heterogeneous neighboring nodes. Then, multiple samples are taken to create multiple groups of sampled neighbors to obtain more critical neighbors. To gather node feature information of all sets of sampled heterogeneous neighboring nodes for each node, we proposed a innovative hierarchical attention aggregation network for each set of sampled neighbors with three steps: (1) Grouping operation: We group these sampled neighbors in a set of sampled neighbors based on types neighbor nodes; (2) Intra-type feature aggregating module: For each node group, we leverage an intra-type self-attention aggregating module for aggregating the content features of all neighboring nodes, and generate each groups’ aggregated type-based neighbor group representation; (3) Inter-type feature aggregating module: After the generation of representations of each heterogeneous type-based neighbor group, an inter-type attention aggregating module is designed to quantify the various implications of heterogeneous groups, and to further calculate the embedding for each group of sample neighbors taking into account these efects learned by the group. We design a fusion module to merge all embeddings of sampled heterogeneous neighbor sets to get the inal representation for each node in the heterogeneous graph. • Model Optimization: To learn representations in the constructed heterogeneous graph that can preserve the implicit structure information and to generate high-quality new users’ representations, we deine the KL-divergence between the hypothetical distribution and empirical distribution as loss function optimized through backpropagation and the mini-batch Adaptive Moment Estimation (Adam) [15]. • Inductive Representation Learning for User Cold-start Recommendation: We infer new users’ representations based upon the inductive capacity of our proposed model. For all unseen new users, our model irstly builds connections with constructed heterogeneous graphs. Then, we take a multiple sampling strategy for each new user to generate its corresponding related multiple sets of sampled neighbors. Finally, we generate new users’ representations by the well-trained hierarchical attention aggregation network. After optimization, we can infer embeddings for new users in testing sets using our proposed model. An inner product operation is utilized to integrate the inferred embeddings of new users and candidate items and further predict the likelihood of preference, which can indicate the level of preference of the new users for the candidate items. 4 METHODOLOGY This section presents our Inductive Heterogeneous Graph Neural Network (IHGNN) model for user cold-start recommendation. 4.1 Heterogeneous Graph Construction To consistently and efectively consider users, items, their respective attributes, and diferent associated relation- ship information, we build a modality-aware heterogeneous graph (M-HG), denote � = d(�as,�,�� ,�� ). In � � M-HG, � represents various kinds of nodes and � represents various relationships between nodes, wher � =e ����∪����∪���� ∪���� ∪� and � = �∪� ∪� ∪� ∪� ∪� . � represents relationships/connections ����� ����� �� �� �� �� �� �� between users and their corresponding sparse attributes. � represents relationships/connections between �� micro-videos and their corresponding multi-modal attributes. � represents relationships/connections be- �� tween users and their corresponding tags. and � represents relationships/connections between micro-videos �� and their corresponding tags. � denotes incorporation connections between tags within the tag�tr . ees �� �� denotes the node type set which comprises user node type, item node type and attribute node types: ���� ���� ����� ����� ���� ��� �� = {���� ,���� ,���� ,���� ,���� ,���� }. �� is the set relationship type set which in- � � � �� �� �� �� �� cludes the relation type of set � , � , � , � and � : �� = {����,���� ,���� ,���� ,���� ,���� }. �� �� �� �� �� � ACM Trans. Inf. Syst. 10 • Desheng Cai, et al. Fig. 3. Hierarchical Atention Aggregation Networks. 4.2 Multiple Hierarchical Atention Aggregation Networks For each node �, ∀� ∈ � , we aim to enrich its node feature representation by aggregating its related node information in the constructed graph � , which can be formulated as follows: � �����(��) = � �����(����)�� (1) where the function� �����(�·) is interpreted as the embedding of the no��de�,is a random variable and represents the related node of the node�, the variable�(� = ���������(����)) can be interpreted as the importance or impact of the node���. However, there are plenty of related nodes for each node � so that it is computational expensive to gather information from all of its related nodes. To deal with this problem, we use a sampling strategy, named Random Walk Sampling Strategy��(�� ), to reduce the computational cost, so that Equation 1 can be reformulated as follow: � �����(��) = � �����(����) (2) |�� | ��� ∈�� � � where �� represents sampled neighbors of node �. Speciically, the ���� is designed in two steps: • Beginning a random walk of settled length from�no ,∀de � ∈ � . The walk travels iteratively to the current node’s neighbors or restarts from the beginning node with a certain probability �. The walk operation runs until there are a settled variety of nodes collected efectively, referr ���e�d to (�). Note that the number of nodes of all types �in ��� (�) is ixed to guarantee that each type of node is sampled�.for • Selecting several types of neighboring nodes. For�no , wde e ind top� nodes for node type�from���� (�) based on their frequencies and further denote these selected nodes as the set �-typ ofe associated neighbors for node�, denoted as �� = {���� ,���� , ...,���� }. � 1 2 |�� | ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 11 Compared with the general random walk metho�d,��� has two advantages, which is important for enriching a representation of each node, especially for cold-start nodes: (1) For each node in the constructed graph M-HG, ���� ensures that each type of node, whether it is a irst-order or high-order neighbor, can be evenly sampled by constraining the number of each type of node in the sampling result. (2) For each �, �no ��de� selects top� nodes for node type� from���� (�) according to the frequency, which can better reduce the negative impact of noise nodes in the constructed graph M-HG on the performance of our model. Furthermore, to ensure the efect of the sampling strategy ��, �� is conducted for multiple times, and the |����� | 1 2 result is denoted as����� = {�� ,�� , ...,�� }. Note that multiple sampling operation may also face � � � � time-consuming problems, so for each node, we do the sampling operation in advance during the pre-processing phase of the experiments. At the same time, we analyzed the relationship between the number of sampling and the efect of the model in Fig 8 in the experimental section to ind the number values of sampling that can be trade-of. � ×1 In a M-HG graph � , The representation of node� ∈ � is denoted as� ∈ � , where � represents dimension � � of representations. Note that we can leverage CNN-based methods 23[] for pre-training nodes if node type is image or the doc2vec model19[] to pre-train nodes if node is textual, also can initialize nodes, which is based on types of nodes. Here we adopt a content vector transformer moduleF, C , to turn embeddings of various type nodes into a uniied space. Formally, the transferred representation for� is node calculated as follows: �(�) = F C (� ) (3) � � � ×1 where �(�) ∈ � , and � is transferred representation dimension. For aggregating node representations transferred F byC of all sampled neighboring nodes of�no , wde e propose a innovative hierarchical attention aggregation network, as shown in Figure 3, for each set of sampled neighbors, with three steps: (1) Grouping module; (2) Intra-type feature aggregating module; (3) Inter-type feature aggregating module. 4.2.1 Grouping module. After using the ���� in previous section, multiple sampling��r�esults �� are obtained. For each set of sampled neighbors �� , we irst group them based on their node types. These groups consist of three categories, which are several multimodal attribute neighbor node groups, a user neighbor node group, and a item neighbor node group. For these multimodal attribute neighbor node groups, they are further divided into four subcategories, which are an image neighbor node group, an audio neighbor node group, a tag neighbor node group, and a text neighbor node group, and represent related multimodal attribute information for users (including new users) and items. Here, we deine�-typ thee sampled neighboring node group�in � as �� . � �� 4.2.2 Intra-type feature aggregating module. For �-type group �� , we use a neural network to generate node �� representation for� ∈ �� . Formally, the aggregated�-type neighboring node group’s representation�for ��� �� can be formulated as follows: � � � (�) = ���������� {�(� )} (4) ��� � ∈�� ��� �� � � ×1 where � (�) ∈ � , � represents aggregated �-type neighboring node group’s representation dimension, �(� ) ��� is the transferred node representation of no�de , and ����������is the�-type neighbors aggregator function. ��� A self-attention technique as �the ��������� function can be applied to obtain attention values between homogeneous nodes in each group. Following Transformer 37] which [ is composed of a stack of multi-head self- attention layers and point-wise, fully connected layers for both encoder and decoder, we deine our self-attention intra-type feature aggregation module. � ×� Given a set of input features, denoted�as∈ � , self-attention transforms them into the matrices of queries � ×� � ×� � ×� � ∈ � , keys � ∈ � and values� ∈ � , given by: � = (� + ��)� ,� = (� + ��)� ,� = �� , (5) � � � ACM Trans. Inf. Syst. 12 • Desheng Cai, et al. � ×� where � , � , and � ∈ � are learnable projection weights, PE is the absolute Positional Embedding of the � � � features,� is the number of rows of the input featur � ,es and � is the dimension of the input featur � . Note e that we use PE to represent the relative position sorted by frequency in each type neighboring node group. The attention weights � are calculated as follows: �� � = ��� ����( ) (6) where a ��� ����function is applied to obtain the weights of values � is and a scaling factor. Further, output weighted average vectors� combined with the residual operation can be formulated as follows: � = �� + � (7) To improve the capacity, our self-attention module can be extended to the multi-head version. And output weighted average vectors of multi-head module � can be calculated as follows: ��� ˆ ˆ ˆ � = ������{� , ...,� , ...} (8) ��� ℎ���_1 ℎ���_� where ������is a feature concatenation function,�and is the output of the i-th self-attention head module. ℎ���_� Moreover, as we all know, the atom operation of self-attention mechanism, namely canonical dot-product, causes the time complexity and memory usage per layer to�b(e� ). The stack of� encoder layer makes total memory usage to be � (�· � ), which limits the model scalability on receiving long and large inputs. So, in order to reduce the space-time complexity and improve the eiciency of the self-attention aggregation module, we randomly sample��� keys to calculate attention weights. Controlled by a constant sampling�factor , we set the sampling num keys of as� · ��� for each query, which makes the our self-attention aggregation module only need to calculate� (��� ) dot-product for each query-key lookup and the layer memory usage maintains � (���� ). Then output vectors � of our self-attention module is applied to a position-wise feed-forward function ��� (FFN) with a single non-linearity, which are independently applied to each element of the set and is given by: ˆ ˆ ��� (� ) = � � (� � + �) + � (9) ��� 1 2 ��� where � (�) is the ReLU activation function, � and � are learnable weights, � and � are bias terms. 1 2 Finally, for ease of discussions, we denote the above process with a layer normalization ���function � as ����: ����(� ) = ���� (��� (� )) (10) ��� Based on above self-attention module, we reformulate � (�) as follows: ����{�(� )} � ∈�� ��� ��� �� � (�) = (11) |�� | �� Where we leverage above self-attention module to aggregate transferred node representations �-typ ofe neighbors and perform the averaging operation to generate the representation of aggregate �-typ d e neighboring node group. 4.2.3 Inter-type feature aggregating module. After the above step,|�� | aggregated representations are obtained �� |�� |×� for�� of node�, deined asE ∈ � . There are |�� | node types in The M-HG� , and diferent types of � � � neighboring nodes may contribute to the generation of node representations diferently. To fuse these aggregated neighbor representations into the representation��forof node� through considering their inluences on node �, we employ the attention method, which can be formulated as follows: �,� �� E = � E (12) �,� ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 13 � � ���(���������(� [� E ])) �,� �,� � = (13) ���(���������(� [� E ])) � ×1 where E ∈ � is the combined representation�for � of node�, denotes the concatenation operation, � � �,∗ � 2� ×1 � demonstrates the importance of diferent neighboring nodes’ embeddings, � and ∈ � is the trainable attention parameter. After hierarchical attention aggregation operation for each set of sampled neighb �� of ors node�, we obtain |���� | 1 2 � |����� | node embeddings for����� , denoted as ���� = {E , E , ..., E }. To fuse these representations � � � � � � into the ultimate representation UE for node�, we design a fusion module formulated as follows: |���� | 1 2 � UE = ��� {������{E ; E ; ...; E }} (14) � � � � ×1 where UE ∈ � , the function��� () is interpreted as a full connection layer, and the function ������() is a concatenation function that concatenates all the representations ���in � . Note that we apply the same dimension � for the transferred node representation, the aggregate�d-type neighboring node group’s representation, and the concatenated ultimate representation for � to nomake de IHGNN adjustment easier in this paper. 4.3 Model Optimization � ×1 To learn the representation UE ∈ � of each node� that can preserve the implicit structure information of the M-HG, the lossL is denoted as the KL-divergence between the hypothetical distribution �(� |� ) and � � empirical distribution �ˆ(� |� ) ofN (� ), which is the set of direct neighbors of�no : de � � � � exp(UE UE ) � � �(� |� ) = ) (15) � � Í exp(UE UE ) � ∈N(� ) � � � � � L = �� (�(·|·),�ˆ(·|·)) (16) where �� (·, ·) denotes the KL-divergence. The empirical probability �(� |� ) is set to 1 if � ∈ N (� ) and 0 � � � � otherwise. With �� (·, ·) replaced by the KL-divergence equation and overlooking a few constants, theLloss can be transformed as follows: L = �(� |� ) (17) � � � ∈N(� ) � � Optimizing the loss L demands a full scan of neighbN ors(� ) for each node�, which causes much computational cost. Thus, we leverage negative sampling [24] and transformL loss as follows: � � 2 L = − ln� (UE UE − UE UE ) + �||�|| (18) �� �� �� �� (�,�,�) ∈�� where � (�) is the sigmoid function, �� is a set of sampled node triples, wher �, �eand � represent the � node � , �ℎ the � node � and the � node � , respectively. Each sampled node triple me�ets ∈ N (� ) and � ∉ N (� ). � �ℎ � �ℎ � � � � � and � are the hyper-parameters of our IHGNN and the regularization weight, respectively. TL hecan lossbe optimized by the back-propagation method and the Adam optimizer [15]. 4.4 Inductive Representation Learning for User Cold-start Recommendation � ×1 After optimization, we can obtain the learned representation UE ∈ � of each node� in the M-HG� . For example, given a new user� in the testing set, with sparse attributes, we can infer the representation of the new user � based upon the learned framework and learned representations of nodes in the M-HG � with three steps as shown in Fig. 4: ACM Trans. Inf. Syst. 14 • Desheng Cai, et al. Connecting a target new user to Calculating Inferring embedding of a target new user a trained heterogeneous graph using his attributes recommended videos Phones info nodes Probabilities Location Well-trained Multiple nodes Hierarchical Attention Aggregation Networks Phone App nodes a target inferred new user Attribute embedding nodes Candidate videos A trained heterogeneous graph Fig. 4. Inductive Representation Learning for New Users. Algorithm 1 Training IHGNN Algorithm Input: The heterogeneous graph M-HG G; The training user-item pairs P; Batch size B; Output: Multiple hierarchical attention aggregation networks; 1: Initialize epoch t = 0; Randomly initialize the parameters � of our model 2: repeat 3: t = t+1; |� | 4: for ⌊ ⌋ iterations do 5: Obtain batch size B user-item pairs BP. 6: for a user-item pair (u, v) indo BP 7: for node v in (u, vdo ) 8: Perform the RWSS operation multiple times and obtain the�r�esult ��� for v. 9: for sampled neighbors�� in����� do � � 10: Group these sampled neighboring nodes in terms of their typ�es� for and obtain|�� | type- � � base sampled neighbor group�� . �� 11: for t-type base sampled neighbor group�� in����� do �� �� 12: Obtain the aggregated t-type neighbors’ embedding for v through Eq.4, Eq.7 and Eq.11; 13: end for �� 14: Obtain|�� | aggregated neighbor embeddings for �� of node v, denoted asE ; � � � 15: end for �� 16: Combine embeddings E of into the representation for �� node � by taking the impacts of these � � representations of node v into consideration through Eq.12 and Eq.13 and obtain inal embedding E of node v; 17: end for 18: Obtain inal embeddings E and E of user u and item v. � � 19: end for 20: Calculate the objective function by Eq.18, backpropagate gradients and update network parameters; 21: end for 22: until convergence; • Firstly, given a new user � with his corresponding sparse attribute information (e.g. photos information, locations, photo apps), we consider � as a graph node and further connect the new user node with the ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 15 input heterogeneous graph M-HG� in the training process based on the sparse attributes of the new user. This operation is the inductive new user representation learning of our IHGNN model. • Secondly, we regard new user� as a target user and infer ultimate embedding of new�user based on well-trained multiple hierarchical attention aggregation networks, which include the sampling operation with RWSS, multiple hierachical attention aggregations and the fusion operation. • Finally, we calculate a preference scor �, which e demonstrates how much the user� prefers the candidate item�, by their inal learned representations. Formally, we give the deinition of the preference score calculation function as follows: � = � (UE UE ) (19) �� � � exp(� ) where � (�) = is the sigmoid function, UE and UE are the learned inal representations of user � � 1+exp(� ) � and candidate items � respectively�, is the predicted preference score value for �user and candidate �� item�. For new users in the testing set, although they do not have historical interaction data and do not appear in the training data, the attribute data (e.g. phone information, locations, phone apps) of new users and existing users are shared so that we can leverage these shared attributes to infer representations of new users. Note that representations of these shared attributes are learned in training process and we initialize represen- tations of new users by summing learned representations of their sparse attributes. 4.5 Algorithm Description Our algorithm is illustrated in Algorithm 1. Given the heterogeneous graph M-HG G, the training user-item pairs P and Batch size B, our goal is to learn a multiple hierarchical attention aggregation network that can leverage shared attributes of new users to infer their representations for predicting their preference for micro-videos. The process of our algorithm are described in the Algorithm 1, and the core is to train a high-quality multiple hierarchical attention aggregation network through back-propagation and the mini-batch Adaptive Moment Estimation (Adam). 5 EXPERIMENTAL RESULTS 5.1 Dataset We use Kwai, Tiltok and Movielens for evaluation. The statistics of them are summarized in Table 2 and we briely describe them as follows: • Tiktok dataset : This dataset is published by a popular micro-video platform named Tiktok. It also contains micro-videos created by users registered on the platform and interactions of user-video (e.g. click, like). We use the micro-videos features from multi-modal in original dataset which ignores the raw data. • Kwai dataset : This dataset is extracted from a real-world micro-video sharing platform named Kwai, which contains users associated with attributes, micro-videos associated with attributes, and some relationship information including user-video interactions. • MovieLens (MLs) dataset : Movielens is a movie rating dataset which has been extensively applied to CF recommendation algorithms. We use the one million rating version which removes users that have less than 20 number of rating records. To count the numbers, we build a vector for each user with each entry is a value of 0 or 1 to indicate whether the user has a record to the movie. http://ai-lab-challenge.bytedance.com/tce/vc/ https://www.kwai.com/ http://grouplens.org/datasets/movielens/1m/ ACM Trans. Inf. Syst. 16 • Desheng Cai, et al. Table 2. The statistics detail of three real world data sets. Dataset User micro-videoInteraction Density Tiktok 3,656 7,085 1,253,112 4.49% Kwai 169,878 310,681 775,834,643 1.47% MovieLens 6040 3706 1,000,209 4.47% 5.2 Baselines To evaluate the performance of IHGNN, we consider several the state-of-the-art approaches as baselines including traditional methods and graph-based methods. Note that, for all baselines, we conduct experiments at the user cold-start scenario. • FM : FM efectively combines factorization machines (FMs) based frameworks and any other ��� ��� side content feature information (e.g. locations), besides the user and item, for recommendation tasks. In this work, we feed heterogeneous information as side features into FM models for user cold-start recommendation tasks. • NCF: Neural Matrix Factorization (NCF) 10] can [ fuse matrix factorization methods and neural networks to train and predict user-item bipartite interaction information for recommendation tasks. • GraphSAGE: GraphSAGE [9] is an unsupervised inductive graph representation learning framework on large graphs. GraphSAGE can be utilized to obtain expressive low-dimensional representations for graph nodes including nodes unseen in the training stage due to its inductive learning capacity. Especially, it is very useful for considering graph structure information and rich node attribute information from neighboring nodes. • STAR-GCN: STAR-GCN [48] designs a stacked and reconstructed GCN framework on user-item bipartite interaction graphs. STAR-GCN requires obtaining several rated edges connected with new nodes in the testing graph and further leverages these edges to make predictions, which may be suitable for the cold start problems. • HeRec: HeRec [31] is a heterogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combines these representations with extended matrix factorization models for improving personalized recommendation performances. • HetGNN: HetGNN [46] is a graph representation learning method for learning heterogeneous node representations by incorporating their heterogeneous content information. HetGNN mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module to consider the heterogeneity of graphs. • IHGNN: IHGNN is our proposed recommendation model, which can leverage Modality-aware Heteroge- neous Graphs (M-HG) for preserving the rich and heterogeneous relationships among users, items and their relevant attribute information. Furthermore, IHGNN utilizes a well-designed hierarchical attentive ag- gregation module to learn the representations of nodes, including new users, to consider the heterogeneity of M-HGs for user cold-start recommendation tasks. 5.3 Experimental Setings For each dataset, we randomly held 80% and 60% of users for training, and the remaining users are treated as testing sets, respectively. To evaluate our approach and the compared baselines on user cold-start recommendation, we utilize four widely used evaluation metrics 42], which [ includes Normalized Discounted Cumulative Gain ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 17 Table 3. Experimental results of IHGNN and baselines in terms of all datasets. training_ratio=0.6 ( , k=10) Datasets Metrics FM NCF GraphSage STAR-GCN HeRec HetGNN IHGNN ��� Kwai Pre 0.1901 0.2245 0.2891 0.2901 0.3789 0.3835 0.3931 Rec 0.1773 0.2017 0.2991 0.3011 0.3858 0.3713 0.4011 NDCG 0.2011 0.2441 0.2997 0.3012 0.3591 0.3812 0.3901 AUC 0.6003 0.6521 0.7001 0.6881 0.7111 0.7311 0.7402 Tiktok Pre 0.2012 0.1991 0.3015 0.2817 0.3601 0.3679 0.3721 Rec 0.2101 0.2048 0.3301 0.2918 0.3901 0.3939 0.4005 NDCG 0.2011 0.1811 0.3339 0.2912 0.3802 0.3811 0.4043 AUC 0.6218 0.5991 0.6981 0.6725 0.7317 0.7411 0.7512 MLs Pre 0.2442 0.2015 0.3129 0.2719 0.3598 0.3611 0.3701 Rec 0.1911 0.1999 0.3195 0.2991 0.3759 0.3753 0.3871 NDCG 0.2312 0.2331 0.3194 0.2849 0.3598 0.3522 0.3701 AUC 0.6001 0.6419 0.6884 0.6512 0.7101 0.7159 0.7233 Table 4. Experimental results of IHGNN and baselines in terms of all datasets. training_ratio=0.8 ( , k=10) Datasets Metrics FM NCF GraphSage STAR-GCN HeRec HetGNN IHGNN ��� Kwai Pre 0.1034 0.1133 0.2242 0.2014 0.2729 0.3299 0.3535 Rec 0.2034 0.2111 0.2954 0.2661 0.3219 0.3501 0.3712 NDCG 0.3305 0.3327 0.3401 0.3391 0.3401 0.3631 0.3825 AUC 0.5940 0.6881 0.7172 0.6901 0.7209 0.7525 0.7791 Tiktok Pre 0.1141 0.1211 0.2523 0.2481 0.2781 0.3129 0.3321 Rec 0.1121 0.2141 0.2943 0.2809 0.3001 0.3214 0.3505 NDCG 0.3035 0.3112 0.3415 0.3101 0.3505 0.3501 0.3843 AUC 0.5155 0.6501 0.6421 0.6911 0.7098 0.7278 0.7419 MLs Pre 0.1449 0.1564 0.2921 0.2519 0.3013 0.3129 0.3201 Rec 0.2019 0.2102 0.2731 0.2811 0.2939 0.3113 0.3471 NDCG 0.2402 0.2435 0.3015 0.2901 0.3019 0.3412 0.3601 AUC 0.6214 0.6555 0.6883 0.6672 0.7121 0.7311 0.7561 at top k (NDCG@k), Recall at top k (R@k), Precision at top k (P@k), and Area under the ROC Curve (AUC). In practice, following experimental settings of the recommendation model 10]NeuCF and the [ micro-video recommendation model MMGCN42[], which both are popular recommendation methods, we set Top k=10 and report the average scores in the testing set. In the training phase, we tune the hyper-parameters of our IHGNN model by utilizing a cross-validation method and search the hyper-parameters with a popular grid search method. Speciically, we irst randomly initialize the parameters of our model by adopting a Gaussian distribution where we set the mean as 0 and the standard deviation as 0.02. To optimize our IHGNN model, we adopt a widely used optimizer named Adaptive Moment Estimation (Adam)15[] in a mini-batch way. The batch size is selected in the set {128, 256, 512}, the learning rate is searched in the set {1e-4, 1e-3, 1e-2, 5e-4, 5e-3, 5e-2} and the regularize is selected in set {1e-5, 1e-4, 5e-2, 1e-2, 1e-3,}. Because we ind the experimental settings are consistent when varying the dimensions of embeddings, if there is not a special explanation, we report our results �=200, when which achieves a relatively good performance. ACM Trans. Inf. Syst. 18 • Desheng Cai, et al. 5.4 uantitative Results The experimental results of baselines and the IHGNN model are shown in Table 3 and Table 4, with 80%, 60% of users used for training. From the results, we can make several conclusions and observations as follows. Firstly, our proposed IHGNN model reliably beats all baseline models on all three datasets for all four metrics, conirming the usefulness and superiority of IHGNN for user cold-start recommendation tasks. It is predictable that traditional methods consistently yield the most exceedingly bad performances for all four metrics. Learning representations of users and items, especially for new users, simply by incorporating related content information into factorization machine is not adequate. They ignore rich and expressive relationship information. Compared with traditional approaches, graph-based methods achieve signiicant performance improvements. These results demonstrate that graph convolutional networks can be used to learn better representations of nodes in graphs, especially for inferring the new users’ representations, which may further improve and promote the quality of representations for user cold-start recommendation. For GraphSAGE and STAR-GCN, on the basis of our experi- ments, their performances endure from the declination compared with heterogeneous graph-based representation learning models, HeRec and HetGNN. This might be because GraphSAGE only focuses on homogeneous graphs and ignores heterogeneity of data. Furthermore, STAR-GCN is based on bipartite user-item interaction graph, and cannot take heterogeneous content feature information into consideration. These comparisons further indicate that considering heterogeneous information, which mainly includes heterogeneous content information and various types of relationships, is vital to generate new users’ representations for user cold-start recommendations. For the HeRec model, it is worse than HetGNN probably because its performance heavily depends on diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs. Also, HeRec can not take advantage of the inluence of diferent types of nodes on the current node when aggregating content of neighboring nodes and may not efectively capture high-order structural data for cold- start recommendation tasks. Compared with the heterogeneous aggregation based method HetGNN, IHGNN achieves better and has the following advantages in terms of user cold-start recommendation task: (1) The advantage in HIN construction. Our model absorbs multi-modal attribute data into heterogeneous graph nodes, instead of just considering users and micro-videos as nodes, and further utilizes the heteroge- neous and rich relationships among these multi-modal attributes as edges in the heterogeneous graph for cold-start recommendation tasks. However, HetGNN just utilizes user-item interactions for constructing graphs. (2) The advantage in generating neighbors. The sampling and grouping module of our model searches and samples relevant heterogeneous neighboring nodes of each node based on rich relationships among multi- modal attributes and further makes our model generate a more robust and comprehensive representation of each user and each micro-video. For sampling more related neighboring nodes of the current node, we design The Random Walk Sampling Strategy ���(� ), which is a random walk based sampling strategy with a restart probability � for sampling related heterogeneous neighbors of each node in the complex graph M-HG. Compared with these exiting sampling strategies, ���� does not require any prior knowledge, such as meta paths, to sample heterogeneous neighboring nodes, and is not sensitive to interference from noisy nodes. Furthermore�,��� is conducted for multiple times to ensure the efectiveness of the sampling strategy because a single sampling operation may miss some important nodes. However, HetGNN just utilizes relatively simple sampling operations to obtain neighboring nodes on user-item bipartite graphs. Moreover, HetGNN heavily depends on user-item interactions, and ignores these relationships among multi-modal attribute data. (3) The advantage in feature aggregation module. Our feature aggregation module uses a novel hierarchical attention network, which consists of attribute-aware self-attention and neighbor-aware attention, to ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 19 Table 5. Experimental results of IHGNN and its key components in terms of all datasets. training_ratio=0.6 ( , k=10) Datasets Metrics IHGNN¬� IHGNN¬� IHGNN¬� IHGNN Kwai Pre 0.3901 0.3889 0.3511 0.3931 Rec 0.3711 0.3710 0.3916 0.4011 NDCG 0.3311 0.3761 0.3811 0.3901 AUC 0.7011 0.7001 0.7219 0.7402 Tiktok Pre 0.3112 0.3412 0.3331 0.3721 Rec 0.3911 0.3914 0.3818 0.4005 NDCG 0.3499 0.3812 0.3881 0.4043 AUC 0.7391 0.7319 0.7411 0.7512 MLs Pre 0.3599 0.3429 0.3311 0.3701 Rec 0.3417 0.3312 0.3711 0.3871 NDCG 0.3621 0.3519 0.3599 0.3701 AUC 0.7158 0.7119 0.6911 0.7233 take the importance of multi-modal attributes and diferent neighboring node types into consideration simultaneously at the same time for inferring representations of each user and each item, including unseen new nodes. The hierarchical design is used to consider the heterogeneity of constructed graphs. However, HetGNN does not consider the importance among diferent multi-modal attributes for representation learning of each node. (4) The advantage in inferring representations of new users. In this work, we consider a new user as a graph node and further connect the new user node with the input heterogeneous graph M-HG � in the training process based on the sparse attributes of the new user. This operation is the inductive new user representation learning of our model. However, HetGNN just relies on sparse attributes to generate embeddings of new users, but ignores hidden relationships among users, items, and their attributes. In conclusion, experimental results demonstrate the proposed IHGNN model has the potential to generate better user cold-start recommendation performance. 5.5 Analysis of IHGNN Components Since our proposed IHGNN consists of multiple key components, we demonstrate the efectiveness of the proposed model by comparing the following variants of the IHGNN: • IHGNN¬� : It is a variant of IHGNN which removes multiple sampling operation and samples related neighbors of each node� only once, and then the sampling result of each no�de is copied multiple times. • IHGNN¬�: It is a variant of IHGNN that removes the self-attention component of the intra-type feature aggregating component and sets the same importance values to multimodal attribute neighboring nodes. • IHGNN¬�: It is a variant of IHGNN which removes the attention module in the inter-type feature aggre- gating component and sets the same importance values to neighboring node groups. The ablation study results of Precision@10, Recall@10, NDCG@10 and AUC on three dataset are reported in Table 5 and 6, with 60%, 80% of users for training. From the results, we can conclude that: • IHGNN achieves better performance than IHGNN ¬� on three datasets, which demonstrates that the multiple sampling operation can capture important neighboring nodes more precisely and efectively. • IHGNN outperforms IHGNN¬�, which shows that the importance value of the same modal nodes, such as users, visual content, items, textual content, and acoustic content) can be better calculated through the intra-type self-attention component. ACM Trans. Inf. Syst. 20 • Desheng Cai, et al. Table 6. Experimental results of IHGNN and its key components in terms of all datasets. training_ratio=0.8 ( , k=10) Datasets Metrics IHGNN¬� IHGNN¬� IHGNN¬� IHGNN Kwai Pre 0.3312 0.3381 0.3227 0.3535 Rec 0.3345 0.3501 0.3616 0.3712 NDCG 0.3443 0.3502 0.3719 0.3825 AUC 0.7601 0.7402 0.7581 0.7791 Tiktok Pre 0.3033 0.3112 0.3129 0.3321 Rec 0.3101 0.3016 0.3121 0.3505 NDCG 0.3704 0.3652 0.3513 0.3843 AUC 0.7302 0.7319 0.7359 0.7419 MLs Pre 0.3301 0.3051 0.3121 0.3201 Rec 0.3099 0.3298 0.3132 0.3471 NDCG 0.3201 0.3019 0.3402 0.3601 AUC 0.7339 0.7101 0.7219 0.7561 IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM (a) NDCG@K (b) AUC Fig. 5. Experimental results of Top-K item recommendation when K varies from 2 to 20 onKwai the dataset. (train- ing_ratio=0.8) • The results of IHGNN is superior to IHGNN ¬�, which indicates that the inter-type attention component can estimate the impact of various type neighbor node groups (e.g.. users, items, attributes) for obtaining inal node embeddings efectively. 5.6 Hyper-parameters Sensitivity We show extended experimental results to break down the inluences of all key parameters for the IHGNN model which include the number of sampling operation, the ranking position K, the depth of sampling operation, and the aggregated representation dimension d for users and items on three datasets. Impact of the Ranking Position K: From Fig.5, Fig.6 and Fig.7, it can be observed that IHGNN shows consistent performance enhancements over graph-based methods across all position parameters, indicating the ACM Trans. Inf. Syst. NDCG@K AUC User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 21 IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM K K (a) NDCG@K (b) AUC Fig. 6. Experimental results of Top-K item recommendation when K varies from 2 to 20 on Tiktok the dataset. (train- ing_ratio=0.8.) IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM (a) NDCG@K (b) AUC Fig. 7. Experimental results of Top-K item recommendation when K varies from 2 to 20 onMo the vieLens dataset. (training_ratio=0.8) need to model heterogeneous information (heterogeneous content information and various types of relationships), as well as the superior capability of graph representation learning of our IHGNN framework. Impact of the Number of Sampling Operation: From Fig.8, with the number of sampling operations for graph nodes varying from 2 to 10, experimental results of AUC and NDCG@10 increase slowly until a basically stable value, which illustrates the importance of multiple sampling operations for each node. Furthermore, our model has the good robustness. Even if there are many sampling operations of each node, it has little impact on the overall performance of our model. Impact of the Depth of Sampling Operation: From Fig.9, as the depth of the sampling operation varies from 2 to 8 for each node, the results of AUC and NDCG@10 slowly increase. nevertheless, with the depth of sampling ACM Trans. Inf. Syst. NDCG@K NDCG@K AUC AUC 22 • Desheng Cai, et al. Kwai Kwai Tiktok Tiktok Movielens Movielens Num of Sampling Operation Num of Sampling Operation (a) Num of Sampling Operation (b) Num of Sampling Operation Fig. 8. Experimental results of AUC and NDCG@10 of IHGNN for diferent number of sampling in terms of all datasets. (training_ratio=0.8, k=10) Kwai Kwai Tiktok Tiktok Movielens Movielens Depth of Sampling D D De e ep p pttth o h o h of S f S f Sa a am m mp p pllliiin n ng g g Depth of Sampling (a) Depth of Sampling (b) Depth of Sampling Fig. 9. Experimental results of AUC and NDCG@10 of IHGNN for diferent depth of sampling values in terms of all datasets. (training_ratio=0.8, k=10) operation increasing, the performance deteriorates. The cause might be that too many noisy neighboring nodes are contained. Impact of the Aggregated Embedding Dimension: From Fig.10, when the aggregate embedding dimension � of each graph node varies between 50 and 350, the AUC and NDCG@10 generally increases. Nevertheless, � as is increased further, performance will slowly decrease, possibly owing to overitting. ACM Trans. Inf. Syst. AUC AUC NDCG@10 NDCG@10 User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 23 Kwai Kwai Tiktok Tiktok Movielens Movielens Embedding Dimension Embedding Dimension (a) Embedding Dimension (b) Embedding Dimension Fig. 10. Experimental results of AUC and NDCG@10 of IHGNN for diferent embedding dimensions in terms of all datasets. (training_ratio=0.8, k=10) 5.7 ualitative Results To instinctively illustrate the efectiveness of the IHGNN in inferring new users’ representations by utilizing the highly complex and rich relationships, such as relational information among related multimodal attributes of users and items, we visualize a new user with his related attributes, sampling results from the M-HG and attention values of some related micro-videos, as shown in Figure 11. For each new user, based on his attributes, we use the sampling operation to sample heterogeneous neighbors in the constructed heterogeneous graph, which can be regarded as their enriched information. In Figure 11, the list of mobile apps for the new user includes apps related to basketball and Amazon, which indicates the new user’s interest. Furthermore, these information can be utilized to sample related micro-videos, including basketball-related and Amazon related micro-videos. After aggregation operation by the hierarchical attention aggregation network, attention values of diferent micro-videos are learned to generate the new user’s representation as shown in attention values box. Note that, in the attention value box, the blue horizontal arrow shows the size of learned attention values of nodes, and the vertical arrow shows the node number of the graph M-HG. From the Figure 11, we can observe that basketball-related and Amazon related micro-videos are more important than other types of videos in terms of their learned attention values. Therefore, the IHGNN model can efectively infer new users’ representations based on relevant data information, which can assist in improving the performance of the user cold-start recommendation task. 6 CONCLUSIONS In this work, we are committed to solving the problem of user cold-start recommendation. We argue that most existing GNN-based cold-start recommendation methods just learn models based on homogeneous graphs and ignore rich various(heterogeneous) relationships among diferent kinds of heterogeneous information in the user cold-start recommendation scenario. We propose a novel Inductive Heterogeneous Graph Neural Network (IHGNN) model, which can take advantage of the rich and heterogeneous relational information to alleviate the sparsity property of user attributes. Our model converts new users, items, associated multimodal information into a Modality-aware Heterogeneous Graph (M-HG), which preserves the rich and heterogeneous relationship information among them. In addition, a well-designed multiple hierarchical attention aggregation model consisting of the intra- and inter-type attention aggregating module is proposed, focusing on useful connected neighbors ACM Trans. Inf. Syst. AUC NDCG@10 24 • Desheng Cai, et al. Fig. 11. Visualization of sampling operation and atention values of some micro-videos learned by the IHGNN model. Note that, in the atention value box, the blue horizontal arrow shows the size of learned atention values of nodes, and the vertical arrow shows the node number of the graph M-HG. and neglecting meaningless and noisy connected neighbors for learning more expressive representations. We evaluate our IHGNN method on three real data sets, and experimental results in terms of all four metrics show that our proposed IHGNN model outperforms existing baselines in user cold-start recommendation tasks. In the future, we will try to focus on the expansion of the existing heterogeneous graph with knowledge graphs on the GNN models. 7 ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (No.62036012, 61936005, 61832002, 61721004, 62072456, 62106262, 61872199), the Key Research Program of Frontier Sciences, CAS, Grant NO. QYZDJSSWJSC039, and the Open Research Projects of Zhejiang Lab (NO.2021KE0AB05). This work was also sponsored by the Tencent WeChat Rhino-Bird Focused Research Program. REFERENCES [1] Gediminas Adomavicius, Jesse C. Bockstedt, Shawn P. Curley, and Jingjing Zhang. 2021. Efects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings. ACM Trans. Inf. Syst. 39, 2 (2021), 13:1ś13:38. [2] Mohammad Aliannejadi and Fabio Crestani. 2018. Personalized Context-Aware Point of Interest Recommendation. ACM Trans. Inf. Syst. 36, 4 (2018), 45:1ś45:28. [3] Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2021. Context-aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants. ACM Trans. Inf. Syst. 39, 3 (2021), 29:1ś29:30. [4] Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-video Recommendation. IEEE Transactions on Multimedia (2021), 1ś1. https://doi.org/10.1109/TMM.2021.3059508 [5] Hongxu Chen, Hongzhi Yin, Tong Chen, Weiqing Wang, Xue Li, and Xia Hu. 2022. Social Boosted Recommendation With Folded Bipartite Network Embedding. IEEE Trans. Knowl. Data Eng. 34, 2 (2022), 914ś926. [6] Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: a toolkit for feature-based collaborative iltering. J. Mach. Learn. Res. 13 (2012), 3619ś3622. [7] Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2019. Joint Neural Collaborative Filtering for Recommender ACMSystems. Trans. Inf. Syst. 37, 4 (2019), 39:1ś39:30. [8] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. InProceedings of the 1st Workshop on Deep Learning for Recommender Systems, RecSys 2016, Boston, MA, USA, September 15, 2016. 7ś10. [9] William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Advances Graphs.in In Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 1024ś1034. ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 25 [10] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural CollaborativPr e Filtering. oceedings In of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017 . 173ś182. [11] Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S. Yu. 2018. Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model. Pr Inoceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. 1531ś1540. [12] Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, and Changsheng Xu. 2021. Eicient Graph Deep Learning in TensorFlow with tf_geometric. MMIn ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021 , Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3775ś3778. [13] Jun Hu, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2019. Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching.PrIn oceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019, Laurent Amsaleg, Benoit Huet, Martha A. Larson, Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, and Wei Tsang Ooi (Eds.). ACM, 1157ś1165. [14] Shuyi Ji, Yifan Feng, Rongrong Ji, Xibin Zhao, Wanwan Tang, and Yue Gao. 2020. Dual Channel Hypergraph Collaborative Filtering. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020 . 2020ś2029. [15] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International In Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings . [16] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classiication with Graph Convolutional5thNetw International orks. In Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . [17] Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for RecommenderComputer Systems. 42, 8 (2009), 30ś37. [18] Pigi Kouki, Shobeir Fakhraei, James R. Foulds, Magdalini Eirinaki, and Lise Getoor. 2015. HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems. In Proceedings of the 9th ACM Conference on Recommender Systems, RecSys 2015, Vienna, Austria, September 16-20, 2015. 99ś106. [19] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. ProceInedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 (JMLR Workshop and Conference Proceedings, Vol.. 32) 1188ś1196. [20] Xiaopeng Li and James She. 2017. Collaborative Variational Autoencoder for Recommender Systems. Proceedings In of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017 . 305ś314. [21] Zhao Li, Xin Shen, Yuhang Jiao, Xuming Pan, Pengcheng Zou, Xianling Meng, Chengwei Yao, and Jiajun Bu. 2020. Hierarchical Bipartite Graph Neural Networks: Towards Large-Scale E-commerce Applications. 36th In IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020. 1677ś1688. [22] Song Liu, Haoqi Fan, Shengsheng Qian, Yiru Chen, Wenkui Ding, and Zhongyuan Wang. 2021. HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval.2021 In IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. 11895ś11905. [23] Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. IEEE Confer In ence on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 . 3431ś3440. [24] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jefrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality Advances . In in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States . 3111ś3119. [25] Federico Monti, Michael M. Bronstein, and Xavier Bresson. 2017. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 3697ś3707. [26] Fedelucio Narducci, Pierpaolo Basile, Cataldo Musto, Pasquale Lops, Annalina Caputo, Marco de Gemmis, Leo Iaquinta, and Giovanni Semeraro. 2016. Concept-based item representations for a cross-lingual content-based recommendation prInf. ocess. Sci.374 (2016), 15ś31. [27] Shengsheng Qian, Dizhan Xue, Huaiwen Zhang, Quan Fang, and Changsheng Xu. 2021. Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval. Thirty-Fifth In AAAI Conference on Artiicial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artiicial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artiicial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. 2440ś2448. [28] Shengsheng Qian, Tianzhu Zhang, Changsheng Xu, and Jie Shao. 2016. Multi-Modal Event Topic Model for Social EventIEEE Analysis. Trans. Multim.18, 2 (2016), 233ś246. [29] Lei Sang, Min Xu, Shengsheng Qian, Matt Martin, Peter Li, and Xindong Wu. 2021. Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Netw IEEE orks. Trans. Multim.23 (2021), 2019ś2032. ACM Trans. Inf. Syst. 26 • Desheng Cai, et al. [30] Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 357ś370. [31] Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 357ś370. [32] Chuan Shi, Jian Liu, Fuzhen Zhuang, Philip S. Yu, and Bin Wu. 2016. Integrating heterogeneous information via lexible regularization framework for recommendation. Knowl. Inf. Syst. 49, 3 (2016), 835ś859. [33] Chuan Shi, Zhiqiang Zhang, Yugang Ji, Weipeng Wang, Philip S. Yu, and Zhiping Shi. 2019. SemRec: a personalized semantic recommendation method based on weighted heterogeneous information netwW orks. orld Wide Web22, 1 (2019), 153ś184. [34] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender System based on Autoencoders. ProceInedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS@RecSys 2016, Boston, MA, USA, September 15, 2016. 11ś16. [35] Jianing Sun, Yingxue Zhang, Chen Ma, Mark Coates, Huifeng Guo, Ruiming Tang, and Xiuqiang He. 2019. Multi-graph Convolution Collaborative Filtering. 2019InIEEE International Conference on Data Mining, ICDM 2019, Beijing, China, November 8-11, 2019 . 1306ś1311. [36] Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph Convolutional Matrix Completion. CoRR abs/1706.02263 (2017). [37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need.AIn dvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 5998ś6008. [38] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. [39] Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua. 2020. Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 1001ś1010. [40] Xiao Wang, Ruijia Wang, Chuan Shi, Guojie Song, and Qingyong Li. 2020. Multi-Component Graph Convolutional Collaborative Filtering.The In Thirty-Fourth AAAI Conference on Artiicial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artiicial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artiicial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. 6267ś6274. [41] Jian Wei, Jianhua He, Kai Chen, Yi Zhou, and Zuoyin Tang. 2017. Collaborative iltering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 69 (2017), 29ś39. [42] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-vide Prooce . In edings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 1437ś1445. [43] Libing Wu, Cong Quan, Chenliang Li, Qian Wang, Bolong Zheng, and Xiangyang Luo. 2019. A Context-Aware User-Item Representation Learning for Item Recommendation. ACM Trans. Inf. Syst. 37, 2 (2019), 22:1ś22:29. [44] Hongzhi Yin, Bin Cui, Xiaofang Zhou, Weiqing Wang, Zi Huang, and Shazia W. Sadiq. 2016. Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation. ACM Trans. Inf. Syst. 35, 2 (2016), 11:1ś11:44. [45] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018 . 974ś983. [46] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 793ś803. [47] Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexander J. Smola. 2020. ResNeSt: Split-Attention Networks. CoRR abs/2004.08955 (2020). [48] Jiani Zhang, Xingjian Shi, Shenglin Zhao, and Irwin King. 2019. STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems. InProceedings of the Twenty-Eighth International Joint Conference on Artiicial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. 4264ś4270. [49] Jing Zheng, Jian Liu, Chuan Shi, Fuzhen Zhuang, Jingzhi Li, and Bin Wu. 2017. Recommendation in heterogeneous information network via dual similarity regularization. Int. J. Data Sci. Anal.3, 1 (2017), 35ś48. ACM Trans. Inf. Syst. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Information Systems (TOIS) Association for Computing Machinery

User Cold-Start Recommendation via Inductive Heterogeneous Graph Neural Network

Loading next page...
 
/lp/association-for-computing-machinery/user-cold-start-recommendation-via-inductive-heterogeneous-graph-qlj3Aff59Z

References (52)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2023 Association for Computing Machinery.
ISSN
1046-8188
eISSN
1558-2868
DOI
10.1145/3560487
Publisher site
See Article on Publisher Site

Abstract

User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network DESHENG CAI, Hefei University of Technology, HFUT, China SHENGSHENG QIAN, National Lab of Pattern Recognition, Institute of Automation, CAS, China QUAN FANG, National Lab of Pattern Recognition, Institute of Automation, CAS, China JUN HU, National Lab of Pattern Recognition, Institute of Automation, CAS, China CHANGSHENG XU, National Lab of Pattern Recognition, Institute of Automation, CAS, China In recent years, user cold-start recommendations have attracted a lot of attention from industry and academia. In user cold-start recommendation systems, the user attribute information is often used by existing approaches to learn user preferences due to the unavailability of the user action data. However, most existing recommendation methods often ignore the sparsity of user attributes in cold-start recommendation systems. To tackle this limitation, this paper proposes a novel Inductive Heterogeneous Graph Neural Network (IHGNN) model, which utilizes the relational information in user cold-start recommendation systems to alleviate the sparsity of user attributes. Our model converts new users, items, associated multimodal information into a Modality-aware Heterogeneous Graph (M-HG), which preserves the rich and heterogeneous relationship information among them. Speciically, to utilize rich and heterogeneous relational information in a M-HG for enriching the sparse attribute information of new users, we design a strategy, which basically is based on random walk operations, to collect associated neighbors of new users by multiple times sampling operation. Then, a well-designed multiple hierarchical attention aggregation model consisting of the intra- and inter-type attention aggregating module is proposed, focusing on useful connected neighbors and neglecting meaningless and noisy connected neighbors to generate high-quality representations for user cold-start recommendations. Experimental results on three real data sets demonstrate that IHGNN outperforms the state-of-the-art baselines. CCS Concepts: · Information systems → Recommender systems. Additional Key Words and Phrases: Multi-Modal, Heterogeneous Graph, User Cold-start Recommendation. 1 INTRODUCTION The personalized recommendation is an important and challenging task that has been receiving substantial attention from the academic community. For example, traditional collaborative iltering based recommendation algorithms, such as Matrix factorization (MF) techniques 17], are[developed to learn expressive representations (lookup tables) for users and items, and use past user-item ratings to predict future ratings for the personalized recommendation. Although much research has been made to develop reliable and eicient algorithms for the personalized recommendation task, these existing work still endure from the cold-start problem, i.e., the scenario Authors’ addresses: Desheng Cai, Hefei University of Technology, HFUT, Hefei, China, caidsml@gmail.com; Shengsheng Qian, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, shengsheng.qian@nlpr.ia.ac.cn; Quan Fang, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, qfang@nlpr.ia.ac.cn,; Jun Hu, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, hujunxianligong@gmail.com; Changsheng Xu, National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing, China, csxu@nlpr.ia.ac.cn. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proit or commercial advantage and that copies bear this notice and the full citation on the irst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speciic permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 1046-8188/2022/9-ART $15.00 https://doi.org/10.1145/3560487 ACM Trans. Inf. Syst. 2 • Desheng Cai, et al. (a) (b) Fig. 1. (a) Real-world heterogeneous relationships among users and items; (b) A related heterogeneous graph with new users. needs to deal with new users or items while their vector embeddings have not been learned because of lacking preference information. In cold-start recommendation, there exist two information spaces, an attribute information space and a behavior information space. The attribute space describes user’s or item’s preferences (e.g., user’s personal information, item’s content information), and the behavior space is used to represent user interactions (e.g., purchase behavior and past interactions). Most of the existing cold start recommendations assume that there is no behavior interactions but abundant attribute information for new users or new items. Existing methods for the cold-start recommendation task can be roughly categorized as three research lines: (1) Content-based recommendation methods6[, 26, 43] utilize simple feature information of items (e.g. categories, textual content, related images, reviews information ) and users (e.g. locations, devices, apps, gender) to learn their respective representations for cold-start recommendation. (2) Some hybrid metho 8,ds 18[, 34] have been proposed to extend MF [17] (traditional and probabilistic) so that user- and item-related information can be learned in their respective representations. (3) Deep learning based hybrid approaches 1, 3, 10 [ , 20, 34, 41] aim to employ deep neural networks to obtain feature representations from user- and item-related attribute information and further incorporate these attributes into a collaborative iltering model for cold-start recommendation. In this paper, we are committed to solving user cold-start recommendation problems. Although these existing models have shown efectiveness in user cold-start recommendation task, most of them heavily rely on user attribute information and usually ignore existing rich, heterogeneous relationship information among new users (new users and their corresponding attributes), and existing historical information (existing users, items and their related attribute information), Interaction such as relationships (e.g., � −� , � −� , 1 2 1 3 � −� , � −� ), Co-occurrence relationships(e.g., � −� , � −� , � −� , � −� , � −� ) and Inclusion relationships 2 3 2 1 4 1 3 2 4 2 3 4 3 3 (e.g., � − � , � − � , � − � , � − � , � − � ), as shown in Figure 1(a). 5 8 6 9 6 8 7 9 7 8 With the rapid development of deep learning in various ields2r,e13 cently , 22, 28 [ , 30, 37, 47], Graph Neural Networks (GNNs) [9, 16, 38] also have pulled in expanding consideration with their fabulous ability in modeling information comprising of components and their dependency, which have gotten extraordinary improvements for recommendation systems25[, 36, 45]. It persuades us to leverage the predominant capacity of GNN models for exploiting existing such useful relationship information between users and existing historical information and further obtaining the superior recommendation model. However, most existing GNN-based recommendations aim to explicitly encode the crucial collaborative signal of user-item interactions to enhance the user/item representations through the propagation process based on user-item bipartite graphs. For example, STAR-GCN [48] leverages a stacked and reconstructed GCN encoder-decoder on the user-item bipartite interaction graph ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 3 to obtain representations of users and items for cold-stat recommendation. These methods just regard side information or attributes of users and items as initial features, and also ignore rich, heterogeneous relationship information among users, items and their corresponding attribute. In fact, such rich relationship information relects hidden inter-dependencies among data and can be used to get more relevant information for users. Therefore, we have to face theChallenge 1: How to efectively model rich, heterogeneous and hidden relationship information and then further enrich user attributes based on these relationship information. Another drawback of most existing user cold-start recommendation models is that they infer the representation of new users based on their related content information but usually do not take into account the heterogeneity and diferent impacts of these information. For instance, in Figure 1(b), each new user is always related to some heterogeneous property information (e.g. locations, mobile phone types, using apps). Among the heterogeneous attribute information, related information using apps should have more inluence on the embedding of each new user since using apps features is more representative than locations and mobile phone types for new users. Actually, there exits some heterogeneous GNN-based recommendation methods which leverage meta-paths to consider heterogeneity of graphs. For example, HeRe 31c][is a heterogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combine these representations with extended matrix factorization models for improving personalized recommendation performances. Meta path based models heavily depends on the selection of meta paths, and can not take advantage of the inluence of diferent types of nodes on the current node when aggregating content of neighboring nodes. Recently, some heterogeneous GNN-based recommendation methods, which rely on heterogeneous aggregations to consider heterogeneous neighboring nodes, have emerged. For instances, HetGNN 46],[which mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module, is proposed for learning heterogeneous node representations by incorporating their heterogeneous content information. However, HetGNN just utilizes relative simple sampling operations to obtain neighboring nodes on user-item bipartite graphs, and ignores rich relationships among multi-modal attribute information and impacts of these relationships on representation learning. Therefore, we need to addr Challenge ess the 2: How to consider the heterogeneity of nodes and diferent impacts of multi-modal attributes and rich relationships among these multi-modal attributes for generating high-quality representations. Moreover, conventional GNN-based recommendation methods straightforwardly produce the relational matrix heavily relying on existing inactive relationship knowledge, which may not precisely coordinate our objective and cause undesirable model performance. For instance, the graph consisting of users and their related attributes is predeined and prebuilt based on their co-occurrence relationships, which is not suitable for new user represen- tation learning in the user cold-start recommendation systems because of the new users’ unseen co-occurrence relationships. Thus, the Challenge 3 should be concerned: How to consider unseen non-static relationships associated with new users to obtain comprehensive and high-quality representations for them. To handle these challenges, we present a novel user cold-start recommendation modelÐnamely, Inductive Heterogeneous Graph Neural Network (IHGNN). For the Challenge 1, we irst create a heterogeneous graph (M- HG) to model existing hidden heterogeneous relationships associated with users and items. Then, we architecture a sampling strategy, which is basically based on random walk operations, to sample heterogeneous neighbors for each node, which can be regarded as nodes’ enriched information. For each node, multiple samples are taken to generate multiple sets of sampled neighbors if a single sample may miss important noChallenge des. For the 2, we design a new kind of hierarchical attention aggregation network that comprises two distinctive levels of module, intra-type self-attention aggregating module, and inter-type attention aggregating module, to aggregate nodal features of the sampled heterogeneous neighboring nodes. For each set of scanned neighbors, we irst group those scanned neighboring nodes in terms of their corresponding node types. Then, we leverage an intra-type self-attention aggregating module for each generated neighboring group, which is designed to obtain meaningful ACM Trans. Inf. Syst. 4 • Desheng Cai, et al. attention weights among homogeneous nodes to aggregate homogeneous node feature information. Based on these generated representations of all diferent neighboring groups, we further use an inter-type attention aggregating module that is designed to obtain signiicant attention weights for all neighboring groups to generate a useful vector representation for each group of scanned neighbors. Finally, we fuse all group representations to produce the inal representation for each node. For the Challenge 3, we infer new users’ representations based upon the inductive capacity of our proposed model. For all of unseen new users, our model irstly build connections, which are based on their sparse attributes, with constructed heterogeneous graph. Then, we take multiple sampling strategy for each new user to generate its corresponding related multiple sets of sampled neighbors. Finally, we generate new users’ representations by the well-trained hierarchical attention aggregation network. In addition, our proposed model forecasts new users’ inclinations by calculating matching scores between users and items with their obtained embeddings. In summary, our contributions are listed as follows: • We propose a novel Inductive Heterogeneous Graph Neural Network (IHGNN) for user cold-start rec- ommendation. IHGNN uses a heterogeneous graph which can take rich and heterogeneous relationship information among users, items and their corresponding attributes into consideration. Further, a sam- pling strategy, which is basically based on random walk operations, is proposed to sample correlated heterogeneous neighbors to produce better representations of new users. • IHGNN utilizes a multiple hierarchical attention mechanism on those sampled related neighbors and consider the impacts of homogeneous and heterogeneous multimodal attributes and various neighboring node groups in order to obtain nodes’ embeddings, including new users. • We show experimental results on three real data sets (the Kwai dataset, the Tiktok dataset, and the movieLens dataset). Compared with the state-of-the-art models, our proposed model performs better in user cold-start recommendation task. 2 RELATED WORK This work is associated with cold-start recommendation tasks, graph neural networks, and heterogeneous graph neural networks, which are briely reviewed. 2.1 Cold-start Recommendation Tasks Whereas collaborative iltering 7, 17[, 43, 44] has accomplished impressive victory in proposal recommendation frameworks, the trouble regularly emerges in managing new users or new items with their sparse historic interaction information, commonly known as the cold-start recommendation problems. In such cases, the only way that a personalized recommendation can be generated is to incorporate additional attributes information. Existing cold-start methods can be coarsely classiied into a Content-based recommendation metho 6, 8, 26ds], [ MF-based hybrid methods8[, 18, 34], and Deep leaning based hybird approaches 1, 3[, 10, 20, 34, 41]. For example, SVDFeature [6], a feature-based collaborative iltering model, is proposed to handle matrix factorization with pre- trained features. The capacity of leveraging pre-trained features permits us to construct factorization frameworks to incorporate side information such as neighbor relationship, temporal dynamics, and hierarchical information so that it can be used to improve the cold-start models’ performances efectively. A hybrid34 mo ] del is designe [ d to gather the leading factors of autoencoder frameworks in arrange to standardize the utilize of autoencoders for CF-based recommendation systems. It presents a appropriate training process of autoencoder models for deicient data and further coordinates both ratings and side data into a single autoencoder framework to handle the cold-start problem. Neural Matrix Factorization (NCF) 10] is [ a well-known collaborative iltering model that can capture the critical points in collaborative iltering information(interactions between users and items) efectively, which adopts matrix factorization and further applies an inner product operation to the learned ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 5 representations of users and items. The existing neural matrix factorization methods can easily utilize content information of users and items as pre-trained vectors to address the cold-start problem. Existing algorithms show promising performance in the area of cold start recommendations. However, most of these methods heavily depend on new user attribute information and usually ignore the sparsity problem of new user attributes. 2.2 Graph Neural Networks The main factor of graph neural networks (GNNs) is to employ neural networks for aggregating content informa- tion for nodes from their local neighbors. For example, graph convolutional 16 netw ], GraphSA ork [ GE [9] and graph attention network38 [ ] leverage convolutional operation, LSTM aggregator and self-attention module to gather neighboring information respectively, which are applied in many 12, 13,ields 27, 29].[Inspired by the advantage of GNN techniques in simulating the information difusion process, recent eforts have studied the design of GNN-based recommendation methods which mostly apply the GNN on the original user-item bipartite graph directly to learn more expressive representations of users and items 45]. For[ example, Multi-GCCF 35[] explicitly incorporates multiple graphs in the embedding learning process. Multi-Graph Convolution Collabora- tive Filtering (Multi-GCCF) not only expressively models the high-order information via a bipartite user-item interaction graph, but also integrates the short-range information by building user-user and item-item graphs by adding edges between two-hop neighbors on the original graph to obtain the user-user and item-item graph In this way, the proximity information among users and items can be explicitly incorporated into user-item interactions. Multi-Component graph convolutional Collaborative Filtering 40] is (MCCF) designe [ d to distinguish the latent purchasing motivations underneath the observed explicit user-item interactions. Speciically, MCCF uses a decomposer module to decompos the edges in user-item graph to identify the latent components that may cause the purchasing relationship, and further recombines these latent components automatically to obtain uniied embeddings for prediction. DGCF 14] try[ to model the importance of diverse user-item relationships in collaborative iltering for obtaining better interpretability, and considers user-item relationships at the iner granularity of user intents for generating disentangled representations. And Dual channel hypergraph collab- orative iltering (DHCF) 39][leverages the divide-and-conquer strategy with Collaborative iltering (CF) to integrate users and items together for recommendation while still maintaining their speciic properties and further employees the hypergraph structure for modeling users and items with explicit hybrid high-order corre- lations. Hierarchical bipartite Graph Neural Network (HiGNN) 21] utilizes [ stacking multiple GNN modules and a deterministic clustering algorithm alternately to efectively and eiciently addresses the problem of utilizing high-order connections and non-linear interactions through hierarchical representation learning on bi-partite graphs for predicting user preferences on a larger scale. Meanwhile, there are some sampling strategies proposed to make GNN eicient and scalable to large-scale graph-based recommendation tasks. For instance, Pinsage 45] [ incorporates graph structure information and node content information (e.g. visual content, textual content) and uses a novel training method that depends on harder training samples to obtain useful representations of users and items for higher-quality recommendations at Pinterest. Moreover, GNNs can be applied to train and obtain more expressive representations for users and items in cold-start recommendations25[, 36, 42, 45, 48]. For example, RMGCNN [25] combines a graph convolutional network for multiple graphs which can capture stationary patterns for users and items, and a RNN module which leverage a learnable difusion process module with non-linear operation to generate the known ratings. In detail, RMGCNN can extract graph local statistical structure patterns for users and items, including cold-start users and cold-start items, in terms of their high dimensional feature spaces, and further apply these learned expressive embeddings to predict interaction ratings. GCMC 36] is [a autoencoder framework based on user-item bipartite graphs and used for the matrix completion problem. The autoencoder framework generates embeddings for users and items through message passing operation on the user-item bipartite interaction graph and these ACM Trans. Inf. Syst. 6 • Desheng Cai, et al. representations are further leveraged to reproduce the links through a bilinear operation. Ying 45] deetvelop al [ a highly-scalable GCN framework, PinSage, that combines random walk operation and multiple graph convolution modules to generate nodes’ representations. Pinsage incorporates graph structure information and node content information (e.g. visual content, textual content) and uses a novel training method that depends on harder training samples to obtain useful representations of users and items for higher-quality recommendations at Pinterest. Chen et al [5] propose a general bipartite embedding method FBNE (short for Folded Bipartite Network Embedding) for social recommendation, which explores the higher-order relationships among users and items by folding the user-item bipartite graphs, and a sequence-based self-attention module that learn node representations via node sequences sampled from graphs. FBNE aims to leverage implicit social relations in social graphs and higher- order implicit relations to enhance the user/item representations and boost the performance of current social recommendation, including cold-start recommendation. However, FBNE may ignore rich and heterogeneous relationship information among users, items and their corresponding attributes as shown in Fig.1 And the heterogeneity of nodes and diferent impacts of multi-modal attributes are not also taken into consideration for learning representations. STAR-GCN 48][ leverages a stacked and reconstructed GCN encoder-decoder on the user-item bipartite interaction graph with intermediate supervision information to achieve better prediction performance. STAR-GCN masks a portion of or the complete users and items representations and remakes these concealed representations with a block of graph encoder-decoder within the training stage, which can make learned embeddings more expressive and generalize the method to obtain useful representations of unseen nodes for the cold-start recommendation. However, the above GNN models are designed to embed homogeneous graphs or bipartite graphs and therefore cannot take beneit of the rich relationships connecting diferent types of heterogeneous data and may not be applicable to the actual cold start scenario. 2.3 Heterogeneous Graph Neural Networks Heterogeneous graphs [30] can be used to model multiple complex object types , and capture the plentiful correlations between these types efectively. Many existing heterogeneous graph based recommendation models leverage various hand-designed semantic meta-paths combined with a matrix factorization framework for recom- mendation tasks11 [ , 32, 33, 49]. For example, Shi et al.33[] design a semantic meta-path based recommendation approach, termed SemRec, to calculate the ratings between users and items for the personalized recommenda- tion lexibly. In detail, SemRec contains the weighted meta-path concept which can be used to subtly portray various meta-path semantics by distinguishing diverse properties. In addition, ground on the well pre-designed meta-paths, SemRec can calculate prioritized and personalized attention values representing user preferences on various meta-paths with incorporating heterogeneous information lexibly 31.]HeRe is acheter [ ogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combines these representations with extended matrix factorization models for improving personalized recom- mendation performances. Although the above approaches are designed to embed heterogeneous graphs, these models heavily rely on the meta-path design process and may not efectively capture high-order structural data for cold-start recommendation tasks. Recently, HetGNN 46],[ which mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module, is proposed for learning heterogeneous node representations by incorporating their heterogeneous content information. Although heterogeneous node type information in various heterogeneous graphs can be efectively used by the node type based neighboring aggregating module and the heterogeneous node type combining module, we argue that HetGNN may not efectively incorporate heterogeneous contents of nodes due to the lack of plentiful relationship information between the multimodal contents of nodes. 4] is HHF also ANa[heterogeneous representation learning based model which utilizes heterogeneous relationships to enhance representations ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 7 Table 1. The main notations of our proposed model. Notation Description ���� the item set ���� the user set ���� the attribute set of users ����� ���� the attribute set of items ����� � the tag trees � the i-th tag tree � the user-item interaction matrix � the implicit feedback value of� and useritem� �� � = (�,�,�� ,�� ) a modality-aware heterogeneous graph � � � the node set R the edge set � the inclusion relationships between tags �� �� the set of node types �� the set of edge types � the feature matrix � the feature of node� ��� the related node of node� �� the sampled neighbors set of node � �� the t-type sampled neighbors set of node � �� ���������� the t-type neighbors aggregator fucntion �,� � the attention weight of the �-type sampled neighbor set of node � E the combined embedding of the �-th sampling neighbors of no�de UE the inal representation of no�de � the predicted preference probability value of� and useritem� �� � the regularization weight � the parameters of the model � the aggregated embedding dimension � the ranking position � the top k results of users and items for micro-video recommendation tasks and is not suitable for cold-start recommendation tasks. To utilize the highly complex relationships between users (including cold-start users), items, and their corresponding associated multimodal attributes, IHGNN leverages a modality-aware heterogeneous graph to learn their expressive representations for user cold-start recommendation tasks. Furthermore, a novel hierarchical feature aggregation network, which mainly comprises intra- and inter-type feature aggregating modules, is designed to incorporate intricate graph structure information and abundant node content semantic information contained in M-HGs for obtaining nodes’ expressive representations, including cold-start nodes. 3 THE PROPOSED ALGORITHM 3.1 Problem Statement In this paper, we focus on user cold-start recommendation task, in which all items are denote ����d=as {����,����, ...,���� } and all new users are denoted as����= {����,����, ...,���� }. Each new user 1 2 |����| 1 2 |����| is related to sparse attributes denoted�as��� (e.g., locations, phone information, apps installed on their ����� ACM Trans. Inf. Syst. 8 • Desheng Cai, et al. Fig. 2. IHGNN: Inductive Heterogeneous Graph Neural Network Architecture. � � � phones): ���� = {����,����, ...,���� }. Similarly, the set of multimodal attribute in terms of each ����� 1 2 |���� | ����� item is deined�as��� (e.g., locations, some related tags, visual content, audio content, textual description of ����� � � � items):���� = {���� ,���� , ...,���� }. Since many used attributes of users, such as apps, and items ����� 1 2 |���� | ����� are related to tags, we adopt pre-designed tag trees to illustrate these associations among various tags. In this 1 2 |� | � paper, we deine pre-designed tag trees as a set � = {� ,� , ...,� } and a tag tree as �, 1 ≤ � ≤ |� |. We build a |����|×|����| user-item bipartite interaction matrix � ∈ � as , in which the entr�y is described from user’s implicit �� feedback data,� = 1 demonstrates that there exists interaction relationship between ne�wand useritem� �� and � = 0 shows that no interaction relationship is existed between ne � w and user item�. Note that only new �� users in the training set have interaction data, while new users in the testing set have no interaction data. In the training phase, we strive to acquire appropriate high-quality representations for new users and items with their attribute information and correlations between them, which can preserve their historic interaction data efectively. In the testing phase, we focus on inferring representations of new users in the test group and further computing the rating points for the preference estimate to predict these new user preferences for cold-start user recommendations. Table 1 lists the key notations of the IHGNN model. 3.2 Overall Framework In order to handle these challenges in Section 1, we propose a well-designed framework, Inductive Heterogeneous Graph Neural Network (IHGNN), for user cold-start recommendation. Our framework, as demonstrated in Figure 2, comprises four key components: • Heterogeneous Graph Construction: A modality-aware heterogeneous graph (M-HG) is built for modeling users, including new users, items, and associated attribute data based on much relationship information such as the relational data among users, items and their properties, user-item bipartite graph information, and tag trees. • Multiple Hierarchical Attention Aggregation Networks: In terms of each node in the constructed heterogeneous graph, we architecture and exert a sampling strategy, which is basically based on random ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 9 walk operation, and apply it to sample associated ixed-size heterogeneous neighboring nodes. Then, multiple samples are taken to create multiple groups of sampled neighbors to obtain more critical neighbors. To gather node feature information of all sets of sampled heterogeneous neighboring nodes for each node, we proposed a innovative hierarchical attention aggregation network for each set of sampled neighbors with three steps: (1) Grouping operation: We group these sampled neighbors in a set of sampled neighbors based on types neighbor nodes; (2) Intra-type feature aggregating module: For each node group, we leverage an intra-type self-attention aggregating module for aggregating the content features of all neighboring nodes, and generate each groups’ aggregated type-based neighbor group representation; (3) Inter-type feature aggregating module: After the generation of representations of each heterogeneous type-based neighbor group, an inter-type attention aggregating module is designed to quantify the various implications of heterogeneous groups, and to further calculate the embedding for each group of sample neighbors taking into account these efects learned by the group. We design a fusion module to merge all embeddings of sampled heterogeneous neighbor sets to get the inal representation for each node in the heterogeneous graph. • Model Optimization: To learn representations in the constructed heterogeneous graph that can preserve the implicit structure information and to generate high-quality new users’ representations, we deine the KL-divergence between the hypothetical distribution and empirical distribution as loss function optimized through backpropagation and the mini-batch Adaptive Moment Estimation (Adam) [15]. • Inductive Representation Learning for User Cold-start Recommendation: We infer new users’ representations based upon the inductive capacity of our proposed model. For all unseen new users, our model irstly builds connections with constructed heterogeneous graphs. Then, we take a multiple sampling strategy for each new user to generate its corresponding related multiple sets of sampled neighbors. Finally, we generate new users’ representations by the well-trained hierarchical attention aggregation network. After optimization, we can infer embeddings for new users in testing sets using our proposed model. An inner product operation is utilized to integrate the inferred embeddings of new users and candidate items and further predict the likelihood of preference, which can indicate the level of preference of the new users for the candidate items. 4 METHODOLOGY This section presents our Inductive Heterogeneous Graph Neural Network (IHGNN) model for user cold-start recommendation. 4.1 Heterogeneous Graph Construction To consistently and efectively consider users, items, their respective attributes, and diferent associated relation- ship information, we build a modality-aware heterogeneous graph (M-HG), denote � = d(�as,�,�� ,�� ). In � � M-HG, � represents various kinds of nodes and � represents various relationships between nodes, wher � =e ����∪����∪���� ∪���� ∪� and � = �∪� ∪� ∪� ∪� ∪� . � represents relationships/connections ����� ����� �� �� �� �� �� �� between users and their corresponding sparse attributes. � represents relationships/connections between �� micro-videos and their corresponding multi-modal attributes. � represents relationships/connections be- �� tween users and their corresponding tags. and � represents relationships/connections between micro-videos �� and their corresponding tags. � denotes incorporation connections between tags within the tag�tr . ees �� �� denotes the node type set which comprises user node type, item node type and attribute node types: ���� ���� ����� ����� ���� ��� �� = {���� ,���� ,���� ,���� ,���� ,���� }. �� is the set relationship type set which in- � � � �� �� �� �� �� cludes the relation type of set � , � , � , � and � : �� = {����,���� ,���� ,���� ,���� ,���� }. �� �� �� �� �� � ACM Trans. Inf. Syst. 10 • Desheng Cai, et al. Fig. 3. Hierarchical Atention Aggregation Networks. 4.2 Multiple Hierarchical Atention Aggregation Networks For each node �, ∀� ∈ � , we aim to enrich its node feature representation by aggregating its related node information in the constructed graph � , which can be formulated as follows: � �����(��) = � �����(����)�� (1) where the function� �����(�·) is interpreted as the embedding of the no��de�,is a random variable and represents the related node of the node�, the variable�(� = ���������(����)) can be interpreted as the importance or impact of the node���. However, there are plenty of related nodes for each node � so that it is computational expensive to gather information from all of its related nodes. To deal with this problem, we use a sampling strategy, named Random Walk Sampling Strategy��(�� ), to reduce the computational cost, so that Equation 1 can be reformulated as follow: � �����(��) = � �����(����) (2) |�� | ��� ∈�� � � where �� represents sampled neighbors of node �. Speciically, the ���� is designed in two steps: • Beginning a random walk of settled length from�no ,∀de � ∈ � . The walk travels iteratively to the current node’s neighbors or restarts from the beginning node with a certain probability �. The walk operation runs until there are a settled variety of nodes collected efectively, referr ���e�d to (�). Note that the number of nodes of all types �in ��� (�) is ixed to guarantee that each type of node is sampled�.for • Selecting several types of neighboring nodes. For�no , wde e ind top� nodes for node type�from���� (�) based on their frequencies and further denote these selected nodes as the set �-typ ofe associated neighbors for node�, denoted as �� = {���� ,���� , ...,���� }. � 1 2 |�� | ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 11 Compared with the general random walk metho�d,��� has two advantages, which is important for enriching a representation of each node, especially for cold-start nodes: (1) For each node in the constructed graph M-HG, ���� ensures that each type of node, whether it is a irst-order or high-order neighbor, can be evenly sampled by constraining the number of each type of node in the sampling result. (2) For each �, �no ��de� selects top� nodes for node type� from���� (�) according to the frequency, which can better reduce the negative impact of noise nodes in the constructed graph M-HG on the performance of our model. Furthermore, to ensure the efect of the sampling strategy ��, �� is conducted for multiple times, and the |����� | 1 2 result is denoted as����� = {�� ,�� , ...,�� }. Note that multiple sampling operation may also face � � � � time-consuming problems, so for each node, we do the sampling operation in advance during the pre-processing phase of the experiments. At the same time, we analyzed the relationship between the number of sampling and the efect of the model in Fig 8 in the experimental section to ind the number values of sampling that can be trade-of. � ×1 In a M-HG graph � , The representation of node� ∈ � is denoted as� ∈ � , where � represents dimension � � of representations. Note that we can leverage CNN-based methods 23[] for pre-training nodes if node type is image or the doc2vec model19[] to pre-train nodes if node is textual, also can initialize nodes, which is based on types of nodes. Here we adopt a content vector transformer moduleF, C , to turn embeddings of various type nodes into a uniied space. Formally, the transferred representation for� is node calculated as follows: �(�) = F C (� ) (3) � � � ×1 where �(�) ∈ � , and � is transferred representation dimension. For aggregating node representations transferred F byC of all sampled neighboring nodes of�no , wde e propose a innovative hierarchical attention aggregation network, as shown in Figure 3, for each set of sampled neighbors, with three steps: (1) Grouping module; (2) Intra-type feature aggregating module; (3) Inter-type feature aggregating module. 4.2.1 Grouping module. After using the ���� in previous section, multiple sampling��r�esults �� are obtained. For each set of sampled neighbors �� , we irst group them based on their node types. These groups consist of three categories, which are several multimodal attribute neighbor node groups, a user neighbor node group, and a item neighbor node group. For these multimodal attribute neighbor node groups, they are further divided into four subcategories, which are an image neighbor node group, an audio neighbor node group, a tag neighbor node group, and a text neighbor node group, and represent related multimodal attribute information for users (including new users) and items. Here, we deine�-typ thee sampled neighboring node group�in � as �� . � �� 4.2.2 Intra-type feature aggregating module. For �-type group �� , we use a neural network to generate node �� representation for� ∈ �� . Formally, the aggregated�-type neighboring node group’s representation�for ��� �� can be formulated as follows: � � � (�) = ���������� {�(� )} (4) ��� � ∈�� ��� �� � � ×1 where � (�) ∈ � , � represents aggregated �-type neighboring node group’s representation dimension, �(� ) ��� is the transferred node representation of no�de , and ����������is the�-type neighbors aggregator function. ��� A self-attention technique as �the ��������� function can be applied to obtain attention values between homogeneous nodes in each group. Following Transformer 37] which [ is composed of a stack of multi-head self- attention layers and point-wise, fully connected layers for both encoder and decoder, we deine our self-attention intra-type feature aggregation module. � ×� Given a set of input features, denoted�as∈ � , self-attention transforms them into the matrices of queries � ×� � ×� � ×� � ∈ � , keys � ∈ � and values� ∈ � , given by: � = (� + ��)� ,� = (� + ��)� ,� = �� , (5) � � � ACM Trans. Inf. Syst. 12 • Desheng Cai, et al. � ×� where � , � , and � ∈ � are learnable projection weights, PE is the absolute Positional Embedding of the � � � features,� is the number of rows of the input featur � ,es and � is the dimension of the input featur � . Note e that we use PE to represent the relative position sorted by frequency in each type neighboring node group. The attention weights � are calculated as follows: �� � = ��� ����( ) (6) where a ��� ����function is applied to obtain the weights of values � is and a scaling factor. Further, output weighted average vectors� combined with the residual operation can be formulated as follows: � = �� + � (7) To improve the capacity, our self-attention module can be extended to the multi-head version. And output weighted average vectors of multi-head module � can be calculated as follows: ��� ˆ ˆ ˆ � = ������{� , ...,� , ...} (8) ��� ℎ���_1 ℎ���_� where ������is a feature concatenation function,�and is the output of the i-th self-attention head module. ℎ���_� Moreover, as we all know, the atom operation of self-attention mechanism, namely canonical dot-product, causes the time complexity and memory usage per layer to�b(e� ). The stack of� encoder layer makes total memory usage to be � (�· � ), which limits the model scalability on receiving long and large inputs. So, in order to reduce the space-time complexity and improve the eiciency of the self-attention aggregation module, we randomly sample��� keys to calculate attention weights. Controlled by a constant sampling�factor , we set the sampling num keys of as� · ��� for each query, which makes the our self-attention aggregation module only need to calculate� (��� ) dot-product for each query-key lookup and the layer memory usage maintains � (���� ). Then output vectors � of our self-attention module is applied to a position-wise feed-forward function ��� (FFN) with a single non-linearity, which are independently applied to each element of the set and is given by: ˆ ˆ ��� (� ) = � � (� � + �) + � (9) ��� 1 2 ��� where � (�) is the ReLU activation function, � and � are learnable weights, � and � are bias terms. 1 2 Finally, for ease of discussions, we denote the above process with a layer normalization ���function � as ����: ����(� ) = ���� (��� (� )) (10) ��� Based on above self-attention module, we reformulate � (�) as follows: ����{�(� )} � ∈�� ��� ��� �� � (�) = (11) |�� | �� Where we leverage above self-attention module to aggregate transferred node representations �-typ ofe neighbors and perform the averaging operation to generate the representation of aggregate �-typ d e neighboring node group. 4.2.3 Inter-type feature aggregating module. After the above step,|�� | aggregated representations are obtained �� |�� |×� for�� of node�, deined asE ∈ � . There are |�� | node types in The M-HG� , and diferent types of � � � neighboring nodes may contribute to the generation of node representations diferently. To fuse these aggregated neighbor representations into the representation��forof node� through considering their inluences on node �, we employ the attention method, which can be formulated as follows: �,� �� E = � E (12) �,� ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 13 � � ���(���������(� [� E ])) �,� �,� � = (13) ���(���������(� [� E ])) � ×1 where E ∈ � is the combined representation�for � of node�, denotes the concatenation operation, � � �,∗ � 2� ×1 � demonstrates the importance of diferent neighboring nodes’ embeddings, � and ∈ � is the trainable attention parameter. After hierarchical attention aggregation operation for each set of sampled neighb �� of ors node�, we obtain |���� | 1 2 � |����� | node embeddings for����� , denoted as ���� = {E , E , ..., E }. To fuse these representations � � � � � � into the ultimate representation UE for node�, we design a fusion module formulated as follows: |���� | 1 2 � UE = ��� {������{E ; E ; ...; E }} (14) � � � � ×1 where UE ∈ � , the function��� () is interpreted as a full connection layer, and the function ������() is a concatenation function that concatenates all the representations ���in � . Note that we apply the same dimension � for the transferred node representation, the aggregate�d-type neighboring node group’s representation, and the concatenated ultimate representation for � to nomake de IHGNN adjustment easier in this paper. 4.3 Model Optimization � ×1 To learn the representation UE ∈ � of each node� that can preserve the implicit structure information of the M-HG, the lossL is denoted as the KL-divergence between the hypothetical distribution �(� |� ) and � � empirical distribution �ˆ(� |� ) ofN (� ), which is the set of direct neighbors of�no : de � � � � exp(UE UE ) � � �(� |� ) = ) (15) � � Í exp(UE UE ) � ∈N(� ) � � � � � L = �� (�(·|·),�ˆ(·|·)) (16) where �� (·, ·) denotes the KL-divergence. The empirical probability �(� |� ) is set to 1 if � ∈ N (� ) and 0 � � � � otherwise. With �� (·, ·) replaced by the KL-divergence equation and overlooking a few constants, theLloss can be transformed as follows: L = �(� |� ) (17) � � � ∈N(� ) � � Optimizing the loss L demands a full scan of neighbN ors(� ) for each node�, which causes much computational cost. Thus, we leverage negative sampling [24] and transformL loss as follows: � � 2 L = − ln� (UE UE − UE UE ) + �||�|| (18) �� �� �� �� (�,�,�) ∈�� where � (�) is the sigmoid function, �� is a set of sampled node triples, wher �, �eand � represent the � node � , �ℎ the � node � and the � node � , respectively. Each sampled node triple me�ets ∈ N (� ) and � ∉ N (� ). � �ℎ � �ℎ � � � � � and � are the hyper-parameters of our IHGNN and the regularization weight, respectively. TL hecan lossbe optimized by the back-propagation method and the Adam optimizer [15]. 4.4 Inductive Representation Learning for User Cold-start Recommendation � ×1 After optimization, we can obtain the learned representation UE ∈ � of each node� in the M-HG� . For example, given a new user� in the testing set, with sparse attributes, we can infer the representation of the new user � based upon the learned framework and learned representations of nodes in the M-HG � with three steps as shown in Fig. 4: ACM Trans. Inf. Syst. 14 • Desheng Cai, et al. Connecting a target new user to Calculating Inferring embedding of a target new user a trained heterogeneous graph using his attributes recommended videos Phones info nodes Probabilities Location Well-trained Multiple nodes Hierarchical Attention Aggregation Networks Phone App nodes a target inferred new user Attribute embedding nodes Candidate videos A trained heterogeneous graph Fig. 4. Inductive Representation Learning for New Users. Algorithm 1 Training IHGNN Algorithm Input: The heterogeneous graph M-HG G; The training user-item pairs P; Batch size B; Output: Multiple hierarchical attention aggregation networks; 1: Initialize epoch t = 0; Randomly initialize the parameters � of our model 2: repeat 3: t = t+1; |� | 4: for ⌊ ⌋ iterations do 5: Obtain batch size B user-item pairs BP. 6: for a user-item pair (u, v) indo BP 7: for node v in (u, vdo ) 8: Perform the RWSS operation multiple times and obtain the�r�esult ��� for v. 9: for sampled neighbors�� in����� do � � 10: Group these sampled neighboring nodes in terms of their typ�es� for and obtain|�� | type- � � base sampled neighbor group�� . �� 11: for t-type base sampled neighbor group�� in����� do �� �� 12: Obtain the aggregated t-type neighbors’ embedding for v through Eq.4, Eq.7 and Eq.11; 13: end for �� 14: Obtain|�� | aggregated neighbor embeddings for �� of node v, denoted asE ; � � � 15: end for �� 16: Combine embeddings E of into the representation for �� node � by taking the impacts of these � � representations of node v into consideration through Eq.12 and Eq.13 and obtain inal embedding E of node v; 17: end for 18: Obtain inal embeddings E and E of user u and item v. � � 19: end for 20: Calculate the objective function by Eq.18, backpropagate gradients and update network parameters; 21: end for 22: until convergence; • Firstly, given a new user � with his corresponding sparse attribute information (e.g. photos information, locations, photo apps), we consider � as a graph node and further connect the new user node with the ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 15 input heterogeneous graph M-HG� in the training process based on the sparse attributes of the new user. This operation is the inductive new user representation learning of our IHGNN model. • Secondly, we regard new user� as a target user and infer ultimate embedding of new�user based on well-trained multiple hierarchical attention aggregation networks, which include the sampling operation with RWSS, multiple hierachical attention aggregations and the fusion operation. • Finally, we calculate a preference scor �, which e demonstrates how much the user� prefers the candidate item�, by their inal learned representations. Formally, we give the deinition of the preference score calculation function as follows: � = � (UE UE ) (19) �� � � exp(� ) where � (�) = is the sigmoid function, UE and UE are the learned inal representations of user � � 1+exp(� ) � and candidate items � respectively�, is the predicted preference score value for �user and candidate �� item�. For new users in the testing set, although they do not have historical interaction data and do not appear in the training data, the attribute data (e.g. phone information, locations, phone apps) of new users and existing users are shared so that we can leverage these shared attributes to infer representations of new users. Note that representations of these shared attributes are learned in training process and we initialize represen- tations of new users by summing learned representations of their sparse attributes. 4.5 Algorithm Description Our algorithm is illustrated in Algorithm 1. Given the heterogeneous graph M-HG G, the training user-item pairs P and Batch size B, our goal is to learn a multiple hierarchical attention aggregation network that can leverage shared attributes of new users to infer their representations for predicting their preference for micro-videos. The process of our algorithm are described in the Algorithm 1, and the core is to train a high-quality multiple hierarchical attention aggregation network through back-propagation and the mini-batch Adaptive Moment Estimation (Adam). 5 EXPERIMENTAL RESULTS 5.1 Dataset We use Kwai, Tiltok and Movielens for evaluation. The statistics of them are summarized in Table 2 and we briely describe them as follows: • Tiktok dataset : This dataset is published by a popular micro-video platform named Tiktok. It also contains micro-videos created by users registered on the platform and interactions of user-video (e.g. click, like). We use the micro-videos features from multi-modal in original dataset which ignores the raw data. • Kwai dataset : This dataset is extracted from a real-world micro-video sharing platform named Kwai, which contains users associated with attributes, micro-videos associated with attributes, and some relationship information including user-video interactions. • MovieLens (MLs) dataset : Movielens is a movie rating dataset which has been extensively applied to CF recommendation algorithms. We use the one million rating version which removes users that have less than 20 number of rating records. To count the numbers, we build a vector for each user with each entry is a value of 0 or 1 to indicate whether the user has a record to the movie. http://ai-lab-challenge.bytedance.com/tce/vc/ https://www.kwai.com/ http://grouplens.org/datasets/movielens/1m/ ACM Trans. Inf. Syst. 16 • Desheng Cai, et al. Table 2. The statistics detail of three real world data sets. Dataset User micro-videoInteraction Density Tiktok 3,656 7,085 1,253,112 4.49% Kwai 169,878 310,681 775,834,643 1.47% MovieLens 6040 3706 1,000,209 4.47% 5.2 Baselines To evaluate the performance of IHGNN, we consider several the state-of-the-art approaches as baselines including traditional methods and graph-based methods. Note that, for all baselines, we conduct experiments at the user cold-start scenario. • FM : FM efectively combines factorization machines (FMs) based frameworks and any other ��� ��� side content feature information (e.g. locations), besides the user and item, for recommendation tasks. In this work, we feed heterogeneous information as side features into FM models for user cold-start recommendation tasks. • NCF: Neural Matrix Factorization (NCF) 10] can [ fuse matrix factorization methods and neural networks to train and predict user-item bipartite interaction information for recommendation tasks. • GraphSAGE: GraphSAGE [9] is an unsupervised inductive graph representation learning framework on large graphs. GraphSAGE can be utilized to obtain expressive low-dimensional representations for graph nodes including nodes unseen in the training stage due to its inductive learning capacity. Especially, it is very useful for considering graph structure information and rich node attribute information from neighboring nodes. • STAR-GCN: STAR-GCN [48] designs a stacked and reconstructed GCN framework on user-item bipartite interaction graphs. STAR-GCN requires obtaining several rated edges connected with new nodes in the testing graph and further leverages these edges to make predictions, which may be suitable for the cold start problems. • HeRec: HeRec [31] is a heterogeneous graph representation learning based recommendation method, which can efectively extract diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs, and further combines these representations with extended matrix factorization models for improving personalized recommendation performances. • HetGNN: HetGNN [46] is a graph representation learning method for learning heterogeneous node representations by incorporating their heterogeneous content information. HetGNN mainly consists of the node type based neighboring aggregating module and the heterogeneous node type information combining module to consider the heterogeneity of graphs. • IHGNN: IHGNN is our proposed recommendation model, which can leverage Modality-aware Heteroge- neous Graphs (M-HG) for preserving the rich and heterogeneous relationships among users, items and their relevant attribute information. Furthermore, IHGNN utilizes a well-designed hierarchical attentive ag- gregation module to learn the representations of nodes, including new users, to consider the heterogeneity of M-HGs for user cold-start recommendation tasks. 5.3 Experimental Setings For each dataset, we randomly held 80% and 60% of users for training, and the remaining users are treated as testing sets, respectively. To evaluate our approach and the compared baselines on user cold-start recommendation, we utilize four widely used evaluation metrics 42], which [ includes Normalized Discounted Cumulative Gain ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 17 Table 3. Experimental results of IHGNN and baselines in terms of all datasets. training_ratio=0.6 ( , k=10) Datasets Metrics FM NCF GraphSage STAR-GCN HeRec HetGNN IHGNN ��� Kwai Pre 0.1901 0.2245 0.2891 0.2901 0.3789 0.3835 0.3931 Rec 0.1773 0.2017 0.2991 0.3011 0.3858 0.3713 0.4011 NDCG 0.2011 0.2441 0.2997 0.3012 0.3591 0.3812 0.3901 AUC 0.6003 0.6521 0.7001 0.6881 0.7111 0.7311 0.7402 Tiktok Pre 0.2012 0.1991 0.3015 0.2817 0.3601 0.3679 0.3721 Rec 0.2101 0.2048 0.3301 0.2918 0.3901 0.3939 0.4005 NDCG 0.2011 0.1811 0.3339 0.2912 0.3802 0.3811 0.4043 AUC 0.6218 0.5991 0.6981 0.6725 0.7317 0.7411 0.7512 MLs Pre 0.2442 0.2015 0.3129 0.2719 0.3598 0.3611 0.3701 Rec 0.1911 0.1999 0.3195 0.2991 0.3759 0.3753 0.3871 NDCG 0.2312 0.2331 0.3194 0.2849 0.3598 0.3522 0.3701 AUC 0.6001 0.6419 0.6884 0.6512 0.7101 0.7159 0.7233 Table 4. Experimental results of IHGNN and baselines in terms of all datasets. training_ratio=0.8 ( , k=10) Datasets Metrics FM NCF GraphSage STAR-GCN HeRec HetGNN IHGNN ��� Kwai Pre 0.1034 0.1133 0.2242 0.2014 0.2729 0.3299 0.3535 Rec 0.2034 0.2111 0.2954 0.2661 0.3219 0.3501 0.3712 NDCG 0.3305 0.3327 0.3401 0.3391 0.3401 0.3631 0.3825 AUC 0.5940 0.6881 0.7172 0.6901 0.7209 0.7525 0.7791 Tiktok Pre 0.1141 0.1211 0.2523 0.2481 0.2781 0.3129 0.3321 Rec 0.1121 0.2141 0.2943 0.2809 0.3001 0.3214 0.3505 NDCG 0.3035 0.3112 0.3415 0.3101 0.3505 0.3501 0.3843 AUC 0.5155 0.6501 0.6421 0.6911 0.7098 0.7278 0.7419 MLs Pre 0.1449 0.1564 0.2921 0.2519 0.3013 0.3129 0.3201 Rec 0.2019 0.2102 0.2731 0.2811 0.2939 0.3113 0.3471 NDCG 0.2402 0.2435 0.3015 0.2901 0.3019 0.3412 0.3601 AUC 0.6214 0.6555 0.6883 0.6672 0.7121 0.7311 0.7561 at top k (NDCG@k), Recall at top k (R@k), Precision at top k (P@k), and Area under the ROC Curve (AUC). In practice, following experimental settings of the recommendation model 10]NeuCF and the [ micro-video recommendation model MMGCN42[], which both are popular recommendation methods, we set Top k=10 and report the average scores in the testing set. In the training phase, we tune the hyper-parameters of our IHGNN model by utilizing a cross-validation method and search the hyper-parameters with a popular grid search method. Speciically, we irst randomly initialize the parameters of our model by adopting a Gaussian distribution where we set the mean as 0 and the standard deviation as 0.02. To optimize our IHGNN model, we adopt a widely used optimizer named Adaptive Moment Estimation (Adam)15[] in a mini-batch way. The batch size is selected in the set {128, 256, 512}, the learning rate is searched in the set {1e-4, 1e-3, 1e-2, 5e-4, 5e-3, 5e-2} and the regularize is selected in set {1e-5, 1e-4, 5e-2, 1e-2, 1e-3,}. Because we ind the experimental settings are consistent when varying the dimensions of embeddings, if there is not a special explanation, we report our results �=200, when which achieves a relatively good performance. ACM Trans. Inf. Syst. 18 • Desheng Cai, et al. 5.4 uantitative Results The experimental results of baselines and the IHGNN model are shown in Table 3 and Table 4, with 80%, 60% of users used for training. From the results, we can make several conclusions and observations as follows. Firstly, our proposed IHGNN model reliably beats all baseline models on all three datasets for all four metrics, conirming the usefulness and superiority of IHGNN for user cold-start recommendation tasks. It is predictable that traditional methods consistently yield the most exceedingly bad performances for all four metrics. Learning representations of users and items, especially for new users, simply by incorporating related content information into factorization machine is not adequate. They ignore rich and expressive relationship information. Compared with traditional approaches, graph-based methods achieve signiicant performance improvements. These results demonstrate that graph convolutional networks can be used to learn better representations of nodes in graphs, especially for inferring the new users’ representations, which may further improve and promote the quality of representations for user cold-start recommendation. For GraphSAGE and STAR-GCN, on the basis of our experi- ments, their performances endure from the declination compared with heterogeneous graph-based representation learning models, HeRec and HetGNN. This might be because GraphSAGE only focuses on homogeneous graphs and ignores heterogeneity of data. Furthermore, STAR-GCN is based on bipartite user-item interaction graph, and cannot take heterogeneous content feature information into consideration. These comparisons further indicate that considering heterogeneous information, which mainly includes heterogeneous content information and various types of relationships, is vital to generate new users’ representations for user cold-start recommendations. For the HeRec model, it is worse than HetGNN probably because its performance heavily depends on diferent kinds of representation information in terms of diferent pre-designed meta-paths in heterogeneous graphs. Also, HeRec can not take advantage of the inluence of diferent types of nodes on the current node when aggregating content of neighboring nodes and may not efectively capture high-order structural data for cold- start recommendation tasks. Compared with the heterogeneous aggregation based method HetGNN, IHGNN achieves better and has the following advantages in terms of user cold-start recommendation task: (1) The advantage in HIN construction. Our model absorbs multi-modal attribute data into heterogeneous graph nodes, instead of just considering users and micro-videos as nodes, and further utilizes the heteroge- neous and rich relationships among these multi-modal attributes as edges in the heterogeneous graph for cold-start recommendation tasks. However, HetGNN just utilizes user-item interactions for constructing graphs. (2) The advantage in generating neighbors. The sampling and grouping module of our model searches and samples relevant heterogeneous neighboring nodes of each node based on rich relationships among multi- modal attributes and further makes our model generate a more robust and comprehensive representation of each user and each micro-video. For sampling more related neighboring nodes of the current node, we design The Random Walk Sampling Strategy ���(� ), which is a random walk based sampling strategy with a restart probability � for sampling related heterogeneous neighbors of each node in the complex graph M-HG. Compared with these exiting sampling strategies, ���� does not require any prior knowledge, such as meta paths, to sample heterogeneous neighboring nodes, and is not sensitive to interference from noisy nodes. Furthermore�,��� is conducted for multiple times to ensure the efectiveness of the sampling strategy because a single sampling operation may miss some important nodes. However, HetGNN just utilizes relatively simple sampling operations to obtain neighboring nodes on user-item bipartite graphs. Moreover, HetGNN heavily depends on user-item interactions, and ignores these relationships among multi-modal attribute data. (3) The advantage in feature aggregation module. Our feature aggregation module uses a novel hierarchical attention network, which consists of attribute-aware self-attention and neighbor-aware attention, to ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 19 Table 5. Experimental results of IHGNN and its key components in terms of all datasets. training_ratio=0.6 ( , k=10) Datasets Metrics IHGNN¬� IHGNN¬� IHGNN¬� IHGNN Kwai Pre 0.3901 0.3889 0.3511 0.3931 Rec 0.3711 0.3710 0.3916 0.4011 NDCG 0.3311 0.3761 0.3811 0.3901 AUC 0.7011 0.7001 0.7219 0.7402 Tiktok Pre 0.3112 0.3412 0.3331 0.3721 Rec 0.3911 0.3914 0.3818 0.4005 NDCG 0.3499 0.3812 0.3881 0.4043 AUC 0.7391 0.7319 0.7411 0.7512 MLs Pre 0.3599 0.3429 0.3311 0.3701 Rec 0.3417 0.3312 0.3711 0.3871 NDCG 0.3621 0.3519 0.3599 0.3701 AUC 0.7158 0.7119 0.6911 0.7233 take the importance of multi-modal attributes and diferent neighboring node types into consideration simultaneously at the same time for inferring representations of each user and each item, including unseen new nodes. The hierarchical design is used to consider the heterogeneity of constructed graphs. However, HetGNN does not consider the importance among diferent multi-modal attributes for representation learning of each node. (4) The advantage in inferring representations of new users. In this work, we consider a new user as a graph node and further connect the new user node with the input heterogeneous graph M-HG � in the training process based on the sparse attributes of the new user. This operation is the inductive new user representation learning of our model. However, HetGNN just relies on sparse attributes to generate embeddings of new users, but ignores hidden relationships among users, items, and their attributes. In conclusion, experimental results demonstrate the proposed IHGNN model has the potential to generate better user cold-start recommendation performance. 5.5 Analysis of IHGNN Components Since our proposed IHGNN consists of multiple key components, we demonstrate the efectiveness of the proposed model by comparing the following variants of the IHGNN: • IHGNN¬� : It is a variant of IHGNN which removes multiple sampling operation and samples related neighbors of each node� only once, and then the sampling result of each no�de is copied multiple times. • IHGNN¬�: It is a variant of IHGNN that removes the self-attention component of the intra-type feature aggregating component and sets the same importance values to multimodal attribute neighboring nodes. • IHGNN¬�: It is a variant of IHGNN which removes the attention module in the inter-type feature aggre- gating component and sets the same importance values to neighboring node groups. The ablation study results of Precision@10, Recall@10, NDCG@10 and AUC on three dataset are reported in Table 5 and 6, with 60%, 80% of users for training. From the results, we can conclude that: • IHGNN achieves better performance than IHGNN ¬� on three datasets, which demonstrates that the multiple sampling operation can capture important neighboring nodes more precisely and efectively. • IHGNN outperforms IHGNN¬�, which shows that the importance value of the same modal nodes, such as users, visual content, items, textual content, and acoustic content) can be better calculated through the intra-type self-attention component. ACM Trans. Inf. Syst. 20 • Desheng Cai, et al. Table 6. Experimental results of IHGNN and its key components in terms of all datasets. training_ratio=0.8 ( , k=10) Datasets Metrics IHGNN¬� IHGNN¬� IHGNN¬� IHGNN Kwai Pre 0.3312 0.3381 0.3227 0.3535 Rec 0.3345 0.3501 0.3616 0.3712 NDCG 0.3443 0.3502 0.3719 0.3825 AUC 0.7601 0.7402 0.7581 0.7791 Tiktok Pre 0.3033 0.3112 0.3129 0.3321 Rec 0.3101 0.3016 0.3121 0.3505 NDCG 0.3704 0.3652 0.3513 0.3843 AUC 0.7302 0.7319 0.7359 0.7419 MLs Pre 0.3301 0.3051 0.3121 0.3201 Rec 0.3099 0.3298 0.3132 0.3471 NDCG 0.3201 0.3019 0.3402 0.3601 AUC 0.7339 0.7101 0.7219 0.7561 IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM (a) NDCG@K (b) AUC Fig. 5. Experimental results of Top-K item recommendation when K varies from 2 to 20 onKwai the dataset. (train- ing_ratio=0.8) • The results of IHGNN is superior to IHGNN ¬�, which indicates that the inter-type attention component can estimate the impact of various type neighbor node groups (e.g.. users, items, attributes) for obtaining inal node embeddings efectively. 5.6 Hyper-parameters Sensitivity We show extended experimental results to break down the inluences of all key parameters for the IHGNN model which include the number of sampling operation, the ranking position K, the depth of sampling operation, and the aggregated representation dimension d for users and items on three datasets. Impact of the Ranking Position K: From Fig.5, Fig.6 and Fig.7, it can be observed that IHGNN shows consistent performance enhancements over graph-based methods across all position parameters, indicating the ACM Trans. Inf. Syst. NDCG@K AUC User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 21 IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM K K (a) NDCG@K (b) AUC Fig. 6. Experimental results of Top-K item recommendation when K varies from 2 to 20 on Tiktok the dataset. (train- ing_ratio=0.8.) IHGNN IHGNN HetGNN HetGNN HeRec HeRec STAR-GCN STAR-GCN GraphSage GraphSage NeuMF NeuMF FM FM (a) NDCG@K (b) AUC Fig. 7. Experimental results of Top-K item recommendation when K varies from 2 to 20 onMo the vieLens dataset. (training_ratio=0.8) need to model heterogeneous information (heterogeneous content information and various types of relationships), as well as the superior capability of graph representation learning of our IHGNN framework. Impact of the Number of Sampling Operation: From Fig.8, with the number of sampling operations for graph nodes varying from 2 to 10, experimental results of AUC and NDCG@10 increase slowly until a basically stable value, which illustrates the importance of multiple sampling operations for each node. Furthermore, our model has the good robustness. Even if there are many sampling operations of each node, it has little impact on the overall performance of our model. Impact of the Depth of Sampling Operation: From Fig.9, as the depth of the sampling operation varies from 2 to 8 for each node, the results of AUC and NDCG@10 slowly increase. nevertheless, with the depth of sampling ACM Trans. Inf. Syst. NDCG@K NDCG@K AUC AUC 22 • Desheng Cai, et al. Kwai Kwai Tiktok Tiktok Movielens Movielens Num of Sampling Operation Num of Sampling Operation (a) Num of Sampling Operation (b) Num of Sampling Operation Fig. 8. Experimental results of AUC and NDCG@10 of IHGNN for diferent number of sampling in terms of all datasets. (training_ratio=0.8, k=10) Kwai Kwai Tiktok Tiktok Movielens Movielens Depth of Sampling D D De e ep p pttth o h o h of S f S f Sa a am m mp p pllliiin n ng g g Depth of Sampling (a) Depth of Sampling (b) Depth of Sampling Fig. 9. Experimental results of AUC and NDCG@10 of IHGNN for diferent depth of sampling values in terms of all datasets. (training_ratio=0.8, k=10) operation increasing, the performance deteriorates. The cause might be that too many noisy neighboring nodes are contained. Impact of the Aggregated Embedding Dimension: From Fig.10, when the aggregate embedding dimension � of each graph node varies between 50 and 350, the AUC and NDCG@10 generally increases. Nevertheless, � as is increased further, performance will slowly decrease, possibly owing to overitting. ACM Trans. Inf. Syst. AUC AUC NDCG@10 NDCG@10 User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 23 Kwai Kwai Tiktok Tiktok Movielens Movielens Embedding Dimension Embedding Dimension (a) Embedding Dimension (b) Embedding Dimension Fig. 10. Experimental results of AUC and NDCG@10 of IHGNN for diferent embedding dimensions in terms of all datasets. (training_ratio=0.8, k=10) 5.7 ualitative Results To instinctively illustrate the efectiveness of the IHGNN in inferring new users’ representations by utilizing the highly complex and rich relationships, such as relational information among related multimodal attributes of users and items, we visualize a new user with his related attributes, sampling results from the M-HG and attention values of some related micro-videos, as shown in Figure 11. For each new user, based on his attributes, we use the sampling operation to sample heterogeneous neighbors in the constructed heterogeneous graph, which can be regarded as their enriched information. In Figure 11, the list of mobile apps for the new user includes apps related to basketball and Amazon, which indicates the new user’s interest. Furthermore, these information can be utilized to sample related micro-videos, including basketball-related and Amazon related micro-videos. After aggregation operation by the hierarchical attention aggregation network, attention values of diferent micro-videos are learned to generate the new user’s representation as shown in attention values box. Note that, in the attention value box, the blue horizontal arrow shows the size of learned attention values of nodes, and the vertical arrow shows the node number of the graph M-HG. From the Figure 11, we can observe that basketball-related and Amazon related micro-videos are more important than other types of videos in terms of their learned attention values. Therefore, the IHGNN model can efectively infer new users’ representations based on relevant data information, which can assist in improving the performance of the user cold-start recommendation task. 6 CONCLUSIONS In this work, we are committed to solving the problem of user cold-start recommendation. We argue that most existing GNN-based cold-start recommendation methods just learn models based on homogeneous graphs and ignore rich various(heterogeneous) relationships among diferent kinds of heterogeneous information in the user cold-start recommendation scenario. We propose a novel Inductive Heterogeneous Graph Neural Network (IHGNN) model, which can take advantage of the rich and heterogeneous relational information to alleviate the sparsity property of user attributes. Our model converts new users, items, associated multimodal information into a Modality-aware Heterogeneous Graph (M-HG), which preserves the rich and heterogeneous relationship information among them. In addition, a well-designed multiple hierarchical attention aggregation model consisting of the intra- and inter-type attention aggregating module is proposed, focusing on useful connected neighbors ACM Trans. Inf. Syst. AUC NDCG@10 24 • Desheng Cai, et al. Fig. 11. Visualization of sampling operation and atention values of some micro-videos learned by the IHGNN model. Note that, in the atention value box, the blue horizontal arrow shows the size of learned atention values of nodes, and the vertical arrow shows the node number of the graph M-HG. and neglecting meaningless and noisy connected neighbors for learning more expressive representations. We evaluate our IHGNN method on three real data sets, and experimental results in terms of all four metrics show that our proposed IHGNN model outperforms existing baselines in user cold-start recommendation tasks. In the future, we will try to focus on the expansion of the existing heterogeneous graph with knowledge graphs on the GNN models. 7 ACKNOWLEDGMENTS This work was supported in part by the National Natural Science Foundation of China (No.62036012, 61936005, 61832002, 61721004, 62072456, 62106262, 61872199), the Key Research Program of Frontier Sciences, CAS, Grant NO. QYZDJSSWJSC039, and the Open Research Projects of Zhejiang Lab (NO.2021KE0AB05). This work was also sponsored by the Tencent WeChat Rhino-Bird Focused Research Program. REFERENCES [1] Gediminas Adomavicius, Jesse C. Bockstedt, Shawn P. Curley, and Jingjing Zhang. 2021. Efects of Personalized and Aggregate Top-N Recommendation Lists on User Preference Ratings. ACM Trans. Inf. Syst. 39, 2 (2021), 13:1ś13:38. [2] Mohammad Aliannejadi and Fabio Crestani. 2018. Personalized Context-Aware Point of Interest Recommendation. ACM Trans. Inf. Syst. 36, 4 (2018), 45:1ś45:28. [3] Mohammad Aliannejadi, Hamed Zamani, Fabio Crestani, and W. Bruce Croft. 2021. Context-aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants. ACM Trans. Inf. Syst. 39, 3 (2021), 29:1ś29:30. [4] Desheng Cai, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2021. Heterogeneous Hierarchical Feature Aggregation Network for Personalized Micro-video Recommendation. IEEE Transactions on Multimedia (2021), 1ś1. https://doi.org/10.1109/TMM.2021.3059508 [5] Hongxu Chen, Hongzhi Yin, Tong Chen, Weiqing Wang, Xue Li, and Xia Hu. 2022. Social Boosted Recommendation With Folded Bipartite Network Embedding. IEEE Trans. Knowl. Data Eng. 34, 2 (2022), 914ś926. [6] Tianqi Chen, Weinan Zhang, Qiuxia Lu, Kailong Chen, Zhao Zheng, and Yong Yu. 2012. SVDFeature: a toolkit for feature-based collaborative iltering. J. Mach. Learn. Res. 13 (2012), 3619ś3622. [7] Wanyu Chen, Fei Cai, Honghui Chen, and Maarten de Rijke. 2019. Joint Neural Collaborative Filtering for Recommender ACMSystems. Trans. Inf. Syst. 37, 4 (2019), 39:1ś39:30. [8] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Rohan Anil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah. 2016. Wide & Deep Learning for Recommender Systems. InProceedings of the 1st Workshop on Deep Learning for Recommender Systems, RecSys 2016, Boston, MA, USA, September 15, 2016. 7ś10. [9] William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Advances Graphs.in In Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 1024ś1034. ACM Trans. Inf. Syst. User Cold-start Recommendation via Inductive Heterogeneous Graph Neural Network • 25 [10] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural CollaborativPr e Filtering. oceedings In of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017 . 173ś182. [11] Binbin Hu, Chuan Shi, Wayne Xin Zhao, and Philip S. Yu. 2018. Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model. Pr Inoceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018. 1531ś1540. [12] Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, and Changsheng Xu. 2021. Eicient Graph Deep Learning in TensorFlow with tf_geometric. MMIn ’21: ACM Multimedia Conference, Virtual Event, China, October 20 - 24, 2021 , Heng Tao Shen, Yueting Zhuang, John R. Smith, Yang Yang, Pablo Cesar, Florian Metze, and Balakrishnan Prabhakaran (Eds.). ACM, 3775ś3778. [13] Jun Hu, Shengsheng Qian, Quan Fang, and Changsheng Xu. 2019. Hierarchical Graph Semantic Pooling Network for Multi-modal Community Question Answer Matching.PrIn oceedings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019, Laurent Amsaleg, Benoit Huet, Martha A. Larson, Guillaume Gravier, Hayley Hung, Chong-Wah Ngo, and Wei Tsang Ooi (Eds.). ACM, 1157ś1165. [14] Shuyi Ji, Yifan Feng, Rongrong Ji, Xibin Zhao, Wanwan Tang, and Yue Gao. 2020. Dual Channel Hypergraph Collaborative Filtering. In KDD ’20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020 . 2020ś2029. [15] Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. 3rd International In Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings . [16] Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classiication with Graph Convolutional5thNetw International orks. In Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings . [17] Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for RecommenderComputer Systems. 42, 8 (2009), 30ś37. [18] Pigi Kouki, Shobeir Fakhraei, James R. Foulds, Magdalini Eirinaki, and Lise Getoor. 2015. HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems. In Proceedings of the 9th ACM Conference on Recommender Systems, RecSys 2015, Vienna, Austria, September 16-20, 2015. 99ś106. [19] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. ProceInedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014 (JMLR Workshop and Conference Proceedings, Vol.. 32) 1188ś1196. [20] Xiaopeng Li and James She. 2017. Collaborative Variational Autoencoder for Recommender Systems. Proceedings In of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017 . 305ś314. [21] Zhao Li, Xin Shen, Yuhang Jiao, Xuming Pan, Pengcheng Zou, Xianling Meng, Chengwei Yao, and Jiajun Bu. 2020. Hierarchical Bipartite Graph Neural Networks: Towards Large-Scale E-commerce Applications. 36th In IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020. 1677ś1688. [22] Song Liu, Haoqi Fan, Shengsheng Qian, Yiru Chen, Wenkui Ding, and Zhongyuan Wang. 2021. HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval.2021 In IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021. 11895ś11905. [23] Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. IEEE Confer In ence on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 . 3431ś3440. [24] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jefrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality Advances . In in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States . 3111ś3119. [25] Federico Monti, Michael M. Bronstein, and Xavier Bresson. 2017. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 3697ś3707. [26] Fedelucio Narducci, Pierpaolo Basile, Cataldo Musto, Pasquale Lops, Annalina Caputo, Marco de Gemmis, Leo Iaquinta, and Giovanni Semeraro. 2016. Concept-based item representations for a cross-lingual content-based recommendation prInf. ocess. Sci.374 (2016), 15ś31. [27] Shengsheng Qian, Dizhan Xue, Huaiwen Zhang, Quan Fang, and Changsheng Xu. 2021. Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval. Thirty-Fifth In AAAI Conference on Artiicial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artiicial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artiicial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. 2440ś2448. [28] Shengsheng Qian, Tianzhu Zhang, Changsheng Xu, and Jie Shao. 2016. Multi-Modal Event Topic Model for Social EventIEEE Analysis. Trans. Multim.18, 2 (2016), 233ś246. [29] Lei Sang, Min Xu, Shengsheng Qian, Matt Martin, Peter Li, and Xindong Wu. 2021. Context-Dependent Propagating-Based Video Recommendation in Multimodal Heterogeneous Information Netw IEEE orks. Trans. Multim.23 (2021), 2019ś2032. ACM Trans. Inf. Syst. 26 • Desheng Cai, et al. [30] Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 357ś370. [31] Chuan Shi, Binbin Hu, Wayne Xin Zhao, and Philip S. Yu. 2019. Heterogeneous Information Network Embedding for Recommendation. IEEE Trans. Knowl. Data Eng. 31, 2 (2019), 357ś370. [32] Chuan Shi, Jian Liu, Fuzhen Zhuang, Philip S. Yu, and Bin Wu. 2016. Integrating heterogeneous information via lexible regularization framework for recommendation. Knowl. Inf. Syst. 49, 3 (2016), 835ś859. [33] Chuan Shi, Zhiqiang Zhang, Yugang Ji, Weipeng Wang, Philip S. Yu, and Zhiping Shi. 2019. SemRec: a personalized semantic recommendation method based on weighted heterogeneous information netwW orks. orld Wide Web22, 1 (2019), 153ś184. [34] Florian Strub, Romaric Gaudel, and Jérémie Mary. 2016. Hybrid Recommender System based on Autoencoders. ProceInedings of the 1st Workshop on Deep Learning for Recommender Systems, DLRS@RecSys 2016, Boston, MA, USA, September 15, 2016. 11ś16. [35] Jianing Sun, Yingxue Zhang, Chen Ma, Mark Coates, Huifeng Guo, Ruiming Tang, and Xiuqiang He. 2019. Multi-graph Convolution Collaborative Filtering. 2019InIEEE International Conference on Data Mining, ICDM 2019, Beijing, China, November 8-11, 2019 . 1306ś1311. [36] Rianne van den Berg, Thomas N. Kipf, and Max Welling. 2017. Graph Convolutional Matrix Completion. CoRR abs/1706.02263 (2017). [37] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need.AIn dvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA. 5998ś6008. [38] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. [39] Xiang Wang, Hongye Jin, An Zhang, Xiangnan He, Tong Xu, and Tat-Seng Chua. 2020. Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020. 1001ś1010. [40] Xiao Wang, Ruijia Wang, Chuan Shi, Guojie Song, and Qingyong Li. 2020. Multi-Component Graph Convolutional Collaborative Filtering.The In Thirty-Fourth AAAI Conference on Artiicial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artiicial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artiicial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. 6267ś6274. [41] Jian Wei, Jianhua He, Kai Chen, Yi Zhou, and Zuoyin Tang. 2017. Collaborative iltering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 69 (2017), 29ś39. [42] Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-vide Prooce . In edings of the 27th ACM International Conference on Multimedia, MM 2019, Nice, France, October 21-25, 2019. 1437ś1445. [43] Libing Wu, Cong Quan, Chenliang Li, Qian Wang, Bolong Zheng, and Xiangyang Luo. 2019. A Context-Aware User-Item Representation Learning for Item Recommendation. ACM Trans. Inf. Syst. 37, 2 (2019), 22:1ś22:29. [44] Hongzhi Yin, Bin Cui, Xiaofang Zhou, Weiqing Wang, Zi Huang, and Shazia W. Sadiq. 2016. Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation. ACM Trans. Inf. Syst. 35, 2 (2016), 11:1ś11:44. [45] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L. Hamilton, and Jure Leskovec. 2018. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018 . 974ś983. [46] Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 793ś803. [47] Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexander J. Smola. 2020. ResNeSt: Split-Attention Networks. CoRR abs/2004.08955 (2020). [48] Jiani Zhang, Xingjian Shi, Shenglin Zhao, and Irwin King. 2019. STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems. InProceedings of the Twenty-Eighth International Joint Conference on Artiicial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. 4264ś4270. [49] Jing Zheng, Jian Liu, Chuan Shi, Fuzhen Zhuang, Jingzhi Li, and Bin Wu. 2017. Recommendation in heterogeneous information network via dual similarity regularization. Int. J. Data Sci. Anal.3, 1 (2017), 35ś48. ACM Trans. Inf. Syst.

Journal

ACM Transactions on Information Systems (TOIS)Association for Computing Machinery

Published: Feb 7, 2023

Keywords: Multimodal

There are no references for this article.