这是用户在 2024-5-27 19:53 为 https://cmt3.research.microsoft.com/ECMLPKDD2024/Submission/Reviews/1100 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to main content

View Reviews 查看评论

Paper ID 论文编号
1100
Paper Title 论文标题
Harnessing Deep Neighborhood in Heterogeneous Graphs for Node Representation
利用异构图中的深层邻域进行节点表示
Track Name 曲目名称
Research Track 研究方向

Reviewer #1  评论者 #1

Questions 问题

  • 1. Does the paper fit into the scope of the ECML-PKDD research track? (See https://2024.ecmlpkdd.org/submissions/research-track)
    1. 该论文是否属于 ECML-PKDD 研究方向的范围? (参见 https://2024.ecmlpkdd.org/submissions/research-track)
    • Yes
  • 2. Please briefly describe what you consider the paper's main contribution to be.
    2. 请简要描述您认为本文的主要贡献是什么。
    • The authors propose HDNM (Heterogeneous Deep Neighborhood Mining) to capture multi-hop neighborhood information effectively in heterogeneous graphs to enhance the quality of node representations while avoiding over-smoothing. More specifically, they aim to overcome 3 main challenges: (1) to address the explosive growth of metapaths, they introduce an atomic path discovery mechanism to identify key path segments; (2) to address the heterogeneous semantic fusion problem, they introduce a heterogeneous spectral convolution module to achieve deep spectral convolutions across a variety of relationships and node types; (3) to address the over-smoothing problem, they introduce a transformer-based module to aggregate and integrate neighborhood information from different depths and complex relationships.
      作者提出HDNM(异构深度邻域挖掘)来有效捕获异构图中的多跳邻域信息,以提高节点表示的质量,同时避免过度平滑。更具体地说,他们的目标是克服 3 个主要挑战:(1)为了解决元路径的爆炸性增长,他们引入了原子路径发现机制来识别关键路径段; (2)为了解决异构语义融合问题,他们引入了异构谱卷积模块来实现跨各种关系和节点类型的深度谱卷积; (3)为了解决过度平滑问题,他们引入了基于变压器的模块来聚合和集成来自不同深度和复杂关系的邻域信息。
  • 3. Please rate the technical soundness of the paper: Are the claims well supported by theoretical analysis or experimental results?
    3. 请评价论文的技术可靠性:理论分析或实验结果是否充分支持这些主张?
    • Serious issues with soundness
      健全性存在严重问题
  • 4. Please rate the contribution of the paper: Does the paper create new knowledge? Does it introduce a new method or task? Does it provide interesting insights into an existing approach?
    4. 请评价论文的贡献:论文是否创造了新知识?它是否引入了新的方法或任务?它是否为现有方法提供了有趣的见解?
    • Moderate contribution; medium interest
      贡献中等;中等兴趣
  • 5. Please rate the quality of the presentation: Is the paper clearly written? Is it well structured? Is it easy to understand?
    5. 请评价演讲的质量:论文写得清楚吗?结构良好吗?是不是很容易理解呢?
    • Average 平均的
  • 6. Please rate the replicability: For example, is it possible to reimplement the described algorithm or experiments solely based on the information in the paper?
    6. 请评价可复制性:例如,是否可以仅根据论文中的信息重新实现所描述的算法或实验?
    • Medium (e.g., some information is missing, but that could be fixed)
      中等(例如,缺少一些信息,但可以修复)
  • 7. Please rate the reproducibility: Are code and data open source?
    7. 请评价重现性:代码和数据是否开源?
    • Medium (some code, data or documentation is missing)
      中等(缺少一些代码、数据或文档)
  • 8. What are the main strengths of the paper? Please give at least 3.
    8. 论文的主要优点是什么?请至少给出3个。
    • S1: Idea of using atomic (key) metapaths as semantically rich propagation path segments and fundamental units of deep structure and relationships within the graph to avoid metapath explosion and/or domain expert involvement.
      S1:使用原子(关键)元路径作为语义丰富的传播路径段以及图中深层结构和关系的基本单元的想法,以避免元路径爆炸和/或领域专家的参与。


      S2: The experiments are extensive, including the comparison with several baselines belonging to different categories (metapath based and metapath free models) on 4 real-world heterogeneous datasets of different sizes. Nevertheless, the explanation is poor and tables 2-3 are unclear.
      S2:实验非常广泛,包括在 4 个不同大小的现实世界异构数据集上与属于不同类别(基于元路径和无元路径模型)的多个基线进行比较。然而,解释很差,表 2-3 不清楚。

      Results seem remarkable, but without information on running time and/or complexity aspects they can be due to the complex architectures employed and/or the embedding dimension.
      结果似乎很显着,但如果没有有关运行时间和/或复杂性方面的信息,它们可能是由于所采用的复杂架构和/或嵌入维度造成的。


      S3: The ablation study is appreciated, as is the comparison based on "tricks," highlighting the authors' knowledge of the state of the art. However, the description is shallow and incomplete.
      S3:消融研究受到赞赏,基于“技巧”的比较也受到赞赏,强调了作者对现有技术的了解。然而,描述是肤浅且不完整的。

  • 9. What opportunities are there to improve the paper? Please give at least 3.
    9. 有哪些机会可以改进论文?请至少给出3个。
    • W1: In discovering atomic metapaths, the authors exclude metapaths with repetition of the same node type, except for endpoints, but this assumption precludes the possibility of edges between two nodes of the same type, which is realistic in many heterogeneous graphs (example: paper cites paper).
      W1:在发现原子元路径时,作者排除了重复相同节点类型的元路径(端点除外),但这种假设排除了同一类型的两个节点之间存在边的可能性,这在许多异构图中是现实的(例如:论文)引用论文)。


      W2: Superficiality in definitions and explanations, inaccuracies, lack of rigor and justification. The section of preliminaries is coarse: the authors use only the pair of sets of nodes and edges to define a heterogeneous graph, several formal and informal definitions of terms used in the text are missing (such as network schema, terminal nodes of metapaths, shallow vs deep levels, etc.); in addition, the vectors’ dimensions are almost never given, it is not clear what the attributes of the node types are, the choice of architectures (MLP and transformer-based) nor the depth set to 10 is not motivated, tricks are not explained, the captions in Tables 2 and 3 are not self-consistent, the downstream task (multi-class classification, right?) can be guessed from Table 1 because the number of classes is given for each dataset and from the paragraph "Results on OGB-MAG" where it is written "many researchers have tried graph representation learning algorithms and various tricks to enhance node classification".
      W2:定义和解释肤浅、不准确、缺乏严谨性和合理性。预备部分很粗糙:作者仅使用一对节点和边的集合来定义异构图,缺少文本中使用的术语的几个正式和非正式定义(例如网络模式、元路径的终端节点、浅层与深层等);此外,向量的维度几乎从未给出,不清楚节点类型的属性是什么,架构的选择(MLP 和基于 Transformer 的)或深度设置为 10 都没有动机,技巧也没有解释,表 2 和表 3 中的标题不一致,可以从表 1 中猜测下游任务(多类分类,对吧?),因为每个数据集给出了类数,并且来自“OGB 上的结果”段落-MAG”,其中写道“许多研究人员已经尝试了图表示学习算法和各种技巧来增强节点分类”。


      W3: The authors repeatedly state the efficiency of the proposed approach, especially in terms of discovery of atomic metapaths, but the claim is not theoretically intuitive and is not supported by the report of running times neither by the analysis of the computational aspects.
      W3:作者反复陈述了所提出方法的效率,特别是在原子元路径的发现方面,但该主张在理论上并不直观,并且没有得到运行时间报告的支持,也没有得到计算方面分析的支持。

  • 10. Please enter a detailed review of the paper that explains the answers to the questions you have given above. Please explain your ratings of technical soundness, novelty, quality of presentation, reproducibility, strong and weak points. Please be specific and constructive, e.g., for weak points that can be improved, add suggestions how they can be improved. If e.g., relevant related work has not been considered, please include the references to this work.
    10. 请输入对论文的详细评论,解释您上面给出的问题的答案。请解释您对技术可靠性、新颖性、演示质量、可重复性、优点和缺点的评价。请具体并具有建设性,例如对于可以改进的薄弱环节,添加如何改进的建议。例如,如果尚未考虑相关相关工作,请包括对此工作的引用。
    • The approach proposed to overcome some common challenges in heterogeneous graphs is original and distringuishes itself from the state of the art, but it relies on unjustified assumptions. The idea sounds good, the experiments are extensive and look promising, but there is still much work to be done. The approach is potentially reasonable, but most core claims lack justification, including the efficiency, the choice of the architectures, and the considered depth. The claim of efficiency requires the report of running times and the analysis of the computational aspects of the entire model. The implementation choices need to be justified (e.g., why MLP? Why transformer-based architecture? Why depth equals to 10?). Even minor claims, such as "the natural adaptation to many deep neighborhood mining methods for homogeneous graphs" must be adequately motivated.
      为克服异构图中一些常见挑战而提出的方法是原创的,并且与现有技术不同,但它依赖于不合理的假设。这个想法听起来不错,实验很广泛,看起来很有希望,但仍有很多工作要做。该方法可能是合理的,但大多数核心主张缺乏合理性,包括效率、架构的选择和考虑的深度。效率的主张需要报告运行时间并对整个模型的计算方面进行分析。实现选择需要合理(例如,为什么采用 MLP?为什么基于 Transformer 的架构?为什么深度等于 10?)。即使是很小的主张,例如“对同质图的许多深层邻域挖掘方法的自然适应”也必须有充分的动机。


      The paper is well structured but the discussion requires some efforts to be followed. For example, to understand the difference between single atomic paths and multiple atomic paths you have to wait until page 9; the term metapath refers to both a type and an instance; sometimes there is confusion between node and node type (e.g., algorithm 1 should be: output […] and starting from any node TYPE); the downstream task must be guessed; captions of tables 2 and 3 are not self-consistent, since it is not clear what the groupings of rows correspond to (just resume in the caption the distinction between metapath based methods and metapath free methods explained in section 5.3). With respect to table 2, there is possibly a mistake as the second best result is -APPNP, except for one case.
      该论文结构良好,但讨论需要付出一些努力。例如,要了解单原子路径和多原子路径之间的区别,您必须等到第 9 页;术语“元路径”既指类型又指实例;有时节点和节点类型之间存在混淆(例如,算法 1 应该是:输出 [...] 并从任何节点类型开始);必须猜测下游任务;表 2 和表 3 的标题不一致,因为不清楚行分组对应的内容(只需在标题中恢复第 5.3 节中解释的基于元路径的方法和无元路径方法之间的区别)。对于表2,除了一种情况外,可能存在错误,因为第二好的结果是-APPNP。


      The datasets used in the experiments are open source and properly referenced, but the authors have not shared the code or stated that they will make it available upon publication. The Experiment Setup Detail section lacks some details (e.g., if a maximum epoch is set, does that mean that the early stopping technique is used? If yes, with what patience? Do the different runs correspond to the use of different seeds?)
      实验中使用的数据集是开源的并被正确引用,但作者尚未共享代码或表示他们将在发布后提供该代码。实验设置详细信息部分缺少一些细节(例如,如果设置了最大历元,是否意味着使用了早期停止技术?如果是,需要多大的耐心?不同的运行是否对应于不同种子的使用?)


      It is suggested that the authors reconsider their algorithm in the case of edges between two nodes of the same type, which is realistic in many heterogeneous graphs (example: paper cites paper), and to be more accurate and rigorous in the explanations.
      建议作者在同一类型的两个节点之间存在边的情况下重新考虑他们的算法,这在许多异构图中是现实的(例如:论文引用论文),并且在解释上更加准确和严谨。

      Concerning the preliminaries section, the heterogeneous graph definition should be G = (V, E, C, R τ, ϕ, X), clarifying the node type attributes and node attributes; the formal and informal definitions of technical terms used in the text (such as network schema, terminal or endpoint and intermediate nodes of metapaths, shallow vs deep levels, etc.) should be added. It is also suggested to specify the downstream task and the tricks used to enhance it.
      关于预备部分,异构图定义应为G = (V, E, C, R τ, phi, X),明确节点类型属性和节点属性;应添加文本中使用的技术术语的正式和非正式定义(例如网络模式、元路径的终端或端点和中间节点、浅层与深层等)。还建议指定下游任务以及用于增强它的技巧。


      Overall, it is an original work with the potential to open up new research directions, but it is not yet ready.
      总体而言,这是一部具有开辟新研究方向潜力的原创作品,但目前还没有准备好。
  • 11. Please give an overall rating to the paper into the following categories.
    11. 请对论文的总体评分分为以下几类。
    • Reject: Below the acceptance bar
      拒绝:低于接受栏

Reviewer #3  评论者 #3

Questions 问题

  • 1. Does the paper fit into the scope of the ECML-PKDD research track? (See https://2024.ecmlpkdd.org/submissions/research-track)
    1. 该论文是否属于 ECML-PKDD 研究方向的范围? (参见 https://2024.ecmlpkdd.org/submissions/research-track)
    • Yes
  • 2. Please briefly describe what you consider the paper's main contribution to be.
    2. 请简要描述您认为本文的主要贡献是什么。
    • This paper proposed deep neighborhood information utilization in heterogeneous graphs to enhance node representation quality.
      本文提出了异构图中的深度邻域信息利用来提高节点表示质量。
  • 3. Please rate the technical soundness of the paper: Are the claims well supported by theoretical analysis or experimental results?
    3. 请评价论文的技术可靠性:理论分析或实验结果是否充分支持这些主张?
    • Minor, fixable issues 可修复的小问题
  • 4. Please rate the contribution of the paper: Does the paper create new knowledge? Does it introduce a new method or task? Does it provide interesting insights into an existing approach?
    4. 请评价论文的贡献:论文是否创造了新知识?它是否引入了新的方法或任务?它是否为现有方法提供了有趣的见解?
    • Moderate contribution; medium interest
      贡献中等;中等兴趣
  • 5. Please rate the quality of the presentation: Is the paper clearly written? Is it well structured? Is it easy to understand?
    5. 请评价演讲的质量:论文写得清楚吗?结构良好吗?是不是很容易理解呢?
    • Average 平均的
  • 6. Please rate the replicability: For example, is it possible to reimplement the described algorithm or experiments solely based on the information in the paper?
    6. 请评价可复制性:例如,是否可以仅根据论文中的信息重新实现所描述的算法或实验?
    • Low (important information, e.g., parameter settings missing)
      低(重要信息,例如参数设置丢失)
  • 7. Please rate the reproducibility: Are code and data open source?
    7. 请评价重现性:代码和数据是否开源?
    • Low (no code, data, documentation given)
      低(没有给出代码、数据、文档)
  • 8. What are the main strengths of the paper? Please give at least 3.
    8. 论文的主要优点是什么?请至少给出3个。
    • 1. The paper is written well and easy to follow.
      1. 这篇论文写得很好,很容易理解。

      2. The claimed challenge of "explosive growth of metapaths" does exist in many existing representation learning method on HIN and solving this challenge could enhance representation quality.
      2. HIN 上许多现有的表示学习方法中确实存在所谓的“元路径爆炸式增长”的挑战,解决这一挑战可以提高表示质量。
  • 9. What opportunities are there to improve the paper? Please give at least 3.
    9. 有哪些机会可以改进论文?请至少给出3个。
    • 1. The proposed "atomic path" seems to be just a "shorter metapath" and does not solve the challenge of "explosive growth of metapaths". For example, in the simple example graph in Fig. 1, there are 6 types of atomic paths with P as the target node type. The authors should provide the difference in the number of atomic paths and regular metapaths on the datasets used.
      1. 提出的“原子路径”似乎只是“更短的元路径”,并没有解决“元路径爆炸式增长”的挑战。例如,在图1的简单示例图中,有6种类型的原子路径,其中P作为目标节点类型。作者应该提供所使用的数据集上原子路径和常规元路径数量的差异。

      2. Furthermore, the method of finding neighboring nodes through atomic paths in HSGC is similar to approaches in many existing works [1], and the subsequent convolution operation is also from existing work. Therefore, the technical contribution of HSGC is limited.
      2. 此外,HSGC中通过原子路径查找邻居节点的方法与许多现有工作[1]中的方法类似,后续的卷积操作也来自现有工作。因此,HSGC的技术贡献是有限的。

      3. The experiment section should provide more analysis on the semantics of atomic paths and their differences from metapaths used in the baseline methods. It should also provide information about the runtime efficiency of the proposed method.
      3. 实验部分应该对原子路径的语义及其与基线方法中使用的元路径的差异进行更多分析。它还应该提供有关所提出方法的运行时效率的信息。

      4. Among the baseline methods, the metapath-based part lacks the inclusion of state-of-the-art methods from the past three years, such as HGT [2], GTN [3], DiffMG [4], PMMM [5].
      4. 在基线方法中,基于元路径的部分缺乏过去三年最先进的方法,例如HGT [2]、GTN [3]、DiffMG [4]、PMMM [5] ]。


      [1] Wang, Xiao, et al. "Heterogeneous graph attention network." The world wide web conference. 2019.
      [1] 王肖,等. “异构图注意力网络。”万维网会议。 2019.

      [2] Hu, Ziniu, et al. "Heterogeneous graph transformer." Proceedings of the web conference 2020. 2020.
      [2] 胡紫牛,等. “异构图形转换器。” 2020年网络会议论文集. 2020.

      [3] Yun, Seongjun, et al. "Graph transformer networks." Advances in neural information processing systems 32 (2019).
      [3]尹成俊,等。 “图形变压器网络。”神经信息处理系统的进展 32 (2019)。

      [4] Ding, Yuhui, et al. "Diffmg: Differentiable meta graph search for heterogeneous graph neural networks." Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021.
      [4] 丁玉辉,等. “Diffmg:异构图神经网络的可微元图搜索。”第 27 届 ACM SIGKDD 知识发现和数据挖掘会议论文集。 2021 年。

      [5] Li, Chao, Hao Xu, and Kun He. "Differentiable meta multigraph search with partial message propagation on heterogeneous information networks." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 37. No. 7. 2023.
      [5] 李超,徐浩,何坤。 “在异构信息网络上进行部分消息传播的可微元多图搜索。” AAAI 人工智能会议论文集。卷。 37. 2023 年第 7 号。
  • 10. Please enter a detailed review of the paper that explains the answers to the questions you have given above. Please explain your ratings of technical soundness, novelty, quality of presentation, reproducibility, strong and weak points. Please be specific and constructive, e.g., for weak points that can be improved, add suggestions how they can be improved. If e.g., relevant related work has not been considered, please include the references to this work.
    10. 请输入对论文的详细评论,解释您上面给出的问题的答案。请解释您对技术可靠性、新颖性、演示质量、可重复性、优点和缺点的评价。请具体并具有建设性,例如对于可以改进的薄弱环节,添加如何改进的建议。例如,如果尚未考虑相关相关工作,请包括对此工作的引用。
    • See strengths and opportunities to improve.
      看到优势和改进的机会。
  • 11. Please give an overall rating to the paper into the following categories.
    11. 请对论文的总体评分分为以下几类。
    • Reject: Below the acceptance bar
      拒绝:低于接受栏

Reviewer #5 评论者 #5

Questions 问题

  • 1. Does the paper fit into the scope of the ECML-PKDD research track? (See https://2024.ecmlpkdd.org/submissions/research-track)
    1. 该论文是否属于 ECML-PKDD 研究方向的范围? (参见 https://2024.ecmlpkdd.org/submissions/research-track)
    • Yes
  • 2. Please briefly describe what you consider the paper's main contribution to be.
    2. 请简要描述您认为本文的主要贡献是什么。
    • The paper present a mining algorithm and a learning architecture for meta-paths in Heterogeneous graphs. The approach is evaluated over several datasets and compares to a large number of related-work approaches.
      本文提出了异构图中元路径的挖掘算法和学习架构。该方法在多个数据集上进行评估,并与大量相关工作方法进行比较。
  • 3. Please rate the technical soundness of the paper: Are the claims well supported by theoretical analysis or experimental results?
    3. 请评价论文的技术可靠性:理论分析或实验结果是否充分支持这些主张?
    • Minor, fixable issues 可修复的小问题
  • 4. Please rate the contribution of the paper: Does the paper create new knowledge? Does it introduce a new method or task? Does it provide interesting insights into an existing approach?
    4. 请评价论文的贡献:论文是否创造了新知识?它是否引入了新的方法或任务?它是否为现有方法提供了有趣的见解?
    • Moderate contribution; medium interest
      贡献中等;中等兴趣
  • 5. Please rate the quality of the presentation: Is the paper clearly written? Is it well structured? Is it easy to understand?
    5. 请评价演讲的质量:论文写得清楚吗?结构良好吗?是不是很容易理解呢?
    • Average 平均的
  • 6. Please rate the replicability: For example, is it possible to reimplement the described algorithm or experiments solely based on the information in the paper?
    6. 请评价可复制性:例如,是否可以仅根据论文中的信息重新实现所描述的算法或实验?
    • Medium (e.g., some information is missing, but that could be fixed)
      中等(例如,缺少一些信息,但可以修复)
  • 7. Please rate the reproducibility: Are code and data open source?
    7. 请评价重现性:代码和数据是否开源?
    • Low (no code, data, documentation given)
      低(没有给出代码、数据、文档)
  • 8. What are the main strengths of the paper? Please give at least 3.
    8. 论文的主要优点是什么?请至少给出3个。
    • S1. Treats an interesting problem setting.
      S1。处理一个有趣的问题设置。

      S2. The evaluation is mada with several datasets and competing approaches.
      S2。评估是通过多个数据集和竞争方法进行的。
  • 9. What opportunities are there to improve the paper? Please give at least 3.
    9. 有哪些机会可以改进论文?请至少给出3个。
    • I1. The problem should be formalized.
      I1.问题应该形式化。

      I2. The pre-processing algorithm should be discussed from a computational point of view.
      I2。预处理算法应该从计算的角度来讨论。

      I2. No reproducibility.  I2。没有再现性。
  • 10. Please enter a detailed review of the paper that explains the answers to the questions you have given above. Please explain your ratings of technical soundness, novelty, quality of presentation, reproducibility, strong and weak points. Please be specific and constructive, e.g., for weak points that can be improved, add suggestions how they can be improved. If e.g., relevant related work has not been considered, please include the references to this work.
    10. 请输入对论文的详细评论,解释您上面给出的问题的答案。请解释您对技术可靠性、新颖性、演示质量、可重复性、优点和缺点的评价。请具体并具有建设性,例如对于可以改进的薄弱环节,添加如何改进的建议。例如,如果尚未考虑相关相关工作,请包括对此工作的引用。
    • The first issue with the paper is that it is hard to see what problem it solves. Section 3 is weak, and it does not formalize the problem, the objectives. Instead, it goes directly to the solution.
      这篇论文的第一个问题是很难看出它解决了什么问题。第 3 节很薄弱,它没有将问题和目标形式化。相反,它直接进入解决方案。


      Secondly, the whole approach seems to rely on Algorithm 1, which, at first view, can be quite costly if the recursion is not implemented correctly. At the very least, the complexity should be discussed. This is important, as the amount of meta-paths fed to the next steps should directly impact the performance.
      其次,整个方法似乎依赖于算法 1,乍一看,如果递归没有正确实现,算法的成本可能会相当高。至少,应该讨论复杂性。这很重要,因为提供给后续步骤的元路径数量应直接影响性能。


      In this note, two evaluations should be added to the experiments: one comparing the running time-performance trade-off and one fidning the impact of the output of Algorithm 1 to the performance.
      在本说明中,实验中应添加两项评估:一项评估运行时间与性能之间的权衡,一项评估算法 1 的输出对性能的影响。


      Importantly, the paper does not provide many details for reproducibility in the paper. There is no code and supplementary material.
      重要的是,该论文没有提供有关论文中可重复性的许多细节。没有代码和补充材料。
  • 11. Please give an overall rating to the paper into the following categories.
    11. 请对论文的总体评分分为以下几类。
    • Reject: Below the acceptance bar
      拒绝:低于接受栏