- NCE'AISTATS2010
- word2vec
- Attention Is All You Need'NIPS2017
- Bert, which deserves reading more times!
- How to represent part-whole hierarchies in a neural network'Arxiv2021, which proposed by Hinton. I think it is a appropriately new trial in Neural Symbolic Learning (use neural networks to convert a image into a parse tree). The whole techiniques are composed with transformers, neural fields, contrastive representation learning, distillation and capsules.
- Convolutional Neural Networks for Sentence Classification'EMNLP2014
- PROBABILITY CALIBRATION FOR KNOWLEDGE GRAPH EMBEDDING MODELS'ICLR2020
- Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler'ECCV2020
- Logic Constrained Pointer Networks for Interpretable Textual Similarity'IJCAI2020
- Graph Structure of Neural Networks'ICML2020, proposed by Jiaxuan You. It represents neural networks as graphs of connections between neurons, and depicts the way how does the graph structure of neural networks affect their predictive performance.
- Regularizing Recurrent Neural Networks via Sequence Mixup, it adopts several regularization techniques from feed-forward networks into RNN. I don't totally understand the paper (maybe re-read in future).
- Self-Attention with Relative Position Representations'NAACL-HLT2018, which changes the absolute position embedding in original transformer to relative position embedding.
- Enhancing the Transformer With Explicit Relational Encoding for Math Problem Solving'Arxiv2020, it proposes a change in the attention mechanism of Transformer, i think it deserves re-reading.
- RealFormer: Transformer Likes Residual Attention'Arxiv2020, which is preposed by Google, and it focuses on adding a residual connection on attention score (so simple!!!). And it says that Post-LN usually performs better than Pre-LN, but Post-LN needs warm up strategy, while Pre-LN does not need (we can set a large learning rate in Pre-LN), such opinion is proposed by this paper.
- RETHINKING POSITIONAL ENCODING IN LANGUAGE PRE-TRAINING'Arxiv2020, this work is well-written!! It thinks that the direct way of adding position embedding to input embedding is not suitable, as the two information is heterogeneous. It also proposes a novel way to tackle
[CLS]
(I do not read this part very carefully). Anyway, it deserves re-reading! - Transformer in Transformer'Arxiv2021, which is a new trail of Transformer in CV. The main idea: patch-level transformer+pixel-level transformer. I think the complexity analysis of the Transformer architecture (e.g., FLOPs) is very professional, section 2.3 deserves re-reading.
- Knowledge Graph Convolutional Networks for Recommender Systems'WWW2019
- KGAT'KDD2019
- Meta-Weight-Net'NIPS2019
- MetaKGR'EMNLP2019
-
Bridging Machine Learning and Logical Reasoning by Abductive Learning'NeurIPS2019
-
Abductive learning: towards bridging machine learning and logical reasoning'ScienceChina2019
-
Abduction and Argumentation for Explainable Machine Learning: A Position Survey'Arxiv2020
- NodeAug'KDD2020, which does data augmentation(change attributes of related nodes and change graph structure by adding or removing edges) on each node separately, and uses subgraph mini-batch training(subgraph can be seen as a receptive field); this work focuses on the semi-supervised node classification task.
- Learning beyond datasets: Knowledge Graph Augmented Neural Networks for Natural language Processing'NAACL2018, which uses KG to do data augmentation for NLP.
- Iterative Paraphrastic Augmentation with Discriminative Span Alignment'Arxiv2020
- Can We Achieve More with Less? Exploring Data Augmentation for Toxic Comment Classification'Arxiv2020
- DeepWalk'KDD2014, random walk + language model (why: the frequency distribution of vertices in random walks of social network and words in a language both follow a power law).
- LINE'WWW2015, the designed objective function which preserves both the first-order and second-order proximities. It proposes an edge-sampling algorithm for optimizing the objective to improve the effectiveness and efficiency.
- HAN'WWW2019, which focuses on heterogeneous graphs. And it uses a hierarchical attention, including node-level and semantic-level attentions.
- TransE'NIPS2013
- TransR'AAAI2015
- ANALOG'ICML2017, which can degenerate into DisMult, ComplEx and HolE. It focuses on analogical structures in KGs, such as "man is king as woman is to queen", which man and woman, king and queen are analogies. ANALOG designs special constraints on relation matrixs, such that analogical structures can be held in the model.
- R-GCN'ESWC2018
- ConvKB'NAACL-HLT2018, it takes transitional characteristics into accounts by using CNN(similart to ConvE, both them use CNN, ConvE doesn't hold transitional characteristics, while ConvKB holds). Different from ConvE which uses CNN to obtain features from head and relation, ConvKB obtains features from head relation and tail simultaneously. Although ConvKB gets competitive results in KGC, some doubts have rasied to question the improvement, e.g. A Re-evaluation of Knowledge Graph Completion Methods'ACL2020.
- QuatE'NIPS2019
- SACN'AAAI2019, SACN = WGCN(weighted GCN) + Conv-TransE. It takes advantage of knowledge graph node connectivity(GCN), node attributes(add attribute nodes) and relation types(WGCN). Conv-TransE keeps the translational property between entities and relations to learn node embeddings for the link prediction(similar to ConvE, both them use 2D convolution, but ConvE doesn't hold the translational property, while Conv-TransE does).
- VR-GCN'IJCAI2019, which generates both entity embeddings and relation embeddings simultaneously. VR-GCN is capable of learning the vectorized embedding of relations, in comparison with existing GCNs.
- QUATRE'Arxiv2020, i think it combines QuatE and TransR together.
- HAKE'AAAI2020, which combines the modulus (encode different categories) and phase (encode unique information in the same category) information. This method only uses triples, while it can caputure semantic hierarchy. I think this work is solid.
- TransE-RW'EKAW2018
- IterE'WWW2019
- Quantum Embedding of Knowledge for Reasoning'NeurIPS2019, E2R, which encodes logical structrues (T-box and A-box) into a vector space wich quantum logic. I think idea behind the model is similar to many works (like Query2box) which encode some concept information (T-box) into embedding. However, I think this model (E2R) is more general, it can encode many logic information into embedding (compared with existing work). I like this work.
- RPJE'AAAI2020
- UniKER'ICML-Workshop2020
- A Hybrid Model for Learning Embeddings and Logical Rules Simultaneously from Knowledge Graphs'Arxiv2020, it is similar to my first work. It iteratively learns rules and embeddings. At each iteration, learned embeddings help to prune the rules search space (special filter function using embeddings); and rules help to infer new facts (use importance sampling to sample from inferred facts, and then add into the training set). It is interesting that the experiment result is really good (compared to SOTA). So, why my method fails??? :(
- PTransE'EMNLP2015. If a path is h -r1-> e1 -r2-> t, then the objective is h + r1 = e1, e1 + r2 = t and h + (r1*r2) = t. As there are many paths and some paths are noise, so the paper proposes a metric to filter and uses the metric to do aggregation of the paths.
- RSNs'ICML2019
- PPKE: Knowledge Representation Learning by Path-based Pre-training'Arxiv2020, which follows CoKE and does path-based pre-training (sample lots of paths and use transformers to capture the context of paths), then it will do fine-tune with specific downstream task (e.g. link prediction or relation prediction).
- RuLES'ISWC2018
- RLvLR'IJCAI2018, which is comparable with Neural-LP. It uses KG embeddings to accelerate the rule finding (also uses sampling to make the embedding model scalable to large KGs), and uses matrix multiplication to accelerate the rule filtering (more efficient to calculate standard confidence).
- KG-BERT'AAAI2020, which treats triples in knowledge graphs as textual sequences and uses bert to model these triples.
- Entity Context and Relational Paths for Knowledge Graph Completion'Arxiv2020
- CoKE: Contextualized Knowledge Graph Embedding'Arxiv2020, it employs transformer encoder to obtain contextualized representations (two types of graph contexts are studied: edges and paths).
- HittER'Arxiv2020, which uses transformer to capture both the entity-relation and entity-context interactions. The interesting thing is that we can make an analogy between HittER with CompGCN (More generally, we can make an analogy between transformer and GCN).
- Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models'COLING2020, which follows KG-BERT, and uses multi-task learning (three tasks: link prediction, relation prediction and relevance ranking task) to combine linguistic information of pre-trained models and triple structural information.
- RETRA: Recurrent Transformers for Learning Temporally Contextualized Knowledge Graph Embeddings'ESWC2021-UnderReview, which focuses on temporally contextualized KGE by combining transformer and RNN.
- PRA'EMNLP2011
- Traversing Knowledge Graphs in Vector Space'EMNLP2015, it traverses in vector space to answer queries, and the paper shows that compositional training (modeling on path queries with length more than three) can improve knowledge base completion (sounds amazing!).
- Neural-LP'NIPS2017
- Chains of Reasoning over Entities, Relations, and Text using Recurrent Neural Networks'EACL2017. Given a query, answer the relation between two entities. The whole task is similar to PRA, the difference is that this paper uses RNN and each path doesn't only have relations but also entities.
- Multi-Hop'EMNLP2018
- Embedding Logical Queries on Knowledge Graphs'NIPS2018, which is followed by Query2box.
- MetaKGR'EMNLP2019, which uses reinforcement learning to do KG reasoning(given a query (h,r,?), return a path as an explain to (h,r,t)), and combines meta-learning to alleviate few-short relations.
- Embed2Reason'NIPS2019, which embeds a symbolic KB into a vector space in a logical structure preserving manner (inspired by the theory of Quantum Logic).
- Query2box'ICLR2020, which uses box embeddings to reasoning over KGs in vector space.
- CBR'AKBC2020
- BetaE'NIPS2020, which is the first embedding-based method that could handle arbitrary FOL queries on KGs (Beta distributions + probabilistic logical operators). The paper is another work by the author of Query2box.
- EM-RBR'Under_Review_ICLR2021, it utilizes relational background knowledge contained in rules to conduct multi-relation reasoning link prediction rather than superficial vector triangle linkage in embedding models. It solves completion through real rule-based reasoning (rather than uses rules to obtain better embeddings), sounds exciting!
-
MTransE'IJCAI2017
-
BootEA'IJCAI2018
-
RSNs'ICML2019
-
Visual Pivoting for (Unsupervised) Entity Alignment'Arxiv2020(accepted by AAAI'2020), it focuses on multi-modal embedding learning, and considers auxiliary information including images, relations and attributes(mostly focuses on images).
- Learning Entity Type Embeddings for Knowledge Graph Completion'CIKM2017, proposes a new task to predict the missing entity types.
- Embedding OWL Ontologies with OWL2Vec*'2019
- OWL2Vec∗: Embedding of OWL Ontologies'2020, there are two paradigms of embedding, one is semantic embedding (e.g. TransE), the other is to first explicitly explore the neighborhoods of entities and relations in the graph, and then learn the embeddings using a language model (e.g. node2vec, rdf2vec). This paper belongs to the language model paradigm, but preserves the semantics not only of the graph structure, but also of the lexical information and the logical constructors. Note that the graph of an ontology, which includes hierarchical categorization structure, differs from the multi-relation graph composed of role assertions of a typical KG; furthermore the ontology’s lexical information and logical constructors can not be successfully exploited by the aforementioned KG embedding methods.
- ExCut: Explainable Embedding-based Clustering over Knowledge Graphs'ISWC2020, which iteratively does clusters by embeddings and learns rules as explanations for the clusters.
- Knowledge Base Completion: Baselines Strike Back'ACL2017
- A Re-evaluation of Knowledge Graph Completion Methods'ACL2020
- On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link Prediction Methods'Arxiv2020, it proposes a new rank function which involves the situation that many scores are the same, and a new metric adjusted mean rank (compared with mean rank).
- Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions'Arxiv2020, simple and straightforward.
- GMatching'EMNLP2018, which is the first research on few-shot learning for knowledge graphs. Techiniques: neightbor encoder (encoder an entity h with its neighbors (r,t)), matching processor (a LSTM module for multi-step matching, given two entity pair, output a similarity score)
- MetaR'EMNLP2019, which follows the setting in GMatching. I think the idea belongs to the range of meta learning. It extracts relation meta information from few short train instances (easy), and calculates gradient meta information based on support set. It updates relation meta on querry set based on gradient meta from support set, and the final training objective is based on the loss on querry set (this part sounds confusing, but it is really similar to MAML (an algorithm of meta learning), both a bit confusing, Oh, this part seems really reasonable, I got it!).
- Adaptive Attentional Network for Few-Shot Knowledge Graph Completion'EMNLP2020. It learns dynamic/adaptive entity embeddings (entities exhibit diverse roles within task relations), and dynamic/adaptive reference embeddings (references make different contributions to queries). And it uses transformer encoder for entity pairs (reference / query).
- Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short'ACL2017, which talks about the influence of sparsity and noise for KGE.
- KBGAN'NAACL2018, which uses GAN to do negative sampling.
- Open-World Knowledge Graph Completion'AAAI2018, which does KG completion in open world(new entities and relations emerge).
- CKRL'AAAI2018, which assumes that triples in KGs are not always right(may have some noise), and triples should be treated differently(each triple has a distinct confidence). I think this work is similar to TransE-RW'EKAW2018, the diffence is that TransE-RW uses rules to calculate confidence, while CKRL models the confidence more complicated.
- TransC'EMNLP2018, which distinguish concepts and instances in KGs differently. It uses a sphere to embed a concept.
- Fact Validation with KG embeddings'2020, which uses KG embeddings as features, then trains by random forest with these features to do fact validation.
- OpenKE'EMNLP2018, which separates a large-scale KG into several parts and adapt KE models for parallel training (thus capable of embedding large-scale KGs). And it proposes a novel negative sampling strategy (offset-based negative sampling algorithm, i don't understand the algorithm) for further acceleration.
- AmpliGraph'2019, it has no paper.
- Pykg2vec'Arxiv2019
- LibKGE'ICLR2020, it indicates that training strategies (loss function, negative sampling, e.t.c) have a significant impact on model performance and may account for a substantial fraction of the progress (rather than the model itself) made in recent years. Interesting and inspiring!
- TorchKGE'IWKG-KDD2020, it evaluates much faster than OpenKGE.
- PyKEEN'Arxiv2020
- GraphVite'WWW2019, which accelarates node embedding greatly(can process very large scale) by designing a CPU-GPU hybrid system, focused on only one machine with mutiple CPU cores and multiple GPUs (one machine, multi-GPUs).
- PBG'SysML2019, distributed training (multi-machines, multi-GPUs).
- DGL-KE'SIGIR2020, distributed training (multi-machines, multi-GPUs).
- Knowledge Graph Embedding: A Survey of Approaches and Applications'TKDE2017, it concludes very comprehensively and deserves re-reading (I just read until section 3.5).