equation_gpt_train_machine_learning.jsonl

{"completion": " The related materials of Domain-Adversarial Neural Networks DANN is: 1.Domain-Adversarial Neural Networks:<url>https://arxiv.org/pdf/1412.4446v2.pdf</url>;", "tag": "machine learning,transfer learning", "prompt": "Can you help me find related materials of Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Bregman Divergences is: <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>. Here is the brief introduction of Bregman Divergences:\n         The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\\phi} is defined as above. \\phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.\n    ", "tag": "machine learning", "prompt": "Can you explain Bregman Divergences?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "Please give me some hints on equation Conditional Random Field CRF?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence is: KL Divergence between P and Q.", "tag": "machine learning,kl divergence", "prompt": "Help me find the meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "Can you write equation Language Modelling?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>w</equation> in Support Vector Machine SVM is: Weight of Linear Classifier.", "tag": "machine learning,svm", "prompt": "What is the meaning of <equation>w</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "Can you write equation Prototypical Networks Protonets?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>w</equation> in Support Vector Machine SVM is: Weight of Linear Classifier.", "tag": "machine learning,svm", "prompt": "Can you explain the symbol <equation>w</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "Please give me some hints on equation Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for TransE is: <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>. Here is the brief introduction of TransE:\n      Given a training set S of triplets (h, l, t) composed of two entities h, t ? E (the set of entities) and a relationship l ? L (the set of relationships), our model learns vector embeddings of the entities and the relationships. The embeddings take values in Rk (k is a model hyperparameter) and are denoted with the same letters, in boldface characters. The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h + l ? t when (h, l, t) holds (t should be a nearest neighbor of h + l), while h + l should be far away from t otherwise. Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2 -norm. To learn such embeddings, we minimize a margin-based ranking criterion over the training set. See paper Translating Embeddings for Modeling Multi-relational Data for more details.\n\n        Related Documents\n\nTranslating Embeddings for Modeling Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "What's the meaning of TransE?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "How to write equation KL-Divergence?"}
{"completion": " The related materials of Hidden Markov Model is: ", "tag": "machine learning,nlp", "prompt": "Can you help me find related materials of Hidden Markov Model?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "Can you write equation Domain-Adversarial Neural Networks DANN?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Can you write equation Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The equation for Bound on Target Domain Error is: <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>. Here is the brief introduction of Bound on Target Domain Error:\n\n            I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \\epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \\hat{\\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \\lambda denotes a fixed term as the sum of \\lambda_{S} and \\lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. \n        You can check more detailed information in this NIPS 2006 paper by Shai Ben-David for more details.\n        \nRelated Documents\n\nAnalysis of Representations for Domain Adaptation\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you give me a brief introduction to Bound on Target Domain Error?"}
{"completion": " The equation for S-Learner is: <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>. Here is the brief introduction of S-Learner:\n            S-Learner use a single machine learning estimator \\mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \\mu and different inputs features of W, namely w=1 and w=0.\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of S-Learner?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "Show me the latex code of equation Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "How to write equation Deep Kernel Learning?"}
{"completion": " The equation for Progressive Layered Extraction PLE is: <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>. Here is the brief introduction of Progressive Layered Extraction PLE:\n\n        Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \\times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \\times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.\n        \nRelated Documents\n\nSee below link of paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Please give me detailed tutorial of Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "What is the equation of Model-Agnostic Meta-Learning MAML?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the latex code of equation TransH?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "Show me the latex code of equation Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Propensity Score?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence is: KL Divergence between P and Q.", "tag": "machine learning,kl divergence", "prompt": "What does the symbol mean <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Conditional Average Treatment Effect CATE is: <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>. Here is the brief introduction of Conditional Average Treatment Effect CATE:\n            Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for KL-Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of KL-Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you explain KL-Divergence?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation TransE?"}
{"completion": " The equation for Progressive Layered Extraction PLE is: <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>. Here is the brief introduction of Progressive Layered Extraction PLE:\n\n        Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \\times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \\times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.\n        \nRelated Documents\n\nSee below link of paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "I am interested in Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "How to write equation Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution X with m data points..", "tag": "machine learning,mmd", "prompt": "Can you explain the symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for KL-Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of KL-Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "What's the meaning of KL-Divergence?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation is: Probability of choosing action a given state s.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>\\pi(a|s)</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for Domain Adaptation H-Divergence is: <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>. Here is the brief introduction of Domain Adaptation H-Divergence:\n\n            The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.\n        \nRelated Documents\n\nA theory of learning from different domains\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "I am interested in Domain Adaptation H-Divergence?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation is: Probability of choosing action a given state s.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation?"}
{"completion": " The equation for TransH is: <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>. Here is the brief introduction of TransH:\n     TransH model learns low-dimensional representations of knowledge graphs triples on the hyperplane of the entities and relations. See paper Knowledge Graph Embedding by Translating on Hyperplanes for more details.\n        Related Documents\n\nKnowledge Graph Embedding by Translating on Hyperplanes\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "I am interested in TransH?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Show me the latex code of equation TransR?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "What is the equation of SimplE?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Please give me some hints on equation RotatE?"}
{"completion": " The equation for Deep Kernel Learning is: <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>. Here is the brief introduction of Deep Kernel Learning:\n        The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\\phi).\n        See below link Deep Kernel Learning for more details. \n    ", "tag": "machine learning,Deep Kernel Learning", "prompt": "Can you give me a brief introduction to Deep Kernel Learning?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling is: Variable length sequences of symbols.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "Show me the latex code of equation Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in KL-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of <equation>Q(x)</equation> in KL-Divergence?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "Please give me some hints on equation Kullback-Leibler Divergence?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD is: Superior of the discrepancy measure between two distribution..", "tag": "machine learning,mmd", "prompt": "Can you explain the symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for TransE is: <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>. Here is the brief introduction of TransE:\n      Given a training set S of triplets (h, l, t) composed of two entities h, t ? E (the set of entities) and a relationship l ? L (the set of relationships), our model learns vector embeddings of the entities and the relationships. The embeddings take values in Rk (k is a model hyperparameter) and are denoted with the same letters, in boldface characters. The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h + l ? t when (h, l, t) holds (t should be a nearest neighbor of h + l), while h + l should be far away from t otherwise. Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2 -norm. To learn such embeddings, we minimize a margin-based ranking criterion over the training set. See paper Translating Embeddings for Modeling Multi-relational Data for more details.\n\n        Related Documents\n\nTranslating Embeddings for Modeling Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of TransE?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "Can you write equation GraphSage?"}
{"completion": " The related materials of Multi-Gate Mixture of Experts MMoE is: 1.Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts:<url>https://dl.acm.org/doi/pdf/10.1145/3219819.3220007</url>;", "tag": "machine learning,multi task", "prompt": "Can you help me find related materials of Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The related materials of Model-Agnostic Meta-Learning MAML is: ", "tag": "machine learning,meta learning", "prompt": "Can you help me find related materials of Model-Agnostic Meta-Learning MAML?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the latex code of equation Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "What is the equation of Support Vector Machine SVM?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound Loss is: <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound Loss:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Please give me detailed tutorial of Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The equation for Area Under Uplift Curve AUUC is: <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>. Here is the brief introduction of Area Under Uplift Curve AUUC:\n            Authors in this paper Causal Inference and Uplift Modeling A review of the literature  define AUUC coefficient as the area under the uplift curve.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Area Under Uplift Curve AUUC?"}
{"completion": " The equation for Deep Kernel Learning is: <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>. Here is the brief introduction of Deep Kernel Learning:\n        The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\\phi).\n        See below link Deep Kernel Learning for more details. \n    ", "tag": "machine learning,Deep Kernel Learning", "prompt": "Please give me detailed tutorial of Deep Kernel Learning?"}
{"completion": " The equation for Domain-Adversarial Neural Networks DANN is: <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>. Here is the brief introduction of Domain-Adversarial Neural Networks DANN:\n\n        In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.\n        \nRelated Documents\n\nDomain-Adversarial Neural Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "What's the meaning of Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM is: Dual problem Lagrangian Relaxation.", "tag": "machine learning,svm", "prompt": "Help me find the meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "Can you write equation Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation T-Learner?"}
{"completion": " The equation for SME Linear is: <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>. Here is the brief introduction of SME Linear:\n\n             The energy function E (denoted SME) is encoded using a neural network, whose architecture first processes each entity in parallel, like in siamese networks. The intuition is that the relation type should first be used to extract relevant components from each argument\u00e2\u20ac\u2122s embedding, and put them in a space where they can then be compared. See paper A Semantic Matching Energy Function for Learning with Multi-relational Data for more details.\n        \nRelated Documents\n\n\">A Semantic Matching Energy Function for Learning with Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "I am interested in SME Linear?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is: <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. Here is the brief introduction of Maximum Mean Discrepancy MMD:\n        Latex code for the Maximum Mean Discrepancy MMD. I will briefly introduce the notations in this formulation. \n        \n: Superior of the discrepancy measure between two distribution.\n: Mean of probability distribution X with m data points.\n: Mean of probability distribution Y with n data points.\n\n\nRelated Documents\n\nA Kernel Two-Sample Test (MIT courses)\n\nRelated Videos\n\n\n", "tag": "machine learning,mmd", "prompt": "What's the meaning of Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the equation of Graph Attention Network GAT?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "What is the latex code of equation Variational AutoEncoder VAE?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Show me the latex code of equation Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "Show me the latex code of equation Conditional Random Field CRF?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in KL-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>P(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Variational AutoEncoder VAE is: <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>. Here is the brief introduction of Variational AutoEncoder VAE:", "tag": "machine learning,VAE", "prompt": "Please give me detailed tutorial of Variational AutoEncoder VAE?"}
{"completion": " The related materials of Mahalanobis Distance is: 1.Mahalanobis Distance:<url>https://www.sciencedirect.com/topics/engineering/mahalanobis-distance</url>;", "tag": "machine learning,mahalanobis", "prompt": "Can you help me find related materials of Mahalanobis Distance?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Unconfoundedness Assumption?"}
{"completion": " The equation for Unconfoundedness Assumption is: <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>. Here is the brief introduction of Unconfoundedness Assumption:\n        The unconfoundedness assumption or CIA (Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.\n    ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Unconfoundedness Assumption?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\gamma</equation> in Bellman Equation is: Reward discount factor \\gamma.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>\\gamma</equation> in Bellman Equation?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q.", "tag": "machine learning", "prompt": "What is the meaning of <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM is: Classifier.", "tag": "machine learning,svm", "prompt": "What does the symbol mean <equation>y=wx+b</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "What is the latex code of equation Entire Space Multi-Task Model ESSM?"}
{"completion": " The equation for Graph Attention Network GAT is: <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>. Here is the brief introduction of Graph Attention Network GAT:\n\n        GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.\n        \nRelated Documents\n\nGRAPH ATTENTION NETWORKS\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Please give me detailed tutorial of Graph Attention Network GAT?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution Y with n data points..", "tag": "machine learning,mmd", "prompt": "Help me find the meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The related materials of Support Vector Machine SVM is: ", "tag": "machine learning,svm", "prompt": "Can you help me find related materials of Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Please give me some hints on equation Graph Laplacian?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM is: Dual problem Lagrangian Relaxation.", "tag": "machine learning,svm", "prompt": "Can you explain the symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Diffusion Model Forward Process is: <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>. Here is the brief introduction of Diffusion Model Forward Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "What's the meaning of Diffusion Model Forward Process?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the equation of Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound Loss is: <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound Loss:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you explain Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the latex code of equation GraphSage?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "Can you write equation Variational AutoEncoder VAE?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound is: <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Please give me detailed tutorial of Diffusion Model Variational Lower Bound?"}
{"completion": " The related materials of TransR is: 1.\">Learning Entity and Relation Embeddings for Knowledge Graph Completion:<url>https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewFile/9571/9523/</url>;", "tag": "machine learning,KG", "prompt": "Can you help me find related materials of TransR?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>p(x)</equation> in Language Modelling is: \n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities .", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>p(x)</equation> in Language Modelling?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "Please give me some hints on equation Generative Adversarial Networks GAN?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in KL-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of <equation>P(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Conditional Average Treatment Effect CATE is: <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>. Here is the brief introduction of Conditional Average Treatment Effect CATE:\n            Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for Area Under Uplift Curve AUUC is: <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>. Here is the brief introduction of Area Under Uplift Curve AUUC:\n            Authors in this paper Causal Inference and Uplift Modeling A review of the literature  define AUUC coefficient as the area under the uplift curve.\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the equation of Bound on Target Domain Error?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation T-Learner?"}
{"completion": " The related materials of Graph Attention Network GAT is: 1.GRAPH ATTENTION NETWORKS:<url>https://arxiv.org/pdf/1710.10903.pdf</url>;", "tag": "machine learning,graph,GNN", "prompt": "Can you help me find related materials of Graph Attention Network GAT?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "Can you write equation Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD is: Superior of the discrepancy measure between two distribution..", "tag": "machine learning,mmd", "prompt": "Help me find the meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation is: Value at state s^{'} in policy \\pi.", "tag": "machine learning", "prompt": "What is the meaning of <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation?"}
{"completion": " The equation for SimplE is: <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>. Here is the brief introduction of SimplE:\n     The similarity function for a triple (e1 , r , e2 ) is ?he1 , vr , te2 ?.\n     SimplE considers two vectors he,te ? Rd as the embedding of each entity (similar to CP), and two vectors vr , vr?1 ? Rd for each relation r. The similarity function of SimplE foratriple(ei,r,ej)isdefinedas 1(?hei,vr,tej?+?hej,vr?1,tei?),i.e. theaverageoftheCP 2 scoresfor(ei,r,ej)and(ej,r?1,ei). See paper SimplE Embedding for Link Prediction in Knowledge Graphs for more details.\n    ", "tag": "machine learning,KG", "prompt": "What's the meaning of SimplE?"}
{"completion": " The equation for Bregman Divergences is: <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>. Here is the brief introduction of Bregman Divergences:\n         The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\\phi} is defined as above. \\phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.\n    ", "tag": "machine learning", "prompt": "Can you give me a brief introduction to Bregman Divergences?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is: <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. Here is the brief introduction of Jensen-Shannon Divergence JS-Divergence:\n        Latex code for the Jensen-Shannon Divergence(JS-Divergence). I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you explain Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\gamma</equation> in Bellman Equation is: Reward discount factor \\gamma.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>\\gamma</equation> in Bellman Equation?"}
{"completion": " The equation for Conditional Random Field CRF is: <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>. Here is the brief introduction of Conditional Random Field CRF:\n        p(Y|x) denotes the linear chain Conditional Random Field(CRF). t_k denotes the function on the transition, s_l denote function on the node.\n        lambda_k and mu_l denotes the weight coefficient.\n    ", "tag": "machine learning,nlp", "prompt": "Please give me detailed tutorial of Conditional Random Field CRF?"}
{"completion": " The equation for Conditional Random Field CRF is: <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>. Here is the brief introduction of Conditional Random Field CRF:\n        p(Y|x) denotes the linear chain Conditional Random Field(CRF). t_k denotes the function on the transition, s_l denote function on the node.\n        lambda_k and mu_l denotes the weight coefficient.\n    ", "tag": "machine learning,nlp", "prompt": "What's the meaning of Conditional Random Field CRF?"}
{"completion": " The equation for Domain Adaptation H-Divergence is: <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>. Here is the brief introduction of Domain Adaptation H-Divergence:\n\n            The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.\n        \nRelated Documents\n\nA theory of learning from different domains\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you explain Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "Can you write equation Conditional Random Field CRF?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM is: Decision Boundary.", "tag": "machine learning,svm", "prompt": "Help me find the meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Cross-Stitch Network is: <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>. Here is the brief introduction of Cross-Stitch Network:\n        The cross-stitch unit takes two activation maps xA and xB from previous layer and learns a linear combination of two inputs from previous tasks and combine them into two new representation. The linear combination is controlled by parameter \\alpha. \n\n        Related Documents\n\nSee below link of paper Cross-stitch Networks for Multi-task Learning for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you explain Cross-Stitch Network?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "Please give me some hints on equation Model-Agnostic Meta-Learning MAML?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation X-Learner?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "Show me the latex code of equation SimplE?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "Can you write equation Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Unconfoundedness Assumption?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling is: Variable length sequences of symbols.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport is: Wasserstein distance p power between two distributions P and Q.", "tag": "machine learning,wasserstein", "prompt": "What is the meaning of <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation X-Learner?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence is: KL Divergence between P and Q.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of the symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "What is the latex code of equation Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "Can you write equation Entire Space Multi-Task Model ESSM?"}
{"completion": " The related materials of Conditional Random Field CRF is: ", "tag": "machine learning,nlp", "prompt": "Can you help me find related materials of Conditional Random Field CRF?"}
{"completion": " The equation for Area Under Uplift Curve AUUC is: <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>. Here is the brief introduction of Area Under Uplift Curve AUUC:\n            Authors in this paper Causal Inference and Uplift Modeling A review of the literature  define AUUC coefficient as the area under the uplift curve.\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "How to write equation Transformer?"}
{"completion": " The equation for TransR is: <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>. Here is the brief introduction of TransR:\n\n             TransR model learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper Learning Entity and Relation Embeddings for Knowledge Graph Completion for more details.\n        \nRelated Documents\n\n\">Learning Entity and Relation Embeddings for Knowledge Graph Completion\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Please give me detailed tutorial of TransR?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "Please give me some hints on equation Entire Space Multi-Task Model ESSM?"}
{"completion": " The equation for Propensity Score is: <equation>e := p(W=1|X=x)</equation>. Here is the brief introduction of Propensity Score:\n            The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Propensity Score?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in KL-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>Q(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence is: Distribution of Q(x) over x.", "tag": "machine learning,kl divergence", "prompt": "Can you explain the symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Can you write equation Graph Attention Network GAT?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in KL-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>Q(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Model-Agnostic Meta-Learning MAML is: <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>. Here is the brief introduction of Model-Agnostic Meta-Learning MAML:\n\n\n\n \n\n\n", "tag": "machine learning,meta learning", "prompt": "Can you explain Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for RotatE is: <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>. Here is the brief introduction of RotatE:\n     RotatE learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space for more details.\n\n        Related Documents\n\n\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "I am interested in RotatE?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "How to write equation TransH?"}
{"completion": " The related materials of Bellman Equation is: 1.:<url></url>;", "tag": "machine learning", "prompt": "Can you help me find related materials of Bellman Equation?"}
{"completion": " The equation for X-Learner is: <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>. Here is the brief introduction of X-Learner:\n            See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning\n", "tag": "machine learning,causual inference", "prompt": "What's the meaning of X-Learner?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution Y with n data points..", "tag": "machine learning,mmd", "prompt": "What is the meaning of <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling is: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Conditional Random Field CRF is: <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>. Here is the brief introduction of Conditional Random Field CRF:\n        p(Y|x) denotes the linear chain Conditional Random Field(CRF). t_k denotes the function on the transition, s_l denote function on the node.\n        lambda_k and mu_l denotes the weight coefficient.\n    ", "tag": "machine learning,nlp", "prompt": "Can you give me a brief introduction to Conditional Random Field CRF?"}
{"completion": " The equation for QINI is: <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>. Here is the brief introduction of QINI:\n            Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of QINI?"}
{"completion": " The equation for Transformer is: <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>. Here is the brief introduction of Transformer:\nRelated Documents\n\nAttention Is All You Need\n\nRelated Videos\n\n\n", "tag": "machine learning,nlp,gpt", "prompt": "I am interested in Transformer?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "Please give me some hints on equation Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "Can you write equation Diffusion Model Reverse Process?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "Show me the latex code of equation Model-Agnostic Meta-Learning MAML?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "Please give me some hints on equation Prototypical Networks Protonets?"}
{"completion": " The equation for Hidden Markov Model is: <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>. Here is the brief introduction of Hidden Markov Model:\n        Q denotes the set of states and V denotes the set of obvervations. Let's assume we have state sequence I of length T, and observation sequence O of length T, Hidden Markov Model(HMM) use transition matrix A to denote the transition probability a_{ij} and matrix B to denote observation probability matrix b_jk.\n    ", "tag": "machine learning,nlp", "prompt": "What's the meaning of Hidden Markov Model?"}
{"completion": " The equation for Progressive Layered Extraction PLE is: <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>. Here is the brief introduction of Progressive Layered Extraction PLE:\n\n        Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \\times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \\times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.\n        \nRelated Documents\n\nSee below link of paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you give me a brief introduction to Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "Can you write equation Diffusion Model Forward Process?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM is: Decision Boundary.", "tag": "machine learning,svm", "prompt": "What is the meaning of <equation>wx+b=0</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for SME Linear is: <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>. Here is the brief introduction of SME Linear:\n\n             The energy function E (denoted SME) is encoded using a neural network, whose architecture first processes each entity in parallel, like in siamese networks. The intuition is that the relation type should first be used to extract relevant components from each argument\u00e2\u20ac\u2122s embedding, and put them in a space where they can then be compared. See paper A Semantic Matching Energy Function for Learning with Multi-relational Data for more details.\n        \nRelated Documents\n\n\">A Semantic Matching Energy Function for Learning with Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you give me a brief introduction to SME Linear?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation Language Modelling?"}
{"completion": " The equation for TransR is: <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>. Here is the brief introduction of TransR:\n\n             TransR model learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper Learning Entity and Relation Embeddings for Knowledge Graph Completion for more details.\n        \nRelated Documents\n\n\">Learning Entity and Relation Embeddings for Knowledge Graph Completion\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you explain TransR?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Average Treatment Effect ATE?"}
{"completion": " The equation for Conditional Average Treatment Effect CATE is: <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>. Here is the brief introduction of Conditional Average Treatment Effect CATE:\n            Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "What is the equation of Diffusion Model Forward Process?"}
{"completion": " The equation for Entire Space Multi-Task Model ESSM is: <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>. Here is the brief introduction of Entire Space Multi-Task Model ESSM:\n        ESSM model uses two separate towers to model pCTR prediction task and pCTCVR prediction task simultaneously. \n        Related Documents\n\nSee below link of paper Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you explain Entire Space Multi-Task Model ESSM?"}
{"completion": " The equation for TransH is: <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>. Here is the brief introduction of TransH:\n     TransH model learns low-dimensional representations of knowledge graphs triples on the hyperplane of the entities and relations. See paper Knowledge Graph Embedding by Translating on Hyperplanes for more details.\n        Related Documents\n\nKnowledge Graph Embedding by Translating on Hyperplanes\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you give me a brief introduction to TransH?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation is: Value at state s^{'} in policy \\pi.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation?"}
{"completion": " The equation for Graph Laplacian is: <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>. Here is the brief introduction of Graph Laplacian:\n\n        Graph Laplacian matrix equals to the identity matrix I minus the matrix multiplication of three parts, the (-1/2) power of the degree matrix D, the adjacency matrix A, and (-1/2) power of degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.\n        \nRelated Documents\n\n\">Graphs and Graph Laplacians- Upenn Tutorial\n\">Graph Laplacian matrix- MathWorks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you explain Graph Laplacian?"}
{"completion": " The equation for TransR is: <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>. Here is the brief introduction of TransR:\n\n             TransR model learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper Learning Entity and Relation Embeddings for Knowledge Graph Completion for more details.\n        \nRelated Documents\n\n\">Learning Entity and Relation Embeddings for Knowledge Graph Completion\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "I am interested in TransR?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Unconfoundedness Assumption?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "Please give me some hints on equation Deep Kernel Learning?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the latex code of equation Domain-Adversarial Neural Networks DANN?"}
{"completion": " The related materials of Transformer is: 1.Attention Is All You Need:<url>https://arxiv.org/abs/1706.03762</url>;", "tag": "machine learning,nlp,gpt", "prompt": "Can you help me find related materials of Transformer?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM is: Classifier.", "tag": "machine learning,svm", "prompt": "What is the meaning of the symbol <equation>y=wx+b</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "Show me the latex code of equation Mahalanobis Distance?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation X-Learner?"}
{"completion": " The equation for Area Under Uplift Curve AUUC is: <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>. Here is the brief introduction of Area Under Uplift Curve AUUC:\n            Authors in this paper Causal Inference and Uplift Modeling A review of the literature  define AUUC coefficient as the area under the uplift curve.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "Show me the latex code of equation Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is: <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. Here is the brief introduction of Jensen-Shannon Divergence JS-Divergence:\n        Latex code for the Jensen-Shannon Divergence(JS-Divergence). I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "How to write equation RotatE?"}
{"completion": " The related materials of TransE is: 1.Translating Embeddings for Modeling Multi-relational Data:<url>https://dl.acm.org/doi/10.5555/2999792.2999923</url>;", "tag": "machine learning", "prompt": "Can you help me find related materials of TransE?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "Please give me some hints on equation Cross-Stitch Network?"}
{"completion": " The equation for Support Vector Machine SVM is: <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. Here is the brief introduction of Support Vector Machine SVM:\n        Latex code for Support Vector Machine (SVM).\n        \n: Dual problem Lagrangian Relaxation\n: Weight of Linear Classifier\n: Classifier\n: Decision Boundary\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,svm", "prompt": "I am interested in Support Vector Machine SVM?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for QINI is: <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>. Here is the brief introduction of QINI:\n            Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of QINI?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in KL-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>P(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Model-Agnostic Meta-Learning MAML is: <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>. Here is the brief introduction of Model-Agnostic Meta-Learning MAML:\n\n\n\n \n\n\n", "tag": "machine learning,meta learning", "prompt": "I am interested in Model-Agnostic Meta-Learning MAML?"}
{"completion": " The related materials of GraphSage is: 1.\">Inductive Representation Learning on Large Graphs:<url>https://arxiv.org/pdf/1706.02216.pdf</url>;", "tag": "machine learning,graph,GNN", "prompt": "Can you help me find related materials of GraphSage?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD is: Superior of the discrepancy measure between two distribution..", "tag": "machine learning,mmd", "prompt": "What is the meaning of <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "Show me the latex code of equation GraphSage?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence is: Distribution of P(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What does the symbol mean <equation>P(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Show me the latex code of equation RotatE?"}
{"completion": " The equation for Domain Adaptation H-Divergence is: <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>. Here is the brief introduction of Domain Adaptation H-Divergence:\n\n            The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.\n        \nRelated Documents\n\nA theory of learning from different domains\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "What's the meaning of Domain Adaptation H-Divergence?"}
{"completion": " The related materials of Progressive Layered Extraction PLE is: 1.Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations:<url>https://dl.acm.org/doi/pdf/10.1145/3383313.3412236</url>;", "tag": "machine learning,multi task", "prompt": "Can you help me find related materials of Progressive Layered Extraction PLE?"}
{"completion": " The equation for Propensity Score is: <equation>e := p(W=1|X=x)</equation>. Here is the brief introduction of Propensity Score:\n            The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Propensity Score?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation TransE?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the latex code of equation Graph Laplacian?"}
{"completion": " The equation for Diffusion Model Forward Process is: <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>. Here is the brief introduction of Diffusion Model Forward Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you explain Diffusion Model Forward Process?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>r</equation> in Bellman Equation is: Reward at state s.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>r</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "What is the latex code of equation Kullback-Leibler Divergence?"}
{"completion": " The equation for Entire Space Multi-Task Model ESSM is: <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>. Here is the brief introduction of Entire Space Multi-Task Model ESSM:\n        ESSM model uses two separate towers to model pCTR prediction task and pCTCVR prediction task simultaneously. \n        Related Documents\n\nSee below link of paper Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you give me a brief introduction to Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Can you write equation Graph Laplacian?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "Show me the latex code of equation Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation is: Value at state s^{'} in policy \\pi.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "How to write equation Graph Laplacian?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "What is the equation of Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM is: Dual problem Lagrangian Relaxation.", "tag": "machine learning,svm", "prompt": "What is the meaning of <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "Show me the latex code of equation Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "Can you write equation Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "Please give me some hints on equation Progressive Layered Extraction PLE?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>p(x)</equation> in Language Modelling is: \n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities .", "tag": "machine learning", "prompt": "What does the symbol mean <equation>p(x)</equation> in Language Modelling?"}
{"completion": " The equation for Transformer is: <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>. Here is the brief introduction of Transformer:\nRelated Documents\n\nAttention Is All You Need\n\nRelated Videos\n\n\n", "tag": "machine learning,nlp,gpt", "prompt": "Can you explain Transformer?"}
{"completion": " The equation for Kullback-Leibler Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of Kullback-Leibler Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,kl divergence", "prompt": "Please give me detailed tutorial of Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Can you write equation TransH?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "What is the latex code of equation SimplE?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution X with m data points..", "tag": "machine learning,mmd", "prompt": "Help me find the meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation Bellman Equation?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Please give me some hints on equation TransH?"}
{"completion": " The related materials of Kullback-Leibler Divergence is: ", "tag": "machine learning,kl divergence", "prompt": "Can you help me find related materials of Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Individual Treatment Effect ITE?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM is: Dual problem Lagrangian Relaxation.", "tag": "machine learning,svm", "prompt": "What does the symbol mean <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Domain-Adversarial Neural Networks DANN is: <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>. Here is the brief introduction of Domain-Adversarial Neural Networks DANN:\n\n        In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.\n        \nRelated Documents\n\nDomain-Adversarial Neural Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you give me a brief introduction to Domain-Adversarial Neural Networks DANN?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation T-Learner?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Average Treatment Effect ATE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation is: Value at state s in policy \\pi.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>v_{\\pi}(s)</equation> in Bellman Equation?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "What is the equation of Wasserstein Distance Optimal Transport?"}
{"completion": " The related materials of QINI is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of QINI?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "Show me the latex code of equation Prototypical Networks Protonets?"}
{"completion": " The equation for Hidden Markov Model is: <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>. Here is the brief introduction of Hidden Markov Model:\n        Q denotes the set of states and V denotes the set of obvervations. Let's assume we have state sequence I of length T, and observation sequence O of length T, Hidden Markov Model(HMM) use transition matrix A to denote the transition probability a_{ij} and matrix B to denote observation probability matrix b_jk.\n    ", "tag": "machine learning,nlp", "prompt": "I am interested in Hidden Markov Model?"}
{"completion": " The related materials of Generative Adversarial Networks GAN is: 1.Generative Adversarial Networks:<url>https://arxiv.org/abs/1406.2661</url>;", "tag": "machine learning,gan", "prompt": "Can you help me find related materials of Generative Adversarial Networks GAN?"}
{"completion": " The equation for Graph Laplacian is: <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>. Here is the brief introduction of Graph Laplacian:\n\n        Graph Laplacian matrix equals to the identity matrix I minus the matrix multiplication of three parts, the (-1/2) power of the degree matrix D, the adjacency matrix A, and (-1/2) power of degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.\n        \nRelated Documents\n\n\">Graphs and Graph Laplacians- Upenn Tutorial\n\">Graph Laplacian matrix- MathWorks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you give me a brief introduction to Graph Laplacian?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "Can you write equation Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "What is the equation of Transformer?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution X with m data points..", "tag": "machine learning,mmd", "prompt": "What does the symbol mean <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in KL-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>P(x)</equation> in KL-Divergence?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the latex code of equation Bound on Target Domain Error?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "Show me the latex code of equation Diffusion Model Reverse Process?"}
{"completion": " The related materials of SME Linear is: 1.\">A Semantic Matching Energy Function for Learning with Multi-relational Data:<url>https://arxiv.org/pdf/1301.3485.pdf</url>;", "tag": "machine learning,KG", "prompt": "Can you help me find related materials of SME Linear?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "How to write equation Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Area Under Uplift Curve AUUC?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>KL(P||Q)</equation> in KL-Divergence?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Show me the latex code of equation TransH?"}
{"completion": " The equation for Multi-Gate Mixture of Experts MMoE is: <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>. Here is the brief introduction of Multi-Gate Mixture of Experts MMoE:\n\n        Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.\n        \nRelated Documents\n\nSee paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.\n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Please give me detailed tutorial of Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of X-Learner?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>r</equation> in Bellman Equation is: Reward at state s.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>r</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "What is the equation of Language Modelling?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "What is the latex code of equation Deep Kernel Learning?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is: <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. Here is the brief introduction of Wasserstein Distance Optimal Transport:\n        Latex code for the Wasserstein Distance (Optimal Transport Distance). I will briefly introduce the notations in this formulation. \n        \n: Wasserstein distance p power between two distributions P and Q\n: Power p of distance moving distributions P towards Q\n\n\nRelated Documents\n\nOptimal Transport and Wasserstein Distance (CMU lecture pdf)\n\nRelated Videos\n\n\n", "tag": "machine learning,wasserstein", "prompt": "Can you explain Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>p(x)</equation> in Language Modelling is: \n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities .", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>p(x)</equation> in Language Modelling?"}
{"completion": " The equation for Diffusion Model Reverse Process is: <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>. Here is the brief introduction of Diffusion Model Reverse Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you give me a brief introduction to Diffusion Model Reverse Process?"}
{"completion": " The equation for X-Learner is: <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>. Here is the brief introduction of X-Learner:\n            See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning\n", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of X-Learner?"}
{"completion": " The equation for Individual Treatment Effect ITE is: <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>. Here is the brief introduction of Individual Treatment Effect ITE:\n            Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i. \n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation Bregman Divergences?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "How to write equation Domain-Adversarial Neural Networks DANN?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the equation of Graph Laplacian?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "How to write equation Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation Language Modelling?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation X-Learner?"}
{"completion": " The equation for Model-Agnostic Meta-Learning MAML is: <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>. Here is the brief introduction of Model-Agnostic Meta-Learning MAML:\n\n\n\n \n\n\n", "tag": "machine learning,meta learning", "prompt": "Please give me detailed tutorial of Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for Generative Adversarial Networks GAN is: <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>. Here is the brief introduction of Generative Adversarial Networks GAN:\n\n            GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks\n\nRelated Documents\n\nGenerative Adversarial Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,gan", "prompt": "Please give me detailed tutorial of Generative Adversarial Networks GAN?"}
{"completion": " The equation for Bregman Divergences is: <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>. Here is the brief introduction of Bregman Divergences:\n         The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\\phi} is defined as above. \\phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.\n    ", "tag": "machine learning", "prompt": "What's the meaning of Bregman Divergences?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Please give me some hints on equation Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "What is the equation of Kullback-Leibler Divergence?"}
{"completion": " The equation for S-Learner is: <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>. Here is the brief introduction of S-Learner:\n            S-Learner use a single machine learning estimator \\mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \\mu and different inputs features of W, namely w=1 and w=0.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to S-Learner?"}
{"completion": " The equation for Graph Attention Network GAT is: <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>. Here is the brief introduction of Graph Attention Network GAT:\n\n        GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.\n        \nRelated Documents\n\nGRAPH ATTENTION NETWORKS\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you give me a brief introduction to Graph Attention Network GAT?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling is: Variable length sequences of symbols.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "Show me the latex code of equation Hidden Markov Model?"}
{"completion": " The equation for Variational AutoEncoder VAE is: <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>. Here is the brief introduction of Variational AutoEncoder VAE:", "tag": "machine learning,VAE", "prompt": "I am interested in Variational AutoEncoder VAE?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "Please give me some hints on equation Transformer?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound is: <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you give me a brief introduction to Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for RotatE is: <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>. Here is the brief introduction of RotatE:\n     RotatE learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space for more details.\n\n        Related Documents\n\n\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you explain RotatE?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound Loss is: <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound Loss:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you give me a brief introduction to Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation Bellman Equation?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for SME Linear is: <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>. Here is the brief introduction of SME Linear:\n\n             The energy function E (denoted SME) is encoded using a neural network, whose architecture first processes each entity in parallel, like in siamese networks. The intuition is that the relation type should first be used to extract relevant components from each argument\u00e2\u20ac\u2122s embedding, and put them in a space where they can then be compared. See paper A Semantic Matching Energy Function for Learning with Multi-relational Data for more details.\n        \nRelated Documents\n\n\">A Semantic Matching Energy Function for Learning with Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Please give me detailed tutorial of SME Linear?"}
{"completion": " The equation for Diffusion Model Forward Process Reparameterization is: <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>. Here is the brief introduction of Diffusion Model Forward Process Reparameterization:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "I am interested in Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for Entire Space Multi-Task Model ESSM is: <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>. Here is the brief introduction of Entire Space Multi-Task Model ESSM:\n        ESSM model uses two separate towers to model pCTR prediction task and pCTCVR prediction task simultaneously. \n        Related Documents\n\nSee below link of paper Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Please give me detailed tutorial of Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "Show me the latex code of equation Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Domain-Adversarial Neural Networks DANNis <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>", "tag": "machine learning,transfer learning", "prompt": "Please give me some hints on equation Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Diffusion Model Forward Process Reparameterization is: <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>. Here is the brief introduction of Diffusion Model Forward Process Reparameterization:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you explain Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is: <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. Here is the brief introduction of Jensen-Shannon Divergence JS-Divergence:\n        Latex code for the Jensen-Shannon Divergence(JS-Divergence). I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>w</equation> in Support Vector Machine SVM is: Weight of Linear Classifier.", "tag": "machine learning,svm", "prompt": "What does the symbol mean <equation>w</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "What is the latex code of equation Hidden Markov Model?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "Show me the latex code of equation Bound on Target Domain Error?"}
{"completion": " The equation for Proximal Policy Optimization PPO is: <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>. Here is the brief introduction of Proximal Policy Optimization PPO:\n        With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we\u2019ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn\u2019t as obvious\u2014the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. https://openai.com/research/openai-baselines-ppo\n", "tag": "machine learning", "prompt": "Can you explain Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "How to write equation Diffusion Model Variational Lower Bound?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Please give me some hints on equation Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "What is the latex code of equation Transformer?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "Can you write equation Wasserstein Distance Optimal Transport?"}
{"completion": " The related materials of Area Under Uplift Curve AUUC is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "How to write equation Domain Adaptation H-Divergence?"}
{"completion": " The equation for Support Vector Machine SVM is: <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. Here is the brief introduction of Support Vector Machine SVM:\n        Latex code for Support Vector Machine (SVM).\n        \n: Dual problem Lagrangian Relaxation\n: Weight of Linear Classifier\n: Classifier\n: Decision Boundary\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,svm", "prompt": "Can you give me a brief introduction to Support Vector Machine SVM?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is: <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. Here is the brief introduction of Maximum Mean Discrepancy MMD:\n        Latex code for the Maximum Mean Discrepancy MMD. I will briefly introduce the notations in this formulation. \n        \n: Superior of the discrepancy measure between two distribution.\n: Mean of probability distribution X with m data points.\n: Mean of probability distribution Y with n data points.\n\n\nRelated Documents\n\nA Kernel Two-Sample Test (MIT courses)\n\nRelated Videos\n\n\n", "tag": "machine learning,mmd", "prompt": "Can you explain Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for RotatE is: <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>. Here is the brief introduction of RotatE:\n     RotatE learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space for more details.\n\n        Related Documents\n\n\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "What's the meaning of RotatE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation is: Probability of choosing action a given state s.", "tag": "machine learning", "prompt": "What is the meaning of <equation>\\pi(a|s)</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "Please give me some hints on equation Hidden Markov Model?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation S-Learner?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation is: Value at state s^{'} in policy \\pi.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling is: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation is: Value at state s^{'} in policy \\pi.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>v_{\\pi}(s^{'})</equation> in Bellman Equation?"}
{"completion": " The equation for S-Learner is: <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>. Here is the brief introduction of S-Learner:\n            S-Learner use a single machine learning estimator \\mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \\mu and different inputs features of W, namely w=1 and w=0.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain S-Learner?"}
{"completion": " The equation for Support Vector Machine SVM is: <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. Here is the brief introduction of Support Vector Machine SVM:\n        Latex code for Support Vector Machine (SVM).\n        \n: Dual problem Lagrangian Relaxation\n: Weight of Linear Classifier\n: Classifier\n: Decision Boundary\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,svm", "prompt": "Please give me detailed tutorial of Support Vector Machine SVM?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Entire Space Multi-Task Model ESSM is: <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>. Here is the brief introduction of Entire Space Multi-Task Model ESSM:\n        ESSM model uses two separate towers to model pCTR prediction task and pCTCVR prediction task simultaneously. \n        Related Documents\n\nSee below link of paper Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "I am interested in Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "How to write equation Diffusion Model Forward Process?"}
{"completion": " The related materials of Proximal Policy Optimization PPO is: ", "tag": "machine learning", "prompt": "Can you help me find related materials of Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation QINI?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "What is the equation of Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Average Treatment Effect ATE is: <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>. Here is the brief introduction of Average Treatment Effect ATE:\n            Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Average Treatment Effect ATE?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the latex code of equation Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence is: Distribution of P(x) over x.", "tag": "machine learning,kl divergence", "prompt": "Help me find the meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation S-Learner?"}
{"completion": " The equation for Entire Space Multi-Task Model ESSM is: <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>. Here is the brief introduction of Entire Space Multi-Task Model ESSM:\n        ESSM model uses two separate towers to model pCTR prediction task and pCTCVR prediction task simultaneously. \n        Related Documents\n\nSee below link of paper Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "What's the meaning of Entire Space Multi-Task Model ESSM?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation QINI?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "What is the equation of Deep Kernel Learning?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM is: Dual problem Lagrangian Relaxation.", "tag": "machine learning,svm", "prompt": "What is the meaning of the symbol <equation>L(w,b,\\alpha)</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is: <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. Here is the brief introduction of Jensen-Shannon Divergence JS-Divergence:\n        Latex code for the Jensen-Shannon Divergence(JS-Divergence). I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "I am interested in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The related materials of RotatE is: 1.\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space:<url>https://openreview.net/forum?id=HkgEQnRqYQ</url>;", "tag": "machine learning,KG", "prompt": "Can you help me find related materials of RotatE?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence?"}
{"completion": " The equation for Language Modelling is: <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. Here is the brief introduction of Language Modelling:\n\n: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of \n\n: Variable length sequences of symbols\n\n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities \n\n", "tag": "machine learning", "prompt": "What's the meaning of Language Modelling?"}
{"completion": " The related materials of Diffusion Model Forward Process Reparameterization is: 1.Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.:<url>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</url>;2.Latex Code for Diffusion Models Equations:<url>http://deepnlp.org/blog/latex-code-for-diffusion-models-equations</url>;", "tag": "machine learning,diffusion", "prompt": "Can you help me find related materials of Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for Graph Laplacian is: <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>. Here is the brief introduction of Graph Laplacian:\n\n        Graph Laplacian matrix equals to the identity matrix I minus the matrix multiplication of three parts, the (-1/2) power of the degree matrix D, the adjacency matrix A, and (-1/2) power of degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.\n        \nRelated Documents\n\n\">Graphs and Graph Laplacians- Upenn Tutorial\n\">Graph Laplacian matrix- MathWorks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "I am interested in Graph Laplacian?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "How to write equation Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation TransHis <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the equation of TransH?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "What is the latex code of equation Generative Adversarial Networks GAN?"}
{"completion": " The related materials of Graph Convolutional Networks GCN is: 1.\">Semi-supervised classification with graph convolutional\nnetworks:<url>https://arxiv.org/abs/1609.02907v3</url>;", "tag": "machine learning,graph,GNN", "prompt": "Can you help me find related materials of Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Average Treatment Effect ATE?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound is: <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "What's the meaning of Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport is: Power p of distance moving distributions P towards Q.", "tag": "machine learning,wasserstein", "prompt": "What does the symbol mean <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for X-Learner is: <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>. Here is the brief introduction of X-Learner:\n            See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning\n", "tag": "machine learning,causual inference", "prompt": "I am interested in X-Learner?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Unconfoundedness Assumption?"}
{"completion": " The equation for Bound on Target Domain Error is: <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>. Here is the brief introduction of Bound on Target Domain Error:\n\n            I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \\epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \\hat{\\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \\lambda denotes a fixed term as the sum of \\lambda_{S} and \\lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. \n        You can check more detailed information in this NIPS 2006 paper by Shai Ben-David for more details.\n        \nRelated Documents\n\nAnalysis of Representations for Domain Adaptation\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Please give me detailed tutorial of Bound on Target Domain Error?"}
{"completion": " The equation for Hidden Markov Model is: <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>. Here is the brief introduction of Hidden Markov Model:\n        Q denotes the set of states and V denotes the set of obvervations. Let's assume we have state sequence I of length T, and observation sequence O of length T, Hidden Markov Model(HMM) use transition matrix A to denote the transition probability a_{ij} and matrix B to denote observation probability matrix b_jk.\n    ", "tag": "machine learning,nlp", "prompt": "Can you give me a brief introduction to Hidden Markov Model?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "Can you write equation Diffusion Model Forward Process Reparameterization?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "How to write equation Bregman Divergences?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "What is the equation of Mahalanobis Distance?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "What is the latex code of equation Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "How to write equation TransE?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence is: Distribution of Q(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of the symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "How to write equation Kullback-Leibler Divergence?"}
{"completion": " The equation for Propensity Score is: <equation>e := p(W=1|X=x)</equation>. Here is the brief introduction of Propensity Score:\n            The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Propensity Score?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "Can you write equation Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "What is the latex code of equation Cross-Stitch Network?"}
{"completion": " The equation for Individual Treatment Effect ITE is: <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>. Here is the brief introduction of Individual Treatment Effect ITE:\n            Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i. \n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "Please give me some hints on equation Support Vector Machine SVM?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>w</equation> in Support Vector Machine SVM is: Weight of Linear Classifier.", "tag": "machine learning,svm", "prompt": "Help me find the meaning of symbol <equation>w</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence is: Distribution of Q(x) over x.", "tag": "machine learning,kl divergence", "prompt": "Help me find the meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "What is the equation of Hidden Markov Model?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling is: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "What is the latex code of equation Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Individual Treatment Effect ITE is: <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>. Here is the brief introduction of Individual Treatment Effect ITE:\n            Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i. \n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in Individual Treatment Effect ITE?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence is: KL Divergence between P and Q.", "tag": "machine learning,kl divergence", "prompt": "Can you explain the symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "What is the latex code of equation Support Vector Machine SVM?"}
{"completion": " The equation for Individual Treatment Effect ITE is: <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>. Here is the brief introduction of Individual Treatment Effect ITE:\n            Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i. \n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Individual Treatment Effect ITE?"}
{"completion": " The equation for Average Treatment Effect ATE is: <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>. Here is the brief introduction of Average Treatment Effect ATE:\n            Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain Average Treatment Effect ATE?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport is: Power p of distance moving distributions P towards Q.", "tag": "machine learning,wasserstein", "prompt": "Help me find the meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "How to write equation Maximum Mean Discrepancy MMD?"}
{"completion": " The related materials of Language Modelling is: 1.:<url></url>;", "tag": "machine learning", "prompt": "Can you help me find related materials of Language Modelling?"}
{"completion": " The equation for Diffusion Model Forward Process Reparameterization is: <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>. Here is the brief introduction of Diffusion Model Forward Process Reparameterization:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "What's the meaning of Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation is: Probability of choosing action a given state s.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>\\pi(a|s)</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "Can you write equation Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "Can you write equation KL-Divergence?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "How to write equation Bellman Equation?"}
{"completion": " The equation for Multi-Gate Mixture of Experts MMoE is: <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>. Here is the brief introduction of Multi-Gate Mixture of Experts MMoE:\n\n        Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.\n        \nRelated Documents\n\nSee paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.\n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you give me a brief introduction to Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Language Modelling is: <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. Here is the brief introduction of Language Modelling:\n\n: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of \n\n: Variable length sequences of symbols\n\n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities \n\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of Language Modelling?"}
{"completion": " The equation for Hidden Markov Model is: <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>. Here is the brief introduction of Hidden Markov Model:\n        Q denotes the set of states and V denotes the set of obvervations. Let's assume we have state sequence I of length T, and observation sequence O of length T, Hidden Markov Model(HMM) use transition matrix A to denote the transition probability a_{ij} and matrix B to denote observation probability matrix b_jk.\n    ", "tag": "machine learning,nlp", "prompt": "Please give me detailed tutorial of Hidden Markov Model?"}
{"completion": " The related materials of Conditional Average Treatment Effect CATE is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Average Treatment Effect ATE?"}
{"completion": " The equation for Mahalanobis Distance is: <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>. Here is the brief introduction of Mahalanobis Distance:\n        Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details Mahalanobis Distance\n\nRelated Documents\n\nMahalanobis Distance\n\nRelated Videos\n\n\n", "tag": "machine learning,mahalanobis", "prompt": "Can you explain Mahalanobis Distance?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM is: Decision Boundary.", "tag": "machine learning,svm", "prompt": "Can you explain the symbol <equation>wx+b=0</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Proximal Policy Optimization PPO is: <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>. Here is the brief introduction of Proximal Policy Optimization PPO:\n        With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we\u2019ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn\u2019t as obvious\u2014the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. https://openai.com/research/openai-baselines-ppo\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to Proximal Policy Optimization PPO?"}
{"completion": " The equation for Diffusion Model Forward Process is: <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>. Here is the brief introduction of Diffusion Model Forward Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you give me a brief introduction to Diffusion Model Forward Process?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "How to write equation Generative Adversarial Networks GAN?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Can you write equation TransR?"}
{"completion": " The equation for Multi-Gate Mixture of Experts MMoE is: <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>. Here is the brief introduction of Multi-Gate Mixture of Experts MMoE:\n\n        Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.\n        \nRelated Documents\n\nSee paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.\n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "I am interested in Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The related materials of Prototypical Networks Protonets is: 1.Prototypical Networks for Few-shot Learning:<url>https://arxiv.org/abs/1703.05175</url>;", "tag": "machine learning,meta learning", "prompt": "Can you help me find related materials of Prototypical Networks Protonets?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "How to write equation Graph Attention Network GAT?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Show me the latex code of equation Graph Attention Network GAT?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in KL-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>Q(x)</equation> in KL-Divergence?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\pi(a|s)</equation> in Bellman Equation is: Probability of choosing action a given state s.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>\\pi(a|s)</equation> in Bellman Equation?"}
{"completion": " The equation for RotatE is: <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>. Here is the brief introduction of RotatE:\n     RotatE learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space for more details.\n\n        Related Documents\n\n\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you give me a brief introduction to RotatE?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "What is the latex code of equation Mahalanobis Distance?"}
{"completion": " The equation for Diffusion Model Reverse Process is: <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>. Here is the brief introduction of Diffusion Model Reverse Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "What's the meaning of Diffusion Model Reverse Process?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "Show me the latex code of equation SME Linear?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "Please give me some hints on equation SME Linear?"}
{"completion": " The equation for Graph Attention Network GAT is: <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>. Here is the brief introduction of Graph Attention Network GAT:\n\n        GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.\n        \nRelated Documents\n\nGRAPH ATTENTION NETWORKS\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you explain Graph Attention Network GAT?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "How to write equation Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "Can you write equation SME Linear?"}
{"completion": " The equation for SimplE is: <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>. Here is the brief introduction of SimplE:\n     The similarity function for a triple (e1 , r , e2 ) is ?he1 , vr , te2 ?.\n     SimplE considers two vectors he,te ? Rd as the embedding of each entity (similar to CP), and two vectors vr , vr?1 ? Rd for each relation r. The similarity function of SimplE foratriple(ei,r,ej)isdefinedas 1(?hei,vr,tej?+?hej,vr?1,tei?),i.e. theaverageoftheCP 2 scoresfor(ei,r,ej)and(ej,r?1,ei). See paper SimplE Embedding for Link Prediction in Knowledge Graphs for more details.\n    ", "tag": "machine learning,KG", "prompt": "Can you give me a brief introduction to SimplE?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "Show me the latex code of equation Diffusion Model Forward Process?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation Bellman Equation?"}
{"completion": " The equation for Conditional Average Treatment Effect CATE is: <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>. Here is the brief introduction of Conditional Average Treatment Effect CATE:\n            Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\gamma</equation> in Bellman Equation is: Reward discount factor \\gamma.", "tag": "machine learning", "prompt": "What is the meaning of <equation>\\gamma</equation> in Bellman Equation?"}
{"completion": " The equation for Proximal Policy Optimization PPO is: <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>. Here is the brief introduction of Proximal Policy Optimization PPO:\n        With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we\u2019ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn\u2019t as obvious\u2014the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. https://openai.com/research/openai-baselines-ppo\n", "tag": "machine learning", "prompt": "I am interested in Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "What is the equation of Cross-Stitch Network?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation KL-Divergence?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation S-Learner?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for TransE is: <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>. Here is the brief introduction of TransE:\n      Given a training set S of triplets (h, l, t) composed of two entities h, t ? E (the set of entities) and a relationship l ? L (the set of relationships), our model learns vector embeddings of the entities and the relationships. The embeddings take values in Rk (k is a model hyperparameter) and are denoted with the same letters, in boldface characters. The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h + l ? t when (h, l, t) holds (t should be a nearest neighbor of h + l), while h + l should be far away from t otherwise. Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2 -norm. To learn such embeddings, we minimize a margin-based ranking criterion over the training set. See paper Translating Embeddings for Modeling Multi-relational Data for more details.\n\n        Related Documents\n\nTranslating Embeddings for Modeling Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "I am interested in TransE?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Area Under Uplift Curve AUUC?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>p(x)</equation> in Language Modelling is: \n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities .", "tag": "machine learning", "prompt": "What is the meaning of <equation>p(x)</equation> in Language Modelling?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation is: Value at state s in policy \\pi.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation QINI?"}
{"completion": " The related materials of Variational AutoEncoder VAE is: ", "tag": "machine learning,VAE", "prompt": "Can you help me find related materials of Variational AutoEncoder VAE?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Show me the latex code of equation Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the latex code of equation TransR?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM is: Classifier.", "tag": "machine learning,svm", "prompt": "Can you explain the symbol <equation>y=wx+b</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Propensity Score?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the latex code of equation Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for Bellman Equation is: <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. Here is the brief introduction of Bellman Equation:\n\n:  Value at state s in policy \\pi\n:  Value at state s^{'} in policy \\pi\n:  Probability of choosing action a given state s\n:  Reward at state s\n:  Reward discount factor \\gamma\n\n\n        You can check more detailed information of Bellman Equation in this tutorial Introduction to Reinforcement Learning for more details.\n        \nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "What's the meaning of Bellman Equation?"}
{"completion": " The equation for Mahalanobis Distance is: <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>. Here is the brief introduction of Mahalanobis Distance:\n        Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details Mahalanobis Distance\n\nRelated Documents\n\nMahalanobis Distance\n\nRelated Videos\n\n\n", "tag": "machine learning,mahalanobis", "prompt": "Can you give me a brief introduction to Mahalanobis Distance?"}
{"completion": " The equation for Model-Agnostic Meta-Learning MAML is: <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>. Here is the brief introduction of Model-Agnostic Meta-Learning MAML:\n\n\n\n \n\n\n", "tag": "machine learning,meta learning", "prompt": "Can you give me a brief introduction to Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for TransE is: <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>. Here is the brief introduction of TransE:\n      Given a training set S of triplets (h, l, t) composed of two entities h, t ? E (the set of entities) and a relationship l ? L (the set of relationships), our model learns vector embeddings of the entities and the relationships. The embeddings take values in Rk (k is a model hyperparameter) and are denoted with the same letters, in boldface characters. The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h + l ? t when (h, l, t) holds (t should be a nearest neighbor of h + l), while h + l should be far away from t otherwise. Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2 -norm. To learn such embeddings, we minimize a margin-based ranking criterion over the training set. See paper Translating Embeddings for Modeling Multi-relational Data for more details.\n\n        Related Documents\n\nTranslating Embeddings for Modeling Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you explain TransE?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is: <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. Here is the brief introduction of Maximum Mean Discrepancy MMD:\n        Latex code for the Maximum Mean Discrepancy MMD. I will briefly introduce the notations in this formulation. \n        \n: Superior of the discrepancy measure between two distribution.\n: Mean of probability distribution X with m data points.\n: Mean of probability distribution Y with n data points.\n\n\nRelated Documents\n\nA Kernel Two-Sample Test (MIT courses)\n\nRelated Videos\n\n\n", "tag": "machine learning,mmd", "prompt": "Please give me detailed tutorial of Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "How to write equation Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for Generative Adversarial Networks GAN is: <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>. Here is the brief introduction of Generative Adversarial Networks GAN:\n\n            GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks\n\nRelated Documents\n\nGenerative Adversarial Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,gan", "prompt": "Can you explain Generative Adversarial Networks GAN?"}
{"completion": " The equation for Kullback-Leibler Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of Kullback-Leibler Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,kl divergence", "prompt": "Can you give me a brief introduction to Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "How to write equation Diffusion Model Reverse Process?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "Can you write equation TransE?"}
{"completion": " The related materials of Bregman Divergences is: ", "tag": "machine learning", "prompt": "Can you help me find related materials of Bregman Divergences?"}
{"completion": " The related materials of Diffusion Model Variational Lower Bound Loss is: 1.Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.:<url>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</url>;2.Latex Code for Diffusion Models Equations:<url>http://deepnlp.org/blog/latex-code-for-diffusion-models-equations</url>;", "tag": "machine learning,diffusion", "prompt": "Can you help me find related materials of Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "What is the latex code of equation Conditional Random Field CRF?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is: <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. Here is the brief introduction of Wasserstein Distance Optimal Transport:\n        Latex code for the Wasserstein Distance (Optimal Transport Distance). I will briefly introduce the notations in this formulation. \n        \n: Wasserstein distance p power between two distributions P and Q\n: Power p of distance moving distributions P towards Q\n\n\nRelated Documents\n\nOptimal Transport and Wasserstein Distance (CMU lecture pdf)\n\nRelated Videos\n\n\n", "tag": "machine learning,wasserstein", "prompt": "What's the meaning of Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Deep Kernel Learning is: <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>. Here is the brief introduction of Deep Kernel Learning:\n        The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\\phi).\n        See below link Deep Kernel Learning for more details. \n    ", "tag": "machine learning,Deep Kernel Learning", "prompt": "What's the meaning of Deep Kernel Learning?"}
{"completion": " The equation for Bound on Target Domain Error is: <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>. Here is the brief introduction of Bound on Target Domain Error:\n\n            I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \\epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \\hat{\\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \\lambda denotes a fixed term as the sum of \\lambda_{S} and \\lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. \n        You can check more detailed information in this NIPS 2006 paper by Shai Ben-David for more details.\n        \nRelated Documents\n\nAnalysis of Representations for Domain Adaptation\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you explain Bound on Target Domain Error?"}
{"completion": " The equation for Language Modelling is: <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. Here is the brief introduction of Language Modelling:\n\n: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of \n\n: Variable length sequences of symbols\n\n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities \n\n", "tag": "machine learning", "prompt": "I am interested in Language Modelling?"}
{"completion": " The equation for Unconfoundedness Assumption is: <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>. Here is the brief introduction of Unconfoundedness Assumption:\n        The unconfoundedness assumption or CIA (Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.\n    ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of Unconfoundedness Assumption?"}
{"completion": " The Latex Code for equation Kullback-Leibler Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning,kl divergence", "prompt": "Can you write equation Kullback-Leibler Divergence?"}
{"completion": " The equation for Proximal Policy Optimization PPO is: <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>. Here is the brief introduction of Proximal Policy Optimization PPO:\n        With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we\u2019ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn\u2019t as obvious\u2014the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. https://openai.com/research/openai-baselines-ppo\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of Proximal Policy Optimization PPO?"}
{"completion": " The equation for Prototypical Networks Protonets is: <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>. Here is the brief introduction of Prototypical Networks Protonets:\n        Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\\phi}(.) with parameters \\phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\\theta) is calculated over query set.\n        Related Documents\n\nSee paper Prototypical Networks for Few-shot Learning for more detail.\n\nRelated Videos\n\n\n", "tag": "machine learning,meta learning", "prompt": "Can you give me a brief introduction to Prototypical Networks Protonets?"}
{"completion": " The equation for Graph Laplacian is: <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>. Here is the brief introduction of Graph Laplacian:\n\n        Graph Laplacian matrix equals to the identity matrix I minus the matrix multiplication of three parts, the (-1/2) power of the degree matrix D, the adjacency matrix A, and (-1/2) power of degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.\n        \nRelated Documents\n\n\">Graphs and Graph Laplacians- Upenn Tutorial\n\">Graph Laplacian matrix- MathWorks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Please give me detailed tutorial of Graph Laplacian?"}
{"completion": " The equation for Diffusion Model Reverse Process is: <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>. Here is the brief introduction of Diffusion Model Reverse Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you explain Diffusion Model Reverse Process?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "How to write equation Conditional Random Field CRF?"}
{"completion": " The equation for Average Treatment Effect ATE is: <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>. Here is the brief introduction of Average Treatment Effect ATE:\n            Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to Average Treatment Effect ATE?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Unconfoundedness Assumption?"}
{"completion": " The Latex Code for equation Entire Space Multi-Task Model ESSMis <equation>L(\\\\theta_{cvr},\\\\theta_{ctr})=\\\\sum^{N}_{i=1}l(y_{i},f(x_{i};\\\\theta_{ctr}))+\\\\sum^{N}_{i=1}l(y_{i}\\\\&z_{i},f(x_{i};\\\\theta_{ctr}) \\\\times f(x_{i};\\\\theta_{cvr}))\\ </equation>", "tag": "machine learning,multi task", "prompt": "What is the equation of Entire Space Multi-Task Model ESSM?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling is: Variable length sequences of symbols.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Can you write equation RotatE?"}
{"completion": " The equation for TransR is: <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>. Here is the brief introduction of TransR:\n\n             TransR model learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper Learning Entity and Relation Embeddings for Knowledge Graph Completion for more details.\n        \nRelated Documents\n\n\">Learning Entity and Relation Embeddings for Knowledge Graph Completion\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "What's the meaning of TransR?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "Can you write equation Generative Adversarial Networks GAN?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "Please give me some hints on equation Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Variational AutoEncoder VAE is: <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>. Here is the brief introduction of Variational AutoEncoder VAE:", "tag": "machine learning,VAE", "prompt": "What's the meaning of Variational AutoEncoder VAE?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation Bregman Divergences?"}
{"completion": " The Latex Code for equation Graph Laplacianis <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Show me the latex code of equation Graph Laplacian?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>r</equation> in Bellman Equation is: Reward at state s.", "tag": "machine learning", "prompt": "What is the meaning of <equation>r</equation> in Bellman Equation?"}
{"completion": " The equation for Domain-Adversarial Neural Networks DANN is: <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>. Here is the brief introduction of Domain-Adversarial Neural Networks DANN:\n\n        In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.\n        \nRelated Documents\n\nDomain-Adversarial Neural Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Please give me detailed tutorial of Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for Domain Adaptation H-Divergence is: <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>. Here is the brief introduction of Domain Adaptation H-Divergence:\n\n            The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.\n        \nRelated Documents\n\nA theory of learning from different domains\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you give me a brief introduction to Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "Please give me some hints on equation GraphSage?"}
{"completion": " The equation for Domain-Adversarial Neural Networks DANN is: <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>. Here is the brief introduction of Domain-Adversarial Neural Networks DANN:\n\n        In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.\n        \nRelated Documents\n\nDomain-Adversarial Neural Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Can you explain Domain-Adversarial Neural Networks DANN?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "Can you write equation Cross-Stitch Network?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "What is the equation of KL-Divergence?"}
{"completion": " The equation for Graph Convolutional Networks GCN is: <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>. Here is the brief introduction of Graph Convolutional Networks GCN:\n\n        In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.\n        \nRelated Documents\n\n\">Semi-supervised classification with graph convolutional\nnetworks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you give me a brief introduction to Graph Convolutional Networks GCN?"}
{"completion": " The equation for TransE is: <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>. Here is the brief introduction of TransE:\n      Given a training set S of triplets (h, l, t) composed of two entities h, t ? E (the set of entities) and a relationship l ? L (the set of relationships), our model learns vector embeddings of the entities and the relationships. The embeddings take values in Rk (k is a model hyperparameter) and are denoted with the same letters, in boldface characters. The basic idea behind our model is that the functional relation induced by the l-labeled edges corresponds to a translation of the embeddings, i.e. we want that h + l ? t when (h, l, t) holds (t should be a nearest neighbor of h + l), while h + l should be far away from t otherwise. Following an energy-based framework, the energy of a triplet is equal to d(h + l, t) for some dissimilarity measure d, which we take to be either the L1 or the L2 -norm. To learn such embeddings, we minimize a margin-based ranking criterion over the training set. See paper Translating Embeddings for Modeling Multi-relational Data for more details.\n\n        Related Documents\n\nTranslating Embeddings for Modeling Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to TransE?"}
{"completion": " The equation for Mahalanobis Distance is: <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>. Here is the brief introduction of Mahalanobis Distance:\n        Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details Mahalanobis Distance\n\nRelated Documents\n\nMahalanobis Distance\n\nRelated Videos\n\n\n", "tag": "machine learning,mahalanobis", "prompt": "I am interested in Mahalanobis Distance?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM is: Classifier.", "tag": "machine learning,svm", "prompt": "What is the meaning of <equation>y=wx+b</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the equation of Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "What is the equation of Generative Adversarial Networks GAN?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence is: KL Divergence between P and Q.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of <equation>KL(P||Q)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "What is the latex code of equation Diffusion Model Reverse Process?"}
{"completion": " The equation for Proximal Policy Optimization PPO is: <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>. Here is the brief introduction of Proximal Policy Optimization PPO:\n        With supervised learning, we can easily implement the cost function, run gradient descent on it, and be very confident that we\u2019ll get excellent results with relatively little hyperparameter tuning. The route to success in reinforcement learning isn\u2019t as obvious\u2014the algorithms have many moving parts that are hard to debug, and they require substantial effort in tuning in order to get good results. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. https://openai.com/research/openai-baselines-ppo\n", "tag": "machine learning", "prompt": "What's the meaning of Proximal Policy Optimization PPO?"}
{"completion": " The equation for Bellman Equation is: <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. Here is the brief introduction of Bellman Equation:\n\n:  Value at state s in policy \\pi\n:  Value at state s^{'} in policy \\pi\n:  Probability of choosing action a given state s\n:  Reward at state s\n:  Reward discount factor \\gamma\n\n\n        You can check more detailed information of Bellman Equation in this tutorial Introduction to Reinforcement Learning for more details.\n        \nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "I am interested in Bellman Equation?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "What is the equation of Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD is: Superior of the discrepancy measure between two distribution..", "tag": "machine learning,mmd", "prompt": "What is the meaning of the symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "How to write equation Mahalanobis Distance?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Individual Treatment Effect ITE?"}
{"completion": " The equation for Transformer is: <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>. Here is the brief introduction of Transformer:\nRelated Documents\n\nAttention Is All You Need\n\nRelated Videos\n\n\n", "tag": "machine learning,nlp,gpt", "prompt": "Please give me detailed tutorial of Transformer?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "What is the latex code of equation Prototypical Networks Protonets?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport is: Power p of distance moving distributions P towards Q.", "tag": "machine learning,wasserstein", "prompt": "What is the meaning of <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "Show me the latex code of equation Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "Can you write equation Bregman Divergences?"}
{"completion": " The equation for SME Linear is: <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>. Here is the brief introduction of SME Linear:\n\n             The energy function E (denoted SME) is encoded using a neural network, whose architecture first processes each entity in parallel, like in siamese networks. The intuition is that the relation type should first be used to extract relevant components from each argument\u00e2\u20ac\u2122s embedding, and put them in a space where they can then be compared. See paper A Semantic Matching Energy Function for Learning with Multi-relational Data for more details.\n        \nRelated Documents\n\n\">A Semantic Matching Energy Function for Learning with Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "What's the meaning of SME Linear?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation T-Learner?"}
{"completion": " The equation for RotatE is: <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>. Here is the brief introduction of RotatE:\n     RotatE learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space for more details.\n\n        Related Documents\n\n\">RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Please give me detailed tutorial of RotatE?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "What is the equation of Diffusion Model Reverse Process?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What is the meaning of <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for KL-Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of KL-Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to KL-Divergence?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Show me the latex code of equation Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Conditional Average Treatment Effect CATE is: <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>. Here is the brief introduction of Conditional Average Treatment Effect CATE:\n            Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain Conditional Average Treatment Effect CATE?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What is the meaning of <equation>KL(P||Q)</equation> in KL-Divergence?"}
{"completion": " The related materials of Maximum Mean Discrepancy MMD is: 1.A Kernel Two-Sample Test (MIT courses):<url>https://jmlr.csail.mit.edu/papers/volume13/gretton12a/gretton12a.pdf</url>;", "tag": "machine learning,mmd", "prompt": "Can you help me find related materials of Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Prototypical Networks Protonets is: <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>. Here is the brief introduction of Prototypical Networks Protonets:\n        Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\\phi}(.) with parameters \\phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\\theta) is calculated over query set.\n        Related Documents\n\nSee paper Prototypical Networks for Few-shot Learning for more detail.\n\nRelated Videos\n\n\n", "tag": "machine learning,meta learning", "prompt": "Can you explain Prototypical Networks Protonets?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>KL(P||Q)</equation> in KL-Divergence?"}
{"completion": " The equation for Graph Convolutional Networks GCN is: <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>. Here is the brief introduction of Graph Convolutional Networks GCN:\n\n        In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.\n        \nRelated Documents\n\n\">Semi-supervised classification with graph convolutional\nnetworks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Please give me detailed tutorial of Graph Convolutional Networks GCN?"}
{"completion": " The equation for TransH is: <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>. Here is the brief introduction of TransH:\n     TransH model learns low-dimensional representations of knowledge graphs triples on the hyperplane of the entities and relations. See paper Knowledge Graph Embedding by Translating on Hyperplanes for more details.\n        Related Documents\n\nKnowledge Graph Embedding by Translating on Hyperplanes\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you explain TransH?"}
{"completion": " The equation for Transformer is: <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>. Here is the brief introduction of Transformer:\nRelated Documents\n\nAttention Is All You Need\n\nRelated Videos\n\n\n", "tag": "machine learning,nlp,gpt", "prompt": "What's the meaning of Transformer?"}
{"completion": " The equation for GraphSage is: <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>. Here is the brief introduction of GraphSage:\n\n        AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.\n        \nRelated Documents\n\n\">Inductive Representation Learning on Large Graphs\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you give me a brief introduction to GraphSage?"}
{"completion": " The equation for KL-Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of KL-Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "I am interested in KL-Divergence?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation Bregman Divergences?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Can you write equation Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Graph Convolutional Networks GCN is: <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>. Here is the brief introduction of Graph Convolutional Networks GCN:\n\n        In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.\n        \nRelated Documents\n\n\">Semi-supervised classification with graph convolutional\nnetworks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "What's the meaning of Graph Convolutional Networks GCN?"}
{"completion": " The equation for Variational AutoEncoder VAE is: <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>. Here is the brief introduction of Variational AutoEncoder VAE:", "tag": "machine learning,VAE", "prompt": "Can you explain Variational AutoEncoder VAE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>r</equation> in Bellman Equation is: Reward at state s.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>r</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Multi-Gate Mixture of Experts MMoEis <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>", "tag": "machine learning,multi task", "prompt": "How to write equation Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Bregman Divergences is: <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>. Here is the brief introduction of Bregman Divergences:\n         The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\\phi} is defined as above. \\phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.\n    ", "tag": "machine learning", "prompt": "Please give me detailed tutorial of Bregman Divergences?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>P(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "What is the latex code of equation Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for Average Treatment Effect ATE is: <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>. Here is the brief introduction of Average Treatment Effect ATE:\n            Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Average Treatment Effect ATE?"}
{"completion": " The equation for Support Vector Machine SVM is: <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. Here is the brief introduction of Support Vector Machine SVM:\n        Latex code for Support Vector Machine (SVM).\n        \n: Dual problem Lagrangian Relaxation\n: Weight of Linear Classifier\n: Classifier\n: Decision Boundary\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,svm", "prompt": "What's the meaning of Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Average Treatment Effect ATE?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of T-Learner?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution X with m data points..", "tag": "machine learning,mmd", "prompt": "What is the meaning of the symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "What is the equation of Proximal Policy Optimization PPO?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the latex code of equation Graph Convolutional Networks GCN?"}
{"completion": " The related materials of Deep Kernel Learning is: ", "tag": "machine learning,Deep Kernel Learning", "prompt": "Can you help me find related materials of Deep Kernel Learning?"}
{"completion": " The Latex Code for equation Proximal Policy Optimization PPOis <equation>L^{CLIP}(\\\\theta)=E_{t}[\\\\min(r_{t}(\\\\theta))A_{t}, \\\\text{clip}(r_{t}(\\\\theta), 1-\\\\epsilon,1+\\\\epsilon)A_{t}]</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation Proximal Policy Optimization PPO?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling is: Variable length sequences of symbols.", "tag": "machine learning", "prompt": "What is the meaning of <equation>(s_{1}, s_{2}, ..., s_{n})</equation> in Language Modelling?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "Please give me some hints on equation Diffusion Model Forward Process?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in KL-Divergence is: Distribution of P(x) over x.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>P(x)</equation> in KL-Divergence?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation KL-Divergence?"}
{"completion": " The related materials of Jensen-Shannon Divergence JS-Divergence is: ", "tag": "machine learning", "prompt": "Can you help me find related materials of Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "Please give me some hints on equation Mahalanobis Distance?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "What is the latex code of equation SME Linear?"}
{"completion": " The equation for Progressive Layered Extraction PLE is: <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>. Here is the brief introduction of Progressive Layered Extraction PLE:\n\n        Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \\times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \\times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.\n        \nRelated Documents\n\nSee below link of paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "What's the meaning of Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation Model-Agnostic Meta-Learning MAMLis <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>", "tag": "machine learning,meta learning", "prompt": "Can you write equation Model-Agnostic Meta-Learning MAML?"}
{"completion": " The equation for Diffusion Model Forward Process Reparameterization is: <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>. Here is the brief introduction of Diffusion Model Forward Process Reparameterization:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you give me a brief introduction to Diffusion Model Forward Process Reparameterization?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the equation of Diffusion Model Forward Process Reparameterization?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "Please give me some hints on equation Variational AutoEncoder VAE?"}
{"completion": " The related materials of Diffusion Model Reverse Process is: 1.Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.:<url>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</url>;2.Latex Code for Diffusion Models Equations:<url>http://deepnlp.org/blog/latex-code-for-diffusion-models-equations</url>;", "tag": "machine learning,diffusion", "prompt": "Can you help me find related materials of Diffusion Model Reverse Process?"}
{"completion": " The related materials of Unconfoundedness Assumption is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Unconfoundedness Assumption?"}
{"completion": " The equation for SimplE is: <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>. Here is the brief introduction of SimplE:\n     The similarity function for a triple (e1 , r , e2 ) is ?he1 , vr , te2 ?.\n     SimplE considers two vectors he,te ? Rd as the embedding of each entity (similar to CP), and two vectors vr , vr?1 ? Rd for each relation r. The similarity function of SimplE foratriple(ei,r,ej)isdefinedas 1(?hei,vr,tej?+?hej,vr?1,tei?),i.e. theaverageoftheCP 2 scoresfor(ei,r,ej)and(ej,r?1,ei). See paper SimplE Embedding for Link Prediction in Knowledge Graphs for more details.\n    ", "tag": "machine learning,KG", "prompt": "Please give me detailed tutorial of SimplE?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "Show me the latex code of equation Variational AutoEncoder VAE?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of QINI?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD is: Superior of the discrepancy measure between two distribution..", "tag": "machine learning,mmd", "prompt": "What does the symbol mean <equation>\\textup{MMD}(\\mathbb{F},X,Y)</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "Can you write equation Bellman Equation?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation is: Value at state s in policy \\pi.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>r</equation> in Bellman Equation is: Reward at state s.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>r</equation> in Bellman Equation?"}
{"completion": " The equation for Language Modelling is: <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. Here is the brief introduction of Language Modelling:\n\n: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of \n\n: Variable length sequences of symbols\n\n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities \n\n", "tag": "machine learning", "prompt": "Can you explain Language Modelling?"}
{"completion": " The equation for Cross-Stitch Network is: <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>. Here is the brief introduction of Cross-Stitch Network:\n        The cross-stitch unit takes two activation maps xA and xB from previous layer and learns a linear combination of two inputs from previous tasks and combine them into two new representation. The linear combination is controlled by parameter \\alpha. \n\n        Related Documents\n\nSee below link of paper Cross-stitch Networks for Multi-task Learning for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you give me a brief introduction to Cross-Stitch Network?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation QINI?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is: <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. Here is the brief introduction of Maximum Mean Discrepancy MMD:\n        Latex code for the Maximum Mean Discrepancy MMD. I will briefly introduce the notations in this formulation. \n        \n: Superior of the discrepancy measure between two distribution.\n: Mean of probability distribution X with m data points.\n: Mean of probability distribution Y with n data points.\n\n\nRelated Documents\n\nA Kernel Two-Sample Test (MIT courses)\n\nRelated Videos\n\n\n", "tag": "machine learning,mmd", "prompt": "Can you give me a brief introduction to Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the latex code of equation RotatE?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is: <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. Here is the brief introduction of Wasserstein Distance Optimal Transport:\n        Latex code for the Wasserstein Distance (Optimal Transport Distance). I will briefly introduce the notations in this formulation. \n        \n: Wasserstein distance p power between two distributions P and Q\n: Power p of distance moving distributions P towards Q\n\n\nRelated Documents\n\nOptimal Transport and Wasserstein Distance (CMU lecture pdf)\n\nRelated Videos\n\n\n", "tag": "machine learning,wasserstein", "prompt": "Can you give me a brief introduction to Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the latex code of equation Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Conditional Random Field CRF is: <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>. Here is the brief introduction of Conditional Random Field CRF:\n        p(Y|x) denotes the linear chain Conditional Random Field(CRF). t_k denotes the function on the transition, s_l denote function on the node.\n        lambda_k and mu_l denotes the weight coefficient.\n    ", "tag": "machine learning,nlp", "prompt": "I am interested in Conditional Random Field CRF?"}
{"completion": " The related materials of SimplE is: ", "tag": "machine learning,KG", "prompt": "Can you help me find related materials of SimplE?"}
{"completion": " The equation for GraphSage is: <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>. Here is the brief introduction of GraphSage:\n\n        AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.\n        \nRelated Documents\n\n\">Inductive Representation Learning on Large Graphs\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "What's the meaning of GraphSage?"}
{"completion": " The equation for Language Modelling is: <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. Here is the brief introduction of Language Modelling:\n\n: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of \n\n: Variable length sequences of symbols\n\n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities \n\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to Language Modelling?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence is: Distribution of P(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of <equation>P(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the equation of S-Learner?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport is: Wasserstein distance p power between two distributions P and Q.", "tag": "machine learning,wasserstein", "prompt": "What is the meaning of the symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Graph Attention Network GAT is: <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>. Here is the brief introduction of Graph Attention Network GAT:\n\n        GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.\n        \nRelated Documents\n\nGRAPH ATTENTION NETWORKS\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "I am interested in Graph Attention Network GAT?"}
{"completion": " The equation for Propensity Score is: <equation>e := p(W=1|X=x)</equation>. Here is the brief introduction of Propensity Score:\n            The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain Propensity Score?"}
{"completion": " The Latex Code for equation Individual Treatment Effect ITEis <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Individual Treatment Effect ITE?"}
{"completion": " The Latex Code for equation T-Learneris <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation T-Learner?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation S-Learner?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "How to write equation Progressive Layered Extraction PLE?"}
{"completion": " The related materials of X-Learner is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of X-Learner?"}
{"completion": " The equation for Deep Kernel Learning is: <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>. Here is the brief introduction of Deep Kernel Learning:\n        The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\\phi).\n        See below link Deep Kernel Learning for more details. \n    ", "tag": "machine learning,Deep Kernel Learning", "prompt": "Can you explain Deep Kernel Learning?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Bound Lossis <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the equation of Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "What is the equation of Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence is: Distribution of Q(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of <equation>Q(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound Loss is: <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound Loss:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "What's the meaning of Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Generative Adversarial Networks GANis <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>", "tag": "machine learning,gan", "prompt": "Show me the latex code of equation Generative Adversarial Networks GAN?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM is: Classifier.", "tag": "machine learning,svm", "prompt": "Help me find the meaning of symbol <equation>y=wx+b</equation> in Support Vector Machine SVM?"}
{"completion": " The equation for Kullback-Leibler Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of Kullback-Leibler Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,kl divergence", "prompt": "Can you explain Kullback-Leibler Divergence?"}
{"completion": " The equation for Mahalanobis Distance is: <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>. Here is the brief introduction of Mahalanobis Distance:\n        Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details Mahalanobis Distance\n\nRelated Documents\n\nMahalanobis Distance\n\nRelated Videos\n\n\n", "tag": "machine learning,mahalanobis", "prompt": "What's the meaning of Mahalanobis Distance?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound is: <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Can you explain Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence is: Distribution of P(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What is the meaning of the symbol <equation>P(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for S-Learner is: <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>. Here is the brief introduction of S-Learner:\n            S-Learner use a single machine learning estimator \\mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \\mu and different inputs features of W, namely w=1 and w=0.\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of S-Learner?"}
{"completion": " The equation for TransH is: <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>. Here is the brief introduction of TransH:\n     TransH model learns low-dimensional representations of knowledge graphs triples on the hyperplane of the entities and relations. See paper Knowledge Graph Embedding by Translating on Hyperplanes for more details.\n        Related Documents\n\nKnowledge Graph Embedding by Translating on Hyperplanes\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Please give me detailed tutorial of TransH?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "What is the equation of TransE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\gamma</equation> in Bellman Equation is: Reward discount factor \\gamma.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>\\gamma</equation> in Bellman Equation?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport is: Wasserstein distance p power between two distributions P and Q.", "tag": "machine learning,wasserstein", "prompt": "Can you explain the symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Diffusion Model Reverse Process is: <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>. Here is the brief introduction of Diffusion Model Reverse Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Please give me detailed tutorial of Diffusion Model Reverse Process?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the equation of GraphSage?"}
{"completion": " The related materials of Propensity Score is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Propensity Score?"}
{"completion": " The equation for Multi-Gate Mixture of Experts MMoE is: <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>. Here is the brief introduction of Multi-Gate Mixture of Experts MMoE:\n\n        Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.\n        \nRelated Documents\n\nSee paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.\n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you explain Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for Unconfoundedness Assumption is: <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>. Here is the brief introduction of Unconfoundedness Assumption:\n        The unconfoundedness assumption or CIA (Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.\n    ", "tag": "machine learning,causual inference", "prompt": "Can you explain Unconfoundedness Assumption?"}
{"completion": " The related materials of Entire Space Multi-Task Model ESSM is: 1.Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate:<url>https://arxiv.org/pdf/1804.07931.pdf</url>;", "tag": "machine learning,multi task", "prompt": "Can you help me find related materials of Entire Space Multi-Task Model ESSM?"}
{"completion": " The equation for Model-Agnostic Meta-Learning MAML is: <equation>\\\\min_{\\\\theta} \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta^{'}_{i}}) = \\\\sum_{\\\\mathcal{T}_{i} \\\\sim p(\\\\mathcal{T})} \\\\mathcal{L}_{\\\\mathcal{T}_{i}}(f_{\\\\theta_{i} - \\\\alpha \\\\nabla_{\\\\theta} \\\\mathcal{L}_{\\\\mathcal{T}_{i}} (f_{\\\\theta}) })</equation>. Here is the brief introduction of Model-Agnostic Meta-Learning MAML:\n\n\n\n \n\n\n", "tag": "machine learning,meta learning", "prompt": "What's the meaning of Model-Agnostic Meta-Learning MAML?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "How to write equation Diffusion Model Forward Process Reparameterization?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "Can you write equation SimplE?"}
{"completion": " The equation for Domain Adaptation H-Divergence is: <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>. Here is the brief introduction of Domain Adaptation H-Divergence:\n\n            The H-Divergence is defined as the superior of divengence between two probability Pr(D) and Pr(D^{'}) for any hypothesis h in all hypotheses class H. In this formulation, given domain X with two data distribution D and D^{'} over X, I(h) denotes the characteristic function(indicator function) on X, which means that for subset of x in I(h), h(x) = 1. You can check more detailed information of domain adaptation and H-divergence in this paper by Shai Ben-David, A theory of learning from different domains for more details.\n        \nRelated Documents\n\nA theory of learning from different domains\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "Please give me detailed tutorial of Domain Adaptation H-Divergence?"}
{"completion": " The equation for Propensity Score is: <equation>e := p(W=1|X=x)</equation>. Here is the brief introduction of Propensity Score:\n            The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in Propensity Score?"}
{"completion": " The Latex Code for equation Mahalanobis Distanceis <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>", "tag": "machine learning,mahalanobis", "prompt": "Can you write equation Mahalanobis Distance?"}
{"completion": " The equation for Diffusion Model Forward Process is: <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>. Here is the brief introduction of Diffusion Model Forward Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "I am interested in Diffusion Model Forward Process?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Diffusion Model Reverse Process is: <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>. Here is the brief introduction of Diffusion Model Reverse Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "I am interested in Diffusion Model Reverse Process?"}
{"completion": " The equation for Graph Convolutional Networks GCN is: <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>. Here is the brief introduction of Graph Convolutional Networks GCN:\n\n        In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.\n        \nRelated Documents\n\n\">Semi-supervised classification with graph convolutional\nnetworks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you explain Graph Convolutional Networks GCN?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport is: Wasserstein distance p power between two distributions P and Q.", "tag": "machine learning,wasserstein", "prompt": "Help me find the meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation Language Modelling?"}
{"completion": " The equation for GraphSage is: <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>. Here is the brief introduction of GraphSage:\n\n        AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.\n        \nRelated Documents\n\n\">Inductive Representation Learning on Large Graphs\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Can you explain GraphSage?"}
{"completion": " The equation for Prototypical Networks Protonets is: <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>. Here is the brief introduction of Prototypical Networks Protonets:\n        Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\\phi}(.) with parameters \\phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\\theta) is calculated over query set.\n        Related Documents\n\nSee paper Prototypical Networks for Few-shot Learning for more detail.\n\nRelated Videos\n\n\n", "tag": "machine learning,meta learning", "prompt": "Please give me detailed tutorial of Prototypical Networks Protonets?"}
{"completion": " The equation for Bound on Target Domain Error is: <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>. Here is the brief introduction of Bound on Target Domain Error:\n\n            I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \\epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \\hat{\\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \\lambda denotes a fixed term as the sum of \\lambda_{S} and \\lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. \n        You can check more detailed information in this NIPS 2006 paper by Shai Ben-David for more details.\n        \nRelated Documents\n\nAnalysis of Representations for Domain Adaptation\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "I am interested in Bound on Target Domain Error?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in Kullback-Leibler Divergence is: Distribution of Q(x) over x.", "tag": "machine learning,kl divergence", "prompt": "What does the symbol mean <equation>Q(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for SimplE is: <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>. Here is the brief introduction of SimplE:\n     The similarity function for a triple (e1 , r , e2 ) is ?he1 , vr , te2 ?.\n     SimplE considers two vectors he,te ? Rd as the embedding of each entity (similar to CP), and two vectors vr , vr?1 ? Rd for each relation r. The similarity function of SimplE foratriple(ei,r,ej)isdefinedas 1(?hei,vr,tej?+?hej,vr?1,tei?),i.e. theaverageoftheCP 2 scoresfor(ei,r,ej)and(ej,r?1,ei). See paper SimplE Embedding for Link Prediction in Knowledge Graphs for more details.\n    ", "tag": "machine learning,KG", "prompt": "Can you explain SimplE?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "Can you write equation Transformer?"}
{"completion": " The equation for Hidden Markov Model is: <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>. Here is the brief introduction of Hidden Markov Model:\n        Q denotes the set of states and V denotes the set of obvervations. Let's assume we have state sequence I of length T, and observation sequence O of length T, Hidden Markov Model(HMM) use transition matrix A to denote the transition probability a_{ij} and matrix B to denote observation probability matrix b_jk.\n    ", "tag": "machine learning,nlp", "prompt": "Can you explain Hidden Markov Model?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "Please give me some hints on equation TransR?"}
{"completion": " The equation for Graph Convolutional Networks GCN is: <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>. Here is the brief introduction of Graph Convolutional Networks GCN:\n\n        In this formulation, W indicates layer-specific trainable weight matrix. H^{0} is the original inputs feature matrix X as H^{0}=X, with dimension as N * D, and H^{l} indicates the l-th layer hidden representation of graph. The model is trained with semi-supervised classification labels and the loss function L is defined above. You can check more detailed information in this ICLR paper, Semi-supervised classification with graph convolutional networks for more details.\n        \nRelated Documents\n\n\">Semi-supervised classification with graph convolutional\nnetworks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "I am interested in Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "What is the equation of Variational AutoEncoder VAE?"}
{"completion": " The equation for Cross-Stitch Network is: <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>. Here is the brief introduction of Cross-Stitch Network:\n        The cross-stitch unit takes two activation maps xA and xB from previous layer and learns a linear combination of two inputs from previous tasks and combine them into two new representation. The linear combination is controlled by parameter \\alpha. \n\n        Related Documents\n\nSee below link of paper Cross-stitch Networks for Multi-task Learning for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "What's the meaning of Cross-Stitch Network?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>Q(x)</equation> in KL-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "Help me find the meaning of symbol <equation>Q(x)</equation> in KL-Divergence?"}
{"completion": " The related materials of Average Treatment Effect ATE is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Average Treatment Effect ATE?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM is: Decision Boundary.", "tag": "machine learning,svm", "prompt": "What does the symbol mean <equation>wx+b=0</equation> in Support Vector Machine SVM?"}
{"completion": " The related materials of TransH is: 1.Knowledge Graph Embedding by Translating on Hyperplanes:<url>https://www.microsoft.com/en-us/research/publication/knowledge-graph-embedding-by-translating-on-hyperplanes/</url>;", "tag": "machine learning,KG", "prompt": "Can you help me find related materials of TransH?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>wx+b=0</equation> in Support Vector Machine SVM is: Decision Boundary.", "tag": "machine learning,svm", "prompt": "What is the meaning of the symbol <equation>wx+b=0</equation> in Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation S-Learneris <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation S-Learner?"}
{"completion": " The Latex Code for equation QINIis <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation QINI?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Processis <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>", "tag": "machine learning,diffusion", "prompt": "What is the latex code of equation Diffusion Model Forward Process?"}
{"completion": " The related materials of KL-Divergence is: ", "tag": "machine learning", "prompt": "Can you help me find related materials of KL-Divergence?"}
{"completion": " The equation for Cross-Stitch Network is: <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>. Here is the brief introduction of Cross-Stitch Network:\n        The cross-stitch unit takes two activation maps xA and xB from previous layer and learns a linear combination of two inputs from previous tasks and combine them into two new representation. The linear combination is controlled by parameter \\alpha. \n\n        Related Documents\n\nSee below link of paper Cross-stitch Networks for Multi-task Learning for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "I am interested in Cross-Stitch Network?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "How to write equation SimplE?"}
{"completion": " The related materials of Individual Treatment Effect ITE is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of Individual Treatment Effect ITE?"}
{"completion": " The equation for T-Learner is: <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>. Here is the brief introduction of T-Learner:\n            T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \\mu_0 and \\mu_1.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain T-Learner?"}
{"completion": " The equation for Unconfoundedness Assumption is: <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>. Here is the brief introduction of Unconfoundedness Assumption:\n        The unconfoundedness assumption or CIA (Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.\n    ", "tag": "machine learning,causual inference", "prompt": "I am interested in Unconfoundedness Assumption?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is: <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. Here is the brief introduction of Jensen-Shannon Divergence JS-Divergence:\n        Latex code for the Jensen-Shannon Divergence(JS-Divergence). I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "What's the meaning of Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Kullback-Leibler Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of Kullback-Leibler Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,kl divergence", "prompt": "What's the meaning of Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Conditional Average Treatment Effect CATEis <equation>\\\\tau(x):=\\\\mathbb{E}[Y(1)-Y(0)|X=x]</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Conditional Average Treatment Effect CATE?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "Can you write equation Hidden Markov Model?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "Can you write equation Deep Kernel Learning?"}
{"completion": " The equation for T-Learner is: <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>. Here is the brief introduction of T-Learner:\n            T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \\mu_0 and \\mu_1.\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of T-Learner?"}
{"completion": " The equation for TransR is: <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>. Here is the brief introduction of TransR:\n\n             TransR model learns low-dimensional representations of entities and relations to relation space r, and multiple original entity embedding to the mapping matrix M. See paper Learning Entity and Relation Embeddings for Knowledge Graph Completion for more details.\n        \nRelated Documents\n\n\">Learning Entity and Relation Embeddings for Knowledge Graph Completion\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you give me a brief introduction to TransR?"}
{"completion": " The Latex Code for equation Wasserstein Distance Optimal Transportis <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>", "tag": "machine learning,wasserstein", "prompt": "Show me the latex code of equation Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation is: Value at state s in policy \\pi.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "How to write equation Cross-Stitch Network?"}
{"completion": " The Latex Code for equation X-Learneris <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation X-Learner?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling is: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of.", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Bellman Equation is: <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. Here is the brief introduction of Bellman Equation:\n\n:  Value at state s in policy \\pi\n:  Value at state s^{'} in policy \\pi\n:  Probability of choosing action a given state s\n:  Reward at state s\n:  Reward discount factor \\gamma\n\n\n        You can check more detailed information of Bellman Equation in this tutorial Introduction to Reinforcement Learning for more details.\n        \nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of Bellman Equation?"}
{"completion": " The Latex Code for equation Support Vector Machine SVMis <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>", "tag": "machine learning,svm", "prompt": "How to write equation Support Vector Machine SVM?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "What is the latex code of equation Graph Attention Network GAT?"}
{"completion": " The equation for Graph Attention Network GAT is: <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>. Here is the brief introduction of Graph Attention Network GAT:\n\n        GAT applies graph attentional layer to model the graph propagation. In each layer, the node i has attention on all the other nodes j. And the attention coefficient is calculated. For the attention calculation, only the set of neighbours nodes N_{i} of each node i contributes to the final softmax attention calculation. You can check more detailed information in this paper, GRAPH ATTENTION NETWORKS for more details.\n        \nRelated Documents\n\nGRAPH ATTENTION NETWORKS\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "What's the meaning of Graph Attention Network GAT?"}
{"completion": " The Latex Code for equation Deep Kernel Learningis <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>", "tag": "machine learning,Deep Kernel Learning", "prompt": "Show me the latex code of equation Deep Kernel Learning?"}
{"completion": " The Latex Code for equation Variational AutoEncoder VAEis <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>", "tag": "machine learning,VAE", "prompt": "How to write equation Variational AutoEncoder VAE?"}
{"completion": " The equation for Multi-Gate Mixture of Experts MMoE is: <equation>g^{k}(x)=\\\\text{softmax}(W_{gk}x) \\\\\\\\\\             f^{k}(x)=\\\\sum^{n}_{i=1}g^{k}(x)_{i}f_{i}(x) \\\\\\\\\\             y_{k}=h^{k}(f^{k}(x))</equation>. Here is the brief introduction of Multi-Gate Mixture of Experts MMoE:\n\n        Multi-Gate Mixture of Experts (MMoE) model is firstly introduced in KDD2018 paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. The model introduce a MMoE layer to model the relationship of K multiple tasks using N experts. Let's assume input feature X has dimension D. There are K output tasks and N experts networks. The gating network is calculated as, g^{k}(x) is a N-dimensional vector indicating the softmax result of relative weights, W_{gk} is a trainable matrix with size R^{ND}. And f^{k}(x) is the weghted sum representation of output of N experts for task k. f_{i}(x) is the output of the i-th expert, and f^{k}(x) indicates the representation of k-th tasks as the summation of N experts.\n        \nRelated Documents\n\nSee paper Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts for details.\n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "What's the meaning of Multi-Gate Mixture of Experts MMoE?"}
{"completion": " The equation for QINI is: <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>. Here is the brief introduction of QINI:\n            Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you explain QINI?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution X with m data points..", "tag": "machine learning,mmd", "prompt": "What is the meaning of <equation>\\frac{1}{m}\\sum_{i=1}^{m}f(x_{i})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The Latex Code for equation Maximum Mean Discrepancy MMDis <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>", "tag": "machine learning,mmd", "prompt": "Show me the latex code of equation Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>KL(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for Generative Adversarial Networks GAN is: <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>. Here is the brief introduction of Generative Adversarial Networks GAN:\n\n            GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks\n\nRelated Documents\n\nGenerative Adversarial Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,gan", "prompt": "Can you give me a brief introduction to Generative Adversarial Networks GAN?"}
{"completion": " The Latex Code for equation Unconfoundedness Assumptionis <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Unconfoundedness Assumption?"}
{"completion": " The equation for Cross-Stitch Network is: <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>. Here is the brief introduction of Cross-Stitch Network:\n        The cross-stitch unit takes two activation maps xA and xB from previous layer and learns a linear combination of two inputs from previous tasks and combine them into two new representation. The linear combination is controlled by parameter \\alpha. \n\n        Related Documents\n\nSee below link of paper Cross-stitch Networks for Multi-task Learning for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Please give me detailed tutorial of Cross-Stitch Network?"}
{"completion": " The related materials of Bound on Target Domain Error is: 1.Analysis of Representations for Domain Adaptation:<url>https://papers.nips.cc/paper/2006/file/b1b0432ceafb0ce714426e9114852ac7-Paper.pdf</url>;", "tag": "machine learning,transfer learning", "prompt": "Can you help me find related materials of Bound on Target Domain Error?"}
{"completion": " The equation for QINI is: <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>. Here is the brief introduction of QINI:\n            Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to QINI?"}
{"completion": " The Latex Code for equation Bellman Equationis <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>", "tag": "machine learning", "prompt": "What is the equation of Bellman Equation?"}
{"completion": " The equation for Variational AutoEncoder VAE is: <equation>\\\\log p_{\\\\theta}(x)=\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log p_{\\\\theta}(x)] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log \\\\frac{p_{\\\\theta}(x,z)}{p_{\\\\theta}(z|x)}] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} \\\\times \\\\frac{q_{\\\\phi}(z|x)}{p_{\\\\theta}(z|x)}]] \\\\\\\\\\         =\\\\mathbb{E}_{q_{\\\\phi}(z|x)}[\\\\log [\\\\frac{p_{\\\\theta}(x,z)}{q_{\\\\phi}(z|x)} ]] +D_{KL}(q_{\\\\phi}(z|x) || p_{\\\\theta}(z|x))\\\\\\\\</equation>. Here is the brief introduction of Variational AutoEncoder VAE:", "tag": "machine learning,VAE", "prompt": "Can you give me a brief introduction to Variational AutoEncoder VAE?"}
{"completion": " The equation for TransH is: <equation>f_{r}(h,t) =||h_{\\\\perp} + d_{r} - t_{\\\\perp} ||^{2}_{2}=||(h - w_{r}hw_{r}) + d_{r} - (t - w_{r}tw_{r}) ||^{2}_{2}</equation>. Here is the brief introduction of TransH:\n     TransH model learns low-dimensional representations of knowledge graphs triples on the hyperplane of the entities and relations. See paper Knowledge Graph Embedding by Translating on Hyperplanes for more details.\n        Related Documents\n\nKnowledge Graph Embedding by Translating on Hyperplanes\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "What's the meaning of TransH?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>v_{\\pi}(s)</equation> in Bellman Equation is: Value at state s in policy \\pi.", "tag": "machine learning", "prompt": "What is the meaning of <equation>v_{\\pi}(s)</equation> in Bellman Equation?"}
{"completion": " The equation for Progressive Layered Extraction PLE is: <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>. Here is the brief introduction of Progressive Layered Extraction PLE:\n\n        Progressive Layered Extraction(PLE) model slightly modifies the original structure of MMoE models and explicitly separate the experts into shared experts and task-specific experts. Let's assume there are m_{s} shared experts and m_{t} tasks-specific experts. S^{k}(x) is a selected matrix composed of (m_{s} + m_{t}) D-dimensional vectors, with dimension as (m_{s} + m_{t}) \\times D. w^{k}(x) denotes the gating function with size (m_{s} + m_{t}) and W^{k}_{g} is a trainable parameters with dimension as (m_{s} + m_{t}) \\times D. t^{k} denotes the task-specific tower paratmeters. The progressive extraction layer means that the gating network g^{k,j}(x) of j-th extraction layer takes the output of previous gating layers g^{k,j-1}(x) as inputs.\n        \nRelated Documents\n\nSee below link of paper Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations for more details. \n\nRelated Videos\n\n\n", "tag": "machine learning,multi task", "prompt": "Can you explain Progressive Layered Extraction PLE?"}
{"completion": " The Latex Code for equation KL-Divergenceis <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>", "tag": "machine learning", "prompt": "Show me the latex code of equation KL-Divergence?"}
{"completion": " The equation for T-Learner is: <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>. Here is the brief introduction of T-Learner:\n            T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \\mu_0 and \\mu_1.\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in T-Learner?"}
{"completion": " The equation for Mahalanobis Distance is: <equation>D_{M}(x,y)=\\\\sqrt{(x-y)^{T}\\\\Sigma^{-1}(x-y)}</equation>. Here is the brief introduction of Mahalanobis Distance:\n        Mahalanobis Distance is a distance measure between a data point and dataset of a distribution. See website for more details Mahalanobis Distance\n\nRelated Documents\n\nMahalanobis Distance\n\nRelated Videos\n\n\n", "tag": "machine learning,mahalanobis", "prompt": "Please give me detailed tutorial of Mahalanobis Distance?"}
{"completion": " The Latex Code for equation Graph Attention Network GATis <equation>h=\\\\{\\\\vec{h_{1}},\\\\vec{h_{2}},...,\\\\vec{h_{N}}\\\\}, \\\\\\\\\\             \\\\vec{h_{i}} \\\\in \\\\mathbb{R}^{F} \\\\\\\\\\             W \\\\in \\\\mathbb{R}^{F \\\\times F^{'}} \\\\\\\\\\             e_{ij}=a(Wh_{i},Wh_{j}) \\\\\\\\\\             k \\\\in \\\\mathcal{N}_{i},\\\\text{ neighbourhood nodes}\\\\\\\\\\             a_{ij}=\\\\text{softmax}_{j}(e_{ij})=\\\\frac{\\\\exp(e_{ij})}{\\\\sum_{k \\\\in \\\\mathcal{N}_{i}} \\\\exp(e_{ik})}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Please give me some hints on equation Graph Attention Network GAT?"}
{"completion": " The equation for Area Under Uplift Curve AUUC is: <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>. Here is the brief introduction of Area Under Uplift Curve AUUC:\n            Authors in this paper Causal Inference and Uplift Modeling A review of the literature  define AUUC coefficient as the area under the uplift curve.\n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation TransEis <equation>\\\\mathcal{L}=\\\\sum_{(h,r,t) \\\\in S} \\\\sum_{(h^{'},r^{'},t^{'}) \\\\in S^{'}_{(h,r,t)}} \\\\[ \\\\gamma + d(h + r, t) - d(h^{'} + r^{'}, t^{'}) \\\\]_{+} \\\\\\\\ S^{'}_{(h,r,t)}=\\\\{(h^{'},r,t)|h^{'} \\\\in E\\\\} \\\\cup \\\\{(h,r,t^{'})|t^{'} \\\\in E\\\\} \\\\\\\\ d(h + r, t)=||h + r - t||^{2}_{2}</equation>", "tag": "machine learning", "prompt": "What is the latex code of equation TransE?"}
{"completion": " The Latex Code for equation Bregman Divergencesis <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>", "tag": "machine learning", "prompt": "What is the equation of Bregman Divergences?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "What is the equation of Prototypical Networks Protonets?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence is: Distribution of Q(x) over x.", "tag": "machine learning", "prompt": "What is the meaning of <equation>Q(x)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The related materials of Graph Laplacian is: 1.\">Graph Laplacian matrix- MathWorks:<url>https://www.mathworks.com/help/matlab/ref/graph.laplacian.html</url>;2.\">Graphs and Graph Laplacians- Upenn Tutorial:<url>https://www.cis.upenn.edu/~cis5150/cis515-14-graphlap.pdf</url>;", "tag": "machine learning,graph,GNN", "prompt": "Can you help me find related materials of Graph Laplacian?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "How to write equation Graph Convolutional Networks GCN?"}
{"completion": " The Latex Code for equation Prototypical Networks Protonetsis <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>", "tag": "machine learning,meta learning", "prompt": "How to write equation Prototypical Networks Protonets?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for S-Learner is: <equation>\\\\mu(x,w)=\\\\mathbb{E}[Y_{i}|X=x_{i},W=w] \\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}(x,1)-\\\\hat{\\\\mu}(x,0)</equation>. Here is the brief introduction of S-Learner:\n            S-Learner use a single machine learning estimator \\mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \\mu and different inputs features of W, namely w=1 and w=0.\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in S-Learner?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>p(x)</equation> in Language Modelling is: \n                : Factorize the joint probabilities over symbols p(x) as the product of conditional probabilities .", "tag": "machine learning", "prompt": "Can you explain the symbol <equation>p(x)</equation> in Language Modelling?"}
{"completion": " The equation for KL-Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>KL(P||Q)</equation> in KL-Divergence is: KL Divergence between P and Q.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>KL(P||Q)</equation> in KL-Divergence?"}
{"completion": " The equation for GraphSage is: <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>. Here is the brief introduction of GraphSage:\n\n        AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.\n        \nRelated Documents\n\n\">Inductive Representation Learning on Large Graphs\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "I am interested in GraphSage?"}
{"completion": " The related materials of Diffusion Model Forward Process is: 1.Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.:<url>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</url>;2.Latex Code for Diffusion Models Equations:<url>http://deepnlp.org/blog/latex-code-for-diffusion-models-equations</url>;", "tag": "machine learning,diffusion", "prompt": "Can you help me find related materials of Diffusion Model Forward Process?"}
{"completion": " The equation for Kullback-Leibler Divergence is <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. The meaning of symbol <equation>P(x)</equation> in Kullback-Leibler Divergence is: Distribution of P(x) over x.", "tag": "machine learning,kl divergence", "prompt": "Can you explain the symbol <equation>P(x)</equation> in Kullback-Leibler Divergence?"}
{"completion": " The equation for Bellman Equation is: <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. Here is the brief introduction of Bellman Equation:\n\n:  Value at state s in policy \\pi\n:  Value at state s^{'} in policy \\pi\n:  Probability of choosing action a given state s\n:  Reward at state s\n:  Reward discount factor \\gamma\n\n\n        You can check more detailed information of Bellman Equation in this tutorial Introduction to Reinforcement Learning for more details.\n        \nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you explain Bellman Equation?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport is: Power p of distance moving distributions P towards Q.", "tag": "machine learning,wasserstein", "prompt": "What is the meaning of the symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for T-Learner is: <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>. Here is the brief introduction of T-Learner:\n            T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \\mu_0 and \\mu_1.\n        ", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to T-Learner?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound is: <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "I am interested in Diffusion Model Variational Lower Bound?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the equation of TransR?"}
{"completion": " The equation for Bellman Equation is: <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. Here is the brief introduction of Bellman Equation:\n\n:  Value at state s in policy \\pi\n:  Value at state s^{'} in policy \\pi\n:  Probability of choosing action a given state s\n:  Reward at state s\n:  Reward discount factor \\gamma\n\n\n        You can check more detailed information of Bellman Equation in this tutorial Introduction to Reinforcement Learning for more details.\n        \nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Can you give me a brief introduction to Bellman Equation?"}
{"completion": " The Latex Code for equation Jensen-Shannon Divergence JS-Divergenceis <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>", "tag": "machine learning", "prompt": "Please give me some hints on equation Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The equation for T-Learner is: <equation>\\\\mu_{0}(x)=\\\\mathbb{E}[Y(0)|X=x],\\\\mu_{1}(x)=\\\\mathbb{E}[Y(1)|X=x],\\\\\\\\\\                 \\\\hat{\\\\tau}(x)=\\\\hat{\\\\mu}_{1}(x)-\\\\hat{\\\\mu}_{0}(x)</equation>. Here is the brief introduction of T-Learner:\n            T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \\mu_0 and \\mu_1.\n        ", "tag": "machine learning,causual inference", "prompt": "What's the meaning of T-Learner?"}
{"completion": " The Latex Code for equation Conditional Random Field CRFis <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>", "tag": "machine learning,nlp", "prompt": "What is the equation of Conditional Random Field CRF?"}
{"completion": " The equation for Prototypical Networks Protonets is: <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>. Here is the brief introduction of Prototypical Networks Protonets:\n        Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\\phi}(.) with parameters \\phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\\theta) is calculated over query set.\n        Related Documents\n\nSee paper Prototypical Networks for Few-shot Learning for more detail.\n\nRelated Videos\n\n\n", "tag": "machine learning,meta learning", "prompt": "What's the meaning of Prototypical Networks Protonets?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is: <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. Here is the brief introduction of Wasserstein Distance Optimal Transport:\n        Latex code for the Wasserstein Distance (Optimal Transport Distance). I will briefly introduce the notations in this formulation. \n        \n: Wasserstein distance p power between two distributions P and Q\n: Power p of distance moving distributions P towards Q\n\n\nRelated Documents\n\nOptimal Transport and Wasserstein Distance (CMU lecture pdf)\n\nRelated Videos\n\n\n", "tag": "machine learning,wasserstein", "prompt": "I am interested in Wasserstein Distance Optimal Transport?"}
{"completion": " The related materials of Domain Adaptation H-Divergence is: 1.A theory of learning from different domains:<url>https://www.researchgate.net/publication/220343859_A_theory_of_learning_from_different_domains</url>;", "tag": "machine learning,transfer learning", "prompt": "Can you help me find related materials of Domain Adaptation H-Divergence?"}
{"completion": " The equation for Jensen-Shannon Divergence JS-Divergence is <equation>JS(P||Q)=\\\\frac{1}{2}KL(P||\\\\frac{(P+Q)}{2})+\\\\frac{1}{2}KL(Q||\\\\frac{(P+Q)}{2})</equation>. The meaning of symbol <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence is: JS Divergence between P and Q, which is the symmetric divergence metric between distribution P and Q.", "tag": "machine learning", "prompt": "What does the symbol mean <equation>JS(P||Q)</equation> in Jensen-Shannon Divergence JS-Divergence?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "How to write equation Bound on Target Domain Error?"}
{"completion": " The Latex Code for equation Diffusion Model Reverse Processis <equation>p_\\\\theta(\\\\mathbf{x}_{0:T}) = p(\\\\mathbf{x}_T) \\\\prod^T_{t=1} p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) \\\\\\\\\\             p_\\\\theta(\\\\mathbf{x}_{t-1} \\\\vert \\\\mathbf{x}_t) = \\\\mathcal{N}(\\\\mathbf{x}_{t-1}; \\\\boldsymbol{\\\\mu}_\\\\theta(\\\\mathbf{x}_t, t), \\\\boldsymbol{\\\\Sigma}_\\\\theta(\\\\mathbf{x}_t, t))</equation>", "tag": "machine learning,diffusion", "prompt": "Please give me some hints on equation Diffusion Model Reverse Process?"}
{"completion": " The equation for Diffusion Model Variational Lower Bound Loss is: <equation>\\\\begin{aligned}\\             L_\\\\text{VLB} &= L_T + L_{T-1} + \\\\dots + L_0 \\\\\\\\\\             \\\\text{where } L_T &= D_\\\\text{KL}(q(\\\\mathbf{x}_T \\\\vert \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_T)) \\\\\\\\\\             L_t &= D_\\\\text{KL}(q(\\\\mathbf{x}_t \\\\vert \\\\mathbf{x}_{t+1}, \\\\mathbf{x}_0) \\\\parallel p_\\\\theta(\\\\mathbf{x}_t \\\\vert\\\\mathbf{x}_{t+1})) \\\\text{ for }1 \\\\leq t \\\\leq T-1 \\\\\\\\\\             L_0 &= - \\\\log p_\\\\theta(\\\\mathbf{x}_0 \\\\vert \\\\mathbf{x}_1)\\             \\\\end{aligned}</equation>. Here is the brief introduction of Diffusion Model Variational Lower Bound Loss:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "I am interested in Diffusion Model Variational Lower Bound Loss?"}
{"completion": " The Latex Code for equation Cross-Stitch Networkis <equation>\\\\begin{bmatrix} \\\\tilde{x}^{ij}_{A}\\\\\\\\\\\\tilde{x}^{ij}_{B}\\\\end{bmatrix}=\\\\begin{bmatrix} a_{AA} & a_{AB}\\\\\\\\ a_{BA} & a_{BB} \\\\end{bmatrix}\\\\begin{bmatrix} x^{ij}_{A}\\\\\\\\ x^{ij}_{B} \\\\end{bmatrix}</equation>", "tag": "machine learning,multi task", "prompt": "Show me the latex code of equation Cross-Stitch Network?"}
{"completion": " The equation for Individual Treatment Effect ITE is: <equation>\\\\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)</equation>. Here is the brief introduction of Individual Treatment Effect ITE:\n            Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i. \n        ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Individual Treatment Effect ITE?"}
{"completion": " The equation for Bellman Equation is <equation>v_{\\\\pi}(s)=\\\\sum_{a}\\\\pi(a|s)\\\\sum_{s^{'},r}p(s^{'},r|s,a)[r+\\\\gamma v_{\\\\pi}(s^{'})]</equation>. The meaning of symbol <equation>\\gamma</equation> in Bellman Equation is: Reward discount factor \\gamma.", "tag": "machine learning", "prompt": "What is the meaning of the symbol <equation>\\gamma</equation> in Bellman Equation?"}
{"completion": " The Latex Code for equation Transformeris <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>", "tag": "machine learning,nlp,gpt", "prompt": "Show me the latex code of equation Transformer?"}
{"completion": " The Latex Code for equation Diffusion Model Forward Process Reparameterizationis <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>", "tag": "machine learning,diffusion", "prompt": "Please give me some hints on equation Diffusion Model Forward Process Reparameterization?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "What is the equation of Progressive Layered Extraction PLE?"}
{"completion": " The equation for Kullback-Leibler Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of Kullback-Leibler Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,kl divergence", "prompt": "I am interested in Kullback-Leibler Divergence?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "Can you write equation Bound on Target Domain Error?"}
{"completion": " The Latex Code for equation Average Treatment Effect ATEis <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>", "tag": "machine learning,causual inference", "prompt": "How to write equation Average Treatment Effect ATE?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Propensity Score?"}
{"completion": " The Latex Code for equation Graph Convolutional Networks GCNis <equation>H^{(l+1)}=\\\\sigma(\\\\tilde{D}^{-\\\\frac{1}{2}}\\\\tilde{A}\\\\tilde{D}^{-\\\\frac{1}{2}}H^{l}W^{l})\\\\\\\\\\             \\\\tilde{A}=A+I_{N}\\\\\\\\\\             \\\\tilde{D}_{ii}=\\\\sum_{j}\\\\tilde{A}_{ij} \\\\\\\\\\             H^{0}=X \\\\\\\\ \\             \\\\mathcal{L}=-\\\\sum_{l \\\\in Y}\\\\sum^{F}_{f=1} Y_{lf} \\\\ln Z_{lf}</equation>", "tag": "machine learning,graph,GNN", "prompt": "Can you write equation Graph Convolutional Networks GCN?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is: <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. Here is the brief introduction of Maximum Mean Discrepancy MMD:\n        Latex code for the Maximum Mean Discrepancy MMD. I will briefly introduce the notations in this formulation. \n        \n: Superior of the discrepancy measure between two distribution.\n: Mean of probability distribution X with m data points.\n: Mean of probability distribution Y with n data points.\n\n\nRelated Documents\n\nA Kernel Two-Sample Test (MIT courses)\n\nRelated Videos\n\n\n", "tag": "machine learning,mmd", "prompt": "I am interested in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for SME Linear is: <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>. Here is the brief introduction of SME Linear:\n\n             The energy function E (denoted SME) is encoded using a neural network, whose architecture first processes each entity in parallel, like in siamese networks. The intuition is that the relation type should first be used to extract relevant components from each argument\u00e2\u20ac\u2122s embedding, and put them in a space where they can then be compared. See paper A Semantic Matching Energy Function for Learning with Multi-relational Data for more details.\n        \nRelated Documents\n\n\">A Semantic Matching Energy Function for Learning with Multi-relational Data\n\nRelated Videos\n\n\n", "tag": "machine learning,KG", "prompt": "Can you explain SME Linear?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "Show me the latex code of equation Propensity Score?"}
{"completion": " The equation for Generative Adversarial Networks GAN is: <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>. Here is the brief introduction of Generative Adversarial Networks GAN:\n\n            GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks\n\nRelated Documents\n\nGenerative Adversarial Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,gan", "prompt": "I am interested in Generative Adversarial Networks GAN?"}
{"completion": " The Latex Code for equation Bound on Target Domain Erroris <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>", "tag": "machine learning,transfer learning", "prompt": "Please give me some hints on equation Bound on Target Domain Error?"}
{"completion": " The Latex Code for equation Area Under Uplift Curve AUUCis <equation>f(t)=(\\\\frac{Y^{T}_{t}}{N^{T}_{t}} - \\\\frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})</equation>", "tag": "machine learning,causual inference", "prompt": "What is the latex code of equation Area Under Uplift Curve AUUC?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "What is the equation of SME Linear?"}
{"completion": " The equation for Domain-Adversarial Neural Networks DANN is: <equation>\\\\min [\\\\frac{1}{m}\\\\sum^{m}_{1}\\\\mathcal{L}(f(\\\\textbf{x}^{s}_{i}),y_{i})+\\\\lambda \\\\max(-\\\\frac{1}{m}\\\\sum^{m}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{s}_{i}),1)-\\\\frac{1}{m^{'}}\\\\sum^{m^{'}}_{i=1}\\\\mathcal{L}^{d}(o(\\\\textbf{x}^{t}_{i}),0))]</equation>. Here is the brief introduction of Domain-Adversarial Neural Networks DANN:\n\n        In this formulation of Domain-Adversarial Neural Networks(DANN), authors add a domain adaptation regularizer term to the original loss function of source domain. The domain adaptation regularizer term are calculated based on the H-divergence of two distributions h(X_{S}) and h(X_{T}). The adversial network aims to maximize the likelihood that the domain classifier are unable to distingush a data point belongs to source domain S or target domain T. Function o(.) is the domain regressor which learns high level representation o(X) given input X. You can check more detailed information in this paper by Hana Ajakan, Pascal Germain, et al., Domain-Adversarial Neural Networks for more details.\n        \nRelated Documents\n\nDomain-Adversarial Neural Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "I am interested in Domain-Adversarial Neural Networks DANN?"}
{"completion": " The equation for X-Learner is: <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>. Here is the brief introduction of X-Learner:\n            See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning\n", "tag": "machine learning,causual inference", "prompt": "Can you give me a brief introduction to X-Learner?"}
{"completion": " The equation for KL-Divergence is: <equation>KL(P||Q)=\\\\sum_{x}P(x)\\\\log(\\\\frac{P(x)}{Q(x)})</equation>. Here is the brief introduction of KL-Divergence:\n        Latex code for the Kullback-Leibler Divergence. I will briefly introduce the notations in this formulation. \n        \n: KL Divergence between P and Q\n: Distribution of P(x) over x\n: Distribution of Q(x) over x\n\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning", "prompt": "Please give me detailed tutorial of KL-Divergence?"}
{"completion": " The Latex Code for equation GraphSageis <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>", "tag": "machine learning,graph,GNN", "prompt": "How to write equation GraphSage?"}
{"completion": " The equation for Generative Adversarial Networks GAN is: <equation>\\\\min_{G} \\\\max_{D} V(D,G)=\\\\mathbb{E}_{x \\\\sim p_{data}(x)}[\\\\log D(x)]+\\\\mathbb{E}_{z \\\\sim p_{z}(z)}[\\\\log(1-D(G(z)))]</equation>. Here is the brief introduction of Generative Adversarial Networks GAN:\n\n            GAN latex code is illustrated above. See paper for more details Generative Adversarial Networks\n\nRelated Documents\n\nGenerative Adversarial Networks\n\nRelated Videos\n\n\n", "tag": "machine learning,gan", "prompt": "What's the meaning of Generative Adversarial Networks GAN?"}
{"completion": " The Latex Code for equation Language Modellingis <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>", "tag": "machine learning", "prompt": "How to write equation Language Modelling?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "Can you write equation Propensity Score?"}
{"completion": " The related materials of Cross-Stitch Network is: 1.Cross-stitch Networks for Multi-task Learning:<url>https://openaccess.thecvf.com/content_cvpr_2016/papers/Misra_Cross-Stitch_Networks_for_CVPR_2016_paper.pdf</url>;", "tag": "machine learning,multi task", "prompt": "Can you help me find related materials of Cross-Stitch Network?"}
{"completion": " The equation for Graph Laplacian is: <equation>L=I_{N}-D^{-\\\\frac{1}{2}}AD^{-\\\\frac{1}{2}} \\\\\\\\\\ L=U\\\\Lambda U^{T}</equation>. Here is the brief introduction of Graph Laplacian:\n\n        Graph Laplacian matrix equals to the identity matrix I minus the matrix multiplication of three parts, the (-1/2) power of the degree matrix D, the adjacency matrix A, and (-1/2) power of degree matrix D. U is the eigenvectors of the normalized graph Laplacian L. The graph laplacian come from the graph Fourier transform F. The original signal x is first transformed to domain F(X) and inverse resulted signal is transformed back using the inverse graph Fourier transform F^{-1}.\n        \nRelated Documents\n\n\">Graphs and Graph Laplacians- Upenn Tutorial\n\">Graph Laplacian matrix- MathWorks\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "What's the meaning of Graph Laplacian?"}
{"completion": " The equation for QINI is: <equation>g(t)=Y^{T}_{t}-\\\\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\\\\\\\\\                 f(t)=g(t) \\\\times \\\\frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}</equation>. Here is the brief introduction of QINI:\n            Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in QINI?"}
{"completion": " The equation for SimplE is: <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>. Here is the brief introduction of SimplE:\n     The similarity function for a triple (e1 , r , e2 ) is ?he1 , vr , te2 ?.\n     SimplE considers two vectors he,te ? Rd as the embedding of each entity (similar to CP), and two vectors vr , vr?1 ? Rd for each relation r. The similarity function of SimplE foratriple(ei,r,ej)isdefinedas 1(?hei,vr,tej?+?hej,vr?1,tei?),i.e. theaverageoftheCP 2 scoresfor(ei,r,ej)and(ej,r?1,ei). See paper SimplE Embedding for Link Prediction in Knowledge Graphs for more details.\n    ", "tag": "machine learning,KG", "prompt": "I am interested in SimplE?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport is: Wasserstein distance p power between two distributions P and Q.", "tag": "machine learning,wasserstein", "prompt": "What does the symbol mean <equation>W_{p}(P||Q)</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Prototypical Networks Protonets is: <equation>c_{k}=\\\\frac{1}{|S_{k}|}\\\\sum_{(x_{i},y_{i}) \\\\in S_{k}} f_{\\\\phi}(x) \\\\\\\\ p_{\\\\phi}(y=k|x)=\\\\frac{\\\\exp(-d(f_{\\\\phi}(x), c_{k}))}{\\\\sum_{k^{'}} \\\\exp(-d(f_{\\\\phi}(x), c_{k^{'}})} \\\\\\\\\\\\min J(\\\\phi)=-\\\\log p_{\\\\phi}(y=k|x)</equation>. Here is the brief introduction of Prototypical Networks Protonets:\n        Prototypical networks compute an M-dimensional representation c_{k} or prototype, of each class through an embedding f_{\\phi}(.) with parameters \\phi. Each prototype is the mean vector of the embedded support points belonging to its class k. Prototypical networks then produce a distribution over classes for a query point x based on a softmax over distances to the prototypes in the embedding space as p(y=k|x). Then the negative log-likelihood of J(\\theta) is calculated over query set.\n        Related Documents\n\nSee paper Prototypical Networks for Few-shot Learning for more detail.\n\nRelated Videos\n\n\n", "tag": "machine learning,meta learning", "prompt": "I am interested in Prototypical Networks Protonets?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution Y with n data points..", "tag": "machine learning,mmd", "prompt": "What does the symbol mean <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution Y with n data points..", "tag": "machine learning,mmd", "prompt": "What is the meaning of the symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The related materials of T-Learner is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of T-Learner?"}
{"completion": " The equation for Diffusion Model Forward Process is: <equation>q(x_{t}|x_{t-1})=\\\\mathcal{N}(x_{t};\\\\sqrt{1-\\\\beta_{t}}x_{t-1},\\\\beta_{t}I) \\\\\\\\q(x_{1:T}|x_{0})=\\\\prod_{t=1}^{T}q(x_{t}|x_{t-1})</equation>. Here is the brief introduction of Diffusion Model Forward Process:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Please give me detailed tutorial of Diffusion Model Forward Process?"}
{"completion": " The equation for Deep Kernel Learning is: <equation>k(x_{i},x_{j}|\\\\phi)=k(h(x_i,w_k),h(x_j,w_k)|w_k,\\\\phi)</equation>. Here is the brief introduction of Deep Kernel Learning:\n        The original data instance x_{i} is first mapped to latent space by a non-linear transformation h(x_{i}, w_{k}), usually a deep neural network with parameter w_{k}, and then passed to a kernel function k(x_{i},x_{j}|\\phi).\n        See below link Deep Kernel Learning for more details. \n    ", "tag": "machine learning,Deep Kernel Learning", "prompt": "I am interested in Deep Kernel Learning?"}
{"completion": " The related materials of S-Learner is: ", "tag": "machine learning,causual inference", "prompt": "Can you help me find related materials of S-Learner?"}
{"completion": " The Latex Code for equation TransRis <equation>h_{r}=hM_{r}, t_{r}=tM_{r} \\\\\\\\f_{r}(h, t) = ||h_{r} + r - t_{r}||^{2}_{2}=||hM_{r}+r-tM_{r}||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "How to write equation TransR?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is: <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. Here is the brief introduction of Wasserstein Distance Optimal Transport:\n        Latex code for the Wasserstein Distance (Optimal Transport Distance). I will briefly introduce the notations in this formulation. \n        \n: Wasserstein distance p power between two distributions P and Q\n: Power p of distance moving distributions P towards Q\n\n\nRelated Documents\n\nOptimal Transport and Wasserstein Distance (CMU lecture pdf)\n\nRelated Videos\n\n\n", "tag": "machine learning,wasserstein", "prompt": "Please give me detailed tutorial of Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Bound on Target Domain Error is: <equation>\\\\epsilon_{T}(h) \\\\le \\\\hat{\\\\epsilon}_{S}(h) + \\\\sqrt{\\\\frac{4}{m}(d \\\\log \\\\frac{2em}{d} + \\\\log \\\\frac{4}{\\\\delta })} + d_{\\\\mathcal{H}}(\\\\tilde{\\\\mathcal{D}}_{S}, \\\\tilde{\\\\mathcal{D}}_{T}) + \\\\lambda \\\\\\\\\\             \\\\lambda = \\\\lambda_{S} + \\\\lambda_{T}</equation>. Here is the brief introduction of Bound on Target Domain Error:\n\n            I will explain this equation in more details. Domain adaptation literatures prove that the test error on target domain \\epsilon_{T}(h) is bounded by three terms: 1. the empirical estimate of training errors on the source domain \\hat{\\epsilon}_{S}(h); 2. the distance divergence between source domain and target domain d(Ds, Dt), 3. Fixed term of VC-Dimension(d), sample size of source domain m, e as the natural logarithm. \\lambda denotes a fixed term as the sum of \\lambda_{S} and \\lambda_{T}, which represent the errors of models training on Ds and Dt respectively. From the above analysis, we can see that if data source Ds and Dt are similar(the divergence between source and target domain distribution Ds and Dt is small), the error on target domain will also be bounded, that's how models trained on source domain will perform better on similar distributed target domains. \n        You can check more detailed information in this NIPS 2006 paper by Shai Ben-David for more details.\n        \nRelated Documents\n\nAnalysis of Representations for Domain Adaptation\n\nRelated Videos\n\n\n", "tag": "machine learning,transfer learning", "prompt": "What's the meaning of Bound on Target Domain Error?"}
{"completion": " The equation for Transformer is: <equation>\\\\text{Attention}(Q, K, V) = \\\\text{softmax}(\\\\frac{QK^T}{\\\\sqrt{d_k}})V</equation>. Here is the brief introduction of Transformer:\nRelated Documents\n\nAttention Is All You Need\n\nRelated Videos\n\n\n", "tag": "machine learning,nlp,gpt", "prompt": "Can you give me a brief introduction to Transformer?"}
{"completion": " The Latex Code for equation SME Linearis <equation>\\\\epsilon(lhs,rel,rhs)=E_{lhs(rel)}^{T}E_{rhs(rel)} \\\\\\\\=(W_{l1}E_{lhs}^{T} + W_{l2}E_{rel}^{T} + b_{l})^{T}(W_{r1}E_{rhs}^{T} + W_{r2}E_{rel}^{T} + b_{r})</equation>", "tag": "machine learning,KG", "prompt": "How to write equation SME Linear?"}
{"completion": " The equation for Unconfoundedness Assumption is: <equation>\\\\{Y_{i}(0),Y_{i}(1)\\\\}\\\\perp W_{i}|X_{i}</equation>. Here is the brief introduction of Unconfoundedness Assumption:\n        The unconfoundedness assumption or CIA (Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.\n    ", "tag": "machine learning,causual inference", "prompt": "Please give me detailed tutorial of Unconfoundedness Assumption?"}
{"completion": " The equation for Bregman Divergences is: <equation>d_{\\\\phi}(z,z^{'})=\\\\phi(z) - \\\\phi(z^{'})-(z-z^{'})^{T} \\\\nabla \\\\phi(z^{'})</equation>. Here is the brief introduction of Bregman Divergences:\n         The prototypi- cal networks algorithm is equivalent to performing mixture density estimation on the support set with an exponential family density. A regular Bregman divergence d_{\\phi} is defined as above. \\phi is a differentiable, strictly convex function of the Legendre type. Examples of Bregman divergences include squared Euclidean distance and Mahalanobis distance.\n    ", "tag": "machine learning", "prompt": "I am interested in Bregman Divergences?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "Please give me some hints on equation Domain Adaptation H-Divergence?"}
{"completion": " The Latex Code for equation SimplEis <equation>s(e_{i}, r, e_{j}) = \\\\frac{1}{2}(<h_{e_{i}}, v_{r}, t_{e_{j}}> + <h_{e_{j}}, v_{r^{-}}, t_{e_{i}}>)</equation>", "tag": "machine learning,KG", "prompt": "Please give me some hints on equation SimplE?"}
{"completion": " The Latex Code for equation Hidden Markov Modelis <equation>Q=\\\\{q_{1},q_{2},...,q_{N}\\\\}, V=\\\\{v_{1},v_{2},...,v_{M}\\\\} \\\\\\\\\\             I=\\\\{i_{1},i_{2},...,i_{T}\\\\},O=\\\\{o_{1},o_{2},...,o_{T}\\\\} \\\\\\\\\\             A=[a_{ij}]_{N \\\\times N}, a_{ij}=P(i_{t+1}=q_{j}|i_{t}=q_{i}) \\\\\\\\\\             B=[b_{j}(k)]_{N \\\\times M},b_{j}(k)=P(o_{t}=v_{k}|i_{t}=q_{j})</equation>", "tag": "machine learning,nlp", "prompt": "How to write equation Hidden Markov Model?"}
{"completion": " The Latex Code for equation RotatEis <equation>f_{r}(h, t) = ||h \\\\circ r - t||^{2}_{2}</equation>", "tag": "machine learning,KG", "prompt": "What is the equation of RotatE?"}
{"completion": " The Latex Code for equation Propensity Scoreis <equation>e := p(W=1|X=x)</equation>", "tag": "machine learning,causual inference", "prompt": "Please give me some hints on equation Propensity Score?"}
{"completion": " The equation for Support Vector Machine SVM is: <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. Here is the brief introduction of Support Vector Machine SVM:\n        Latex code for Support Vector Machine (SVM).\n        \n: Dual problem Lagrangian Relaxation\n: Weight of Linear Classifier\n: Classifier\n: Decision Boundary\n\nRelated Documents\n\n\n\nRelated Videos\n\n\n", "tag": "machine learning,svm", "prompt": "Can you explain Support Vector Machine SVM?"}
{"completion": " The equation for Language Modelling is <equation>p(x)=\\\\prod^{n}_{i=1} p(s_{n}|s_{1},...,s_{n-1})</equation>. The meaning of symbol <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling is: Language modeling is usually framed as unsupervised distribution estimation from a set of examples of.", "tag": "machine learning", "prompt": "What is the meaning of <equation>(x_{1}, x_{2}, ..., x_{n})</equation> in Language Modelling?"}
{"completion": " The equation for Diffusion Model Forward Process Reparameterization is: <equation>x_{t}=\\\\sqrt{\\\\alpha_{t}}x_{t-1}+\\\\sqrt{1-\\\\alpha_{t}} \\\\epsilon_{t-1}\\\\\\\\=\\\\sqrt{\\\\alpha_{t}\\\\alpha_{t-1}}x_{t-2} + \\\\sqrt{1-\\\\alpha_{t}\\\\alpha_{t-1}} \\\\bar{\\\\epsilon}_{t-2}\\\\\\\\=\\\\text{...}\\\\\\\\=\\\\sqrt{\\\\bar{\\\\alpha}_{t}}x_{0}+\\\\sqrt{1-\\\\bar{\\\\alpha}_{t}}\\\\epsilon \\\\\\\\\\\\alpha_{t}=1-\\\\beta_{t}, \\\\bar{\\\\alpha}_{t}=\\\\prod_{t=1}^{T}\\\\alpha_{t}</equation>. Here is the brief introduction of Diffusion Model Forward Process Reparameterization:\nRelated Documents\n\nLatex Code for Diffusion Models Equations\nWeng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.\n\nRelated Videos\n\n\n", "tag": "machine learning,diffusion", "prompt": "Please give me detailed tutorial of Diffusion Model Forward Process Reparameterization?"}
{"completion": " The equation for X-Learner is: <equation>\\\\tilde{D}^{1}_{i}:=Y^{1}_{i}-\\\\hat{\\\\mu}_{0}(X^{1}_{i}),\\\\tilde{D}^{0}_{i}:=\\\\hat{\\\\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\\\\\\\\\     \\\\hat{\\\\tau}(x)=g(x)\\\\hat{\\\\tau}_{0}(x) + (1-g(x))\\\\hat{\\\\tau}_{1}(x)</equation>. Here is the brief introduction of X-Learner:\n            See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning\n", "tag": "machine learning,causual inference", "prompt": "Can you explain X-Learner?"}
{"completion": " The equation for Wasserstein Distance Optimal Transport is <equation>W_{p}(P,Q)=(\\\\inf_{J \\\\in J(P,Q)} \\\\int{||x-y||^{p}dJ(X,Y)})^\\\\frac{1}{p}</equation>. The meaning of symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport is: Power p of distance moving distributions P towards Q.", "tag": "machine learning,wasserstein", "prompt": "Can you explain the symbol <equation>||x-y||^{p}</equation> in Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Maximum Mean Discrepancy MMD is <equation>\\\\textup{MMD}(\\\\mathbb{F},X,Y):=\\\\sup_{f \\\\in\\\\mathbb{F}}(\\\\frac{1}{m}\\\\sum_{i=1}^{m}f(x_{i}) -\\\\frac{1}{n}\\\\sum_{j=1}^{n}f(y_{j}))</equation>. The meaning of symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD is: Mean of probability distribution Y with n data points..", "tag": "machine learning,mmd", "prompt": "Can you explain the symbol <equation>\\frac{1}{n}\\sum_{j=1}^{n}f(y_{j})</equation> in Maximum Mean Discrepancy MMD?"}
{"completion": " The equation for GraphSage is: <equation>h^{0}_{v} \\\\leftarrow x_{v} \\\\\\\\\\             \\\\textbf{for} k \\\\in \\\\{1,2,...,K\\\\} \\\\text{do}\\\\\\\\\\             \\\\textbf{for} v \\\\in V \\\\text{do} \\\\\\\\\\             h^{k}_{N_{v}} \\\\leftarrow \\\\textbf{AGGREGATE}_{k}(h^{k-1}_{u}, u \\\\in N(v)); \\\\\\\\\\                         h^{k}_{v} \\\\leftarrow \\\\sigma (W^{k} \\\\textbf{concat}(h^{k-1}_{v},h^{k}_{N_{v}})) \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             h^{k}_{v}=h^{k}_{v}/||h^{k}_{v}||_{2},\\\\forall v \\\\in V \\\\\\\\\\             \\\\textbf{end} \\\\\\\\\\             z_{v} \\\\leftarrow h^{k}_{v} \\\\\\\\\\ J_{\\\\textbf{z}_{u}}=-\\\\log (\\\\sigma (\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v})) - Q \\\\mathbb{E}_{v_{n} \\\\sim p_n(v)} \\\\log(\\\\sigma (-\\\\textbf{z}_{u}^{T}\\\\textbf{z}_{v_{n}}))</equation>. Here is the brief introduction of GraphSage:\n\n        AGGREGATE function must operate on a set of unordered neighbour node vectors of each node v. Common choices includes Mean aggregator, Pooling aggregator, LSTM aggregator (random permutation of neighbours). The final loss functions is calculated in a unsupervised settings. Positive neighhour v is the node that co-occurs within fixed-length random walk of each node v. Negative neighhour is sampled from distribution of p_n(v). The final loss function of GraphSage is calculated as J_{\\textbf{z}_{u}}, which is similar to NCE noise contrastive loss, where similar items pairs have higher values while unrelated items pairs have lower values. You can check more detailed information in this paper, Inductive Representation Learning on Large Graphs for more details.\n        \nRelated Documents\n\n\">Inductive Representation Learning on Large Graphs\n\nRelated Videos\n\n\n", "tag": "machine learning,graph,GNN", "prompt": "Please give me detailed tutorial of GraphSage?"}
{"completion": " The related materials of Wasserstein Distance Optimal Transport is: 1.Optimal Transport and Wasserstein Distance (CMU lecture pdf):<url>https://www.stat.cmu.edu/~larry/=sml/Opt.pdf</url>;", "tag": "machine learning,wasserstein", "prompt": "Can you help me find related materials of Wasserstein Distance Optimal Transport?"}
{"completion": " The equation for Conditional Random Field CRF is: <equation>P(y|x)=\\\\frac{1}{Z(x)}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i)) \\\\\\\\\\         Z(x)=\\\\sum_{y}\\\\exp(\\\\sum_{i,k}\\\\lambda_{k}t_{k}(y_{i-1},y_{i},x,i))+\\\\sum_{i,l}\\\\mu_{l}s_{l}(y_{i},x,i))</equation>. Here is the brief introduction of Conditional Random Field CRF:\n        p(Y|x) denotes the linear chain Conditional Random Field(CRF). t_k denotes the function on the transition, s_l denote function on the node.\n        lambda_k and mu_l denotes the weight coefficient.\n    ", "tag": "machine learning,nlp", "prompt": "Can you explain Conditional Random Field CRF?"}
{"completion": " The Latex Code for equation Domain Adaptation H-Divergenceis <equation>d_{\\\\mathcal{H}}(\\\\mathcal{D},\\\\mathcal{D}^{'})=2\\\\sup_{h \\\\in \\\\mathcal{H}}|\\\\Pr_{\\\\mathcal{D}}[I(h)]-\\\\Pr_{\\\\mathcal{D}^{'}}[I(h)]|</equation>", "tag": "machine learning,transfer learning", "prompt": "What is the equation of Domain Adaptation H-Divergence?"}
{"completion": " The equation for Support Vector Machine SVM is <equation>\\\\max_{w,b} \\\\frac{2}{||w||} \\\\\\\\\\             s.t.\\\\ y_{i}(w^{T}x_{i} + b) \\\\geq 1, i=1,2,...,m  \\\\\\\\ \\             L(w,b,\\\\alpha)=\\\\frac{1}{2}||w||^2 + \\\\sum^{m}_{i=1}a_{i}(1-y_{i}(w^{T}x_{i} + b))</equation>. The meaning of symbol <equation>w</equation> in Support Vector Machine SVM is: Weight of Linear Classifier.", "tag": "machine learning,svm", "prompt": "What is the meaning of the symbol <equation>w</equation> in Support Vector Machine SVM?"}
{"completion": " The related materials of Diffusion Model Variational Lower Bound is: 1.Weng, Lilian. (Jul 2021). What are diffusion models? Lil'Log.:<url>https://lilianweng.github.io/posts/2021-07-11-diffusion-models/</url>;2.Latex Code for Diffusion Models Equations:<url>http://deepnlp.org/blog/latex-code-for-diffusion-models-equations</url>;", "tag": "machine learning,diffusion", "prompt": "Can you help me find related materials of Diffusion Model Variational Lower Bound?"}
{"completion": " The equation for Average Treatment Effect ATE is: <equation>\\\\text{ATE}:=\\\\mathbb{E}[Y(1)-Y(0)]</equation>. Here is the brief introduction of Average Treatment Effect ATE:\n            Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)\n        ", "tag": "machine learning,causual inference", "prompt": "I am interested in Average Treatment Effect ATE?"}
{"completion": " The Latex Code for equation Diffusion Model Variational Lower Boundis <equation>\\\\begin{aligned}\\             - \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\             &\\\\leq - \\\\log p_\\\\theta(\\\\mathbf{x}_0) + D_\\\\text{KL}(q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) \\\\| p_\\\\theta(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0) ) \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_{\\\\mathbf{x}_{1:T}\\\\sim q(\\\\mathbf{x}_{1:T} \\\\vert \\\\mathbf{x}_0)} \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T}) / p_\\\\theta(\\\\mathbf{x}_0)} \\\\Big] \\\\\\\\\\             &= -\\\\log p_\\\\theta(\\\\mathbf{x}_0) + \\\\mathbb{E}_q \\\\Big[ \\\\log\\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} + \\\\log p_\\\\theta(\\\\mathbf{x}_0) \\\\Big] \\\\\\\\\\             &= \\\\mathbb{E}_q \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\\\\\\\             \\\\text{Let }L_\\\\text{VLB} \\             &= \\\\mathbb{E}_{q(\\\\mathbf{x}_{0:T})} \\\\Big[ \\\\log \\\\frac{q(\\\\mathbf{x}_{1:T}\\\\vert\\\\mathbf{x}_0)}{p_\\\\theta(\\\\mathbf{x}_{0:T})} \\\\Big] \\\\geq - \\\\mathbb{E}_{q(\\\\mathbf{x}_0)} \\\\log p_\\\\theta(\\\\mathbf{x}_0)\\             \\\\end{aligned}</equation>", "tag": "machine learning,diffusion", "prompt": "Please give me some hints on equation Diffusion Model Variational Lower Bound?"}
{"completion": " The Latex Code for equation Progressive Layered Extraction PLEis <equation>g^{k}(x)=w^{k}(x)S^{k}(x) \\\\\\\\\\             w^{k}(x)=\\\\text{softmax}(W^{k}_{g}x) \\\\\\\\\\             S^{k}(x)=\\\\[E^{T}_{(k,1)},E^{T}_{(k,2)},...,E^{T}_{(k,m_{k})},E^{T}_{(s,1)},E^{T}_{(s,2)},...,E^{T}_{(s,m_{s})}\\\\]^{T} \\\\\\\\\\             y^{k}(x)=t^{k}(g^{k}(x)) \\\\\\\\\\             g^{k,j}(x)=w^{k,j}(g^{k,j-1}(x))S^{k,j}(x) </equation>", "tag": "machine learning,multi task", "prompt": "What is the latex code of equation Progressive Layered Extraction PLE?"}