index.html

<!DOCTYPE html>
<html lang="en-us">
	<head>
		<meta charset="UTF-8">
		<title>InfoSync</title>
		<meta name="viewport" content="width=device-width, initial-scale=1">
		<meta name="theme-color" content="#157879">
		<link rel="stylesheet" href="css/normalize.css">
		<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
		<link rel="stylesheet" href="css/cayman.css">
	</head>
	<body>
		<section class="page-header">
			<h1><img src="figures/logo.jpg" style="max-width:40%;"></h1> 
			<a href="https://vgupta123.github.io/docs/infosync_paper.pdf" class="btn">Paper</a>
			<a href="https://github.com/Info-Sync/InfoSync" class="btn">Dataset</a>
			<a href="explore.html" class="btn">Explore</a>
			<a href="https://github.com/Info-Sync/InfoSync" class="btn">Code</a> 
			<a href="https://youtu.be/aHpvWraGVwM" class="btn">Video</a>
			<a href="https://docs.google.com/presentation/d/1lPm7c8hubwADNpWHfcqNCRX-KlChh8gEzWPBLIFP79Y/edit?usp=sharing" class="btn">PPT</a><br>
			<a href="https://vgupta123.github.io/docs/infosync_poster.pdf" class="btn">Poster</a><br>
		</section>
		<section class="main-content">
			<h1>Information Synchronization Across Multilingual Semi-Structured Tables</h1>
			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>About</h2><p style="text-align: justify;"> The representation of information across languages poses significant challenges, particularly when it comes to the synchronization of semi-structured data. Wikipedia is a notable example, with the English version comprising only 11.68% of all pages despite having the greatest number of editors (75). 94% of the global population does not have access to comprehensive information in their native language. The majority of non-English Wikipedia pages are outdated and inadequately maintained. Moreover, Wikipedia translations are frequently inaccurate. An illustration of the issue is provided below.<p>
			<p style="margin-left:10%; margin-right:10%;"><img src="figures/Slide-1.png" style="max-width:95%;"></p>
			<p style="text-align: justify;">  Promoting inclusivity and facilitating global knowledge sharing requires ensuring accurate representation and bridging the language gap. Maintaining the consistency and integrity of Wikipedia tables across multiple languages requires meticulous attention to detail. This will pave the way for the creation of a trustworthy, comprehensive, and language-inclusive knowledge source.<p>
			<p style="text-align: justify;"> The objective of introducing the InfoSync dataset and employing a two-step method for tabular synchronization is to provide effective solutions to this problem. <p>

			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Dataset Details</h2>
			<p style="text-align: justify;">To systematically assess the challenge of information synchronization and evaluate the methodologies, we build a large-scale table synchronization dataset InfoSync based on entity-centric Wikipedia Infoboxes.</p>
			<p style="text-align: justify;">We collected a dataset comprising approximately <strong>99,440 infoboxes and 1,078,717 rows</strong>. The dataset included information in multiple languages, namely <em>English, German, French, Spanish, Dutch, Arabic, Hindi, Chinese, Korean, Afrikaans, Cebuano, and Swedish</em>. The infoboxes covered various categories such as <em>Airport, Album, Animal, Athlete, Book, City, College, Company, Country, Diseases, Food, Medicine, Monument, Movie, Musician, Nobel, Painting, Person, Planet, Shows, and Stadium</em>. This diverse dataset serves as the foundation for our research analysis and experimentation.</p>

			<div>
                             <table style="margin-left:15%;margin-right:15%;text-align: center">
                               <col>
                               <colgroup span="2"></colgroup>
                               <colgroup span="2"></colgroup>
                               <tr>
                                 <td rowspan="2"></td>
                                 <th colspan="2" scope="colgroup">Average Table Transfer %</th>
                                 <th colspan="2" scope="colgroup">Language Statistics</th>
                               </tr>
                               <tr>
                                 <th scope="col">C1 -> <span>&Sigma;</span> ln </th>
                                 <th scope="col"><span>&Sigma;</span> ln -> C1</th>
                                 <th scope="col"># Table</th>
                                 <th scope="col">Average Rows</th>
                               </tr>
                               <tr>
                                 <th scope="row">af</th>
                                 <td>17.46</td>
                                 <td>400.5</td>
                                 <td>1575</td>
                                 <td>9.91</td>
                               </tr>
                               <tr>
                                 <th scope="row">ar</th>
                                 <td>34.02</td>
                                 <td>27.38</td>
                                 <td>7648</td>
                                 <td>13.01</td>
                               </tr>
                               <tr>
                                 <th scope="row">ceb</th>
                                 <td>42.87</td>
                                 <td>134.88</td>
                                 <td>3870</td>
                                 <td>7.82</td>
                               </tr>
                               <tr>
                                 <th scope="row">de</th>
                                 <td>40.73</td>
                                 <td>27.12</td>
                                 <td>8215</td>
                                 <td>7.88</td>
                               </tr>
                               <tr>
                                 <th scope="row">en</th>
                                 <td>45.85</td>
                                 <td>0.32</td>
                                 <td>12431</td>
                                 <td>12.60</td>
                               </tr>
                               <tr>
                                 <th scope="row">es</th>
                                 <td>38.78</td>
                                 <td>9.0</td>
                                 <td>9950</td>
                                 <td>12.59</td>
                               </tr>
                               <tr>
                                 <th scope="row">fr</th>
                                 <td>41.25</td>
                                 <td>4.73</td>
                                 <td>10858</td>
                                 <td>10.30</td>
                               </tr>
                               <tr>
                                 <th scope="row">hi</th>
                                 <td>18.39</td>
                                 <td>358.97</td>
                                 <td>1724</td>
                                 <td>10.91</td>
                               </tr>
                               <tr>
                                 <th scope="row">ko</th>
                                 <td>31.13</td>
                                 <td>40.51</td>
                                 <td>6601</td>
                                 <td>9.35</td>
                               </tr>
                               <tr>
                                 <th scope="row">nl</th>
                                 <td>33.69</td>
                                 <td>24.6</td>
                                 <td>7837</td>
                                 <td>10.46</td>
                               </tr>
                               <tr>
                                 <th scope="row">ru</th>
                                 <td>36.98</td>
                                 <td>14.54</td>
                                 <td>9066</td>
                                 <td>11.41</td>
                               </tr>
                               <tr>
                                 <th scope="row">sv</th>
                                 <td>35.53</td>
                                 <td>24.62</td>
                                 <td>7985</td>
                                 <td>9.89</td>
                               </tr>
                               <tr>
                                 <th scope="row">tr</th>
                                 <td>28.99</td>
                                 <td>59.33</td>
                                 <td>5599</td>
                                 <td>10.14</td>
                               </tr>
                               <tr>
                                 <th scope="row">zh</th>
                                 <td>36.16</td>
                                 <td>32.71</td>
                                 <td>7140</td>
                                 <td>12.43</td>
                               </tr>
                             </table>
			     <caption>Table: <strong>Average Table Transfer</strong>:- Column 2 shows the average number of tables missing in other languages which can be transferred from C1. Column 3 shows the average number of tables missing in C1, which we can transfer from all languages to C1. Here L is the set of all languages (ln) except source or transfer language. <strong>Language Statistics</strong>:- The number of tables and average rows (AR) per table across different categories for each language.</caption>
			</div>
			<div>
				<table  style="margin-left:15%;margin-right:15%;text-align: center">
					<thead>
						<tr>
							<th>Topic</th>
							<th># Table</th>
							<th>Average Rows</th>
						</tr>
					</thead>
					<tbody align="center">
						<tr>
							<td>Airport</td>
							<td>18512</td>
							<td>9.66</td>
						</tr>
						<tr>
							<td>Food</td>
							<td>6184</td>
							<td>7.93</td>
						</tr>
						<tr>
							<td>Album</td>
							<td>5833</td>
							<td>7.58</td>
						</tr>
						<tr>
							<td>Animal</td>
							<td>3209</td>
							<td>8.27</td>
						</tr>
					</tbody>
					<br>
					<!-- <caption>Number of tables and premise-hypothesis
					pairs for each data split</caption> -->
				</table>
                                <caption><strong>Category Statistics: </strong>Number of tables in each category and average number of rows (AR) across different languages. Statistics on all categories present in the paper above.</caption>
			</div>

			<h3><a id="user-content-header-3" class="anchor" href="#header-3" aria-hidden="true"><span class="octicon octicon-link"></span></a>Test Sets</h3>
			<p style="text-align: justify;">We created several test sets to evaluate the alignment accuracy of our pipeline for different configurations.</p>

                        <ul>
                        <li>
			<h4><a id="user-content-header-4" class="anchor" href="#header-4" aria-hidden="true"><span class="octicon octicon-link"></span></a>Translations Based Test Set:</h4>
			<p style="text-align: justify;">For the translation-based test sets, we employed translations (Google or cutting-edge translation models) and covered approximately 1500 tables for both <strong>English to X</strong> and <strong>X to Y</strong> alignments. Here, <strong>X</strong> and <strong>Y</strong> represent non-English languages. Annotators obtained preliminary alignments from our alignment pipeline. The goal was to evaluate and verify the veracity of these alignments, remedy any errors, and add any missing ones.</p>
                        </li> 
                        <li>
			<h4><a id="user-content-header-4" class="anchor" href="#header-4" aria-hidden="true"><span class="octicon octicon-link"></span></a>Native Speaker Annotated True Test Set:</h4>
			<p style="text-align: justify;">Similarly, we created a second test set without using translations; instead, native speakers of the language completed alignment annotations (<strong>English and Hindi</strong>, <strong>English and Chinese</strong>, roughly 200 tables in each language pair).
</p>

			<!--<p style="text-align: justify;">Through these test sets, we aimed to assess the accuracy and reliability of the alignment pipeline by comparing the alignments generated against the expected alignments. This process helped us validate the effectiveness of our alignment techniques and improve the quality of the generated alignments.</p> -->

                        </li> 
                        <li>
			<h3><a id="user-content-header-3" class="anchor" href="#header-3" aria-hidden="true"><span class="octicon octicon-link"></span></a>Metadata</h3>
			<p style="text-align: justify;">Human annotators also classify the types of errors present in the test data in one of the five categories 1) Disambiguation 2) Multiple alignments 3) Partial or incorrect extraction 4) Wrong_translations 5) Key Paraphrasing. This evaluation helps standardizing and comparing update methods against each other.

                        </li> 
                        </ul>

			
			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a> Synchronization Methodology </h2>
			<p style="text-align: justify;">Our proposed approach for table synchronization involves two steps:
<ul> 
<li><strong>Information Alignment</strong>, which focuses on aligning table rows. We utilize corpus statistics from Wikipedia, considering both key and value-based similarities to align rows in multilingual tables. 
</li> <li> <strong>Information Update</strong>, which aims to update missing or outdated rows across different language pairs to ensure consistency. <!--We employ a rule-based approach that consists of nine curated rules. These rules include row transfer, time-based updates, value trends, multikey matching, appending values, prioritizing high to low resource information, handling differences in the number of rows, and dealing with rare keys. --></li></ul> </p>
                        <p style="text-align: justify;">  We evaluate the effectiveness of both tasks using the InfoSync dataset. Additionally, we conduct an online experiment adhering to Wikipedia editing guidelines, where we submit detected mismatches for review by Wikipedia editors. We track the number of edits approved or refused by the editors.</p>
			<h3><a id="user-content-header-3" class="anchor" href="#header-3" aria-hidden="true"><span class="octicon octicon-link"></span></a>Information Alignment</h3>
			<p style="text-align: justify;">The proposed method consists of five modules designed to generate additional alignments sequentially by aligning table rows with relaxed matching requirements.</h3>
			<ol>
			<li>
			<p style="text-align: justify;"><b> Corpus-based:</b> Aligns rows based on the cosine similarity of their English translations, taking multiple translations into account using majority voting. Accurate key translations take into account additional context, such as key values and categories.
			</li>
			<li>
			<p style="text-align: justify;"><b> Key-only:</b> Attempts to align unaligned pairs from the previous module by computing cosine similarity of their English translations, with a threshold for selecting mutually most similar keys.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Key value bidirectional:</b> Similar to the previous step, but computes similarities using the entire row (key + value) and applies a threshold for alignment.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Key value unidirectional:</b> Relaxes the bidirectional mapping constraint by considering the highest similarity in either direction, using a higher threshold to avoid spurious alignments.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Multi-key:</b> Allows for the selection of multiple keys (up to two) based on a threshold, with a soft constraint for value-combination alignment. Valid multi-key alignment occurs when the merge value-combination similarity score exceeds that of the most similar key.</p>
			</li>
			</ol>
			
			
			
			
			
			<p style="text-align: justify;">In summary, these five modules progressively relax the matching requirements, incorporating different aspects of the table rows, to generate alignments based on cosine similarity scores.</p> 
			<!-- <p style="margin-left:10%; margin-right:10%;"><img src="figures/Figure-3.png" style="max-width:95%;"></p> -->
			<h4><a id="user-content-header-4" class="anchor" href="#header-4" aria-hidden="true"><span class="octicon octicon-link"></span></a><b>Alignment Example And Evaluation</b></h4>
			<!-- <p style="text-align: justify; display:inline;">Below is an update example. The infobox for Shirley Strickland de la Hunty has been updated to include information in both English and Spanish. It shows rows transfer for missing information, value substitution because "Aged 78" is absent in Died. Additionally, one medal
				infomation (Bronze,1952, 100m) is added in to medal tally.</p> -->
			
            <figure> <p style="margin-left:10%; margin-right:10%;"><img src="figures/Slide5.png" style="max-width:95%;"></p> <figcaption><b>Explanation of Alignment Performance Metrics</b>: <b>T</b><sub>en</sub> and <b>T</b><sub>hi</sub> are a collection of all rows in the English and Hindi tables, respectively. <b>R</b><sub>x</sub><sup>n</sup> represents the <em>n</em><sup>th</sup> row in the language table. <b>R</b><sub>x</sub>(<b>X</b>) retrieves all rows in the language <em>x</em> using mapping  <b>X</b>. |.| represents the set's cardinality. Every alignment is saved as a tuple in form (<b>R</b><sub>x</sub><sup>m</sup>, <b>R</b><sub>y</sub><sup>n</sup>). <b>G</b> is a collection of all gold (human) alignments. <b>P</b> is a collection of predicted alignments (can see there are mistakes in the alignment.</figcaption>
</figure>
 
			<h3><a id="user-content-header-3" class="anchor" href="#header-3" aria-hidden="true"><span class="octicon octicon-link"></span></a>Information Updation</h3>
			<p style="text-align: justify;"> We proposed a rule-based heuristic approach for information updates. These rules are applied
				sequentially according to their priority rank (P.R.).</p>
            <ol>
			<li>
			<p style="text-align: justify;"><b> Row Transfer (R1): </b> Unaligned rows are transferred from one table to another.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Mutli-Match (R2): </b> Updating the table by handling multi-alignments and merging information to address cases with multiple key alignments.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Time Based (R3): </b> Updating aligned values using the latest timestamp to ensure the information reflects the most current data.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Trends positive/negative (R4):</b> Updating values based on identified monotonic patterns (increasing or decreasing) over time, particularly applicable to athlete career statistics.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Append Values (R5):</b> Appending additional value information from an up-to-date row to update outdated rows.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> HR to LR (R6):</b> Transferring information from a high resource language to a low resource language to update outdated information.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> #Rows (R7):</b> Transferring information from a table with a greater number of rows to a table with fewer rows.</p>
			</li>
			<li>
			<p style="text-align: justify;"><b> Non Popular Keys (R8):</b> Updating information from a table where recently added non-popular keys are likely to exist in order to update outdated tables.</p> 
			</li>
			</ol>
			
			<h4><a id="user-content-header-4" class="anchor" href="#header-4" aria-hidden="true"><span class="octicon octicon-link"></span></a><b>Updation Example</b></h4>
			<p style="text-align: justify; display:inline;">Below is an update example. The infobox for Shirley Strickland de la Hunty has been updated to include information in both English and Spanish. It shows rows transfer for missing information, value substitution because "Aged 78" is absent in Died. Additionally, one medal
				infomation (Bronze,1952, 100m) is added in to medal tally.</p>
			<p style="margin-left:10%; margin-right:10%;"><img src="figures/Figure-3.png" style="max-width:95%;"></p>

			<!-- <h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Reasoning</h2>
			<p style="text-align: justify;">To study the nature of reasoning that is involved in deciding the relationship between a table and a hypothesis, we adapted the set of reasoning categories from <a href="https://gluebenchmark.com">GLUE Benchmark</a> to table premises. All definitions and their boundaries were verified with several rounds of discussions. Following this, three graduate students (authors of the paper) independently annotated 160 pairs from the dev and alpha 3 test sets each, and edge cases were adjudicated to arrive at consensus labels.</p>
			<figure>
				<img src="figures/reasoning.png" style="max-width:100%;">
				<figcaption>Type and counts of reasoning in the Development and test alpha3 data splits. OOT and KCS are short forms of out-of-table and Knowledge & Common Sense, respectively.
				</figcaption>
			</figure> -->
			<!-- <div>
				<table  style="margin-left:15%;
					margin-right:15%;">
					<thead>
						<tr>
							<th>Data Split</th>
							<th>Number of Tables</th>
							<th>Number of Pairs</th>
						</tr>
					</thead>
					<tbody align="center">
						<tr>
							<td>Train</td>
							<td>1740</td>
							<td>16538</td>
						</tr>
						<tr>
							<td>Dev</td>
							<td>200</td>
							<td>1800</td>
						</tr>
						<tr>
							<td>alpha 1</td>
							<td>200</td>
							<td>1800</td>
						</tr>
						<tr>
							<td>alpha 2</td>
							<td>200</td>
							<td>1800</td>
						</tr>
						<tr>
							<td>alpha 3</td>
							<td>200</td>
							<td>1800</td>
						</tr>
					</tbody>
					<caption>Number of tables and premise-hypothesis
					pairs for each data split</caption>
				</table>
			</div> -->
			<br>
			<!-- <div style="text-align:center;">
				<table>
					<thead>
						<tr>
							<th>Data Split</th>
							<th>Cohen's Kappa</th>
							<th>Human Performance</th>
							<th>Majority Agreeement</th>
						</tr>
					</thead>
					<tbody align="center">
						<tr>
							<td>Dev</td>
							<td>0.78</td>
							<td>79.78</td>
							<td>93.53</td>
						</tr>
						<tr>
							<td>alpha 1</td>
							<td>0.80</td>
							<td>84.04</td>
							<td>97.48</td>
						</tr>
						<tr>
							<td>alpha 2</td>
							<td>0.80</td>
							<td>83.88</td>
							<td>96.77</td>
						</tr>
						<tr>
							<td>alpha 3</td>
							<td>0.74</td>
							<td>79.33</td>
							<td>95.58</td>
						</tr>
					</tbody>
					<caption>Cohen's Kappa, human baseline and inter-annotator agreement scores</caption>
				</table>
			</div> -->

			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Human Assisted Wikipedia Updates</h2>
			<p style="text-align: justify;">Information update results are given to human editors for updating Wikipedia infoboxes. Following Wikipedia's guidelines, rule set, and policies, update requests were submitted with other evidence supporting the claim. This evidence consists of the up-to-date entity page URL in the source language, the specific table rows information along with the source language, details of the proposed changes, and an additional citation provided by the editor for further validation.</p>
			<div>
				<table  style="margin-left:15%;
					margin-right:15%;">
					<thead>
						<tr>
							<th></th>
							<th>Accepted</th>
							<th>Rejected</th>
							<th>Total</th>
						</tr>
					</thead>
					<tbody align="center">
						<tr>
							<td>Eng => X</td>
							<td>161</td>
							<td>43</td>
							<td>204</td>
						</tr>
						<tr>
							<td>X => Y</td>
							<td>169</td>
							<td>47</td>
							<td>216</td>
						</tr>
						<tr>
							<td>X => English</td>
							<td>136</td>
							<td>47</td>
							<td>183</td>
						</tr>
						<tr>
							<td>Total</td>
							<td>466</td>
							<td>137</td>
							<td>603</td>
						</tr>
					</tbody>
					<br>
					<!-- <caption>Number of tables and premise-hypothesis
					pairs for each data split</caption> -->
				</table>
                                <caption><strong>Table : </strong>Human-Assisted Wikipedia infobox updates: Accept/Reject rate for different flows of information.</caption>
			</div>
			<!-- <br><br>
			<div style="text-align:center;">
				<table>
					<thead>
						<tr>
							<th>Data Split</th>
							<th>Cohen's Kappa</th>
							<th>Human Performance</th>
							<th>Majority Agreeement</th>
						</tr>
					</thead>
					<tbody align="center">
						<tr>
							<td>Dev</td>
							<td>0.78</td>
							<td>79.78</td>
							<td>93.53</td>
						</tr>
						<tr>
							<td>alpha 1</td>
							<td>0.80</td>
							<td>84.04</td>
							<td>97.48</td>
						</tr>
						<tr>
							<td>alpha 2</td>
							<td>0.80</td>
							<td>83.88</td>
							<td>96.77</td>
						</tr>
						<tr>
							<td>alpha 3</td>
							<td>0.74</td>
							<td>79.33</td>
							<td>95.58</td>
						</tr>
					</tbody>
					<caption>Cohen's Kappa, human baseline and inter-annotator agreement scores</caption>
				</table>
			</div> -->
			<!-- <h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Knowledge + InfoTabS</h2>
			<p style="text-align: justify;"> You should check our <a href="https://2021.naacl.org/">NAACL 2021</a> paper which <a href="https://knowledge-infotabs.github.io">enhance InfoTabS</a> with extra Knowledge.</p>
			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>TabPert</h2>
			<p style="text-align: justify;"> You should check our <a href="https://2021.emnlp.org">EMNLP 2021</a> paper which is a <a href="https://tabpert.github.io">tabular perturbation platform</a> to generate counterfactual examples.</p> -->

			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>People</h2>
			<p style="text-align: justify;"> The InfoSync dataset is prepared by collaboration of across multiple institutions <a href="https://www.cs.utah.edu/">University of Utah</a>, <a href="https://www.iitg.ac.in/cse/"> IIT Guwahati</a>,<a href="https://www.ctae.ac.in/"> CTAE</a>   and <a href="https://www.bloomberg.com/company/"> Bloomberg LP</a> by the following people: </p>
			<figure>
				<img src="figures/siddarth.JPG" width="140" height="120">
				<img src="figures/chelsi.jpg" width="140" height="120">
				<img src="figures/vivekg.jpg" width="140" height="120">
				<img src="figures/tushar.png" width="140" height="120">
				<img src="figures/shou.jpg" width="140" height="120">
				<figcaption>From left to right <a href="https://www.linkedin.com/in/siddharth-khincha-644a70203">Siddharth Khincha</a>,<a href="https://www.linkedin.com/in/chelsi-jain-7b0734192">Chelsi Jain</a>, <a href="https://vgupta123.github.io">Vivek Gupta*</a>,<a href="https://tushaarkataria.github.io/">Tushar Kataria*</a> and <a href="https://imsure318.github.io/">Shuo Zhang</a>. </figcaption>
			</figure>
			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Citation</h2>
			<p style="text-align: justify;"> Please cite our paper as below if you use the InfoSync dataset.</p>
			<pre><code> @inproceedings{khincha-etal-2023-infosync,
    title = "{I}nfo{S}ync: Information Synchronization across Multilingual Semi-structured Tables",
    author = "Khincha, Siddharth  and
      Jain, Chelsi  and
      Gupta, Vivek  and
      Kataria, Tushar  and
      Zhang, Shuo",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.159",
    pages = "2536--2559",
    abstract = "Information Synchronization of semi-structured data across languages is challenging. For example, Wikipedia tables in one language need to be synchronized with others. To address this problem, we introduce a new dataset InfoSync and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset ({\textasciitilde}3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en {\textless}-{\textgreater} non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 532 table pairs. Our approach obtains an acceptance rate of 77.28{\%} on Wikipedia, showing the effectiveness of the proposed method.",
} 
</code></pre>
			<h2><a id="user-content-header-2" class="anchor" href="#header-2" aria-hidden="true"><span class="octicon octicon-link"></span></a>Acknowledgement</h2>
			<p style="text-align: justify;">Authors thank members of the <a href="https://svivek.com/">Utah NLP group</a> for their valuable insights and
			suggestions at various stages of the project; and <a href="https://2023.aclweb.org/">ACL 2023</a> reviewers for pointers to
			related works, corrections, and helpful comments. Authors thank the largest free resource <a href="https://en.wikipedia.org/wiki/Main_Page"> Wikipedia</a> for InfoSync tables.</p>
			<footer class="site-footer">
				<span class="site-footer-owner"><a href="https://github.com/Info-Sync/InfoSync">InfoSync</a> is maintained by <a href="https://vgupta123.github.io">Vivek Gupta</a>.</span>
				<span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a> using the <a href="https://github.com/jasonlong/cayman-theme">Cayman</a> theme by <a href="https://github.com/jasonlong">jasonlong</a>.</span>
			</footer>
		</section>
	</body>
</html>