index.html

<!DOCTYPE HTML>
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  <title>Siddhant Garg</title>
  
  <!-- <meta name="author" content="Siddhant Garg"> -->
  <meta name="google-site-verification" content="1uePDgjfB9UA1EgFH950G2a12eSMpHDWHSolx5ho5z0" />
  <meta name="viewport" content="width=device-width, initial-scale=1">
  
  <link rel="stylesheet" type="text/css" href="stylesheet.css">
	<link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>🌐</text></svg>">
</head>

<body>
  <table style="width:100%;max-width:800px;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
    <tr style="padding:0px">
      <td style="padding:0px">
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr style="padding:0px">
            <td style="padding:2.5%;width:63%;vertical-align:middle">
              <p style="text-align:center">
                <name>Siddhant Garg</name>
              </p>
              <p>I am a <strong> Computer Science Master's student </strong> at the University of Massachusetts Amherst <strong>(UMass Amherst)</strong>. My studies are focused towards Machine Learning and AI. Here I have worked on 2D/3D Computer Vision, Natural Language Processing and Optimizing Deep Neural Networks for GFLOPs (computation) reduction. 
              </p>
              <p>
                I completed my undergraduate studies from the Department of Mathematics and Scientific Computing at the Indian Institute of Technology Kanpur <strong>(IIT Kanpur)</strong> where I worked on Bayesian Statistics. 
                <!-- At Google I've worked on <a href="https://www.google.com/glass/start/">Glass</a>,  <a href="https://ai.googleblog.com/2014/04/lens-blur-in-new-google-camera-app.html">Lens Blur</a>, <a href="https://ai.googleblog.com/2014/10/hdr-low-light-and-high-dynamic-range.html">HDR+</a>, <a href="https://blog.google/products/google-ar-vr/introducing-next-generation-jump/">Jump</a>, <a href="https://ai.googleblog.com/2017/10/portrait-mode-on-pixel-2-and-pixel-2-xl.html">Portrait Mode</a>, <a href="https://ai.googleblog.com/2020/12/portrait-light-enhancing-portrait.html">Portrait Light</a>, and <a href="https://www.matthewtancik.com/nerf">NeRF</a>. I did my PhD at <a href="http://www.eecs.berkeley.edu/">UC Berkeley</a>, where I was advised by <a href="http://www.cs.berkeley.edu/~malik/">Jitendra Malik</a> and funded by the <a href="http://www.nsfgrfp.org/">NSF GRFP</a>. I've received the <a href="https://www2.eecs.berkeley.edu/Students/Awards/15/">C.V. Ramamoorthy Distinguished Research Award</a> and the <a href="https://www.thecvf.com/?page_id=413#YRA">PAMI Young Researcher Award</a>. -->
              </p>
              <p>
                After that I worked with Samsung Research, Bengaluru, India for 2 years as a Machine Learning Engineer for On-Device AI solutions, where I published a paper as well as developed a deep learning models for novel applications to be deployed on smartphones globally.
              </p>
              <p>
                During the summer of 2022, I also worked as <strong>Research Scientist Intern</strong> at <a href="https://research.adobe.com/">Adobe Research</a> and worked with the Video Understanding Group.
              </p>
              <p style="text-align:center">
                <a href="data/Resume_Siddhant.pdf">Resume</a> &nbsp/&nbsp
                <a href="https://www.linkedin.com/in/sid-garg/">LinkedIn</a> &nbsp/&nbsp
                <a href="https://github.com/gargsid?tab=repositories">Github</a> &nbsp/&nbsp
                <a href="https://scholar.google.com/citations?user=UJ5xzYkAAAAJ&hl=en">Google Scholar</a> &nbsp/&nbsp
                <a href="mailto:siddhantgarg85@gmail.com">Email</a> 
                <!-- <a href="data/JonBarron-bio.txt">Bio</a> &nbsp/&nbsp -->
                <!-- <a href="https://twitter.com/jon_barron">Twitter</a> &nbsp/&nbsp -->
              </p>
            </td>
            <td style="padding:2.5%;width:40%;max-width:40%">
              <a href="images/sidphoto.jpeg"><img style="width:100%;max-width:100%" alt="profile photo" src="images/sidphoto.jpeg" class="hoverZoomLink"></a>
            </td>
          </tr>
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
            <tr>
            <td style="padding:20px;width:100%;vertical-align:middle">
              <heading>Publications</heading>
            </td>
          </tr>
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/cross-shape.png" width="150" height="60">
            </td>
            <td width="75%" valign="middle">
              <a href="https://marios2019.github.io/CSN/">
                <papertitle>Cross-Shape Attention for Part Segmentation of 3D Point Clouds</papertitle>
              </a>
              <br>
              <a href="https://marios2019.github.io/">Marios Loizou</a>, <strong> Siddhant Garg </strong>, <a href="https://lodurality.github.io/">Dmitry Petrov</a>, <a href="https://melinos.github.io/">Melinos Averkiou</a>, <a href="https://people.cs.umass.edu/~kalo/">Evangelos Kalogerakis</a>
              <br>
              <em>Symposium on Geometry Processing (SGP) 2023</em>
              <br>
              <a href="https://arxiv.org/pdf/2003.09053.pdf">PDF</a> / <a href="https://github.com/marios2019/CSN">Code</a> / <a href="https://marios2019.github.io/CSN/">Project Page</a>
              <p>We present a deep learning method that propagates point-wise feature representations across shapes within a collection for the
                purpose of 3D shape segmentation. We propose a cross-shape attention mechanism to enable interactions between a shape’s
                point-wise features and those of other shapes. The mechanism assesses both the degree of interaction between points and
                also mediates feature propagation across shapes, improving the accuracy and consistency of the resulting point-wise feature
                representations for shape segmentation. </p>
            </td>
          </tr>
          
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/pruning.png" width="150" height="60">
            </td>
            <td width="75%" valign="middle">
              <a href="https://arxiv.org/pdf/2304.06840.pdf">
                <papertitle>Structured Pruning for Multi-Task Deep Neural Networks</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, <a href="https://zhanglijun95.github.io/resume/">Lijun Zhang</a>, <a href="https://guanh01.github.io/">Prof. Hui Guan</a>
              <br>
              <em>UMass Amherst</em>
              <br>
              <a href="https://arxiv.org/pdf/2304.06840.pdf">PDF</a> / <a href="https://github.com/gargsid/MTLCosPrune">Code</a>
              <p>Deep Multi-Task models can be further optimized via model compression. In this work, we investigate the effectiveness of structured pruning on multi-task models. We show that, with careful hyper-parameter tuning, architectures obtained from different pruning methods do not have significant differences in their performances across tasks when the number of parameters is similar. </p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/smart-share-tilt.jpeg" width="90" height="150">
            </td>
            <!-- <td style="padding:20px;width:75%;vertical-align:middle"> -->
            <td width="75%" valign="middle">
              <papertitle>A Simple Approach to Image Tilt Correction with Self-Attention MobileNet for Smartphones</papertitle>
              <br>
              <strong> Siddhant Garg </strong>,
              <a href="https://scholar.google.com/citations?user=5EWG5EAAAAAJ&hl=en/">Debi Prasanna Mohanty</a>, 
							Siva Prasad Thota, Sukumar Moharana
              <br>
              <em>British Machine Vision Conference</em>, 2021
              <br>
              <a href="https://arxiv.org/abs/2111.00398">PDF</a> / <a href="data/BMVC_SAM_Tilt.pdf">PPT</a> / <strong>Patent Pending (USPTO)</strong>
              <p></p>
              <p>
              This work was done at <a href="https://research.samsung.com/sri-b">Samsung Research, Bengaluru</a>. We proposed a low-latency Self-Attention MobileNet for real-time image tilt correction for smartphones. 
              The model was also deployed on the Samsung Galaxy with <a href="https://www.samsung.com/us/apps/one-ui/">One UI</a> for the Tilt Correction feature on Samsung Smart Share Tray. 
              </p>
            </td>
          </tr>

        </tbody></table>
        
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr>
          <td style="padding:20px;width:100%;vertical-align:middle">
            <heading>Academic Projects</heading>
            <!-- <p>
              I am a Computer Science Master's student at UMass Amherst with a focus on Machine Learning, Computer Vision, and Natural Language Processing.
            </p> -->
          </td>
        </tr>
      </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody></tbody>          
          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/SERP.png" width="150" height="45">
            </td>
            <td width="75%" valign="middle">
              <a href="https://arxiv.org/pdf/2209.06067.pdf">
                <papertitle>SeRP: Self-Supervised Representation Learning Using Perturbed Point Clouds</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, <a href="https://muditchaudhary.github.io/">Mudit Chaudhary</a>
              <br>
              <em>Course: Intelligent Visual Computing, Instructor: Prof. Evangelos Kalogerakis, UMass Amherst </em>, 2022
              <br>
              <a href="https://arxiv.org/pdf/2209.06067.pdf">PDF</a> / <a href="https://github.com/gargsid/SERPNet-Point-Cloud-Representation-Learning">Code</a> / <a href="data/SERP_PPT.pdf">PPT</a>
              <p>We proposed a self-supervised method that learns the embeddings by reconstructing the 3D point clouds from noisy data with a Transformer encoder-decoder architecture.</p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/clipspark.png" width="150" height="45">
            </td>
            <td width="75%" valign="middle">
              <a href="data/CS532_final.pdf">
                <papertitle>Scalable Video Processing Using CLIP and PySpark</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, Sridhama Prakhya
              <br>
              <em>Course: Systems for Data Science, Instructor: Prof. Hui Guan, UMass Amherst </em>, 2022
              <br>
              <a href="data/CS532_final.pdf">PDF</a> / <a href="https://github.com/gargsid/Scalable-Video-Processing-Spark">Code</a> / <a href="data/532-ppt-spark-clip.pdf">PPT</a> / <a href="https://www.youtube.com/watch?v=4NMDOKa-R1k">Video</a>
              <p>We built a distributed video processing systems using machine learning for large video corpus with PySpark, CLIP and faster inference, distributed storage and faster retrieval.</p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/self-label.png" width="150" height="35">
            </td>
            <td width="75%" valign="middle">
              <a href="https://arxiv.org/pdf/2204.04545.pdf">
                <papertitle>Self-Labeling Refinement for Robust Self-Supervised Learning with Bootstrap Your Own Latent</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, Dhruval Jain
              <br>
              <em>Course: Neural Networks, Instructor: Prof. Erik Learned-Miller, UMass Amherst </em>, 2021
              <br>
              <a href="https://arxiv.org/pdf/2204.04545.pdf">PDF</a> / <a href="https://github.com/gargsid/Self-Supervised-Self-Labeling-Refinement-with-Bootstrap-Your-Own-Latent">Code</a>
              <p>We proposed a self-labeling refinement method for self-supervised model called Bootstrap Your Own Latent (BYOL) for more robust representation learning.</p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/xlm.png" width="150" height="45">
            </td>
            <td width="75%" valign="middle">
              <a href="data/CS_685_Final_Report_Hate_Speech_Detection.pdf">
                <papertitle>Multi-Lingual Hate Speech Detection using XLM-Transformers</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, Mudit Chaudhary, Sridhama Prakhya
              <br>
              <em>Course: Advanced Natural Language Processing, Instructor: Prof. Mohut Iyyer, UMass Amherst </em>, 2022
              <br>
              <a href="data/CS_685_Final_Report_Hate_Speech_Detection.pdf">PDF</a> / <a href="https://github.com/gargsid/Multi-Lingual-Hate-Speech-Detection">Code</a>
              <p>We used multi-lingual hate speech dataset called OLID and applied various transformer models to identify the best approach for detecting hate speech across many languages. We have used a multi-lingual transformer model called XLM-RoBeRTa, and we have also used BeRT models pretrained specfically on individual languages.</p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/lunar.png" width="150" height="150">
            </td>
            <td width="75%" valign="middle">
              <a href="data/687_ppo.pdf">
                <papertitle>Proximal Policy Optimization (Lunar Lander, Cartpole) </papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>, Andrew Teeter
              <br>
              <em>Course: Reinforcement Learning, Instructor: Prof. Bruno Castro da Silva, UMass Amherst </em>, 2021
              <br>
              <a href="data/687_ppo.pdf">PDF</a> / <a href="https://github.com/gargsid/Proximal-Policy-Optimization">Code</a>
              <p>Implementation of Proximal Policy Optimization (PPO) for Lunar Lander and Cartpole.</p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/lasso-enet.png" width="150" height="90">
            </td>
            <td width="75%" valign="middle">
              <a href="data/Lasso.pdf">
                <papertitle>Feature Selection using LASSO regression with Geometric Skew-Normal Distribution </papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>
              <br>
              <em>UG Project, Instructor: Prof. Debasis Kundu, IIT Kanpur </em>, 2019
              <br>
              <a href="data/Lasso.pdf">PDF</a> 
              <p>Proposed an Expectation Maximization algorithm to find the parameters of the GSN distribution with the LASSO and Elastic-Net regression objective.
                The learned distribution is multi-model and the regularization terms help in modeling high-dimensional datasets with very few data points effectively. 
                For example, we were able to find the most representative covariates, responsible for cancer prediction in the ARCENE dataset.
              </p>
            </td>
          </tr>

          <tr>
            <td style="padding:20px;width:25%;vertical-align:middle">
              <img src="images/bimodal-gsn.png" width="150" height="90">
            </td>
            <td width="75%" valign="middle">
              <a href="data/bayesian-geometric-mh.pdf">
                <papertitle>Metropolis-Hastings Algorithm for Geometric Skew-Normal Distribution</papertitle>
              </a>
              <br>
              <strong> Siddhant Garg </strong>
              <br>
              <em>UG Project, Instructor: Prof. Debasis Kundu, IIT Kanpur </em>, 2018
              <br>
              <a href="data/bayesian-geometric-mh.pdf">PDF</a> 
              <p>Devloped a proposal distribution for using Metropolis-Hastings algorithm to estimates the parameters of the Geometric Skew Normal Distribution.
              </p>
            </td>
          </tr>
					
        </tbody></table>
        <table style="width:100%;border:0px;border-spacing:0px;border-collapse:separate;margin-right:auto;margin-left:auto;"><tbody>
          <tr>
            <td style="padding:0px">
              <p style="text-align:center;font-size:small;">
                Design and source code from <a href="https://jonbarron.info/">Jon Barron's website</a>.
              </p>
            </td>
          </tr>
        </tbody></table>
      </td>
    </tr>
  </table>
</body>

</html>