MapReduce Inverted Index Program

The MapReduce Inverted Index Program is a distributed computing application designed to generate an inverted index from a collection of documents using the MapReduce paradigm.

Overview

The MapReduce Inverted Index Program processes text documents to create an inverted index, mapping each unique word found in the documents to the locations where it occurs. This index allows efficient search and retrieval of documents based on specific words, enabling faster information retrieval in large datasets.

Features

Word Mapping: Generates a mapping of words to the documents they appear in.
Scalability: Utilizes Hadoop's distributed computing capabilities for scalability and efficient processing of large datasets.
Customizable: Allows customization of input and output directories, providing flexibility for different datasets and environments.

Prerequisites

Ensure the following prerequisites are met before using the MapReduce Inverted Index Program:

Java Development Kit (JDK) installed
Apache Hadoop configured and running
Maven installed
Text documents or datasets for indexing

Installation

Clone or download this repository to your local machine.
Configure Hadoop to connect to your local environment or Hadoop cluster.
Set up the project in your preferred Java Integrated Development Environment (IDE).

Usage

Prepare the text documents or datasets for indexing.
Update input and output paths in the code to point to your data directories.
Run the MapReduce job to generate the inverted index.
Access the output directory to view the generated inverted index.

Project Structure

The project structure is organized as follows:

src/main/java: Contains Java source code files.
- org.example: Package containing MapReduce classes (InvertedIndexMapper, InvertedIndexReducer, InvertedIndexDriver).
pom.xml: Maven project configuration file.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
InvertedIndexMain.java		InvertedIndexMain.java
InvertedIndexMapper.java		InvertedIndexMapper.java
InvertedIndexReducer.java		InvertedIndexReducer.java
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MapReduce Inverted Index Program

Table of Contents

Overview

Features

Prerequisites

Installation

Usage

Project Structure

License

About

Releases

Packages

Languages

License

alina-z7/MapReduce-Inverted-Index-Program

Folders and files

Latest commit

History

Repository files navigation

MapReduce Inverted Index Program

Table of Contents

Overview

Features

Prerequisites

Installation

Usage

Project Structure

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages