Skip to content

This is an Informative blog on new concepts & terms related to Digital that involves Digital Twin, AI, Robotics, Data Engineering, IOT, Virtual reality

Notifications You must be signed in to change notification settings

Dhineshkumarganesan/Digital

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 

Repository files navigation

Digital

This is an Informative blog on new concepts & terms related to Digital that involves Digital Twin, AI, Robotics, Data Engineering, IOT, Virtual reality on AWS Cloud Platform

**********************************
Digital Twin
**********************************

A digital twin is a digital or virtual copy of physical assets or products. The term digital twin was coined originally by Dr. Michael Grieves in 2002. Nasa was one of the first to use this technology for space exploration missions. Digital Twin connect the real and virtual world by collecting real -time data from the installed sensors. The collected data is either locally decentralized or centrally stored in a cloud. The data is then evaluated and simulated in virtual copy of the assets.After receiving the information from the simulation, the parameters are applied to real assets.This integration of data in real and virtual representations helps in optimizing the performance of real assets. Digital twins can be used in various industries:MANUFACTURING, AUTOMOTIVE,CONSTRUCTION, UTILITIES, HEALTH CARE.

Digital Twins are the next big thing in Fourth Industrial Revolution for the development of new products and processes.

**********************************
AWS CONTROL TOWER
**********************************
The easiest way to set up and govern a secure multi-account AWS environment

If you have multiple AWS accounts and teams, cloud setup and governance can be complex and time consuming, slowing down the very innovation you’re trying to speed up. AWS Control Tower provides the easiest way to set up and govern a secure, multi-account AWS environment, called a landing zone. It creates your landing zone using AWS Organizations, bringing ongoing account management and governance as well as implementation best practices based on AWS’s experience working with thousands of customers as they move to the cloud. Builders can provision new AWS accounts in a few clicks, while you have peace of mind knowing that your accounts conform to company policies. Extend governance into new or existing accounts, and gain visibility into their compliance status quickly. If you are building a new AWS environment, starting out on your journey to AWS, or starting a new cloud initiative, AWS Control Tower will help you get started quickly with built-in governance and best practices.

image


**********************************
AWS IoT TwinMaker
**********************************
Optimize operations by easily creating digital twins of real-world systems

AWS IoT TwinMaker makes it easier for developers to create digital twins of real-world systems such as buildings, factories, industrial equipment, and production lines. AWS IoT TwinMaker provides the tools you need to build digital twins to help you optimize building operations, increase production output, and improve equipment performance. With the ability to use existing data from multiple sources, create virtual representations of any physical environment, and combine existing 3D models with real-world data, you can now harness digital twins to create a holistic view of your operations faster and with less effort.

image

**********************************
Amazon SageMaker
**********************************
Build, train, and deploy machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows

Make ML predictions using a visual interface with SageMaker Canvas. Prepare data and build, train, and deploy models with SageMaker Studio. Deploy and manage models at scale with SageMaker MLOps.

Make ML more accessible Enable more people to innovate with ML through a choice of tools—integrated development environments for data scientists and no-code visual interfaces for business analysts.

Prepare data at scale Access, label, and process large amounts of structured data (tabular data) and unstructured data (photos, video, and audio) for ML.

Accelerate ML development Reduce training time from hours to minutes with optimized infrastructure. Boost team productivity up to 10 times with purpose-built tools.

Streamline the ML lifecycle Automate and standardize MLOps practices across your organization to build, train, deploy, and manage models at scale.

**********************************
Curation of Data
**********************************

Data curation is the process of creating, organizing and maintaining data sets so they can be accessed and used by people looking for information. It involves collecting, structuring, indexing and cataloging data for users in an organization, group or the general public. Main reason to learn data curation is to improve communication about data on your team.

Data curation files consists of Flowcharts, tables other than data tables, Diagrams, Cheat sheets, Manuals, Data dictionaries,Survery documentation, Warehouse documentation, Policies and Procedures.

A data reduction diagram is a flow chart that shows how you systematically removed records from bigger datasets to reduce it to a smaller dataset.

Good Data --> Good ML Model

Awesome Data --> Awesome ML Model

Best Practices to perfom data curation

1.Clear Labels 2.Relevant data 3.Beware of biased data a)Collect more data b)Adjust weighting 4.Use consistent terms 5.Rule out - data leakage

************
Amazon Glue
************

It is a fully managed ETL ( Extract, Transact , Load) service that makes it simple and cost effective to categorize your data, clean it, enrich it and move it reliably between various data stores. They are serverless.

When do you use AWS Glue --> to build a data warehouse to organize, cleanse, validate and format data --> when you run serverless queries against your Amazon S3 data lake --> when you want to create event driven ETL Pipelines --> to understand your data assets

Benefits --> Less hassle --> More Power --> Cost Effective

Glue Design image

Critical Terminologies image

Amazon S3

Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its e-commerce network

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the Amazon Web Services (AWS) Cloud. Using Amazon EC2 eliminates your need to invest in hardware up front, so you can develop and deploy applications faster.

Amazon ECR

Amazon Elastic Container Registry (Amazon ECR) is an AWS managed container image registry service that is secure, scalable, and reliable. Amazon ECR supports private repositories with resource-based permissions using AWS IAM.

Amazon SQS

What is Amazon SQS used for? Amazon SQS is a message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components.

Amazon SNS

What is an Amazon SNS topic? An Amazon SNS topic is a logical access point that acts as a communication channel. A topic lets you group multiple endpoints (such as AWS Lambda, Amazon SQS, HTTP/S, or an email address).

**********************************
Model Development & AI ML Curation
**********************************

AI ML Curation

What is Data Curation? As defined by tech republic, data curation is “the art of maintaining the value of data.” It is the process of collecting, organizing, labeling, cleaning, enhancing and preserving data for use

What is AI curation? AI-based curation refers to the use of algorithms to process huge volumes of data, deciphering meaning and patterns. AI works by analysing user data to make sense of user intent, thereby helping marketers address consumers better

ML Pipeline

One definition of an ML pipeline is a means of automating the machine learning workflow by enabling data to be transformed and correlated into a model that can then be analyzed to achieve outputs. This type of ML pipeline makes the process of inputting data into the ML model fully automated.

Development Workbench

Notebook

Model

Container

https://medium.com/@thejasbabu/docker-the-mysterious-black-box-338ee3139bed https://www.youtube.com/watch?v=2_yOif1JlW0 https://appfleet.com/blog/reverse-engineer-docker-images-into-dockerfiles-with-dedockify/

Train

*****************
Quality Check
******************
Monitor Quality Check on Data

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html

Monitor Quality Check on Model

https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html

Monitor Bias Drift

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html

Monitor Feature Attribution Drift

https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-feature-attribution-drift.html

**********************************
Storing of Models ( On prem & cloud )
**********************************
Amazon EC2

Amazon ECR

Docker Hub

Model Artefacts

Istio

LinkerD

AWS App Mesh

EKS

Fargate

**********************************
Microservices
**********************************
Kubernetes Cluster ( Redhat open shift / Minikube/ any on premise vendor)

EKS Microservices ( Cloud Kubernetes engine)

Apache Airflow

Inference Engine

Event Bridge

S3 Model Output
**********************************
Prediction Models
**********************************
Anomaly Engine

Inference Analytics

Clustering

Forecasting ( Timeseries)

Deep learning

************************************
Ontology
************************************

In metaphysics, ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions of how entities are grouped into categories and which of these entities exist on the most fundamental level.

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Databases utilize entity-relationship diagrams, which represent the logic of the database, whereas ontologies are expressed in languages with which you can describe logics.

The fundamental focus of an ontology is to specify and share meaning. The fundamental focus for a database schema is to describe data. A relational database schema has a single purpose: to structure a set of instances for efficient storage and querying.

What is a database ontology?

An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time.


GRPC

gRPC is a cross-platform open source high performance Remote Procedure Call framework. gRPC was initially created by Google, which has used a single general-purpose RPC infrastructure called Stubby to connect the large number of microservices running within and across its data centers for over a decade.
why SQS

Using SQS, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be available. Get started with SQS in minutes using the AWS Management Console, Command Line Interface or SDK of your choice, and three simple commands.


Digital Twin

https://aws.amazon.com/what-is/digital-twin/

what is Digital twin technology

A digital twin is a virtual model of a physical object. It spans the object's lifecycle and uses real-time data sent from sensors on the object to simulate the behavior and monitor operations. Digital twins can replicate many real-world items, from single pieces of equipment in a factory to full installations, such as wind turbines and even entire cities. Digital twin technology allows you to oversee the performance of an asset, identify potential faults, and make better-informed decisions about maintenance and lifecycle.

How does a digital twin work?
A digital twin works by digitally replicating a physical asset in the virtual environment, including its functionality, features, and behavior. A real-time digital representation of the asset is created using smart sensors that collect data from the product. You can use the representation across the lifecycle of an asset, from initial product testing to real-world operating and decommissioning.

Digital twins use several technologies to provide a digital model of an asset. They include the following.

Internet of Things Internet of Things refers to a collective network of connected devices and the technology that facilitates communication between devices and the cloud as well as between the devices themselves. Thanks to the advent of inexpensive computer chips and high-bandwidth telecommunication, we now have billions of devices connected to the internet. Digital twins rely on IoT sensor data to transmit information from the real-world object into the digital-world object. The data inputs into a software platform or dashboard where you can see data updating in real time.

Artificial intelligence Artificial intelligence (AI) is the field of computer science that's dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition. Machine learning (ML) is an AI technique that develops statistical models and algorithms so that computer systems perform tasks without explicit instructions, relying on patterns and inference instead. Digital twin technology uses machine learning algorithms to process the large quantities of sensor data and identify data patterns. Artificial intelligence and machine learning (AI/ML) provide data insights about performance optimization, maintenance, emissions outputs, and efficiencies.

Digital twins compared to simulations Digital twins and simulations are both virtual model-based simulations, but some key differences exist. Simulations are typically used for design and, in certain cases, offline optimization. Designers input changes to simulations to observe what-if scenarios. Digital twins, on the other hand, are complex, virtual environments that you can interact with and update in real time. They are bigger in scale and application.

For example, consider a car simulation. A new driver can get an immersive training experience, learn the operations of various car parts, and face different real-world scenarios while virtually driving. However, the scenarios are not linked to an actual physical car. A digital twin of the car is linked to the physical vehicle and knows everything about the actual car, such as vital performance stats, the parts replaced in the past, potential issues as observed by the sensors, previous service records, and more.

What is Calico?

Calico is an open source networking and network security solution for containers, virtual machines, and native host-based workloads. Calico supports a broad range of platforms including Kubernetes, OpenShift, Mirantis Kubernetes Engine (MKE), OpenStack, and bare metal services.

Whether you opt to use Calico's eBPF data plane or Linux’s standard networking pipeline, Calico delivers blazing fast performance with true cloud-native scalability. Calico provides developers and cluster operators with a consistent experience and set of capabilities whether running in public cloud or on-prem, on a single node, or across a multi-thousand node cluster.

What is Rancher ?


Rancher is a complete software stack for teams adopting containers. It addresses the operational and security challenges of managing multiple Kubernetes clusters, while providing DevOps teams with integrated tools for running containerized workloads.


Chaos and Resilience Engineering


Chaos engineering is the practice of subjecting applications and services to real world stresses and failures in order to build and validate resilience to unreliable conditions and missing dependencies.

A predictable system is a myth. System failures are inevitable but you can be prepared for failures by building resilient systems. We explore chaos engineering as a way to do exactly that.

What is chaos engineering?

image

What is chaos engineering? Chaos engineering or chaos testing is a Site Reliability Engineering (SRE) technique that simulates unexpected system failures to test a system's behavior and recovery plan. Based on what is learned from these tests, organizations design interventions and upgrades to strengthen their technology.

image

Why do we need chaos engineering?

Let’s look at an instance where one of our e-commerce customers sees their applications terminating one after another during a Black Friday sale. But, there is no CPU or memory spike. Ultimately, it turns out that writing logs in a file within the container led to running out of disk space.

In the microservices world, it is not uncommon for one slow service to drag the latency up for the whole chain of systems.

In fact, today’s world of microservice architecture and ecosystems has moved us from a single point of failure in monolith systems to multi-point failures in distributed systems. To create scalable, highly available and reliable systems we need newer methods of testing.

How does chaos engineering work?

Chaos engineering is like a vaccine. Vaccines are usually a mild form of the disease/virus injected into the blood so our body learns to fight against the actual disease. Chaos engineering puts the system and infrastructure under immense stress scenarios to prepare for better availability, stability and resilience. image

The most common problems that every application suffers are CPU or memory spike, network latency, time change during daylight saving time, reduced disk spaces and application crashes. So, the first step would be to make the infrastructure resilient enough to overcome these disasters at the application level.

Building resiliency with chaos engineering There are four major steps when running any chaos test:

Define a steady state: before running chaos tests, define what an ideal system would look like. For instance, with a web application, the health check endpoint should return a 200 success response

Introduce chaos: simulate a failure, something like a network bottleneck, disk fill or application crash for example

Verify the steady-state: check if the system works as defined in Step 1. Also, verify that the corresponding alerts were triggered via email, SMS, text, slack message etc.

Roll back the chaos: the most crucial step, especially while running in production, is to roll back or stop the chaos that we introduced and ensure that the system returns to normal

Building resiliency with chaos engineering If the application passes the test, that’s evidence the system is resilient. However, if the application fails the test, we’d recommend following the red-green testing cycle — and once the weakness has been identified, fix it and rerun the test.

image

How to start chaos testing?

If teams have just begun adopting chaos engineering, we’d suggest using a simple shell script. However, it’s important to run a steady-state hypothesis with continuous monitoring in parallel. As the chaos testing practice matures, we'd recommend using one of the many open-source or commercial tools.

Gremlin is leading this space and covers most of the use cases

Litmus chaos toolkit is a Kubernetes native, designed for k8s-based applications. You can read more about running a chaos test using this tool here

Istio service mesh is great for network-related chaos such as network delays, errors, etc.

AWS Fault Injection Simulator is a toolkit that helps when conducting chaos experiments on applications deployed in AWS

Ideally, chaos testing is best run in production. However, we recommend that you learn in a lower environment first and then conduct controlled experiments in production later. In one of Thoughtworks’ client projects, it took the team six months to learn and practice in a lower environment before everyone (including clients) had the confidence to run chaos tests in production.

Once teams get here, they could also automate chaos testing like scheduled jobs in deployment pipelines. Schedule it to be run every week to verify that new changes in software are still meeting availability and resiliency benchmarks and commits that passed these checks progress further in the pipeline for production deployments.

What outcomes does chaos engineering deliver?

Increased availability and decreased mean time to resolution (MTTR) are the two most common benefits enterprises observe. Teams who frequently run chaos engineering experiments enjoy more than 99.9% availability. 23% of the teams reduced their MTTR to under one hour and 60% to under 12 hours with chaos engineering. For more on our experiences with chaos engineering, listen to our talk here.

About

This is an Informative blog on new concepts & terms related to Digital that involves Digital Twin, AI, Robotics, Data Engineering, IOT, Virtual reality

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published