To create a centralized database that stores and manages all contact information for a college, ensuring secure and seamless ingestion of data via APIs and SFTP, automated processing, and efficient storage in an Azure SQL Database.
- College Administrators: Upload or update contact information via APIs or SFTP.
- Integration System: Handles data ingestion, processing, and storage.
- End Users: Faculty, staff, and students querying contact information.
- Centralization: Consolidates all contact information into a single source of truth.
- Flexibility: Supports multiple data ingestion methods (API and SFTP).
- Automation: Reduces manual effort with automated triggers and workflows.
- Scalability: Easily scales to handle increasing volumes of contact data.
- Security: Ensures secure access to APIs and SFTP endpoints using API keys
- Real-time Reporting: Enables real-time updates and analysis of contact information.
graph TD
A[College Administrators] -->|API Upload| B[Azure Function App]
A -->|SFTP Upload| C[Azure Storage Account]
C -->|File Upload Event| D[Azure Event Grid]
B -->|Data Ingested| E[Azure Service Bus Queue]
D --> E
E -->|Message Trigger| F[Azure Function App]
F -->|Data Transformation| G[Azure SQL Database]
G -->|Query| H[End Users/Applications]
G -->|Data Visualization| I[Power BI]
- An administrator uploads a CSV file containing new student contacts via SFTP.
- Azure Storage triggers an Event Grid notification when the file is uploaded.
- Event Grid sends a message to the Service Bus queue.
- The Function App processes the file, validates the data, and inserts it into the SQL Database.
- Faculty members access the updated contact list via a web portal via and api integrated with the Azure SQL Database.
This solution ensures a robust, scalable, and automated system for managing college contact information effectively.
Here's a comparison table highlighting the key features, strengths, and use cases for Azure Data Factory (ADF), Azure Databricks, Azure Synapse Analytics and Custom Data Ingestion Service
Feature | Azure Data Factory (ADF) | Azure Databricks | Azure Synapse Analytics | Custom Data Ingestion Service |
---|---|---|---|---|
Purpose | Data integration, ETL, and data orchestration | Big data analytics, data engineering, and machine learning | Analytics, data warehousing, and big data processing | Data integration, ETL, and data orchestration |
Key Components | Pipelines, Dataflows, Linked Services, Datasets | Workspaces, Notebooks, Clusters, Jobs | Dedicated SQL Pool, Serverless SQL Pool, Spark Pool, Data Integration | Dedicated SQL Pool, Serverless SQL Workflow, DataFlow. DataSets Linkservice :- api |
Data Integration | Extensive connectors for various data sources | Built-in integration with many data sources via notebooks | Integrated data ingestion and transformation | Connectors to Azure Table Storage, Azure SQL |
Data Processing | ETL/ELT processing using dataflows and pipelines | Advanced data processing and analytics with Apache Spark | Unified analytics, SQL-based data processing, Spark, and pipelines | ETL/ELT processing using Orchestration WorkFlow(JSON) sand pipelines |
Programming Languages | JSON for pipeline definitions, Dataflow expressions | Scala, Python, R, SQL | T-SQL, Python, Scala, .NET | T-SQL, .NET |
Scalability | Automatically scalable | Highly scalable with auto-scaling clusters | Scalable from small to very large workloads | Scalable from small to very large workloads |
Compute Management | Managed compute for data movement and transformation | Customizable and managed Spark clusters | Managed pools for SQL and Spark, auto-scaling | Managed pools for Azure functions and SQL, auto-scaling |
Integration with Azure Services | Deep integration with Azure services like Azure SQL, Blob Storage | Strong integration with Azure storage, SQL, and AI services | Comprehensive integration with Azure ecosystem | Deep integration with Azure services like Azure SQL, Blob Storage |
Data Movement | Copy data activity for moving data across sources | Supports data movement through Spark jobs | Integrated data movement and transformation | Copy data, move,slice and transform data across azure resources |
Orchestration | Advanced scheduling, event-based triggers, and monitoring | Workflow management via jobs and notebooks | Unified data and pipeline orchestration | event-based triggers, and monitoring, Workflow management |
Machine Learning | Limited ML capabilities | Advanced ML with integrated MLflow and Spark MLlib | Integrated ML capabilities with Synapse Studio | Integrated ML capabilities with c# .NET |
User Interface | GUI for pipeline design and monitoring | Notebooks for interactive development and visualization | Unified web-based interface for SQL, Spark, and pipelines | Web-based interface for SQL |
Security | Built-in security features, VNET support, managed identity | Secure workspace, role-based access control (RBAC) | Comprehensive security features, RBAC, encryption | Built-in security features, VNET support, managed identity, RBAC |
Use Cases | ETL/ELT processes, data migration, data integration | Data engineering, big data analytics, machine learning workflows | Advanced analytics, data warehousing, big data processing, and data integration | ETL/ELT processes, data migration, data integration |
Cost Structure | Pay-as-you-go based on pipeline runs, data movement | Pay-as-you-go based on compute and storage usage | Pay-as-you-go for different pools, storage, and compute | Pay-as-you-go for different dedicated pools, storage, and compute |
Set Up Steps
Creating a serverless API using Azure that leverages Service Bus to communicate with an SQL Database involves several steps. Here's a high-level overview of how you can set this up:
-
Set Up Azure SQL Database:
- Create an Azure SQL Database instance.
- Set up the necessary tables and schemas you'll need for your application.
-
Create Azure Service Bus:
- Set up an Azure Service Bus namespace.
- Within the namespace, create a queue or topic (based on your requirement).
-
Deploy Serverless API using Azure Functions:
- Create a new Azure Function App.
- Develop an HTTP-triggered function that will act as your API endpoint.
- In this function, when data is received, send a message to the Service Bus queue or topic.
-
Deploy 2 Service Bus Triggered Function:
- Create another Azure Function that is triggered by the Service Bus queue or topic.
- This function will read the message from the Service Bus and process it. The processing might involve parsing the message and inserting the data into the Azure SQL Database.
-
Deploy a Timer Triggered Function:
- Create another Azure Function that is triggered when a file is dropped in a container.
- This function will stream in a file, read it and place on the service bus topic.
-
Implement Error Handling:
- Ensure that you have error handling in place. If there's a failure in processing the message and inserting it into the database, you might want to log the error or move the message to a dead-letter queue.
-
Secure Your Functions:
- Ensure that your HTTP-triggered function (API endpoint) is secured, possibly using Azure Active Directory or function keys.
-
Optimize & Monitor:
- Monitor the performance of your functions using Azure Monitor and Application Insights.
- Optimize the performance, scalability, and cost by adjusting the function's plan (Consumption Plan, Premium Plan, etc.) and tweaking the configurations.
-
Deployment:
- Deploy your functions to the Azure environment. You can use CI/CD pipelines using tools like Azure DevOps or GitHub Actions for automated deployments.
By following these steps, you'll have a serverless API in Azure that uses Service Bus as a mediator to process data and store it in an SQL Database. This architecture ensures decoupling between data ingestion and processing, adding a layer of resilience and scalability to your solution.
Key | Value | Comment |
---|---|---|
AzureWebJobsStorage | [CONNECTION STRING] | RECOMMENDATION : store in AzureKey Vault. |
ConfigurationPath | [CONFIGURATION FOLDER PATH] | Folder is optional |
ApiKeyName | [API KEY NAME] | Will be passed in the header : the file name of the config. |
AppName | [APPLICATION NAME] | This is the name of the Function App, used in log analytics |
StorageAcctName | [STORAGE ACCOUNT NAME] | Example "AzureWebJobsStorage" |
ServiceBusConnectionString | [SERVICE BUS CONNECTION STRING] | Example "ServiceBusConnectionString". Recommmended to store in Key vault. |
DatabaseConnection | [DATABASE CONNECTION STRING] | Example "DatabaseConnection". Recommmended to store in Key vault. |
TimerInterval | [TIMER_INTERVAL] | Example "0 */1 * * * *" 1 MIN |
Note: Look at the configuration file in the Config Folder and created a Table to record information.
Note: The Configuration is located in the FunctionApp in a Config Folder.
FileName | Description |
---|---|
99F77BEF300E4660A63A939ADD0BCF68.json | Upload File Parse CSV file --> Write Batched Files To Storage |
43EFE991E8614CFB9EDECF1B0FDED37A.json | File Parser Parse CSV file --> File received from SFTP will use this process to parse files |
43EFE991E8614CFB9EDECF1B0FDED37D.json | Upload File Parse JSON/CSV Directly to NO SQL DB |
43EFE991E8614CFB9EDECF1B0FDED37C.json | Service Bus Trigger for SQL DB |
43EFE991E8614CFB9EDECF1B0FDED37F.json | Service Bus Trigger for No SQL DB |
43EFE991E8614CFB9EDECF1B0FDED37E.json | Blob Trigger Send parsed/sharded file to Send to Service Bus |
43EFE991E8614CFB9EDECF1B0FDED37B.json | Search Resullt from NO SQLDB |
43EFE991E8614CFB9EDECF1B0FDED37G.json | Search SQL DB. Return resultset |
3FB620B0E0FD4E8F93C9E4D839D00E1E.json | Copy File from SFTP into the pickup folder |
3FB620B0E0FD4E8F93C9E4D839D00E1F.json | Create a new Record in NoSQL Database |
CC244934898F46789734A9437B6F76CA.json | Encode Payload Request |
6B427917E36A4DA281D57F9A64AD9D55.json | Get reports from DB |
Create the following blob containers and share in azure storage
ContainerName | Description |
---|---|
config | Location for the configuration files |
pickup | Thes are files that are copied from the SFTP share and dropped in the pickup container |
processed | These are files the have been parsed and dropped in th processed container |
Table | Description |
---|---|
csvbatchfiles | Track the CSV parsed files |
training[YYYYMMDD] | N0 SQL DataStore |
Share | Description |
---|---|
training[YYYYMMDD] | Create a share location for SFTP to drop files |
Subscription Name | Description |
---|---|
request | Create a Topic |
nosqlmessage | Create a Subscription |
sqlmessage | Create a Subscription |
User the following link to create a Azure Container Instance(ACI for SFTP)
https://docs.microsoft.com/en-us/samples/azure-samples/sftp-creation-template/sftp-on-azure
Kusto Queries used for Application Insights
search "ReceiveMessageFromServieBus"
| summarize count() by bin(timestamp, 1h)
| order by timestamp desc
customEvents
| where isnotnull(customDimensions.ProcessName)
//| where customDimensions.ProcessName == 'ReceiveMessageFromServieBus'
| summarize count() by bin(timestamp, 1m), Key = tostring(customDimensions.ProcessName)
| order by timestamp desc
| render columnchart
products | links | Comments |
---|---|---|
Azure Getting Started | https://azure.microsoft.com/en-us/free/ | Create free account + $200 in Credit |
Sample Data Sets | https://www.kaggle.com/datasets | Useful site for pulling sample payload |
Azure Storage Explorer | https://azure.microsoft.com/en-us/features/storage-explorer/#features | useful view and query data in azure table storage |
Postman | https://www.postman.com/downloads/ | Postman supports the Web or Desktop Version |
VsCode | https://visualstudio.microsoft.com/downloads/ | Required extensions. Azure Functions, Azure Account |
VS Studio Community Edition | https://visualstudio.microsoft.com/downloads/ | Recommended. Nice intergration with Azure. software is free. |
Liquid Template | https://liquidjs.com/tutorials/intro-to-liquid.html |
"Configuration-driven development," refers to an approach in software development where the behavior and functionality of an application are primarily controlled through configuration files, rather than writing code. It focuses on separating application logic from configuration parameters, allowing developers to easily modify the behavior of the software without the need for extensive code changes. Xenhey.BPM.Core.Net6 offers a orchrestration pipeline using configuration to drive the execution of business logic, providing a tailored solution for each Line Of Business(LOB). The following are some benefits.
-
Flexibility: By using configuration files, developers can easily tweak and adjust the application's behavior without modifying the underlying code. It allows for quick customization and adaptation to different scenarios or client requirements.
-
Maintenance: Separating configuration from code simplifies maintenance processes. Updates and modifications can be made by adjusting the configuration files, reducing the risk of introducing bugs or breaking existing functionality. It also facilitates version control and collaboration, as changes to configuration can be tracked separately from code changes.
-
Scalability: Configuration-driven development promotes scalability by enabling the application to handle different environments, configurations, or user preferences. It allows for seamless deployment across multiple instances or environments with minimal code changes.
-
Testing and Validation: Configuration-driven development enhances testing and validation processes. Since configuration changes are isolated from the codebase, it becomes easier to verify the impact of different configurations on the application's behavior. It also facilitates A/B testing or experimentation by quickly switching between different configurations.
-
Domain-Specific Customization: Configuration-driven development enables domain-specific customization by providing options and settings tailored to specific use cases. This empowers non-technical users or administrators to configure the application according to their specific requirements without the need for coding expertise
For questions related to this project, can be reached via email :- [email protected]