Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Added creating of Directed Acyclic Graphs (DAG) to existing DAG Driver #433

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

NotAnAddictz
Copy link
Contributor

Summary

Previously, in PR #383 , we added functionality for sequential invocation for each function. This commit adds on to this existing feature, making every function act as an entry point and create a DAG structure for each function based on the width and depth distributions in data/traces/example/dag_structure.xlsx and invoke it according to the frequency of the entry functions.

Implementation Notes ⚒️

  • Added various helper functions to create and manage the DAG structure.
  • Added functionality to download the sampled_150 folder containing the folders for each group of functions if required.
  • Tweaked the generation of specifications for each function to cover the highest possible invocation frequencies/min
  • Wrapped the functions into Nodes to facilitate DAG generation
  • Added parameter entriesWritten to functionsDriver to ensure all invocations are written in the output file
  • Added a retry limit of 1 for each function in the DAG.
    image
  • All DAG Structures will not have duplicate functions inside it, and is populated by randomly chosen functions in the function list.

External Dependencies 🍀

  • N/A

Breaking API Changes ⚠️

  • N/A

Simply specify none (N/A) if not applicable.

@NotAnAddictz NotAnAddictz force-pushed the main branch 2 times, most recently from d9e9ef0 to 9161a77 Compare May 22, 2024 16:25
@cvetkovic
Copy link
Contributor

@leokondrashov: Is this still relevant or we close this PR?

@wanghanchengchn
Copy link
Contributor

@leokondrashov: Is this still relevant or we close this PR?

I apologize for the late reply. I will review this pull request!

@NotAnAddictz NotAnAddictz force-pushed the main branch 2 times, most recently from d75890a to fca5d95 Compare July 19, 2024 12:10
@wanghanchengchn
Copy link
Contributor

Dear @cvetkovic,

The current version looks good to me. However, due to my limited experience in reviewing pull requests, I would greatly appreciate it if you could provide us with some feedback when you have a moment. Thank you very much!

cmd/loader.go Outdated Show resolved Hide resolved
cmd/loader.go Outdated Show resolved Hide resolved
data/traces/example/dag_structure.xlsx Outdated Show resolved Hide resolved
pkg/config/parser.go Outdated Show resolved Hide resolved
pkg/driver/trace_driver.go Outdated Show resolved Hide resolved
cvetkovic
cvetkovic previously approved these changes Oct 4, 2024
Copy link
Contributor

@cvetkovic cvetkovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @leokondrashov If good, you can proceed with merging.

@cvetkovic
Copy link
Contributor

@wanghanchengchn Just fix the errors linter reports.

@wanghanchengchn
Copy link
Contributor

Thank you! Dear @NotAnAddictz, could you please address the failed checks? Additionally, this branch is out-of-date with the base branch. Kindly rebase on the main branch and verify that all checks are passing. Thank you!

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me codewise. But I'd like to have documentation for the feature in the repo, not only in the PR description.

I think the linter problem is not caused by you, but it's easy to fix by changing Fatalf to Fatal.

cmd/config.json Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. I have a couple of suggestions for tests: failure to generate, correct depth and width of bigger DAGs with multiple branches, reading and generating the sizes from trace file, creation of several DAGs.

@NotAnAddictz NotAnAddictz force-pushed the main branch 3 times, most recently from 734925b to 7729de4 Compare November 26, 2024 13:15
@NotAnAddictz
Copy link
Contributor Author

Thanks for the suggestions! Have added tests to generate from trace, multiple DAG generation of bigger DAGs (width = 10, depth 5)

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix couple minor things

Comment on lines 656 to 664
if d.Configuration.LoaderConfiguration.AsyncMode {
sleepFor := time.Duration(d.Configuration.LoaderConfiguration.AsyncWaitToCollectMin) * time.Minute

log.Infof("Sleeping for %v...", sleepFor)
time.Sleep(sleepFor)

d.writeAsyncRecordsToLog(globalMetricsCollector)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated lines, look above

node = node.Next()
}
atomic.AddInt64(metadata.FunctionsInvoked, numberOfFunctionsInvoked)
if success {
atomic.AddInt64(metadata.SuccessCount, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now counts successful branch execution, not DAG or functions. I don't think that would be the correct behaviour.

Copy link
Contributor Author

@NotAnAddictz NotAnAddictz Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, have changed to reflect successful functions invoked.

RecordOutputChannel: invocationRecordOutputChannel,
AnnounceDoneWG: announceDone,
}

announceDone.Add(1)
testDriver.invokeFunction(metadata)
if !(successCount == 1 && failureCount == 0) {
announceDone.Wait()
if !(functionsInvoked == 3 && failureCount == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the successCount test as well. I think it might be handled wrongly (see previous comment).

@@ -135,7 +136,7 @@ func TestInvokeFunctionFromDriver(t *testing.T) {

testDriver := createTestDriver()
var failureCountByMinute = make([]int64, testDriver.Configuration.TraceDuration)

var functionsInvoked int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add it into test conditions.

@@ -33,7 +33,10 @@
| MetricScrapingPeriodSeconds | int | > 0 | 15 | Period of Prometheus metrics scrapping |
| GRPCConnectionTimeoutSeconds | int | > 0 | 60 | Timeout for establishing a gRPC connection |
| GRPCFunctionTimeoutSeconds | int | > 0 | 90 | Maximum time given to function to execute[^5] |
| DAGMode | bool | true/false | false | Sequential invocation of all functions one after another |
| DAGMode | bool | true/false | false | Generates DAG workflows iteratively with functions in TracePath [^7]. Frequency and IAT of the DAG follows their respective entry function, while Duration and Memory of each function will follow their respective values in TracePath. |
| EnableDAGDataset | bool | true/false | true | Generate width and depth from data/traces/example/dag_structure.csv[^8] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update that the generation will take the .csv file from the trace path, not from that specific one. This one is the sample, not the data that should be used in real experiments.

@NotAnAddictz NotAnAddictz force-pushed the main branch 2 times, most recently from 51b9959 to 2dbc9a7 Compare November 28, 2024 09:32
Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thank you, YiShen, for patiently following all my suggestions.

@cvetkovic We would need to plan the merge around the merge of RPS mode. Can you assist in that?

@NotAnAddictz Unfortunately, this might require another rebase of the trace driver. But it should be pretty small because the RPS mode mostly changes the functionsDriver, while yours is mostly in the invoker part.

@NotAnAddictz NotAnAddictz force-pushed the main branch 3 times, most recently from 3959500 to 18c7792 Compare December 9, 2024 06:22
leokondrashov
leokondrashov previously approved these changes Dec 10, 2024
Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the timely update. It looks good to me, @cvetkovic please merge it if you don't have any further comments.

leokondrashov

This comment was marked as resolved.

@@ -0,0 +1,6 @@
Width,Width - Percentile,Depth,Depth - Percentile,Total Nodes,Total Nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the column names to Width,WidthPercentile,Depth,DepthPercentile,TotalNodes,TotalNodes. We might have parsing problems on different systems.

@cvetkovic
Copy link
Contributor

@leokondrashov: Just a minor comment. After that we are ready to merge.

@leokondrashov leokondrashov merged commit 06e0651 into vhive-serverless:main Dec 23, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants