-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance #3
Comments
Thanks for your interest! Are the repos you refer to publicly available anywhere to test?
However, I've been toying with an alternative idea (I'll need new name 😠 ) that's slightly less stateless, using the commit-embedded-in-a-tree format that submodules use, that would have identical performance as ordinary git usage. (I'd be happy to explain further if you want to hear about it. This idea is inspired my friend jwmerrill's assertion that the only problem with submodules is the tooling.) Really though, the main problem is that since switching employers, I don't have a practical application to try out my ideas. Hopefully MathQuill will become mature enough to start splitting out subprojects. |
Dear Laughinghan, I would wery like to hear further explanations of Your idea (If you dont mind me being a person that is more new to Git than You ;) ) |
So basically there's 2 fundamental data model problems with
My alternative idea (current name:
Note that arguably this is purely a workflow/tooling change on top of existing submodules, and could in theory be implemented entirely on top of This also has the advantage over subhistory in that the path to the subproject could change over time without destroying subproject history (although that's also addressable by sacrificing the "statelessness" of subhistory). Open questions about subcommit:
|
Hello! Thank You for Your reply - I got myself some nice time to read over christmas and Iam currentyl studying two projects that do build over subrees - subhistory and https://github.com/ingydotnet/git-subrepo/ I would start from the end of Your post. Open questions about subcommit: Iam actually surprised about the possibility to actually use submodules, but the cons are, that there would need to be lots of wrappers coded, as you did write. My opinion from the user point of view:
How do we manage remotes: doing cd is, I think, the main thing that can be ugly. Imagine many sub-thingies in one repository and the need to cd everytime. 2 fundamental problems
|
Please do allow me a 'Performance' - releated question to understand the whole picture better. When I do a commit to a uber-project, that affects a sub-project and I have never used a subhistory - Is it correct, that I do need to subhistory split? (to push and init subhistory repo remotely...) (This is actually a problem that is present in the 'subrepo' solution. For bigger repos it gets annoying how time consuming can full filter branch (imagine subhistory split again and again) get...) |
@Darthholi: gosh, thanks for actually reading that huge tome I wrote! I kind of spit it all out without spending that much time editing it down (though I originally wrote way more and moved all that into #4, haha)
Wow, subrepo is still active and maintained! You know, I saw it on HN a while ago but its README was soooo loooong and I couldn't (and still can't) find a succinct explanation of what it actually does, like my README's "what does this actually do" section with analogy to how Re: subcommit
I know, right? People who want to squash can always
Well it's only every time you have to do something with a remote for a subproject, not other stuff with subprojects. I have difficulty imagining why remotes for a whole bunch of subprojects would all change at once, right? But you're right it's not great. Re: subhistory
Well I'm not proposing automatically transforming your commit messages by default, you'd enable it if you choose (and you'd have to configure whether the prefix is like
Yes. Split and push to the new GitHub remote and boom, you've initialized a subproject repo.
No assimilate, that's for taking commits to the subproject that weren't in the uber-project, and merging them into the uber-project. To push these new uber-project commits that affect sub-project, you would split again. The new Subhistory shouldn't need to filter-branch all of history though (I mean, it does, but it shouldn't need to), and I can't think of any other reason split must get slower as history gets longer. We should be able to cache a mapping from uber-project commits to sub-project commits, and therefore only need to filter-branch the uber-project commits that are new since the last split. Fascinating that subrepo has this performance problem, still wish I understood what it actually does!
Funnily enough, I'm actually feeling pretty good about the algorithm for creating synthetic merge commits that I ended up with in #4, it feels like it might actually be pretty much "optimal" in the sense that in every case where there's an obvious right answer to a human about how to merge the Main tree of the synthetic merge commit, this algorithm would pick it, and it would also never pick any obviously wrong answers, all while maintaining the guarantee that, outside the subproject, the Main tree of the synthetic commit is always identical to that of some real, non-synthetic Main commit. Oh by the way, I thought of a 3rd fundamental data model problem with This is actually encouraging to me because the way that occurs to me to deal with this is to sacrifice the statelessness and have a mapping from subproject name to path; as long as no commit changes both name and path at the same time, either can change throughout Main repo's history and And if we're sacrificing statelessness, the mapping commit messages problem becomes completely tractable! (I'm thinking Together that means that maybe instead of 3 fundamental data model problems, perhaps This is really exciting to me, I really want to work on this now but I still have the problem of not personally having a real use case! Do you, @Darthholi, have a use case? What about you, @sergeylukin? Care to elaborate on what your use case(s) are? I'd love to work on this with you guys, you could even take over ownership and I could just advise with my midcore git knowledge. |
Hello! Soo - subrepo is actually doing the same as subhistory. From a new-person-like-me point of view. It is splitting the history, but it does have some 'stateness' by introducing config fles .gitrepo... I actually do have a feeling, that the features/issues you do talk about are there somehow solved. But there is also one common problem for both projects - both do filterbranching the whole project everytime (subrepo has some interesting thread ingydotnet/git-subrepo#142 that is about the same issues as this thread :P ). Longer answer:
Re: subcommit #4 and other things ... and usecases and my excited talking Actually I would be most happy if you would look at subrepo (i do suggest the wiki and code), take its inspirations or realize in what it is different. Iam a bash noob (look at my branch), but if You would come up with really clever thingies (sed on commit messages is cool) for subhistory (or, hell, even for subrepo, I dont care) so that all our ideas in this thread would work, I think I can code them with Your help. But I still do think that you deserve to be the owner. If subhistory would get some nice .subhist file, where we would be able to save remotes, and other thingies you do talk about and track the files moving and maybe, just maybe, track more than one folder in one subhist (possibly regexed subflders), I would be happy. I do have a use case! Two aready git-ed projects. With some common folders, where the code is exactly the same. |
I have solved my bash problems and added a shortcut for future splits - subhistory/start/$newbranch. Now I need Your wisdom to foresee if there can be any problems with this strategy. (+Also all the things from the previous post :) ) |
Cool! I'll respond here first, then take a deeper look at your branch. Less familiarity with Git and shell scripting is fine, just having a collaborator and user is already supremely helpful.
Git actually requires shell scripts to be written in a cross-compatible shell syntax that is mostly a subset of POSIX (which I think is a superset of Bourne, but is much smaller than Bash): https://github.com/git/git/blob/master/Documentation/CodingGuidelines I should probably mention that in the README or something somewhere.
That's actually no big deal, #4 is actually mostly about the edge case of merge conflicts in synthetic merge commits, which can only happen due to criss-crossed merges:
Here
Hmmm, can you explain further? I'm pretty hesitant about this. If you split out this subhistory, do you get multiple commit histories, one per folder, or are the folders are merged somehow?
So I went ahead and read up on
The downside is that it necessitates reinventing a custom merge base algorithm, and using rebase to fix the broken history when pushing or pulling. Rebasing like that elides merge commits, for one thing, although it might not be that often you intentionally want to push a commit history including merges. The custom merge base algorithm, at least as currently implemented, can only find one merge base even if there should be multiple due to criss-cross merges as mentioned above, merging based on that has problems compared to merging using Git's default "recursive" merge strategy: http://blog.plasticscm.com/2012/01/more-on-recursive-merge-strategy.html I don't think using my ideas to upgrade By the way, do you have a link to where they're discussing not squashing commits when merging? As far as I can tell they don't have anything like assimilate, they do the rebasing with Sub commits not Main commits, and they would need either something like assimilate or
Oh my! That's definitely unacceptable.
Hmmm, my main concern would be what if the current branch is reset to not a fast-forward, but I'll try to understand your code and get back to you. |
Hi!
Cool! Clever!
Lets say that my library is defined not only by subfolder but by a name scheme (/subfolder/* -> /sub * folder/ *). One superproject just needs to use the naming convention but the library essentialy 2 subfolders.
Oki Iam happy that you do see it! I saw only the shallow similarity. The squashing nonsquashing is here - "regarding the squash/no-squash it might be goo to have that optional on a subrepo basis." - ingydotnet/git-subrepo#142
I tried to add some checking if we can do it all faster. If you do see some cases when we cannot please do tell me :)
Actually now it is getting even more interesting! I need to share code in such a way that in repo A the subfolder has UTF-like encoding in .dfm files and in repo B the subfolder has not-utf encoding (c++ builder if you guessed). (sry for the absence of commas in this text I seem to have a broken keyboard) |
No, not unit test, I mean like in the Main repo outside the Sub folder. For example (this is upside-down from the criss-crossed merges example I gave above,
When creating the synthetic commit
I still don't understand—what does the separate repo for the subproject look like?
Hmm, I see. Do you think you can comment on your PR describing at a high level the changes you made in order to do this?
Well obviously you're gonna have to pick just one of those encodings for the shared Git commit history between A and B. But you can add a build step to one or both of them that re-encodes the files into whatever you need it to be, I guess?
|
Sorry, the github made my asterisk character look like formatting. So the idea was just that I might use to make a subhistory not only of one subfolder, but two subfolders (generalizes from /subfolder/asterisk files included in subhistory to /subfolder asterisk / asterisk ... in the case of subfolder1 and subfolder2 it would include both). They are not two separate subrepositories, it is just that these two folders do contain the code for one library actually. This usecase is obviously because the company project manager refuses to copy the files from two folders into one. As such I would repeat that it is not that important, It can be used as two separate subhistoried folders.
Ok. Thank you for the example, I see that merging these two branches would throw a conflict and so assimilationg has hard time too, but for me there is one thing misleading (and I think that Iam unclear in expressing how exactly). |
Ok, reminding myself we are talking about synthetic commits. I will try to read it all once again :) |
Unfortunately a git commit history has to have just one root folder, although you could probably symlink the two folders into folders in the subhistory folder.
Oh, in that example the only synthetic commit is
No, there were two separate splits. Imagine that first, when the |
I'm still grokking your PR but I was thinking about the idea of caching a separate map between Main commits and Sub commits, and I was thinking about how Basically instead of being embedded in the tree, the Sub commit corresponding to the Main commit is in a ref, which we ensure is always pushed to and pulled from the remote whenever the corresponding Main commit is: #8 |
Hi,
I like the idea of subrepo being absorbed into the main that you followed in
git-subhistory
. I currently usegit-subtree
for a while to manage 1 shared repo among 3 independent ones. Before I usedgit-submodule
but I didn't like it at all.My shared repo has 560 commits and independent repos have around 1-2k each. Since I passed 1k commits I noticed the performance issue and now pushing shared commit from one of main repos locally takes around 10 sec. Fetching updates from shared repo is actually OK, around 1-2 sec. locally.
How about
git-subhistory
, is it any better performance-wise? I'm sorry for not checking it myself before asking. I will definitely setup a benchmarking eventually myself.Thanks
The text was updated successfully, but these errors were encountered: