-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to identify and substitute variables that are used in expressions in C code? #228
Comments
There are a few ways one can approach this problem in ROSE. One way to accomplish it would be to use a synthesized attribute where, as the traversal works from the leaves back up the tree, the synthesized attribute could accumulate any variable references to globals that you wish to transform to be passed as parameters. When this traversal encounters an SgFunctionDeclaration node you would test the synthesized attribute to see if any of the descendants of the declaration are references to the global and modify the parameter set of the corresponding SgFunctionDeclaration accordingly. There are a couple tutorial examples that use the AstBottomUpProcessing class that demonstrate simple uses of synthesized attributes. |
Here's a quick example that may help. It is a bit more verbose than necessary but I think it makes the process clear. On your input example it produces the following code as output: static int n;
int f(int x,int n)
{
return x * x + n;
}
int main()
{
return 0;
} First, I define the synthesized attribute: #include "rose.h"
class SynthesizedAttribute {
public:
std::vector<SgInitializedName*> globalStatics;
std::unordered_map<SgFunctionDefinition*, std::vector<SgInitializedName*> > worklist;
void accumulate(const SynthesizedAttribute& other) {
for (auto e: other.globalStatics) {
globalStatics.push_back(e);
}
for (auto kv : other.worklist) {
worklist.insert(kv);
}
}
}; This is used by the bottom-up traversal to accumulate the list of SgInitializedName objects corresponding to global static variables. The worklist is used to accumulate the set of SgFunctionDefinition objects that contain references to global static variables. The traversal to build this is defined as: class visitorTraversal : public AstBottomUpProcessing<SynthesizedAttribute>
{
public:
// virtual function must be defined
virtual SynthesizedAttribute evaluateSynthesizedAttribute (SgNode* n, SynthesizedAttributesList childAttributes );
};
SynthesizedAttribute
visitorTraversal::evaluateSynthesizedAttribute ( SgNode* n, SynthesizedAttributesList childAttributes )
{
SynthesizedAttribute localResult;
for (SynthesizedAttribute child : childAttributes) {
localResult.accumulate(child);
}
if (SgVarRefExp *vre = isSgVarRefExp(n)) {
SgVariableSymbol *sym = vre->get_symbol();
SgInitializedName *name = sym->get_declaration();
if (isSgGlobal(name->get_scope()) && name->get_storageModifier().isStatic()) {
localResult.globalStatics.push_back(name);
}
} else if (SgFunctionDefinition *fundef = isSgFunctionDefinition(n)) {
if (localResult.globalStatics.size() > 0) {
localResult.worklist.insert({fundef, localResult.globalStatics});
std::cout << "FunDef: " << fundef->get_declaration()->get_name() << std::endl;
for (SgInitializedName *name : localResult.globalStatics) {
std::cout << " Referenced global static: " << name->get_name() << std::endl;
}
}
}
return localResult;
} This visits each node in the AST in a bottom up fashion. First, the visitor accumulates the results from the visit to the children of the current node, and then tests if the current node is either a variable reference expression or a function definition. If it's a variable reference, it checks if the initialized name meets the criteria of being global and static. If so, it stores it in the list of initialized names to pass up the tree. If it's a function definition, then it stores the relation of the function definition to the list of global static variables that were found below it. If a function definition referenced no globals that we want to transform into parameters, it gets skipped. The main program invokes this traversal to build up the worklist of things to transform. Given the result, it modifies the function definition by adding the parameter and storing the set of mappings from global static variables to the new parameters. This is then passed to a second traversal that walks the function body and rewrites variable references to the new parameter. Here is the second traversal that updates variable references: class UpdateVarRefs: public AstSimpleProcessing {
std::unordered_map<SgInitializedName*, SgInitializedName*> replacements;
public:
UpdateVarRefs(std::unordered_map<SgInitializedName*, SgInitializedName*> replacements): replacements(replacements) {}
virtual void visit(SgNode *n);
};
void UpdateVarRefs::visit(SgNode *n) {
namespace SB = SageBuilder;
if (SgVarRefExp *vre = isSgVarRefExp(n)) {
SgInitializedName *name = vre->get_symbol()->get_declaration();
if (replacements.find(name) != replacements.end()) {
std::cout << "Updating var ref: " << vre->get_symbol()->get_declaration()->get_name() << std::endl;
SgInitializedName *replacement = replacements[name];
if (SgVariableSymbol *sym = isSgVariableSymbol(replacement->search_for_symbol_from_symbol_table())) {
vre->set_symbol(sym);
}
}
}
} And here is the main that brings it all together. int main(int argc, char **argv) {
ROSE_INITIALIZE;
SgProject *project = frontend(argc, argv);
// Build the traversal object and call "traverse" member function
visitorTraversal exampleTraversal;
// the synthesized attribute will contain a worklist containing all functions to transform.
// we will need to transform all of the function definitions as well as all call sites to them.
SynthesizedAttribute results = exampleTraversal.traverse(project);
for (auto workitem : results.worklist) {
SgFunctionDefinition *fundef = workitem.first;
std::cout << "Rewriting parameters for " << fundef->get_declaration()->get_name() << std::endl;
std::vector<SgInitializedName*> vars = workitem.second;
SgFunctionParameterList *params = fundef->get_declaration()->get_parameterList();
std::unordered_map<SgInitializedName*, SgInitializedName*> replacements;
for (SgInitializedName* gvar : vars) {
std::cout << "Adding param : " << gvar->get_name() << std::endl;
SgInitializedName *newparam = SageBuilder::buildInitializedName(gvar->get_name(), gvar->get_type());
SageInterface::appendArg(params, newparam);
replacements.insert({gvar,newparam});
}
UpdateVarRefs(replacements).traverse(fundef->get_body(), postorder);
}
project->unparse();
return 0;
} An important subtle note: in my code, I use SageInterface::appendArg to add the argument to the parameter list. In your code, you used the append_arg function on the parameter list object itself. The SageInterface call makes sure that the symbol that is created is placed in the right scoping unit for you. If you call the append_arg function on the parameter list directly, you'd need to handle that yourself. It's a good practice to call the helpers in SageInterface since it's easy to get that stuff right in creating the data structures. You can probably use the information that the synthesized attribute to do what I assume is the next step and rewrite the call sites to the functions that were transformed. In that case I'd write a similar traversal that looks for function call expressions and rewrites them to include the parameters. The only trick here is that you may need to do some call path analysis since a function that does not access a global static variable may need to have it added to its parameter list if that function calls a subsequent function that does reference a global. Hopefully this helps. You should be able to paste the code blocks above in order into a C++ file and compile it with the current ROSE. |
Thank you so much! The detailed analysis and explanation helps a ton! I ran the code on my example, and it works as expected, but it still fails with this other example: Source: #include <stdio.h>
static int n = 5;
int f(int x) {
n = x + 10;
return x*x + n;
}
int main() {
printf("Original n value: n = %d\n", n);
int f_val = f(2);
printf("Value of f(n): f(n) = %d\nNew value for n: n = %d\n", f_val, n);
return 0;
} Expected output: #include <stdio.h>
int f(int x, int n) {
n = x + 10;
return x*x + n;
}
int main() {
int n = 5;
printf("Original n value: n = %d\n", n);
int f_val = f(2, n);
printf("Value of f(n): f(n) = %d\nNew value for n: n = %d\n", f_val, n);
return 0;
} Actual output: #include <stdio.h>
static int n = 5;
int f(int x,int n,int n)
{
n = x + 10;
return x * x + n;
}
int main(n,n)
int n;
int n;
{
printf("Original n value: n = %d\n",n);
int f_val = f(2);
printf("Value of f(n): f(n) = %d\nNew value for n: n = %d\n",f_val,n);
return 0;
} So, it seems that the program adds the global variable identifier multiple times if there is more than one use of the given variable inside a function. I've attempted to fix the problem by performing the following uniqueness check in the SynthesizedAttribute class: class SynthesizedAttribute
{
public:
std::vector<SgInitializedName *> globalStatics;
std::unordered_map<SgFunctionDefinition *, std::vector<SgInitializedName *>> worklist;
void accumulate(const SynthesizedAttribute &other)
{
for (auto e : other.globalStatics)
{// check for uniqueness before appending
auto it = std::find(globalStatics.begin(), globalStatics.end(), e);
if (it == globalStatics.end()) {
globalStatics.push_back(e);
}
}
for (auto kv : other.worklist)
{
worklist.insert(kv);
}
}
}; With this fix, the program no longer appends the parameter I've also added a simple fix for the main function, which is the entry point, by adding an exception in the main traversal: if ((localResult.globalStatics.size() > 0) && (fundef->get_declaration()->get_name() != "main"))
{
localResult.worklist.insert({fundef, localResult.globalStatics});
std::cout << "FunDef: " << fundef->get_declaration()->get_name() << std::endl;
for (SgInitializedName *name : localResult.globalStatics)
{
std::cout << " Referenced global static: " << name->get_name() << std::endl;
}
} Also, there's the problem which you pointed as the next step, which is the fact that the call sites of the modified functions are still unaltered:
In this case, would it be possible to traverse the function call graph in a bottom-up approach (such that the leaf nodes are the most deeply nested inside the call stack)? That way, if I iteratively update the child nodes and properly add the parameters in their call sites, the parent nodes will directly require the global variable in question. Finally, I'm publishing this code in a public repo here, with due credit, of course. As a disclaimer, I'm doing this for educational and research purposes only! If you feel uncomfortable with me putting the code in the repository, feel free to tell me and I'm going to remove it as soon as possible! Again, many thanks! (and sorry for the very late reply) |
Looks like it's my turn for the long delay - I missed the notification for your response. Regarding the problem related to the call graph. You might start looking at some documentation on call graph analysis with ROSE: WikiBooks CallGraph chapter. I am not sure if that code is still up to date with the latest ROSE. Given the call graph, I would approach the problem this way:
I believe it boils down to something like that - a couple of traversals combined with a couple of maps that map functions to name sets, where one traversal builds the initial map of sets, and the other performs the union of sets based on the call graph. Regarding the public repo: no problem with that. Given some spare cycles, I can help out on that code if you try to implement what I described above and get stuck. |
Thank you! I'm going to look into the chapter you mentioned. Also, as a side note: I'd like to contribute to ROSE's documentation somehow, but I am unsure where to put it. I think it would be interesting (and useful) if the code in the repository got integrated in the ROSE examples, and I'd be willing to do a step-by-step explanation, once finished, for instance. |
I can ask about how we'd integrate contributed documentation / examples. If you were to develop that example out to the point where you're happy with it and write up a short step-by-step explanation as a README.md that accompanies the code it would be useful. For now having it in its own repository would be fine. We might add a page to the Wikibook that can point at repositories containing examples like the one you're putting together. If you were to write other examples, you might have a repository for each or just have one "ROSE examples" repository that collects them together. |
Right, yeah I'd also have to think about how I'd do it, because I'm still not sure. I'm sure we'll figure it out as the repo is updated, though |
I want to transform static global variables into function arguments for functions that use said variables. For example:
Would get converted to:
This is the code that I came up with in an attempt to perform the transformation:
Unfortunately, my solution does not work. Some mistakes that I could identify:
if (funcDef->get_body()->lookup_variable_symbol(var->get_name()) == NULL)
doesn't work, because it does not traverse the AST searching for uses of the variableI searched the documentation and the tutorials for this information but I still haven't found what I need to do the desired transformation. Any tips?
The text was updated successfully, but these errors were encountered: