Skip to content

Commit cd949ff

Browse files
VTA Callgraph (#794)
* iterative Tarjan Algorithm implemenmted * SCCGeneric.h finished * typeAssignmentGraph ported * SCCGeneric.h, TypeTraits.h and TypeAssignmentGraph.h updated * updated template types * error corrections * minor corrections * SCCGeneric.h corrections * SCC Tests added and VTACallGraphTest ported * AnalysisConfig.h modified * new test case in SCCGenericTest.cpp * Update TAG, SCCs, and AdjacencyList to use TypedVector * Update TAG to llvm::DIType instead of llvm::Type + fix VTACallGraphTest with opaque pointers * Add call-graph tool to generate call-graphs for arbitrary LLVM IR files + compute CG statistics * Add resolver-based TAG construction * Fix SCCGenericTest * Refine the concepts for GraphTraits + add some comments * Some cleanup * Add ground-truth to SCCGenericTest + fix error in Compressor introduced by merge + mino * Some cleanup + some comments * Adapt VTACallGraphTest to TestingSrcLocation + measure timing in call-graph tool + some cleanup * Fix AdjacencyList with TypedVector * Fix stack-use-after-scope in TypedVector::operator[], materialized in minimizeGraph() * some cleanup * Replace Tarjan's algorithm with Pearce's algorithm for computing SCCs. This let's us compute SCCs and topological sorting in a single pass over the graph and also gets rid of the recursion * Also test recursive version of SCC computation * Use AliasIterator in VTA call-graph analysis * minors --------- Co-authored-by: bulletSpace <erik.binder@hotmail.com>
1 parent 23ea6b9 commit cd949ff

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+4254
-375
lines changed

BreakingChanges.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
## development HEAD
44

5+
- The `AdjacencyList` struct now now has one more template argument to denote the intege-like `vertex_t` type. It is the second template argument (which previously was the EdgeType). The edge-type is now denoted by the *third* template argument.
6+
- The `AdjacencyList` switches from using `llvm::NoneType` as empty-node marker to `psr::EmptyType` for forward-compatibility with LLVM-16 that removes `llvm::NoneType`.
7+
58
- Removed `SpecialSummaries`.
69
- Removed `Hexastore` and the corresponding database queries.
710
- Removed `LLVMTypeHierarchy` (and `LLVMTypeHierarchyData`), which is superceeded by `DIBasedTypeHierarchy`.

cmake/phasar_macros.cmake

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -361,7 +361,7 @@ function(add_phasar_library name)
361361
endif()
362362
endfunction(add_phasar_library)
363363

364-
macro(subdirlist result curdir)
364+
function(subdirlist result curdir)
365365
file(GLOB children RELATIVE ${curdir} ${curdir}/*)
366366
set(dirlist "")
367367

@@ -371,5 +371,5 @@ macro(subdirlist result curdir)
371371
endif()
372372
endforeach()
373373

374-
set(${result} ${dirlist})
375-
endmacro(subdirlist)
374+
set(${result} ${dirlist} PARENT_SCOPE)
375+
endfunction(subdirlist)

include/phasar/PhasarLLVM/ControlFlow.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,6 @@
2020
#include "phasar/PhasarLLVM/ControlFlow/Resolver/OTFResolver.h"
2121
#include "phasar/PhasarLLVM/ControlFlow/Resolver/RTAResolver.h"
2222
#include "phasar/PhasarLLVM/ControlFlow/Resolver/Resolver.h"
23+
#include "phasar/PhasarLLVM/ControlFlow/Resolver/VTAResolver.h"
2324

2425
#endif // PHASAR_PHASARLLVM_CONTROLFLOW_H

include/phasar/PhasarLLVM/ControlFlow/EntryFunctionUtils.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ getEntryFunctions(const LLVMProjectIRDB &IRDB,
2626
[[nodiscard]] std::vector<llvm::Function *>
2727
getEntryFunctionsMut(LLVMProjectIRDB &IRDB,
2828
llvm::ArrayRef<std::string> EntryPoints);
29+
30+
[[nodiscard]] std::vector<std::string>
31+
getDefaultEntryPoints(const LLVMProjectIRDB &IRDB);
2932
} // namespace psr
3033

3134
#endif // PHASAR_PHASARLLVM_UTILS_ENTRYFUNCTIONUTILS_H

include/phasar/PhasarLLVM/ControlFlow/LLVMBasedCFG.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
#include "llvm/IR/InstrTypes.h"
1919
#include "llvm/IR/Instructions.h"
2020

21+
#include "nlohmann/json.hpp"
22+
2123
namespace llvm {
2224
class Function;
2325
} // namespace llvm

include/phasar/PhasarLLVM/ControlFlow/LLVMBasedCallGraphBuilder.h

Lines changed: 43 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,66 @@ class DIBasedTypeHierarchy;
2121
class LLVMVFTableProvider;
2222
class Resolver;
2323

24+
/// Constructs a call-graph using the given CGResolver to resolve indirect
25+
/// calls.
26+
///
27+
/// Uses a fixpoint iteration, if
28+
/// `CGResolver.mutatesHelperAnalysisInformation()` returns true and the
29+
/// soundness S is not Soundness::Unsound.
30+
///
31+
/// \param IRDB The IR code where the call-graph should be based on
32+
/// \param CGResolver The resolver to use for resolving indirect calls.
33+
/// \param EntryPoints The functions, where the call-graph construction should
34+
/// start. The resulting call-graph will only contain functions that are
35+
/// (transitively) reachable from the entry-points.
36+
/// \param S The soundness level. May be used to trade soundness for
37+
/// performance.
2438
[[nodiscard]] LLVMBasedCallGraph
25-
buildLLVMBasedCallGraph(LLVMProjectIRDB &IRDB, CallGraphAnalysisType CGType,
39+
buildLLVMBasedCallGraph(const LLVMProjectIRDB &IRDB, Resolver &CGResolver,
2640
llvm::ArrayRef<const llvm::Function *> EntryPoints,
27-
DIBasedTypeHierarchy &TH, LLVMVFTableProvider &VTP,
28-
LLVMAliasInfoRef PT = nullptr,
2941
Soundness S = Soundness::Soundy);
3042

43+
/// Constructs a call-graph using the given CGResolver to resolve indirect
44+
/// calls.
45+
///
46+
/// Uses a fixpoint iteration, if
47+
/// `CGResolver.mutatesHelperAnalysisInformation()` returns true and the
48+
/// soundness S is not Soundness::Unsound.
49+
///
50+
/// \param IRDB The IR code where the call-graph should be based on
51+
/// \param CGResolver The resolver to use for resolving indirect calls.
52+
/// \param EntryPoints Names of the functions, where the call-graph construction
53+
/// should start. The resulting call-graph will only contain functions that are
54+
/// (transitively) reachable from the entry-points.
55+
/// \param S The soundness level. May be used to trade soundness for
56+
/// performance.
3157
[[nodiscard]] LLVMBasedCallGraph
3258
buildLLVMBasedCallGraph(const LLVMProjectIRDB &IRDB, Resolver &CGResolver,
33-
llvm::ArrayRef<const llvm::Function *> EntryPoints,
59+
llvm::ArrayRef<std::string> EntryPoints,
3460
Soundness S = Soundness::Soundy);
3561

62+
/// Kept for compatibility with LLVMBasedICFG. See the constructor of
63+
/// LLVMBasedICFG::LLVMBasedICFG(LLVMProjectIRDB *, CallGraphAnalysisType,
64+
/// llvm::ArrayRef<std::string>, DIBasedTypeHierarchy *, LLVMAliasInfoRef,
65+
/// Soundness, bool) for more information.
3666
[[nodiscard]] LLVMBasedCallGraph
3767
buildLLVMBasedCallGraph(LLVMProjectIRDB &IRDB, CallGraphAnalysisType CGType,
38-
llvm::ArrayRef<std::string> EntryPoints,
68+
llvm::ArrayRef<const llvm::Function *> EntryPoints,
3969
DIBasedTypeHierarchy &TH, LLVMVFTableProvider &VTP,
4070
LLVMAliasInfoRef PT = nullptr,
4171
Soundness S = Soundness::Soundy);
4272

73+
/// Kept for compatibility with LLVMBasedICFG. See the constructor of
74+
/// LLVMBasedICFG::LLVMBasedICFG(LLVMProjectIRDB *, CallGraphAnalysisType,
75+
/// llvm::ArrayRef<std::string>, DIBasedTypeHierarchy *, LLVMAliasInfoRef,
76+
/// Soundness, bool) for more information.
4377
[[nodiscard]] LLVMBasedCallGraph
44-
buildLLVMBasedCallGraph(const LLVMProjectIRDB &IRDB, Resolver &CGResolver,
78+
buildLLVMBasedCallGraph(LLVMProjectIRDB &IRDB, CallGraphAnalysisType CGType,
4579
llvm::ArrayRef<std::string> EntryPoints,
80+
DIBasedTypeHierarchy &TH, LLVMVFTableProvider &VTP,
81+
LLVMAliasInfoRef PT = nullptr,
4682
Soundness S = Soundness::Soundy);
83+
4784
} // namespace psr
4885

4986
#endif // PHASAR_PHASARLLVM_CONTROLFLOW_LLVMBASEDCALLGRAPHBUILDER_H

include/phasar/PhasarLLVM/ControlFlow/LLVMBasedICFG.h

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,12 @@ class LLVMBasedICFG : public LLVMBasedCFG, public ICFGBase<LLVMBasedICFG> {
6464
/// \param EntryPoints The names of the functions to start with when
6565
/// incrementally building up the ICFG. For whole-program analysis of an
6666
/// executable use {"main"}.
67-
/// \param TH The type-hierarchy implementation to use. Will be constructed
68-
/// on-the-fly if nullptr, but required
67+
/// \param TH The type-hierarchy implementation to use. Must be non-null, if
68+
/// the selected call-graph analysis requires type-hierarchy information;
69+
/// currently, this holds for the CHA and RTA algorithms.
6970
/// \param PT The points-to implementation to use. Will be constructed
70-
/// on-the-fly if nullptr, but required
71+
/// on-the-fly if nullptr, but required; currently, this holds for the OTF and
72+
/// VTA algorithms.
7173
/// \param S The soundness level to expect from the analysis. Currently unused
7274
/// \param IncludeGlobals Properly include global constructors/destructors
7375
/// into the ICFG, if true. Requires to generate artificial functions into the
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
/******************************************************************************
2+
* Copyright (c) 2025 Fabian Schiebel.
3+
* All rights reserved. This program and the accompanying materials are made
4+
* available under the terms of LICENSE.txt.
5+
*
6+
* Contributors:
7+
* Fabian Schiebel and others
8+
*****************************************************************************/
9+
10+
#ifndef PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_PRECOMPUTEDRESOLVER_H
11+
#define PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_PRECOMPUTEDRESOLVER_H
12+
13+
#include "phasar/PhasarLLVM/ControlFlow/LLVMBasedCallGraph.h"
14+
#include "phasar/PhasarLLVM/ControlFlow/Resolver/Resolver.h"
15+
#include "phasar/Utils/MaybeUniquePtr.h"
16+
17+
namespace psr {
18+
/// \brief A Resolver that uses a pre-computed call-graph to resolve indirect
19+
/// calls.
20+
///
21+
/// \note We eventually may want the LLVMBasedCallGraph to *be* a Resolver. This
22+
/// requires the concept of resolvers to generalize beyond LLVM. See
23+
/// <https://github.com/fabianbs96/phasar/tree/f-ResolverCombinators> for
24+
/// reference
25+
class PrecomputedResolver : public Resolver {
26+
public:
27+
PrecomputedResolver(const LLVMProjectIRDB *IRDB,
28+
const LLVMVFTableProvider *VTP,
29+
MaybeUniquePtr<const LLVMBasedCallGraph> BaseCG);
30+
31+
[[nodiscard]] bool
32+
mutatesHelperAnalysisInformation() const noexcept override {
33+
return false;
34+
}
35+
36+
void resolveVirtualCall(FunctionSetTy &PossibleTargets,
37+
const llvm::CallBase *CallSite) override {
38+
resolveFunctionPointer(PossibleTargets, CallSite);
39+
}
40+
41+
void resolveFunctionPointer(FunctionSetTy &PossibleTargets,
42+
const llvm::CallBase *CallSite) override;
43+
44+
[[nodiscard]] std::string str() const override;
45+
46+
private:
47+
MaybeUniquePtr<const LLVMBasedCallGraph> BaseCG;
48+
};
49+
} // namespace psr
50+
51+
#endif

include/phasar/PhasarLLVM/ControlFlow/Resolver/Resolver.h

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#define PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_RESOLVER_H_
1919

2020
#include "phasar/PhasarLLVM/Pointer/LLVMAliasInfo.h"
21+
#include "phasar/Utils/MaybeUniquePtr.h"
2122

2223
#include "llvm/ADT/DenseSet.h"
2324
#include "llvm/ADT/SmallVector.h"
@@ -109,12 +110,37 @@ class Resolver {
109110
[[nodiscard]] llvm::ArrayRef<const llvm::Function *>
110111
getAddressTakenFunctions();
111112

112-
[[nodiscard]] static std::unique_ptr<Resolver>
113+
using BaseResolverProvider = llvm::function_ref<MaybeUniquePtr<Resolver>(
114+
const LLVMProjectIRDB *IRDB, const LLVMVFTableProvider *VTP,
115+
const DIBasedTypeHierarchy *TH, LLVMAliasInfoRef PT)>;
116+
117+
/// Factory function to create a Resolver that can be used to implement the
118+
/// given call-graph analysis type.
119+
///
120+
/// \param Ty Determines the Resolver subclass to instantiate
121+
/// \param IRDB The IR code where the Resolver should be based on. Must not be
122+
/// nullptr.
123+
/// \param VTP A virtual-table-provider that is used to extract C++-VTables
124+
/// from the IR. Must not be nullptr.
125+
/// \param TH The type-hierarchy implementation to use. Must be non-null, if
126+
/// the selected call-graph analysis requires type-hierarchy information;
127+
/// currently, this holds for the CHA and RTA algorithms.
128+
/// \param PT The points-to implementation to use. Will be constructed
129+
/// on-the-fly if nullptr, but required; currently, this holds for the OTF and
130+
/// VTA algorithms.
131+
static std::unique_ptr<Resolver>
113132
create(CallGraphAnalysisType Ty, const LLVMProjectIRDB *IRDB,
114133
const LLVMVFTableProvider *VTP, const DIBasedTypeHierarchy *TH,
115-
LLVMAliasInfoRef PT = nullptr);
134+
LLVMAliasInfoRef PT = nullptr,
135+
BaseResolverProvider GetBaseRes = nullptr);
116136

117137
protected:
138+
virtual void resolveVirtualCall(FunctionSetTy &PossibleTargets,
139+
const llvm::CallBase *CallSite) = 0;
140+
141+
virtual void resolveFunctionPointer(FunctionSetTy &PossibleTargets,
142+
const llvm::CallBase *CallSite);
143+
118144
const llvm::Function *
119145
getNonPureVirtualVFTEntry(const llvm::DIType *T, unsigned Idx,
120146
const llvm::CallBase *CallSite,
@@ -125,17 +151,12 @@ class Resolver {
125151
return psr::getNonPureVirtualVFTEntry(T, Idx, CallSite, *VTP, ReceiverType);
126152
}
127153

154+
// ---
155+
128156
const LLVMProjectIRDB *IRDB{};
129157
const LLVMVFTableProvider *VTP{};
130158
std::optional<llvm::SmallVector<const llvm::Function *, 0>>
131159
AddressTakenFunctions{};
132-
133-
protected:
134-
virtual void resolveVirtualCall(FunctionSetTy &PossibleTargets,
135-
const llvm::CallBase *CallSite) = 0;
136-
137-
virtual void resolveFunctionPointer(FunctionSetTy &PossibleTargets,
138-
const llvm::CallBase *CallSite);
139160
};
140161
} // namespace psr
141162

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
/******************************************************************************
2+
* Copyright (c) 2025 Fabian Schiebel.
3+
* All rights reserved. This program and the accompanying materials are made
4+
* available under the terms of LICENSE.txt.
5+
*
6+
* Contributors:
7+
* Fabian Schiebel and others
8+
*****************************************************************************/
9+
10+
#ifndef PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_VTARESOLVER_H
11+
#define PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_VTARESOLVER_H
12+
13+
#include "phasar/PhasarLLVM/ControlFlow/LLVMBasedCallGraph.h"
14+
#include "phasar/PhasarLLVM/ControlFlow/Resolver/Resolver.h"
15+
#include "phasar/PhasarLLVM/ControlFlow/VTA/TypePropagator.h"
16+
#include "phasar/PhasarLLVM/Pointer/LLVMAliasInfo.h"
17+
#include "phasar/Utils/Compressor.h"
18+
#include "phasar/Utils/MaybeUniquePtr.h"
19+
#include "phasar/Utils/SCCGeneric.h"
20+
21+
#include "llvm/ADT/STLFunctionalExtras.h"
22+
23+
namespace psr {
24+
25+
class LLVMProjectIRDB;
26+
27+
/// \brief A Resolver that uses a variant of the Variable Type Analysis to
28+
/// resolver indirect calls.
29+
///
30+
/// Uses debug-information to achieve better results with C++ virtual calls.
31+
/// Uses alias-information as fallback mechanism for when types don't help or
32+
/// are not found, e.g., to resolve function-pointer calls.
33+
///
34+
/// Requires a base-call-graph or at least a base-resolver to resolve indirect
35+
/// calls while constructing the type-assignment graph.
36+
class VTAResolver : public Resolver {
37+
public:
38+
struct DefaultReachableFunctions {
39+
void operator()(const LLVMProjectIRDB &IRDB,
40+
llvm::function_ref<void(const llvm::Function *)> WithFun);
41+
};
42+
43+
/// Constructs a VTAResolver with a given pre-computed call-graph and
44+
/// alias-information
45+
///
46+
/// Builds the type-assignment graph and propagates allocated types through
47+
/// it's SCCs.
48+
explicit VTAResolver(const LLVMProjectIRDB *IRDB,
49+
const LLVMVFTableProvider *VTP, LLVMAliasIteratorRef AS,
50+
MaybeUniquePtr<const LLVMBasedCallGraph> BaseCG);
51+
52+
/// Constructs a VTAResolver with a given base-resolver (no base-call-graph)
53+
/// and alias-information
54+
/// Uses the optional parameter ReachableFunctions to consider only a subset
55+
/// of all functions for building the type-assignment graph
56+
///
57+
/// Builds the type-assignment graph and propagates allocated types through
58+
/// it's SCCs.
59+
explicit VTAResolver(
60+
const LLVMProjectIRDB *IRDB, const LLVMVFTableProvider *VTP,
61+
LLVMAliasIteratorRef AS, MaybeUniquePtr<Resolver> BaseRes,
62+
llvm::function_ref<void(const LLVMProjectIRDB &,
63+
llvm::function_ref<void(const llvm::Function *)>)>
64+
ReachableFunctions = DefaultReachableFunctions{});
65+
66+
[[nodiscard]] std::string str() const override;
67+
68+
[[nodiscard]] bool
69+
mutatesHelperAnalysisInformation() const noexcept override {
70+
return false;
71+
}
72+
73+
private:
74+
void resolveVirtualCall(FunctionSetTy &PossibleTargets,
75+
const llvm::CallBase *CallSite) override;
76+
77+
void resolveFunctionPointer(FunctionSetTy &PossibleTargets,
78+
const llvm::CallBase *CallSite) override;
79+
80+
MaybeUniquePtr<Resolver> BaseResolver{};
81+
vta::TypeAssignment TA{};
82+
SCCHolder<vta::TAGNodeId> SCCs{};
83+
Compressor<vta::TAGNode, vta::TAGNodeId> Nodes;
84+
};
85+
} // namespace psr
86+
87+
#endif // PHASAR_PHASARLLVM_CONTROLFLOW_RESOLVER_VTARESOLVER_H

0 commit comments

Comments
 (0)