Skip to content

Commit 01b5d14

Browse files
committed
Merge: Backport KVM to 6.15
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/865 Backport common and x86 KVM code to upstream kernel 6.15 JIRA: https://issues.redhat.com/browse/RHEL-82917 Tested by: * x86: run kvm unit tests and kernel selftests on amd (virtlab800.virt.eng.rdu2.dc.redhat.com) and intel machine (virtlab807.virt.eng.rdu2.dc.redhat.com) * aarch64: run kvm unit tests and kernel selftests on hpe-apollo-cn99xx-06.khw.eng.rdu2.dc.redhat.com * s390x/ppc64le: compile tested only. Omitted-fix: 6a3d704 ("KVM: x86/mmu: Use kvm_x86_call() instead of manual static_call()") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Approved-by: Rafael Aquini <raquini@redhat.com> Approved-by: Jerry Snitselaar <jsnitsel@redhat.com> Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com> Approved-by: Julio Faracco <jfaracco@redhat.com> Approved-by: Steve Best <sbest@redhat.com> Approved-by: Lucas Zampieri <lzampier@redhat.com> Approved-by: Eric Auger <eric.auger@redhat.com> Approved-by: Sebastian Ott <sebott@redhat.com> Merged-by: Julio Faracco <jfaracco@redhat.com>
2 parents 56d6c93 + 442623d commit 01b5d14

File tree

274 files changed

+7093
-4879
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

274 files changed

+7093
-4879
lines changed

Documentation/virt/kvm/api.rst

Lines changed: 478 additions & 461 deletions
Large diffs are not rendered by default.

Documentation/virt/kvm/locking.rst

Lines changed: 43 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,8 @@ We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
135135
For direct sp, we can easily avoid it since the spte of direct sp is fixed
136136
to gfn. For indirect sp, we disabled fast page fault for simplicity.
137137

138-
A solution for indirect sp could be to pin the gfn, for example via
139-
gfn_to_pfn_memslot_atomic, before the cmpxchg. After the pinning:
138+
A solution for indirect sp could be to pin the gfn before the cmpxchg. After
139+
the pinning:
140140

141141
- We have held the refcount of pfn; that means the pfn can not be freed and
142142
be reused for another gfn.
@@ -147,54 +147,56 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn.
147147

148148
2) Dirty bit tracking
149149

150-
In the origin code, the spte can be fast updated (non-atomically) if the
150+
In the original code, the spte can be fast updated (non-atomically) if the
151151
spte is read-only and the Accessed bit has already been set since the
152152
Accessed bit and Dirty bit can not be lost.
153153

154154
But it is not true after fast page fault since the spte can be marked
155155
writable between reading spte and updating spte. Like below case:
156156

157-
+------------------------------------------------------------------------+
158-
| At the beginning:: |
159-
| |
160-
| spte.W = 0 |
161-
| spte.Accessed = 1 |
162-
+------------------------------------+-----------------------------------+
163-
| CPU 0: | CPU 1: |
164-
+------------------------------------+-----------------------------------+
165-
| In mmu_spte_clear_track_bits():: | |
166-
| | |
167-
| old_spte = *spte; | |
168-
| | |
169-
| | |
170-
| /* 'if' condition is satisfied. */| |
171-
| if (old_spte.Accessed == 1 && | |
172-
| old_spte.W == 0) | |
173-
| spte = 0ull; | |
174-
+------------------------------------+-----------------------------------+
175-
| | on fast page fault path:: |
176-
| | |
177-
| | spte.W = 1 |
178-
| | |
179-
| | memory write on the spte:: |
180-
| | |
181-
| | spte.Dirty = 1 |
182-
+------------------------------------+-----------------------------------+
183-
| :: | |
184-
| | |
185-
| else | |
186-
| old_spte = xchg(spte, 0ull) | |
187-
| if (old_spte.Accessed == 1) | |
188-
| kvm_set_pfn_accessed(spte.pfn);| |
189-
| if (old_spte.Dirty == 1) | |
190-
| kvm_set_pfn_dirty(spte.pfn); | |
191-
| OOPS!!! | |
192-
+------------------------------------+-----------------------------------+
157+
+-------------------------------------------------------------------------+
158+
| At the beginning:: |
159+
| |
160+
| spte.W = 0 |
161+
| spte.Accessed = 1 |
162+
+-------------------------------------+-----------------------------------+
163+
| CPU 0: | CPU 1: |
164+
+-------------------------------------+-----------------------------------+
165+
| In mmu_spte_update():: | |
166+
| | |
167+
| old_spte = *spte; | |
168+
| | |
169+
| | |
170+
| /* 'if' condition is satisfied. */ | |
171+
| if (old_spte.Accessed == 1 && | |
172+
| old_spte.W == 0) | |
173+
| spte = new_spte; | |
174+
+-------------------------------------+-----------------------------------+
175+
| | on fast page fault path:: |
176+
| | |
177+
| | spte.W = 1 |
178+
| | |
179+
| | memory write on the spte:: |
180+
| | |
181+
| | spte.Dirty = 1 |
182+
+-------------------------------------+-----------------------------------+
183+
| :: | |
184+
| | |
185+
| else | |
186+
| old_spte = xchg(spte, new_spte);| |
187+
| if (old_spte.Accessed && | |
188+
| !new_spte.Accessed) | |
189+
| flush = true; | |
190+
| if (old_spte.Dirty && | |
191+
| !new_spte.Dirty) | |
192+
| flush = true; | |
193+
| OOPS!!! | |
194+
+-------------------------------------+-----------------------------------+
193195

194196
The Dirty bit is lost in this case.
195197

196198
In order to avoid this kind of issue, we always treat the spte as "volatile"
197-
if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means
199+
if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means
198200
the spte is always atomically updated in this case.
199201

200202
3) flush tlbs due to spte updated
@@ -210,7 +212,7 @@ function to update spte (present -> present).
210212

211213
Since the spte is "volatile" if it can be updated out of mmu-lock, we always
212214
atomically update the spte and the race caused by fast page fault can be avoided.
213-
See the comments in spte_has_volatile_bits() and mmu_spte_update().
215+
See the comments in spte_needs_atomic_update() and mmu_spte_update().
214216

215217
Lockless Access Tracking:
216218

Documentation/virt/kvm/x86/errata.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,18 @@ Note however that any software (e.g ``WIN87EM.DLL``) expecting these features
3333
to be present likely predates these CPUID feature bits, and therefore
3434
doesn't know to check for them anyway.
3535

36+
``KVM_SET_VCPU_EVENTS`` issue
37+
-----------------------------
38+
39+
Invalid KVM_SET_VCPU_EVENTS input with respect to error codes *may* result in
40+
failed VM-Entry on Intel CPUs. Pre-CET Intel CPUs require that exception
41+
injection through the VMCS correctly set the "error code valid" flag, e.g.
42+
require the flag be set when injecting a #GP, clear when injecting a #UD,
43+
clear when injecting a soft exception, etc. Intel CPUs that enumerate
44+
IA32_VMX_BASIC[56] as '1' relax VMX's consistency checks, and AMD CPUs have no
45+
restrictions whatsoever. KVM_SET_VCPU_EVENTS doesn't sanity check the vector
46+
versus "has_error_code", i.e. KVM's ABI follows AMD behavior.
47+
3648
Nested virtualization features
3749
------------------------------
3850

MAINTAINERS

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12492,8 +12492,8 @@ F: arch/arm64/include/asm/kvm*
1249212492
F: arch/arm64/include/uapi/asm/kvm*
1249312493
F: arch/arm64/kvm/
1249412494
F: include/kvm/arm_*
12495-
F: tools/testing/selftests/kvm/*/aarch64/
12496-
F: tools/testing/selftests/kvm/aarch64/
12495+
F: tools/testing/selftests/kvm/*/arm64/
12496+
F: tools/testing/selftests/kvm/arm64/
1249712497

1249812498
KERNEL VIRTUAL MACHINE FOR LOONGARCH (KVM/LoongArch)
1249912499
M: Tianrui Zhao <zhaotianrui@loongson.cn>
@@ -12564,8 +12564,8 @@ F: arch/s390/kvm/
1256412564
F: arch/s390/mm/gmap.c
1256512565
F: drivers/s390/char/uvdevice.c
1256612566
F: tools/testing/selftests/drivers/s390x/uvdevice/
12567-
F: tools/testing/selftests/kvm/*/s390x/
12568-
F: tools/testing/selftests/kvm/s390x/
12567+
F: tools/testing/selftests/kvm/*/s390/
12568+
F: tools/testing/selftests/kvm/s390/
1256912569

1257012570
KERNEL VIRTUAL MACHINE FOR X86 (KVM/x86)
1257112571
M: Sean Christopherson <seanjc@google.com>
@@ -12582,8 +12582,8 @@ F: arch/x86/include/uapi/asm/svm.h
1258212582
F: arch/x86/include/uapi/asm/vmx.h
1258312583
F: arch/x86/kvm/
1258412584
F: arch/x86/kvm/*/
12585-
F: tools/testing/selftests/kvm/*/x86_64/
12586-
F: tools/testing/selftests/kvm/x86_64/
12585+
F: tools/testing/selftests/kvm/*/x86/
12586+
F: tools/testing/selftests/kvm/x86/
1258712587

1258812588
KERNFS
1258912589
M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arch/arm64/include/asm/kvm_host.h

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1166,7 +1166,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
11661166
void kvm_arm_halt_guest(struct kvm *kvm);
11671167
void kvm_arm_resume_guest(struct kvm *kvm);
11681168

1169-
#define vcpu_has_run_once(vcpu) !!rcu_access_pointer((vcpu)->pid)
1169+
#define vcpu_has_run_once(vcpu) (!!READ_ONCE((vcpu)->pid))
11701170

11711171
#ifndef __KVM_NVHE_HYPERVISOR__
11721172
#define kvm_call_hyp_nvhe(f, ...) \
@@ -1346,8 +1346,6 @@ static inline bool kvm_system_needs_idmapped_vectors(void)
13461346
return cpus_have_final_cap(ARM64_SPECTRE_V3A);
13471347
}
13481348

1349-
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
1350-
13511349
void kvm_init_host_debug_data(void);
13521350
void kvm_vcpu_load_debug(struct kvm_vcpu *vcpu);
13531351
void kvm_vcpu_put_debug(struct kvm_vcpu *vcpu);
@@ -1555,4 +1553,9 @@ void kvm_set_vm_id_reg(struct kvm *kvm, u32 reg, u64 val);
15551553
#define kvm_has_s1poe(k) \
15561554
(kvm_has_feat((k), ID_AA64MMFR3_EL1, S1POE, IMP))
15571555

1556+
static inline bool kvm_arch_has_irq_bypass(void)
1557+
{
1558+
return true;
1559+
}
1560+
15581561
#endif /* __ARM64_KVM_HOST_H__ */

arch/arm64/include/uapi/asm/kvm.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,6 @@
4343
#define KVM_COALESCED_MMIO_PAGE_OFFSET 1
4444
#define KVM_DIRTY_LOG_PAGE_OFFSET 64
4545

46-
#define KVM_REG_SIZE(id) \
47-
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
48-
4946
struct kvm_regs {
5047
struct user_pt_regs regs; /* sp = sp_el0 */
5148

arch/arm64/kvm/arm.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2712,11 +2712,6 @@ bool kvm_arch_irqchip_in_kernel(struct kvm *kvm)
27122712
return irqchip_in_kernel(kvm);
27132713
}
27142714

2715-
bool kvm_arch_has_irq_bypass(void)
2716-
{
2717-
return true;
2718-
}
2719-
27202715
int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
27212716
struct irq_bypass_producer *prod)
27222717
{

arch/arm64/kvm/guest.c

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1044,21 +1044,19 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
10441044
}
10451045

10461046
while (length > 0) {
1047-
kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
1047+
struct page *page = __gfn_to_page(kvm, gfn, write);
10481048
void *maddr;
10491049
unsigned long num_tags;
1050-
struct page *page;
10511050
struct folio *folio;
10521051

1053-
if (is_error_noslot_pfn(pfn)) {
1052+
if (!page) {
10541053
ret = -EFAULT;
10551054
goto out;
10561055
}
10571056

1058-
page = pfn_to_online_page(pfn);
1059-
if (!page) {
1057+
if (!pfn_to_online_page(page_to_pfn(page))) {
10601058
/* Reject ZONE_DEVICE memory */
1061-
kvm_release_pfn_clean(pfn);
1059+
kvm_release_page_unused(page);
10621060
ret = -EFAULT;
10631061
goto out;
10641062
}
@@ -1075,7 +1073,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
10751073
/* No tags in memory, so write zeros */
10761074
num_tags = MTE_GRANULES_PER_PAGE -
10771075
clear_user(tags, MTE_GRANULES_PER_PAGE);
1078-
kvm_release_pfn_clean(pfn);
1076+
kvm_release_page_clean(page);
10791077
} else {
10801078
/*
10811079
* Only locking to serialise with a concurrent
@@ -1097,7 +1095,7 @@ int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
10971095
else
10981096
set_page_mte_tagged(page);
10991097

1100-
kvm_release_pfn_dirty(pfn);
1098+
kvm_release_page_dirty(page);
11011099
}
11021100

11031101
if (num_tags != MTE_GRANULES_PER_PAGE) {

arch/arm64/kvm/mmu.c

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1475,6 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
14751475
long vma_pagesize, fault_granule;
14761476
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
14771477
struct kvm_pgtable *pgt;
1478+
struct page *page;
14781479
enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED;
14791480

14801481
if (fault_is_perm)
@@ -1604,7 +1605,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16041605

16051606
/*
16061607
* Read mmu_invalidate_seq so that KVM can detect if the results of
1607-
* vma_lookup() or __gfn_to_pfn_memslot() become stale prior to
1608+
* vma_lookup() or __kvm_faultin_pfn() become stale prior to
16081609
* acquiring kvm->mmu_lock.
16091610
*
16101611
* Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
@@ -1613,8 +1614,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16131614
mmu_seq = vcpu->kvm->mmu_invalidate_seq;
16141615
mmap_read_unlock(current->mm);
16151616

1616-
pfn = __gfn_to_pfn_memslot(memslot, gfn, false, false, NULL,
1617-
write_fault, &writable, NULL);
1617+
pfn = __kvm_faultin_pfn(memslot, gfn, write_fault ? FOLL_WRITE : 0,
1618+
&writable, &page);
16181619
if (pfn == KVM_PFN_ERR_HWPOISON) {
16191620
kvm_send_hwpoison_signal(hva, vma_shift);
16201621
return 0;
@@ -1627,7 +1628,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
16271628
* If the page was identified as device early by looking at
16281629
* the VMA flags, vma_pagesize is already representing the
16291630
* largest quantity we can map. If instead it was mapped
1630-
* via gfn_to_pfn_prot(), vma_pagesize is set to PAGE_SIZE
1631+
* via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
16311632
* and must not be upgraded.
16321633
*
16331634
* In both cases, we don't let transparent_hugepage_adjust()
@@ -1734,15 +1735,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
17341735
}
17351736

17361737
out_unlock:
1738+
kvm_release_faultin_page(kvm, page, !!ret, writable);
17371739
kvm_fault_unlock(kvm);
17381740

17391741
/* Mark the page dirty only if the fault is handled successfully */
1740-
if (writable && !ret) {
1741-
kvm_set_pfn_dirty(pfn);
1742+
if (writable && !ret)
17421743
mark_page_dirty_in_slot(kvm, memslot, gfn);
1743-
}
17441744

1745-
kvm_release_pfn_clean(pfn);
17461745
return ret != -EAGAIN ? ret : 0;
17471746
}
17481747

arch/loongarch/include/asm/kvm_host.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,6 @@ static inline bool kvm_is_ifetch_fault(struct kvm_vcpu_arch *arch)
303303

304304
/* Misc */
305305
static inline void kvm_arch_hardware_unsetup(void) {}
306-
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
307306
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
308307
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
309308
static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}

0 commit comments

Comments
 (0)