kubect->kube apiserver->etcd->initializer->control loops->kubelet->wrap up
kubect
validation and generators -> api groups and version negotiation-> client auth
All attempts to access or change state in the Kubernetes system goes through the API server, which in turns communicates with etcd.
What’s worth pointing out before we continue is that Kubernetes uses a versioned API that is categorised into “API groups”. An API group is meant to categorise similar resources so that they’re easier to reason about.
After kubectl generates the runtime object, it starts to find the appropriate API group and version for it and then assembles a versioned client that is aware of the various REST semantics for the resource.自动发现机制?
kube-apiserver
authentication-> authorization->admission control
As we’ve already mentioned, kube-apiserver is the primary interface that clients and system components use to persist and retrieve cluster state. To perform its function, it needs to be able to verify that the requester is who they say there are. This process is called authentication.
Whilst authorization is focused on answering whether a user has permission, admission controllers intercept the request to ensure that it matches the wider expectations and rules of the cluster.
Admission controllers:
- InitialResources
- LimitRanger
- ResourceQuota
etcd
Well, there’s a pretty complicated series of steps that happen before any requests are served.
- When the
kube-apiserver
binary is run, it creates a server chain, which allows apiserver aggregation. This is basically a way of supporting multiple apiservers (we don’t need to worry about this). - When this happens, a generic apiserver is created that serves as a default implementation.
- The generated OpenAPI schema populates the apiserver’s configuration.
- kube-apiserver then iterates over all the API groups specified in the schema and configures a storage provider for each that serves as a generic storage abstraction. This is what kube-apiserver talks to when it accesses or mutates the state of a resource.
- For every API group it also iterates over each of the group versions and installs the REST mappings for every HTTP route. This allows kube-apiserver to map requests and be able to delegate off to the correct logic once it finds a match.
- For our specific use case, a POST handler is registered, which in turn will delegate to a create resource handler.
Now let’s imagine our HTTP request has flown in:
- If the handler chain can match the request to a set pattern (i.e. to the routes we registered), it will dispatch the dedicated handler that was registered for the route. Otherwise it fall back to a path-based handler (this is what happens when you call
/apis
). If no handlers are registered for that path, a not found handler is invoked which results in a 404. - Luckily for us, we have a registered route called
createHandler
! What does it do? Well it will first decode the HTTP request and perform basic validation, such as ensuring the JSON they provided correlates with our expectation of the versioned API resource. - Auditing and final admission will occur.
- The resource will be saved to etcd by delegating to the storage provider. Usually the etcd key will be the form of
<namespace>/<name>
, but this is configurable. - Any create errors are caught and, finally, the storage provider performs a
get
call to ensure the object was actually created. It then invokes any post-create handlers and decorators if additional finalization is required. - The HTTP response is constructed and sent back.
initializers
An initializer is a controller that is associated with a resource type and performs logic on the resource before it’s made available to the outside world
initializerConfiguration
objects allow you to declare which initializers should run for certain resource types. After creating this config, it will append custom-pod-initializer
to every Pod’s metadata.initializers.pending
field.
Control loops
deployments controller-> replicaSets controller-> informers -> scheduler
When we think about it, a Deployment is really just a collection of ReplicaSets, and a ReplicaSet is a collection of Pods.
This handler will be executed when our Deployment first becomes available and will start by adding the object to an internal work queue.
it will begin a scaling process to start resolving state. It does this by rolling out (e.g. creating) a ReplicaSet resource, assigning it a label selector, and giving it the revision number of 1. label selector?
the Deployments controller created our Deployment’s first ReplicaSet but we still have no Pods. This is where the ReplicaSet controller comes into play! Its job is to monitor the lifecycle of ReplicaSets and their dependent resources (Pods). Like most other controllers, it does this by triggering handlers on certain events
Kubernetes enforces object hierarchies through Owner References (a field in the child resource where it references the ID of its parent).
An informer is a pattern that allows controllers to subscribe to storage events and easily list resources they’re interested in.
Apart from providing an abstraction which is nice to work with, it also takes care of a lot of the nuts and bolts such as caching (caching is important because it reduces unnecessary kube-apiserver connections, and reduces duplicate serialization costs server- and controller-side).关于caching的含义?消息应该不能用cache才对
What’s interesting is that both predicate and priority functions are extensible and can be defined by using the --policy-config-file
flag. This introduces a degree of flexibility. 调度的灵活性
kubelet
pod sync-> CRI and pause containers-> cni and pod networking-> inter-host networking-> container startup
The kubelet is an agent that runs on every node in a Kubernetes cluster and is responsible for, among other things, managing the lifecycle of Pods.
A useful way of thinking about the kubelet is again like a controller! It queries Pods from kube-apiserver every 20 seconds (this is configurable), filtering the ones whose NodeName
matches the name of the node the kubelet is running on. Once it has that list, it detects new additions by comparing against its own internal cache and begins to synchronise state if any discrepencies exist.
In this runtime, creating a sandbox involves creating a “pause” container. A pause container serves like a parent for all of the other containers in the Pod since it hosts a lot of the pod-level resources that workload containers will end up using.
The “pause” container provides a way to host all of these namespaces and allow child containers to share them.