[algorithm] Think Least_Conn In Math
Abstract
least_conn
is a load balance algorithm, which distributed incoming request to the server withleast connected connections
- least_conn try to quantify the load by the number of connections, but it depends on the duration of request process time
- Some Basic Rules:
- if a real server’s cpu usage lower than it should be, then this server must has some issue about performance,
higher latency for others component
orlower cpu performance
- if a real server’s cpu usage lower than it should be, then this server must has some issue about performance,
Introduction
In Math
Distribute requests to n
server, least_conn try to archive a goal every server as busy as others
.
total time t
that n
servers finished requests:
$$
\begin{align}
\\
&t = \sum_{i=1}^nt_i = \sum_{i=1}^n(t_{cpu\_i} + t_{other\_i}) \\
&{t_{i}}\text{ : the time that }{i^{th}} \text{ server finish the distributed requests} \\
&{t_{cpu\_i}}\text{ : the cpu time that }{i^{th}}\text{ server finish the distributed requests} \\
&{t_{other\_i}}\text{ : the rest time that }{i^{th}}\text{ server finish the distributed requests}
\\
\end{align}
$$
what least_conn (same weight) algorithm want to archive is :
$$
\begin{align}
\\
t_i &\approx t_{i+1} \\
t_{cpu\_i} + t_{other\_i} &\approx t_{cpu\_i+1} + t_{other\_i+1} \\
\\
\end{align}
$$
Proof: server has lower cpu usage must has performance issue
Conditions:
- all servers have same configuration, included software and hardware
- all servers have same weight
Proof in math:
$$
\begin{align}
\\
\text{ Proof: } \\
&\because \ \exists \ t_{cpu\_i}\downarrow \ \bigwedge t_{cpu\_i} + t_{other\_i} \approx t_{cpu\_i\_+\_1} + t_{other\_i\_+\_1} \\
\\
&\therefore \ t_{other\_i}\uparrow
\\
Q.E.D
\\
\end{align}
$$
Conclusion - Issues:
- cpu performance
higher cpu steal
: oversolde in cloud or container enviromentlower cpu frequence
: cause by bad power policy or otherscpu throttling
:CFS/cgroup
- latancy of others component
- network:
bandwidth
/pps
/ others - io:
io read/write perofmance issue
- network:
Conclusion - Troubleshooting:
- cpu performance:
- VM: monitor
cpu steal
metrics- metrics:
cpu steal
cpu frequence
orpower policy
- command:
top
orhtop
sar
orvmstat
- tools:
- metrics:
- Container: monitor cgroup’s statistics
cpu.stat
- metrics:
- nr_periods: Number of enforcement intervals that have elapsed.
- nr_throttled: Number of times the group has been throttled/limited.
- throttled_time: The total time duration (in nanoseconds) for which entities of the group have been throttled.
- nr_bursts: Number of periods burst occurs.
- burst_time: Cumulative wall-time (in nanoseconds) that any CPUs has used above quota in respective periods.
- command:
cat /sys/fs/cgroup/cpu,cpuacct/*/cpu.stat
- tools:
- metrics:
- VM: monitor
- lantancy of others component:
- network:
- metrics:
bandwidth
pps
- metrics:
- io:
- metrics:
iops
avg queue size
- command:
iostat
sar
- tools:
- metrics:
- network: