|
TarDiBa - Database Server
|
|
=========================
|
|
|
|
|
|
Operating System
|
|
----------------
|
|
|
|
Kernel
|
|
~~~~~~
|
|
|
|
Patches
|
|
^^^^^^^
|
|
|
|
- linux-vserver (included in vserver package and applied automatically)
|
|
- grsecurity (we have to use stable Linux-VServer + grsecurity patch)
|
|
|
|
|
|
Filesystem
|
|
~~~~~~~~~~
|
|
|
|
Filesystem Layout
|
|
^^^^^^^^^^^^^^^^^
|
|
|
|
mountpoint fstype size
|
|
|
|
/
|
|
/boot
|
|
/usr
|
|
/home
|
|
/data
|
|
|
|
- filesystem, probably reiserfs (fast)
|
|
- tweaked boot for high performance
|
|
- tweaked kernel, highmem, disk access and stuff
|
|
|
|
|
|
Database System
|
|
---------------
|
|
|
|
- various maintanace scripts (for now using them for backups
|
|
and vacuuming)
|
|
- jump to pg8
|
|
|
|
|
|
Autovacuuming
|
|
~~~~~~~~~~~~~
|
|
|
|
- autovacuuming has to be disabled with the postgresql.conf
|
|
|
|
|
|
Logging
|
|
~~~~~~~
|
|
|
|
Abstract
|
|
^^^^^^^^
|
|
|
|
postgresql has a logging facility that drops every statement to log
|
|
before it get's executed.
|
|
in certain conditions those logs are of very big interest.
|
|
legal issues and profiling issues.
|
|
|
|
|
|
Problem
|
|
^^^^^^^
|
|
|
|
problem is that if i enable that systems goes 10 times slower because
|
|
of the huge statements that has to be written to disc.
|
|
|
|
On a bussy server they can easily reach 10th of MB per day.
|
|
30-40 but that is quite rare
|
|
usualy under 10MB
|
|
|
|
|
|
Solution
|
|
^^^^^^^^
|
|
|
|
- implementing a logging facility to RAM, flushing logs in
|
|
background to disk
|
|
|
|
- usually i am only interested in those statements
|
|
|
|
we have to consider to flush the logs from the ramdisk if we restart
|
|
pg perhaps
|
|
|
|
|
|
Tasks
|
|
^^^^^
|
|
|
|
- we need to test a postgresql.conf option that only logs statements
|
|
that take more time than a specified period (i.e. 5min.)
|
|
|
|
- problem is that that option doesn't work too well like the option
|
|
to kill iddle statements
|
|
|
|
- problem is that pg does not recognize blocked processes like the
|
|
ones which are idle in transaction. an iddle in transaction proces
|
|
blocks the whole db (no selects, no updates, no nothing). the only way
|
|
out of it is to kill -9 the process
|
|
|
|
- killing postgresql processes
|
|
- postgresql has 2 kinds of processes (the parent postmaster
|
|
process which is allways active and child postmaster processes
|
|
which are born from the master when it receives a statement)
|
|
- problem is that childs are forks so whenever i kill a child,
|
|
all childs die that sucks somehow
|
|
- so instead of killing childs we usualy restart the postmaster but
|
|
that has to have another solution too, just that i was unable to
|
|
find it
|
|
- is there any possib to kill a forked child withoput killing all
|
|
childs? (maybe there are some kill statement options for that)
|
|
- normally each child should have an own PID
|
|
- i use kill -9 $PID
|
|
- most probably the parent "thinks" there are problems if one
|
|
child dies and kills them all
|
|
- next thing is that only the master postmaster is alive. i
|
|
couldn't get a concludent answer from #postgresql. all that i
|
|
got was 'Don't kill -9 postmaster.
|
|
- maybe instead of trying to kill processes we should add pgpool
|
|
capabilities to TarDiBa, harvesting in this case other benefits
|
|
too like fixed number of connections and the posibility to play
|
|
with replpication thingy and loadbalancing.
|
|
- we have to implement somehow in the sv interface the ability to
|
|
send SIGINT signal to the backend. SIGINT signal forces a fast
|
|
shutdown, even if clients are conected.
|
|
|
|
|
|
Backup System
|
|
-------------
|
|
|
|
- backup is done by shell scripts and cron (change? not likely)
|
|
- backups on other machines (over scp for other linux hosts, samba
|
|
for `secure` Windows2000???, eventually a pro backup app although i
|
|
don`t think it is necessary, maybe for whole system recovery)
|
|
- we should investigate synbak which seems nice and even if it
|
|
doesn't have PostgreSQL capability that doesn't look too hard
|
|
to implement.
|
|
|
|
System Monitoring and Notification
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Abstract
|
|
^^^^^^^^
|
|
|
|
in order to prevent catastrophic events we need propper informations
|
|
about the state of our servers.
|
|
our catastrophic events fall into three categories:
|
|
|
|
- hardware failures (for which we need early warning), especially hard drive
|
|
failures (RAM testing would not be a bad ideea, but i think we must pass on
|
|
it due to the horrific time that a full RAM test takes).
|
|
- OS related failures (which are not failures per se, but states of incapacity
|
|
to fullfil the demand), in this category we need to monitor the hard drive
|
|
usage of our backups, postgres runaway processes, establish CPU and RAM usage
|
|
patterns (aka CPU and RAM usage peek points and their reasons).
|
|
- postgres related failures, in which category we include unpredictable DB
|
|
behaviours like shutdowns, incapacity of serving certain querries, and most
|
|
important queries (aka processes) in dubious state (aka IDLE IN TRANSACTION).
|
|
|
|
our monitoring solution must gather information about all these issues, plus
|
|
as a bonus we like to gather some infos about DB patterns, like most used
|
|
queries, longest running sql statements, some info about connections IP, time
|
|
and so on. Once this info is gathered it must be delivered in a form of a
|
|
report via mail or other means in a prioritized fashion (in order to easily)
|
|
recognize the most important ones. On the other hand we want that info stored
|
|
localy on the server too in a DB for profiling reasons.
|
|
|
|
|
|
- monitoring disk space, backup files dimensions, excerpts from
|
|
certain log files, system status and stats (cpu usage, ram etc), monitoring
|
|
pg sessions because the internal pg monitor sucks, we need a good policy on
|
|
killing iddle sessions and runaway processes.
|
|
|
|
- monitoring network traffic is a good meassurement as well. From one side it
|
|
helps to find possible network related bottlenecks on the other it also
|
|
provides valuable information for intrusion detection.
|
|
|
|
Possible tools:
|
|
- sancp (prelude aware!)
|
|
|