Mnemosyne
/
tardiba


								TarDiBa - Database Server

								=========================


								Operating System

								----------------


								Kernel

								~~~~~~


								Patches

								^^^^^^^


								- linux-vserver (included in vserver package and applied automatically)

								- grsecurity (we have to use stable Linux-VServer + grsecurity patch)


								Filesystem

								~~~~~~~~~~


								Filesystem Layout

								^^^^^^^^^^^^^^^^^


								mountpoint fstype size


								/

								/boot

								/usr

								/home

								/data


								- filesystem, probably reiserfs (fast)

								- tweaked boot for high performance

								- tweaked kernel, highmem, disk access and stuff


								Database System

								---------------


								- various maintanace scripts (for now using them for backups

								and vacuuming)

								- jump to pg8


								Autovacuuming

								~~~~~~~~~~~~~


								- autovacuuming has to be disabled with the postgresql.conf


								Logging

								~~~~~~~


								Abstract

								^^^^^^^^


								postgresql has a logging facility that drops every statement to log

								before it get's executed.

								in certain conditions those logs are of very big interest.

								legal issues and profiling issues.


								Problem

								^^^^^^^


								problem is that if i enable that systems goes 10 times slower because

								of the huge statements that has to be written to disc.


								On a bussy server they can easily reach 10th of MB per day.

								30-40 but that is quite rare

								usualy under 10MB


								Solution

								^^^^^^^^


								- implementing a logging facility to RAM, flushing logs in

								  background to disk


								- usually i am only interested in those statements


								we have to consider to flush the logs from the ramdisk if we restart

								pg perhaps


								Tasks

								^^^^^


								- we need to test a postgresql.conf option that only logs statements

								  that take more time than a specified period (i.e. 5min.)


								- problem is that that option doesn't work too well like the option

								  to kill iddle statements


								- problem is that pg does not recognize blocked processes like the

								  ones which are idle in transaction. an iddle in transaction proces

								  blocks the whole db (no selects, no updates, no nothing). the only way

								  out of it is to kill -9 the process


								- killing postgresql processes

									- postgresql has 2 kinds of processes (the parent postmaster

									  process which is allways active and child postmaster processes

									  which are born from the master when it receives a statement)

									- problem is that childs are forks so whenever i kill a child,

									  all childs die that sucks somehow

									- so instead of killing childs we usualy restart the postmaster but

									  that has to have another solution too, just that i was unable to

									  find it

									- is there any possib to kill a forked child withoput killing all

									  childs? (maybe there are some kill statement options for that)

									- normally each child should have an own PID

									- i use kill -9 $PID

									- most probably the parent "thinks" there are problems if one

									  child dies and kills them all

									- next thing is that only the master postmaster is alive. i

									  couldn't get a concludent answer from #postgresql. all that i

									  got was 'Don't kill -9 postmaster.

									- maybe instead of trying to kill processes we should add pgpool

									  capabilities to TarDiBa, harvesting in this case other benefits

									  too like fixed number of connections and the posibility to play

									  with replpication thingy and loadbalancing.

									- we have to implement somehow in the sv interface the ability to

									  send SIGINT signal to the backend. SIGINT signal forces a fast

									  shutdown, even if clients are conected.


								Backup System

								-------------


								- backup is done by shell scripts and cron (change? not likely)

								- backups on other machines (over scp for other linux hosts, samba

								for `secure` Windows2000???, eventually a pro backup app although i

								don`t think it is necessary, maybe for whole system recovery)

								- we should investigate synbak which seems nice and even if it

								doesn't have PostgreSQL capability that doesn't look too hard

								to implement.


								System Monitoring and Notification

								~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


								Abstract

								^^^^^^^^


								in order to prevent catastrophic events we need propper informations

								about the state of our servers.

								our catastrophic events fall into three categories:


								- hardware failures (for which we need early warning), especially hard drive

								  failures (RAM testing would not be a bad ideea, but i think we must pass on

								  it due to the horrific time that a full RAM test takes).

								- OS related failures (which are not failures per se, but states of incapacity

								  to fullfil the demand), in this category we need to monitor the hard drive

								  usage of our backups, postgres runaway processes, establish CPU and RAM usage

								  patterns (aka CPU and RAM  usage peek points and their reasons).

								- postgres related failures, in which category we include unpredictable DB

								  behaviours like shutdowns, incapacity of serving certain querries, and most

								  important queries (aka processes) in dubious state (aka IDLE IN TRANSACTION).


								our monitoring solution must gather information about all these issues, plus

								as a bonus we like to gather some infos about DB patterns, like most used

								queries, longest running sql statements, some info about connections IP, time

								and so on. Once this info is gathered it must be delivered in a form of a

								report via mail or other means in a prioritized fashion (in order to easily)

								recognize the most important ones. On the other hand we want that info stored

								localy on the server too in a DB for profiling reasons.


								- monitoring disk space, backup files dimensions, excerpts from

								certain log files, system status and stats (cpu usage, ram etc), monitoring

								pg sessions because the internal pg monitor sucks, we need a good policy on

								killing iddle sessions and runaway processes.


								- monitoring network traffic is a good meassurement as well. From one side it

								  helps to find possible network related bottlenecks on the other it also

								  provides valuable information for intrusion detection.


								  Possible tools:

									- sancp (prelude aware!)