Discussion:
Problems after upgrade to 3.9.3
Sigurd Mytting
2011-12-07 09:29:31 UTC
Permalink
Hi,

I've run into trouble after upgrading to 3.9.3 via apt-source,
system is debian 6.0.3, this was not a problem with 3.9.2.

First the message during upgrade:

Setting up nav (2+3.9.3-1) ...
Installing new version of config file /etc/nav/sortedStats.conf ...
Installing new version of config file /etc/nav/smsd.conf ...
Installing new version of config file /etc/nav/report/report.conf ...
Traceback (most recent call last):
File "/usr/share/doc/nav/sql/syncdb.py", line 463, in <module>
main()
File "/usr/share/doc/nav/sql/syncdb.py", line 44, in main
sync.synchronize()
File "/usr/share/doc/nav/sql/syncdb.py", line 220, in synchronize
self.apply_changes()
File "/usr/share/doc/nav/sql/syncdb.py", line 305, in apply_changes
self.apply_change_script(version, script)
File "/usr/share/doc/nav/sql/syncdb.py", line 326, in apply_change_script
self.execute_sql_file(script)
File "/usr/share/doc/nav/sql/syncdb.py", line 352, in execute_sql_file
self.cursor.execute(sql)
psycopg2.extensions.QueryCanceledError: canceling statement due to statement timeout
CONTEXT: SQL statement "DELETE FROM ONLY "manage"."rrd_datasource" WHERE $1 OPERATOR(pg_catalog.=) "rrd_fileid""

and if I try adding a new device I get:

Something went wrong
Something went wrong while processing your request. Please try again later.
If the problems persists, contact your system administrator and/or report a bug on NAV bugtracker.

And an error when trying to delete a device:

Error: canceling statement due to statement timeout CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid""

When digging into the postgres-log I find (everything from a database restart with two delete attempts):

2011-12-07 10:11:04 CET LOG: database system was shut down at 2011-12-07 10:10:55 CET
2011-12-07 10:11:04 CET LOG: database system is ready to accept connections
2011-12-07 10:11:04 CET LOG: autovacuum launcher started
2011-12-07 10:11:05 CET LOG: incomplete startup packet
2011-12-07 10:12:00 CET ERROR: canceling statement due to statement timeout
2011-12-07 10:12:00 CET CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid""
2011-12-07 10:12:00 CET STATEMENT: DELETE FROM "netbox" WHERE "netboxid" IN (162)
2011-12-07 10:15:01 CET LOG: unexpected EOF on client connection
2011-12-07 10:18:22 CET ERROR: canceling statement due to statement timeout
2011-12-07 10:18:22 CET CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid""
2011-12-07 10:18:22 CET STATEMENT: DELETE FROM "netbox" WHERE "netboxid" IN (162)
2011-12-07 10:20:02 CET LOG: unexpected EOF on client connection

Anyone having any idea on what's going on? Postgresql has been tuned a bit with allowing 1000 simultaneous connections, but should otherwise be standard.

Best regards,

-Sigurd
Sigurd Mytting
2011-12-07 20:39:34 UTC
Permalink
Post by Sigurd Mytting
Hi,
Hi :-)
Post by Sigurd Mytting
I've run into trouble after upgrading to 3.9.3 via apt-source,
system is debian 6.0.3, this was not a problem with 3.9.2.
Setting up nav (2+3.9.3-1) ...
I did a little digging and found the statement timeout was due to a 30
second time limit on transactions in postgresql.conf, changed that to
the default value of 0 (wait forever). After then trying to run
syncdb.py for seven hours I terminated it. Some more digging showed the
statement "DELETE FROM rrd_file WHERE netboxid IS NULL;" did not
terminate, but "select *" gave over 7700 lines. Deleting them one by
one using the rrd_fileid and "netboxid IS NULL" cleaned up the database
in a few minutes.

After adjusting the timeout value I can delete devices, but still get
errors when adding devices - however bulk import seems to work.

Don't know if it's related, but the postgres-log contains a lot of
"LOG: unexpected EOF on client connection".

cheers,

-Sigurd
Morten Brekkevold
2011-12-08 09:57:58 UTC
Permalink
Post by Sigurd Mytting
Post by Sigurd Mytting
Hi,
Hi :-)
Post by Sigurd Mytting
I've run into trouble after upgrading to 3.9.3 via apt-source,
system is debian 6.0.3, this was not a problem with 3.9.2.
Setting up nav (2+3.9.3-1) ...
I did a little digging and found the statement timeout was due to a 30 second
time limit on transactions in postgresql.conf, changed that to the default
value of 0 (wait forever).
Ouch, that could have been a problem with many long-running queries.
Post by Sigurd Mytting
After then trying to run syncdb.py for seven hours
I terminated it. Some more digging showed the statement "DELETE FROM rrd_file
WHERE netboxid IS NULL;" did not terminate, but "select *" gave over 7700
lines. Deleting them one by one using the rrd_fileid and "netboxid IS NULL"
cleaned up the database in a few minutes.
"DELETE FROM rrd_file WHERE netboxid IS NULL" should never take 7
hours. I'm guessing something was holding on to a database lock that the
statement needed to wait for.

Does a "ps axuw" listing show many postgres processes with the status
"idle in transaction"?
Post by Sigurd Mytting
After adjusting the timeout value I can delete devices, but still get errors
when adding devices - however bulk import seems to work.
What kind of errors?
Post by Sigurd Mytting
Don't know if it's related, but the postgres-log contains a lot of
"LOG: unexpected EOF on client connection".
It's not related.
--
Morten Brekkevold
UNINETT
Sigurd Mytting
2011-12-08 16:01:19 UTC
Permalink
Hi :-)
Post by Morten Brekkevold
"DELETE FROM rrd_file WHERE netboxid IS NULL" should never take 7
hours. I'm guessing something was holding on to a database lock that the
statement needed to wait for.
Does a "ps axuw" listing show many postgres processes with the status
"idle in transaction"?
I tried restarting postgres just before issuing the command, so it shouldn't
be any at the time.

For the last 30 minutes its been stable at 7. I noticed before upgrading I
had up to max_connections and pping crashed when max_connections was reached.
This behavior is gone now.
Post by Morten Brekkevold
Post by Sigurd Mytting
After adjusting the timeout value I can delete devices, but still get
errors
Post by Sigurd Mytting
when adding devices - however bulk import seems to work.
What kind of errors?
Webpage says:
"Something went wrong

Something went wrong while processing your request. Please try again later.

If the problems persists, contact your system administrator and/or report a bug on NAV bugtracker."

Apache access log:
nav.vegvesen.no:80 146.2.x.y - - [08/Dec/2011:16:56:39 +0100] "POST /seeddb/netbox/add/ HTTP/1.1" 200 2011 "http://nav.vegvesen.no/seeddb/netbox/add/" "Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0"
nav.vegvesen.no:80 146.2.x.y - - [08/Dec/2011:16:56:41 +0100] "POST /seeddb/netbox/add/ HTTP/1.1" 500 472 "http://nav.vegvesen.no/seeddb/netbox/add/" "Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0"

Error log:
/usr/lib/python2.6/dist-packages/mod_python/importer.py:32: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import md5
(I guess not related)

Any other logs relevant?

Cheers,

-Sigurd
Morten Brekkevold
2011-12-09 12:30:13 UTC
Permalink
Post by Sigurd Mytting
Post by Morten Brekkevold
What kind of errors?
"Something went wrong
Something went wrong while processing your request. Please try again later.
Please see the FAQ entry at [1].
Post by Sigurd Mytting
/usr/lib/python2.6/dist-packages/mod_python/importer.py:32: DeprecationWarning: the md5 module is deprecated; use hashlib instead
import md5
(I guess not related)
No, that's just a "cosmetic" problem with the mod_python package in Debian.
Post by Sigurd Mytting
Any other logs relevant?
Follow the instructions from the FAQ entry and see what you get.



[1] http://metanav.uninett.no/navfaq#how_can_i_get_help_for_something_went_wrong_in_the_web_interface
--
Morten Brekkevold
UNINETT
Loading...