The Too-Many-Tabs Problem

I suffer from having too many open tabs.

I think I should start dumping links into webpages like these and begin trying to treat my computers more like the desktops of old. People used to shut down their systems entirely after every use. I have a friend who actually still does this with his very modern laptop.

Sleep modes and desktop restoration on my daily driver operating systems has allowed me to enter into a state of rot with respect to the number of open tabs I have.

By forcing myself to go through them and take action, I should in theory be able to have more focus and intention when I approach my computer.

I think one of my problems now is that there is so much I could do, it's hard to pick the right thing at any given time and so often enough I just end up doing nothing at all.

Server issues

I tried to access one of my server pages only to find it completely unresponsive. I cannot tell what caused a dramatic spike in load starting early this morning. It was gradual but substantial. I was unable to run commands as basic as docker ps, something was locking up the computer.

I am unsure of the root cause (perhaps it was NFS, that's the hypothesis), but what follows are the things I encountered in restoring service.

For reference, my services on this server are:

Of these, nginx, littlelink(s) and apache2 failed to restart, and photoprism was inaccessible because I forgot to migrate it to Cloudflare after my recent clfx.cc DNS changes.

The simple stuff

The littlelink servers had an unless-stopped restart policy, so I'm not sure why they didn't come back up, but I manually re-started them. I thought about changing the policy to always but I want to see if they come back up on their own the next time I restart docker (they did, hopefully on reboot they will too).

NFS

It may have been the NFS server I configured at fault, with DNS issues at home leading to failing autossh tunnel connections outbound to register the NFS drive at home as a local one for the server. I managed to reboot the server using the DigitalOcean console but then it failed to turn back on.

What happened was the /etc/fstab file was trying and failing to mount the NFS drive (despite the line being commented out with a #), so I had to boot from a recovery ISO, edit the file by adding another # character in front of the NFS-related line, and then was able to boot from the hard drive and start diagnosing my server.

The restart was very bad though, as many services did not come back alive properly (most, including the most important one... nginx). But at least I had a responsive shell, working ssh access from my laptop, and docker ps was working again.

Incomplete Cloudflare Migration

Soon enough I was able to diagnose the problem of unfinished re-configuration of some of my services being migrated to Cloudflare. I left things in an incomplete state and as a result nginx failed to restart.

I registered px.clfx.cc and files.clfx.cc, pointed them to the new IP addresses, and brought those services back online through Cloudflare. I ended up using the docker-supplied IP address with docker inspect <container_name> | grep IPAdd in the configuration and pointing to the internal container ports, despite binding to external ports as well (I was debugging).

Port 8080

However, in trying to bring up my apache web-server on 8080 (to bind to localhost:8080 in Cloudflare), I found another unexpected bug...

For some reason port 8080 was suddenly occupied when I expected it to be free as well. I got this error:

ERROR: for webserver  Cannot start service server: driver failed programming external connectivity on endpoint webserver (<container_id>): Bind for 0.0.0.0:8080 failed: port is already allocated

Running doas netstat -tulpn | grep 8080 yielded nothing.

I also tried this:

mm@cloud:~/apache-server$ doas ss -lptn 'sport = :80'
doas (mm@cloud) password:
State               Recv-Q              Send-Q                           Local Address:Port                           Peer Address:Port              Process
LISTEN              0                   4096                                   0.0.0.0:80                                  0.0.0.0:*                  users:(("docker-proxy",pid=33534,fd=4))
LISTEN              0                   4096                                      [::]:80                                     [::]:*                  users:(("docker-proxy",pid=33540,fd=4))
mm@cloud:~/apache-server$ doas ss -lptn 'sport = :8080'
doas (mm@cloud) password:
State                  Recv-Q                  Send-Q                                   Local Address:Port                                     Peer Address:Port                  Process

I also found pip install whatportis which has "common usage" of the ports. Neat but not helpful.

The mystery persists...

< some time later >

Since I could not find a service running on the port, I figured it couldn't hurt to try seeing what was running.

curl localhost:8080 led to a "Connection Refused" message, but curl localhost:80 showed me my expected homepage.

That gives me a hypothesis: perhaps nginx is installed on the system and upon failing to start on port 80 due to a docker container binding to that port, nginx defaulted to 8080 instead as an alternative default HTTP port.

curl localhost:<other port> also shows "Connection Refused", and nginx and apache2 are not installed on the computer.

Better Testing

I got tired of trying to start apache2 via docker-compose.yml and changed my debugging strategy:

docker run --rm -ti -p 8080:8080 python:3.9 python -m http.server 8080

And that worked... So my problem with the apache2 container must be something else. The bind I am using is 8080:80 from the docker-compose file.

I tried my debugging command for the same network with the --network=file-server flag.

Resolution

I don't know what worked but suddenly I could bind to 8080 again... At some point I did restart docker and did try to kill processes in htop that were found by filtering for the port number. I wish I thought of the better debugging step earlier but oh well, everything seems to be working now. I made sure the webserver (apache2) service would not have a restart policy.

What went wrong?

I still want to go through logs to figure out what was bogging down the server, checking on the DigitalOcean dashboard reveals that things are looking okay now.

One thing I did disable was the rsync job from my laptop to the server. I want to keep that disabled and sync journal entries manually for the time being while keeping an eye on how the droplet continues to perform.

Graphs

Six Hour View

droplet-status

Twenty-Four Hour View

droplet-status

Back to Business

Some to-do's - set up overleaf on home server - migrate photoprism to home server - set up Cloudflare access via nginx (with htpasswd -c .htpasswd <username> to create and -v to verify)
- have nginx direct to local services - do the above for the services in mlden.com after migrating DNS to Cloudflare.

Gitea Sync Problems

In looking at my log files from gitea (referenced below), I found that fenics and super stopped syncing because of expired github tokens. Later that day I updated the page with the new token I generated (I was reminded when I came across the still-open Github tab in trying to clean up my browser at the end of the evening).

Changepoint Detection

If I recall correctly, I have been keeping logs on server access attempts on my io server for quite some time. I should save this data and do some analysis on it to see if the migration to Cloudflare has resulted in a change in attacks.

Unfortunately since the server restarted recently, I am not sure I actually have the history of attempts or not.

docker logs gitea reveals that time-stamps are incomplete for SSH attempts.

The fact that time-stamps show up for some interactions means that I can at least get an inconsistent time series observation, with data going back to 2022-05-06. I captured the log today and hosted it on my server here

gitea-logs

I can see that I do have daily data (at least one observation per day) for the two week period by looking at the parse of

cat .gitea-05-20.log | grep 2022/05/ | awk '{ print $1 }'

I do observe attacks as recent as yesterday, so perhaps I need to close port 22 in my router configuration (apparently I already did, though I forgot to close my secondary SSH port which was rendered useless anyway). That reminds me that I now will not be able to access my studio server without a VPN or memorizing its IP address and re-opening the other SSH port (it is probably better to leave it closed to the world entirely).

If my studio server's IP address is no longer in DNS records, how is it that I can still receive attacks?

On my cloud server I have logs going back three months earlier and have not yet migrated that to Cloudflare. I went ahead and saved this file and uploaded it to here.

I can definitely develop the code on the smaller dataset and then apply it to my cloud server, and if I make the change on June 2, I would have exactly four months of data prior to the changepoint of switching to Cloudflare.

PyScript Interactivity

I loved my first experiments with pyscript (mud research), but didn't get around to handling interactions between the DOM and Python. I finally looked it up and have saved an example from this stack overflow post along with an interactive todo list which relies on a python file to embed the code into more deeply nested HTML than the basic example I put together (also looks like styling is possible!).