Storage Troubleshooting - NAS Case Study

Symptoms to Resolution

It's easy to state symptoms:
My BI server started crashing.
My recordings are missing.
No alerts are being sent.

Much harder to know how to investigate and self-correct or provide the needed information for support to help.

The articles here are frameworks on how to think about the issue and get to resolution.
Post Reply
varghesesa
Posts: 90
Joined: Thu Jul 11, 2019 9:52 pm

Storage Troubleshooting - NAS Case Study

Post by varghesesa »

Introduction

When there are issues with your BI Server, the log files are a good place to start to understand the problem. The purpose of this article is to provide an example of how we use the logs in order to get hints as to the underlying issue. This article provides guidance on how to examine the log files. There are many causes for a server crash or poor performance. This article should at least allow you to diagnose and self-correct or provide key information to help BI Support diagnose the issue.

There is a recorded webinar associated with this article if you prefer to listen rather than read. The Blue Iris status log.


Case Study


NAS Write Errors

I found this ticket interesting because the problem did not appear immediately. It took several days of running before the problem appears. This user had a small/medium sized server load with 14 cameras. The July log file shared for diagnosing the issue had 198,901 lines from July 1 - July 20.



So how do you approach the problem?

Start with the symptoms. The user reported:
Every few months (although not consistently) my primary drive gets full and goes past the limits (600G of 1T) and then BI stops moving files to the ‘stored’ network folder.


Analysis / Thought Process

From this information, I knew there is probably an issue with Global settings -> Clips and archiving. This means there should be Storage and/or Recording errors.

The logs happened to begin with camera connectivity errors. However, the ticket is in regards to storage, so I mentioned the issue in my analysis report (further below) but did not focus on this issue.

Code: Select all

0 	7/1/2021 12:00:00.449 AM	App                 	Today's sunrise: 7:11 AM, sunset: 10:05 PM
1 	7/1/2021 12:00:03.467 AM	HVAC                	Signal: network retry
1 	7/1/2021 12:00:26.291 AM	HVAC                	Signal: Failed to connect
1 	7/1/2021 12:00:33.612 AM	HVAC                	Signal: network retry
1 	7/1/2021 12:00:56.176 AM	HVAC                	Signal: Failed to connect
3 	7/1/2021 12:01:01.581 AM	Wine                	MOTION
1 	7/1/2021 12:01:03.756 AM	HVAC                	Signal: network retry
1 	7/1/2021 12:01:26.394 AM	HVAC                	Signal: Failed to connect
1 	7/1/2021 12:01:33.969 AM	HVAC                	Signal: network retry
1 	7/1/2021 12:01:56.624 AM	HVAC                	Signal: Failed to connect
The user stated the logs were created since the last crash. I went to the end of the logs and confirmed many, many Clip: Disk full (recording) errors at the end of the log. Thus, I knew these logs contained a point of time where the server was functional and crashed due to Storage & Recording issues as the user stated. The logs should tell me the problem.

Code: Select all

2 	7/20/2021 3:40:35.909 PM	Basketball          	Clip: Disk full
2 	7/20/2021 3:40:35.914 PM	Garage              	Clip: Disk full
2 	7/20/2021 3:40:36.237 PM	MudRoom             	Clip: Disk full
2 	7/20/2021 3:40:38.156 PM	BasementEntrance    	Clip: Disk full
2 	7/20/2021 3:40:38.342 PM	Driveway            	Clip: Disk full
2 	7/20/2021 3:40:39.445 PM	MudRoomSide         	Clip: Disk full
2 	7/20/2021 3:40:40.477 PM	Side                	Clip: Disk full
2 	7/20/2021 3:40:40.721 PM	HVAC                	Clip: Disk full
2 	7/20/2021 3:40:41.448 PM	DrivewayRight       	Clip: Disk full
2 	7/20/2021 3:40:41.589 PM	FrontDoorStation    	Clip: Disk full
2 	7/20/2021 3:40:41.702 PM	PoolEq              	Clip: Disk full
2 	7/20/2021 3:40:42.570 PM	Pool                	Clip: Disk full
2 	7/20/2021 3:40:44.314 PM	FrontDoor           	Clip: Disk full
2 	7/20/2021 3:40:45.788 PM	DrivewayLeft        	Clip: Disk full
2 	7/20/2021 3:40:45.924 PM	Garage              	Clip: Disk full
2 	7/20/2021 3:40:45.974 PM	Basketball          	Clip: Disk full
2 	7/20/2021 3:40:46.316 PM	MudRoom             	Clip: Disk full
2 	7/20/2021 3:40:48.202 PM	BasementEntrance    	Clip: Disk full
So then, I went back to the top of the log files. I started scrolling down, scanning for Storage / Record errors. After scrolling through several, several pages (remember logs have 200k lines), I saw the below common pattern repeatedly.

Code: Select all

0 	7/1/2021 3:47:40.393 AM	New                 	Move: 2 items 865.3M [600.8G/600.0G, 328.4G free] 80 locked
0 	7/1/2021 3:47:40.397 AM	Stored              	Delete: nothing to do [2.92T/2.92T, 1.99T free]
0 	7/1/2021 3:47:40.412 AM	Alerts              	Delete: nothing to do [992/4800 hrs, 625.4M/100.0G, 329.2G free]
The Clips and archiving settings were fairly simple: New -> Stored.
I also starting observing that New after the (successful) Moves would hover around 328GB free, i.e. BI is managing storage correctly.

Since the log file was so large, I simply started searching for the next Move: command. With search, I quickly pounded through all the Move commands observing whether New remained around 328 GB (sometimes 330 GB, sometimes 327 GB).

Even with search, getting through all the Move commands in just ONE day took several, several clicks. I then would scroll down to the next day, then search for the next Move: command. By doing, I was fairly quickly able to narrow down the issue to the day. Eventually (not going to say quickly), I found the first Move: error.

On July 19, 2021 I found the first Move error (the needle in the haystack). This was line 147,021 in the log file. Notice how free storage for the first time dropped below 327 GB. Looking back, I probably should have started from the bottom of the file and searched up.

Code: Select all

0 	7/19/2021 7:32:01.679 PM	New                 	Move: 2 items 3.66G [602.1G/600.0G, 326.2G free] 126 locked
0 	7/19/2021 7:32:01.788 PM	Stored              	Delete: 2 items 4.24G [2.93T/2.92T, 1.99T free]
0 	7/19/2021 7:32:01.791 PM	Alerts              	Delete: nothing to do [1439/4800 hrs, 895.5M/100.0G, 329.9G free]
1 	7/19/2021 7:32:01.860 PM	Clips               	MoveFile Error 2: F:\BlueIris\New\FrontDoor.20210719_080000.bvr
1 	7/19/2021 7:32:01.925 PM	Clips               	MoveFile Error 2: F:\BlueIris\New\FrontDoorStation.20210719_080000.bvr


Analysis Report

From here, it was apparent BI could not move files to Stored and would eventually fill up the New drive and crash. Analysis report below.
Everything seems to be running ok. Move 1 files to Stored. Delete 1 file in Stored to make space.

0 7/8/2021 5:23:43.700 PM New Move: 1 items 3.20G [602.6G/600.0G, 326.2G free] 100 locked
0 7/8/2021 5:23:43.919 PM Stored Delete: 1 items 1.38G [2.93T/2.92T, 1.99T free]
0 7/8/2021 5:23:43.923 PM Alerts Delete: nothing to do [1173/4800 hrs, 712.4M/100.0G, 329.4G free]


Continues working until 7/19 at 7:27 PM

0 7/19/2021 7:27:01.363 PM New Move: nothing to do [599.8G/600.0G, 328.5G free] 126 locked
0 7/19/2021 7:27:01.366 PM Stored Delete: nothing to do [2.92T/2.92T, 2.00T free]
0 7/19/2021 7:27:01.373 PM Alerts Delete: nothing to do [1439/4800 hrs, 895.5M/100.0G, 328.5G free]

On 7/19 @7:32 PM, first time BI cannot move files to the NAS.

0 7/19/2021 7:32:01.679 PM New Move: 2 items 3.66G [602.1G/600.0G, 326.2G free] 126 locked
0 7/19/2021 7:32:01.788 PM Stored Delete: 2 items 4.24G [2.93T/2.92T, 1.99T free]
0 7/19/2021 7:32:01.791 PM Alerts Delete: nothing to do [1439/4800 hrs, 895.5M/100.0G, 329.9G free]
1 7/19/2021 7:32:01.860 PM Clips MoveFile Error 2: F:\BlueIris\New\FrontDoor.20210719_080000.bvr
1 7/19/2021 7:32:01.925 PM Clips MoveFile Error 2: F:\BlueIris\New\FrontDoorStation.20210719_080000.bvr


From there on, drive containing New starts getting full until BI crashes. Drive containing New folder was steady at 329 GB free for many days. Now it is slowly depleting.
Below snapshot shows, 10+ GB have been eaten (316.6 GB free).

0 7/19/2021 7:52:03.882 PM New Move: 6 items 14.5G [611.8G/600.0G, 316.6G free] 126 locked
0 7/19/2021 7:52:03.885 PM Stored Delete: nothing to do [2.92T/2.92T, 1.98T free]
0 7/19/2021 7:52:03.895 PM Alerts Delete: nothing to do [1440/4800 hrs, 896.0M/100.0G, 331.2G free]
1 7/19/2021 7:52:03.966 PM Clips MoveFile Error 2: F:\BlueIris\New\Garage.20210719_080000.bvr
1 7/19/2021 7:52:04.025 PM Clips MoveFile Error 2: F:\BlueIris\New\BasementEntrance.20210719_080000.bvr

Last move before BI crashes. Only 4k left on hard drive.

0 7/20/2021 10:04:52.773 AM New Move: 129 items 329.6G [929.3G/600.0G, 4K free]
0 7/20/2021 10:04:52.779 AM Alerts Delete: nothing to do [1454/4800 hrs, 897.5M/100.0G, 329.6G free]
1 7/20/2021 10:04:52.800 AM Clips MoveFile Error 0: F:\BlueIris\New\MudRoom.20210719_080000.bvr

Question is why did BI stop communicating with the NAS on July 19?
This ticket is still not resolved. Ken added more details to the log to gather more information. The user is reaching out to the forum and the NAS vendor as well.

I closed the analysis report by including the other issues I found in the log files.
Other recommendations:
Your HVAC camera is always disconnecting. You could try and resolve that issue. See the No signal article.
Post Reply