Data Quality Check
This page allows you to analyze the data in the database. The script tries to automatically detect suspicious values, which require your attention and maybe also correction. It can be very useful, but it is necessary that you read the information below so that you use it correctly - especially do not skip the last part that explains what the "memory" parameter is and how it works.
Before you start the analysis you need to set the parameters for the analysis. First you need to choose the interval you want to analyze. By default the fields will contain interval for the current year. Dates are chosen using a popup calendar and interval corresponds to what you have in the database.
Once you have chosen the interval you need to select gap interval. One of the functions in the analysis looks for gaps in data. Here you must specify, what is the minimum interval that should be considered as a gap. This interval should be a number in minutes.
Next section are the spikes. Spikes in this case are suddenly changing numbers. The actual values can be within the normal limits, but they are likely to be incorrect because the rate of change is too high. The number you specify is the minimum rate of change per 10 minutes that should be considered as a spike. So, for example, if you put 10 degrees Celsius for the temperature, then a spike will be any number, where the rate of change for temperature was equal to or higher than 10 degrees. Remember this is per 10 minutes, which means the numbers are adjusted based on your database interval. If for example you have database interval of 5 minutes, then a spike in this case would be a number where the difference between subsequent updates is 5 or more degrees. Likewise, if there was a longer gap, then a higher number would be acceptable.
Last thing to set up are the outliers. Outliers are number that are out of the bounds specified. You can set here specific limits, i.e. minimum and maximum, for each parameter and each month. An outlier would then be any number which is either equal to or above or below the specified number for that parameter and month.
The analysis will also show you a section called "nonsense values". There are certain fixed rules that the script checks and if the numbers do not comply with these, it will be evaluated as a nonsense value.
Here is a list of what is considered as nonsense value:
- humidity below 0%
- humidity above 100%
- temperature/apparent temperature below -100 degrees
- temperature/apparent temperature above 150 degrees
- pressure below 0
- pressure above 1100
- wind speed below 0
- wind gust below 0
- daily rain below 0
- wind direction degrees below 0
- wind direction degrees above 360
This is especially intended for situations where for example there is a problem with some of your sensors and the reported value is for example -9999 etc.
Last field which you probably do not know what it means is the memory field. This needs a bit longer explanation. This script was originally in the main template, known as dbInfo. Because however, often people ran into problems due to memory limits of PHP, I had to discontinue it. The problem is that in PHP there is always a maximum allowed size you can allocate to an array. This can be set in the server settings. However, if you do not have your own server, then this is preset by your webhosting provider and you therefore cannot change it. There is a PHP command for increasing it, but in 99% of cases this cannot override the global setting by your provider.
In practice this means that if you chose a longer interval, more data had to be fetched from the MySQL database and soon the memory was exhausted and you just got an error.
I was thinking long time how to best solve this and tried several things. I finally came up with a workaround, but please, only take this as still experimental feature. Basically, you can increase the memory limit in the setting. Then, before the MySQL is queried, the intervals are spliced into chunks, each one is processed separately and then finally, the results are aggregated. When testing it, it seemed to be working fine, but I still need more people to confirm it is working.
The way it works is this
By default always leave this to "1". If however, you get an error showing the memory was exhausted, then try increasing the number always by 1, so if you get an error, try setting the memory limit to "2". If the error persists try "3" and so on. The smaller the number the better, so you want to make sure you choose the lowest number that already works.