Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All sensors unavailable for one minute, same time each day. #126

Open
Steve2017 opened this issue Aug 8, 2023 · 95 comments
Open

All sensors unavailable for one minute, same time each day. #126

Steve2017 opened this issue Aug 8, 2023 · 95 comments
Labels
FW Problem Problem in Envoy Firmware, can't fix in integration

Comments

@Steve2017
Copy link

I mentioned this in another issue and it has been suggested it should be separated.

The issue is that all sensors created by the integration become unavailable for a short period - usually a minute - at the same time each day. In my case, it is 23:00 local time (13:00 GMT). On some days I also experience a dropout at approximately 06:35, also for a very short period.

I am using v 0.2.10 of Enphase Envoy Envoy_2_Core_Custom_Test which I gather is being folded into this repo.
(Happened with earlier versions too)
This is on Envoy S Metered with the Software Version: D7.6.175 (f79c8d)
HA 2023.8.xx (was happening in July too)

Although the times might vary a little, I am not the only one reporting this behaviour:
#122 (comment)
#122 (comment)
#122 (comment)

My dropouts are similar to the one in the first link above. Magnified view:
Production-Consumption dropout
All sensors, including micro-invertors demonstrate the same behaviour.

This is the log from an occurrence a few days ago, showing the system attempts a GET at 23:00:28 and times out about 30 seconds later. A new attempt then works:

> 2023-08-01 22:59:28.218 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 1221xxxxx data in 1.092 seconds (success: True)
> 2023-08-01 23:00:28.127 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: ******REDACTED TOKEN DATA*****
> 2023-08-01 23:00:28.128 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: ******REDACTED TOKEN DATA*****
> 2023-08-01 23:00:28.128 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-07-30 14:21:48
> 2023-08-01 23:00:28.128 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1: Header:{'Authorization': 'Bearer ******REDACTED TOKEN DATA*****'}
> 2023-08-01 23:00:58.130 ERROR (MainThread) [custom_components.enphase_envoy] Timeout fetching envoy Envoy 1221xxxxxx data
> 2023-08-01 23:00:58.135 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 1221xxxxxxx data in 30.008 seconds (success: False)
> 2023-08-01 23:01:58.126 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: 

From those production and consumption sensors in the graph, I use a template sensor to calculate excess production (ie power available for export). That too drops out (with a slight lag) for a short period.

The issue for me is not so much the original sensor values and the template sensor, because they come back after a minute. The issue is the template sensor is the source for a Reimann Sum sensor calculating export energy. It also goes off-line, but does not return until solar production resumes at about 06:40. (Bear in mind this is at 11pm - so zero solar production)

Downstream from there I have a number of child sensors to give me data on daily energy exports, export earnings, net energy costs etc, etc. All of those become unavailable after the Reimann Sum sensor is unavailable for one minute. They return when solar production resumes.

The dropouts at 06:35 are happening prior to solar production resuming, so I see nothing different there.

I have a similar set-up for measuring grid import power - just with sensors reversed in the formula. It experiences a similar Template sensor and Reimann Sum dropout, but the Reimann Sum sensor comes back on - I am assuming because the template sensor is returning new and changing values.

I have tried a number of suggested solutions for the downstream issue, such as removing state_class and device_class entries from the Template Sensor YAML or using the "available" function in the sensor YAML. None of those things worked.

To try to isolate something, I set up the REST/RESTful sensor integration for the Envoy. It works without dropout. It uses the token not the Enphase User-Name and Password method to gain access to the Envoy.

@catsmanac
Copy link
Collaborator

I am using v 0.2.10 of Enphase Envoy Envoy_2_Core_Custom_Test which I gather is being folded into this repo.

Let me explain:

Envoy_2_Core_Custom_Test was my attempt to propose @briancmpbll 's custom integration with the recent additions by @posixx as the successor of the current HA Core that doesn't support the token mechanism. In that process I had to apply some corrections to the code to pass pretty tight HA Core tests on code layout and components used. As result a minor modification to the code and communications was applied. These may play role here, not sure.

I ended the proposal effort with the announcement of the 2023.09 HA release will (most probably) contain an updated Enphase Envoy based on a different enphase communication library as used before and will support the new firmware (but drop support for very old firmware with only html pages). I plan to offer functional changes in the proposal effort to @briancmpbll version as well if deemed beneficial and archive/make read-only that Envoy_2_Core_Custom_Test in the future as I see no benefit in having a second one available.

So your issues are probably present in this custom integration current as well, but to be fair we should verify that using @briancmpbll code and not the one on it's way to become new HA Core and is in dead end street now.

Having said that, the new HA Core Enphase Envoy using that new communication library may do better, to be seen.

@catsmanac
Copy link
Collaborator

Timeout fetching envoy Envoy 1221xxxxxx data

This triggered me remembering I've seen this before in another custom integration by @vincentwolsink using installer accounts. They increased timeout on getting data from Envoy to 60 seconds.

You can try this by going into the /config/custom_components/enphase/envoy folder and change envoy_reader.py. Look for '_async_fetch_with_retry' and then couple of lines down for 'timeout=30' and change it to timeout=60 or so. Then cycle HA.

@catsmanac
Copy link
Collaborator

@Steve2017 as for Riemann integration you mentioned, mine is doing the same at a late evening drop. It has to wait for new timestamp/value before it picks up again when solar starts producing again. It's still working ok, but it shows gaps because of it. It's how integration works, root issue is no new timestamp/value after the drop of communication.

Potential solution, larger timeout or retry on timeout!?

@Steve2017
Copy link
Author

So your issues are probably present in this custom integration current as well, but to be fair we should verify that using @briancmpbll code and not the one on it's way to become new HA Core and is in dead end street now.

Would that be the v 0.1.3 version here?
If so, I think I have tested that and still had the issue.... but happy to try again.

I'll try the timeout change. There are four references to timeout=30. Do I change all four or just the first?
The first one comes at line 249 - 16 lines after a reference to the 2nd mention of '_async_fetch_with_retry'.
Given it comes not long after:

            _LOGGER.debug(
                "HTTP GET Attempt #%s: %s: Header:%s ",
                attempt + 1,

and GET attempt is where the error appears in the log????

@Steve2017
Copy link
Author

It's how integration works, root issue is no new timestamp/value after the drop of communication.

That's where I was hoping the use of 'available' might work - especially if it replaced the earlier 0kw data with 1w for that one attempt. Didn't help.

Restarting the Reimann Sum sensor might help, because a HA restart puts the Reimann Sum sensor back to its pre-dropout state.

@catsmanac
Copy link
Collaborator

@Steve2017 it's the one below

 async def _async_fetch_with_retry(self, url, **kwargs):

where de debug log for attempt is indeed

@catsmanac
Copy link
Collaborator

That's where I was hoping the use of 'available' might work - especially if it replaced the earlier 0kw data with 1w for that one attempt. Didn't help.

Seen this one?

@Steve2017
Copy link
Author

Seen this one?

No - but that is the same issue and I have seen other attempts to fix it. While the drop-out causes the initial problem, the ongoing problem seems to lie with the Reimann Sum and as Tom_I indicates there were attempts to fix it...unsuccessfully.

Tom has made some other suggestions, but none worked for me.

I might have to wait until the integration is back in the official version, then try raising an issue in the core.

I'm not sure how to implement the last suggestion in that thread, but no-one is saying it works.

@catsmanac
Copy link
Collaborator

Maybe something like

- platform: integration
  source:  {{ sensor.envoy_122302045041_inverter_122220009431 | float(default=0) }} 
  name: envoy_122302045041_inverter_122220009431_productie
  unique_id: sensor.envoy_122302045041_inverter_122220009431_productie
  unit_prefix: k
  round: 3
  method: left

@Steve2017
Copy link
Author

I get "invalid key: " when I try to use this format {{ sensor.envoy_122302045041_inverter_122220009431 | float(default=0) }}
(using my sensors of course)

The message is:
Error loading /config/configuration.yaml: invalid key: "{'sensor.solar_export_power3 | float(default=0)': None}"

@catsmanac
Copy link
Collaborator

You can try such an expression in developers tools / template. Could be some {} are wrongly positioned. That is not my stronghold for sure.

@catsmanac
Copy link
Collaborator

I tried this in my templates test section

{{ states('sensor.envoy_122302045041_inverter_122220009431') | float(default=0) }}

and it yielded the value

@catsmanac
Copy link
Collaborator

And found this calc to make kwh for wh

  - name: "envoy productie 7 dag teller kWh"
    unique_id: envoy_productie_7_dag_teller_kWh
    unit_of_measurement: "kWh"
    icon: mdi:solar-panel
    state: >
      {{ states('sensor.envoy_122302045041_last_seven_days_energy_production') | float(0) / 1000}}
    availability: "{{ states('sensor.envoy_122302045041_last_seven_days_energy_production') | is_number }}"

so run your input signal through this and then integrate?

@Steve2017
Copy link
Author

I also get a value in my test template section, but when I use that format as the source for the Reimann Sum sensor, it throws its hands up.

I'll wait until after 11pm (10 minutes away) to see if the longer timeout works.

Given the 'last_seven_days_energy_production' sensor also drops out, I am guessing the Reimann Sum will still have issues there, but worth a try.

@Steve2017
Copy link
Author

The longer timeout did not make any difference. It still drops out. I forgot to turn on debug which is not helpful.

Why does the integration use the ID and password method to obtain a token? Why not just enter a token directly in the configuration/setup and skip those steps?

@aperezva
Copy link

aperezva commented Aug 8, 2023

Hi, As I could see, HA will announce a new release of Official Enphase integration?

Could you confirm?

I have the same disconnection problems, but it not make sense to go deeper if we will have a need integration

Could you confirm?

@catsmanac
Copy link
Collaborator

Hi, As I could see, HA will announce a new release of Official Enphase integration?

Yes, not sure which version it will be. All the PR for Enphase changes in core are here (added to development version and here (pending)

This is the most important one: 'Refactor enphase_envoy to use pyenphase library'

@catsmanac
Copy link
Collaborator

Why does the integration use the ID and password method to obtain a token? Why not just enter a token directly in the configuration/setup and skip those steps?

to allow for an automatic token refresh if it's expired.

@catsmanac
Copy link
Collaborator

The longer timeout did not make any difference. It still drops out.

Too bad. Only option I can think of is adding a retry on timeout. Considering the next collection cycle 1 minute later works (right?) just retrying may work as well.

@Steve2017
Copy link
Author

Steve2017 commented Aug 9, 2023

to allow for an automatic token refresh if it's expired.

Too bad there is not an option to just use the token because it has a 12-month expiry. I understand the reasoning behind regular token updates/checks, but the Enphase website is very slow and non-responsive at times.

I wonder if that might be related to the issue.

Having a choice between ID/Password and Token would be nice. There is already a routine to check whether the token is still valid. Even better would be having Reimann Sum fixed by the HA dev team.

@Steve2017
Copy link
Author

Too bad. Only option I can think of is adding a retry on timeout. Considering the next collection cycle 1 minute later works (right?) just retrying may work as well.

The envoy_reader_py looks to already have retries at line 233 in its code???

async def _async_fetch_with_retry(self, url, **kwargs):
        """Retry 3 times to fetch the url if there is a transport error."""
        for attempt in range(3):
            header = " <Blank Authorization Header> "
            if self._authorization_header:
                header = " <Authorization header with Token hidden> "

My debug log shows failure after one attempt, where the code indicates retries occur after a 401, so maybe I am not even getting to a 401.

@catsmanac
Copy link
Collaborator

I wonder if that might be related to the issue.

I doubt it. Enphase website is only contacted when HA starts. Checking if the token is expired is using the date embedded in the token and comparing it to current time. If current time is beyond token expiration time then and only then it reaches out to Enphase website to get a new token.

The 2.10.0 dev-2-core one we tested for some time even stores the token locally so does not need to contact Enphase website when HA starts, but that's not when your issue happens and the issue happened with that one as well.

@catsmanac
Copy link
Collaborator

The envoy_reader_py looks to already have retries at line 233 in its code???

Yes, but that's only for the 401 to trigger cookies/token check. You are getting a timeout that takes it to the bottom of the where a test is present for httpx.Transporterror. That should include timeout as well but that doesn't seem to function.

            except httpx.TransportError:
                if attempt == 2:
                    raise

You can try removing httpx.Transport error from that line (leave the :) that should let it retry 3 times on any error and see it that captures the timeout.

            except:
                if attempt == 2:
                    raise

@Steve2017
Copy link
Author

You can try removing httpx.Transport error from that line

Still no joy.

It might be time to concentrate on how to make the Reimann Sum sensor become available again. As a test I set one up through the Helper GUI (so has unique ID) and it let me switch it back from unavailable to available.

Thanks for your patience. I think mine has run out.

There is still something fundamentally wrong with those sensors. When solar production was zero, I created a GUI one to measure solar energy export and instead of returning 0kWh, it returned 'unknown'. From programming language logic this might make sense, but from real world logic we knew that solar power = 0.... therefore solar energy export = 0. It was not "unknown"!

@catsmanac
Copy link
Collaborator

Understand @Steve2017 , though I feel I'm not done with it yet. There are indeed 2 separate issues, the Riemann and the communication issue.

As for your attempt with changed error handling, did you happen to have some log around that can help me how to really implement a retry at timeout?

As for the returned unknown, I'm not sure I fully comprehend what you did.

@Steve2017
Copy link
Author

Steve2017 commented Aug 10, 2023

As for your attempt with changed error handling, did you happen to have some log around that can help me how to really implement a retry at timeout?

No - it was late at night and I forgot to turn on debug again. I'll try tonight.

As for the returned unknown, I'm not sure I fully comprehend what you did.

Just to make sure there wasn't something different in the way Reimann Sum sensors were processed in the Helper GUI, I created a new energy sensor that duplicated the ones I had made in YAML in config.yaml. (I think the only real difference is the GUI assigns a unique ID)

At the time I created the new GUI sensor there was no solar production, so solar_export_power was 0.0kw. This meant the energy sensor should have been showing 0.0kWh exported. I gather a Reimann Sum sensor will normally show 'unknown' until the first data reading is received, then instead of "unknown", it will show 0.12kWh (or whatever the calculation returns).

Instead, it showed 'unknown' up until the Envoy Integration sensors went off-line for a minute. When the Envoy source sensors came back on line, it changed to 'unavailable', just like the YAML versions.

I can only assume the power sensor had been sending "no data" because there had been no new readings for several hours - and so the Reimann Sum sensor recorded that as unknown.

BTW: No dropout early this morning - only the regular 11pm event last night.

@catsmanac
Copy link
Collaborator

I'll try tonight.

I fully understand you kinda had it with this, so no need to stay up late to just get it.

I can only assume the power sensor had been sending "no data"

Thanks for explaining, I think I know what is happening:

  1. The Riemann needs 2 values in series with different timestamps to be able to integrate, until then it probably 'unknown'.
  2. The Envoy sends the last reported value and time. When production ends that value and time are no longer changing. During that time the Riemann is waiting for next value/time.
  3. When outage occurs the Riemann can't use that data point as next one and probably signals unavailable.
  4. When communication is restored HA inserts the 0 with the current timestamp I guess as timestamp will still be 7pm
  5. ENVOY is still sending zero value for 7PM, no newer data as timestamp is not newer as the inserted communication restored point.
  6. Then in the morning when production resumes and first value and time comes in the Riemann has its awaited second value and can integrate again.

Hope it makes sense.

@Steve2017
Copy link
Author

5. ENVOY is still sending zero value

I had been hoping using the available function to return 1.0 or 2.0 when it drops out would help if it then reverted to zero on the next report. That was not the case.

I'll turn debug on now. It'll be a large file, but at least it'll be on.

@Steve2017
Copy link
Author

Steve2017 commented Aug 10, 2023

You can set your watch by it. All is good, then GET attempt starts at 23:00:01 with 3 attempts. Times out 30 seconds later.
Tries again at 23:01:31.425. All good.

20230810 Debug Log Enphase Dev2.txt

The Reimann Sum sensor for solar export and child sensors are now unavailable.

@catsmanac
Copy link
Collaborator

Thanks for providing this @Steve2017, I realize it was late (or already early) over there.

2023-08-10 23:00:01.426 HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1
2023-08-10 23:00:31.428 HTTP GET Attempt #2: https://192.168.1.144/production.json?details=1: 
2023-08-10 23:00:31.432 HTTP GET Attempt #3: https://192.168.1.144/production.json?details=1: 
2023-08-10 23:00:31.436 ERROR  Timeout fetching envoy Envoy  data
2023-08-10 23:00:31.439 Finished fetching envoy Envoy data in 30.014 seconds (success: False)

The error catching worked as it looped 3 times. But as you can see from timing attempt 2 and 3 returned right away with time-out. Think we need to do some additional action there to start like a real new connection. Need to give that a bit of taught.

Since it happens like clockwork I would think it's something in the Envoy going on. But it has an ugly effect downstream. Al tough I don't think the resulting numbers are wrong, it's that seeing all those gaps makes one think there some other problem.

@Steve2017
Copy link
Author

2023-08-19 22:59:46.118 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 122135007569 data in 4.871 seconds (success: True)
2023-08-19 23:00:29.988 INFO (MainThread) [homeassistant.components.mqtt.discovery] Component has already been discovered: sensor Acurite-986-2F-43031 Acurite-986-2F-43031-B, sending update
2023-08-19 23:00:29.994 INFO (MainThread) [homeassistant.components.mqtt.discovery] Component has already been discovered: sensor Acurite-986-2F-43031 Acurite-986-2F-43031-F, sending update
2023-08-19 23:00:46.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-19 23:00:46.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-19 23:00:46.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-19 23:00:46.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-19 23:01:01.708 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] Error in fetch_with_retry, try closing connection: 
2023-08-19 23:01:01.713 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #2: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-19 23:01:17.325 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] Error in fetch_with_retry, raising: 
2023-08-19 23:01:17.327 ERROR (MainThread) [custom_components.enphase_envoy] Error fetching envoy Envoy 122135007569 data: Error communicating with API: 
2023-08-19 23:01:17.327 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 122135007569 data in 31.079 seconds (success: False)
2023-08-19 23:02:17.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-19 23:02:17.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-19 23:02:17.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-19 23:02:17.248 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 

@catsmanac
Copy link
Collaborator

Thanks @Steve2017, well it confirms the change in timing will allow for actual retry. Too bad the retry failed as well. It was done with the 2 attempts in 31 seconds, but neither the first attempt nor the retry spit out the actual error. Both took the allotted 15 secs. In between I tried closing the connection, but that clearly did fix it.

Next try will be @cddu33 recommended 30 sec wait in between 2 attempts.

@catsmanac
Copy link
Collaborator

Just released 0.0.14+2 in another attempt. Maybe went a little overboard by making all timing configurable...

By default this version will use a timeout of 12 seconds, then wait 30 seconds and try once more.

To fine tune to what may work use the settings option for the envoy. Be aware that the bottom item, Overall timeout, is the time after which the whole data collection is cancelled. I,e, if you increase the retry count or wait time between tries and exceed the overall timeout all data will be considered timed-out by this timer.

afbeelding

I simulated a timeout by unplugging my envoy from the network using an 8 sec timeout and 30 sec wait time, it's not an exact test for the problem but it will show what debug logs will show:

2023-08-20 11:55:25.097 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.3.112/api/v1/production: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-20 11:55:33.104 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP Timeout in fetch_with_retry, waiting a bit: 
2023-08-20 11:56:03.136 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #2: https://192.168.3.112/api/v1/production: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-20 11:56:11.144 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP Timeout in fetch_with_retry, raising: 
2023-08-20 11:56:11.149 ERROR (MainThread) [custom_components.enphase_envoy] Error fetching envoy Envoy 122302045041 data: Error communicating with API: 
2023-08-20 11:56:11.150 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 122302045041 data in 46.054 seconds (success: False)

@Steve2017
Copy link
Author

Installed - will leave the config at the default:

Time between entity updates [s].
60
Timeout for getting single Envoy data page [s], minimum 5.
10
How many retries in getting single Envoy data page. minium 1.
1
Time between 2 retries to get single Envoy data page[s], minimum 0.
30
Overall Timeout on getting all Envoy data pages[s], minimum 30.
55

@Steve2017
Copy link
Author

Still off-line for 60 seconds. Riemann Sum sensors now unavailable. Lost cause???

2023-08-20 22:58:55.402 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy 122135007569 data in 1.216 seconds (success: True)
2023-08-20 22:59:55.186 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-20 22:59:55.187 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-20 22:59:55.187 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-20 22:59:55.187 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-20 23:00:07.479 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP Timeout in fetch_with_retry, waiting a bit: 
2023-08-20 23:00:37.488 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #2: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-20 23:00:50.180 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP Timeout in fetch_with_retry, raising: 
2023-08-20 23:00:50.181 ERROR (MainThread) [custom_components.enphase_envoy] Error fetching envoy Envoy REDACTED data: Error communicating with API: 
2023-08-20 23:00:50.181 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy REDACTED data in 54.995 seconds (success: False)
2023-08-20 23:01:50.186 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-20 23:01:50.186 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-20 23:01:50.187 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-20 23:01:50.187 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1: https://192.168.1.144/production.json?details=1: use owner token: True: Header: <Authorization header with Token hidden> 
2023-08-20 23:01:58.320 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Fetched (1 of 2) from https://192.168.1.144/production.json?details=1: <Response [200 OK]>: 

@catsmanac
Copy link
Collaborator

catsmanac commented Aug 20, 2023

Lost cause???

Yes, I'm out of ideas now.

Whatever we try within a single collection cycle seems fruitless. Since it will get data on the next cycle the effect is limited but ugly with the Riemann until new values come in. I'm open to any good suggestion, but feel we exhausted our options.

@Steve2017
Copy link
Author

Close it as unresolved.

@Steve2017 Steve2017 reopened this Aug 22, 2023
@catsmanac
Copy link
Collaborator

Any new ideas to try @Steve2017 ? What you might try is increase the wait time between tries. Make sure to increase the overall time accordingly.

@Steve2017
Copy link
Author

I can try, but I keep wondering if we are trying to solve a problem that is programmed into the Envoy. (When I say we I really mean you and others because my programming skills = 0.0)

Until we see whether it is possible for this integration to take data from "ivp/meters/readings" that thought will remain. I just wonder whether Enphase resets the processing unit compiling the "/production" data each night. It might have always done this, but this only became apparent when the HA developers changed something in the Riemann Sum code a few months ago?

@catsmanac
Copy link
Collaborator

I think you are right with your analysis. I was just checking if you had an Ah-moment as you re-opened the issue. For me no need to retry but if you had courage again I wanted to suggest some options.

I hope to add the ivp/meters/readings soon. Plan to release 0.0.15 today or tomorrow with a fix for battery issue in 0.0.14 and configurable timing (need it for slow Envoy as well, I probably change the defaults to more generic) and probably one other item. Then on with 0.0.16 with the ivp meters by @cddu33, hope to have test versions in a few days.

@Steve2017
Copy link
Author

No ah-ha. Just wanted to make sure people could see it is not a new problem, and that it is not critical. Their system will still work, but Riemann Sum sensors (& the child sensors) will go off-line until fresh data arrives.

@catsmanac
Copy link
Collaborator

@Steve2017, meters/readings is out for test. In HACS install from https://github.com/catsmanac/home_assistant_custom_envoy, enable beta versions and select V0.0.14-ImportExportMeters.

Readme there is updated with new entities.

this version also has configurable timeouts set to more relaxed defaults, but if you specified these it will use those found in your config.

@Steve2017
Copy link
Author

Steve2017 commented Aug 30, 2023

Hi @catsmanac I'm assuming from one of the other posts I should switch over to:
https://github.com/briancmpbll/home_assistant_custom_envoy/releases/tag/v0.0.17

I might have a workaround on the 23:00 dropouts. Two days ago, I changed the scan period from 60 seconds to 150 seconds. I have just checked the history for the past two nights and the longer scan period seems to overcome the dropouts on all Envoy-supplied sensors. Instead of the dropout on production (which caused the Riemann Sum issue), it reverted to showing phantom production.
Solar Production
This is from two nights ago where I got two blips. Last night was one. Every other sensor I checked shows no dropouts at 23:00.

What I don't know is whether this avoids the Riemann Sum sensors issue, because I have disabled them. I have used the REST sensors with a utility meter sensor. I have just re-enabled a Rieman Sum test sensor so will see tonight.

Those two peaks last for about 2 minutes so a shorter scan period might also work.

Of course, the downside of the 150 second interval is you don't get as up-to-date net energy data.

@Steve2017
Copy link
Author

Steve2017 commented Aug 30, 2023

I might have spoken too soon. While I checked all the Envoy sourced sensors, the child sensors of those entities are still showing a dropout. The source sensors look okay, but not the ones downstream.

EDIT: Update at 23:25. The Riemann Sum sensor relying on solar power data did not become unavailable - as it would have in the past.. So a longer scan period overcomes the problem of dropouts and Riemann Sum. I’ll try with a 120 scan interval and see how that goes tomorrow.

@catsmanac
Copy link
Collaborator

Hi @Steve2017, it remains a puzzle what really is going on. The blips make no sense either. The 150 sec could avoid that danger minute completely, but then why the blip. And the first blip seems at 23:04 why later than the usual danger minute. Are those values actually coming from the Envoy? That would require a debug log to inspect. IF the scan interval is 150, then why are the blips 2 min wide and 1 min in between?

The v0.0.17 version has the import/export added from /ivp/meteres/readings. But as far as timimg is concerned it has the same options as v0.0.15 and v0.0.16.

Be aware that HA Core will be releasing their updated version in September and it uses a rewritten communication that may or may not help with this too. The V0.0.17 import export is not part of HA core yet and when it gets added in a future version it may differ from this one and may cause history import/export to start fresh on new entities/variables. So depending how important that continuation is you may hold back on installing. On the other getting some data to see what it can bring can be considered as better then no data at all. Just that you are aware of this.

@Steve2017
Copy link
Author

Steve2017 commented Aug 30, 2023

Are those values actually coming from the Envoy?

I’m not sure if your question is rhetorical, but that is one of the native sensors in the integration. I’ll run debug tomorrow. I was surprised at the two.

I got a single blip from the reading half an hour ago. The 1-watt equates with the normal power drain I see on the Envoy home page after sunset (hence phantom production)

@Steve2017
Copy link
Author

Back to the two minute scan and I can see a dropout and the Riemann Sum sensor is unavailable.

2023-08-31 22:58:29.334 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy  -redacted-  data in 1.965 seconds (success: True)

2023-08-31 23:00:29.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-31 23:00:29.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-31 23:00:29.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-31 23:00:29.371 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1 of 2: https://192.168.1.144/production.json?details=1: use token: True: Header: <Token hidden>  Timeout: 30 Holdoff: 0
2023-08-31 23:00:59.660 WARNING (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP Timeout in fetch_with_retry, waiting 0 sec: 
2023-08-31 23:00:59.665 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #2 of 2: https://192.168.1.144/production.json?details=1: use token: True: Header: <Token hidden>  Timeout: 30 Holdoff: 0
2023-08-31 23:01:24.372 ERROR (MainThread) [custom_components.enphase_envoy] Timeout fetching envoy Envoy  -redacted-  data
2023-08-31 23:01:24.372 DEBUG (MainThread) [custom_components.enphase_envoy] Finished fetching envoy Envoy  -redacted-  data in 55.002 seconds (success: False)

2023-08-31 23:03:24.369 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Checking Token value: yJraWQiOi (Only first 10 characters shown)
2023-08-31 23:03:24.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token is populated: yJraWQiOi (Only first 10 characters shown)
2023-08-31 23:03:24.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Token expires at: 2024-08-10 16:20:04
2023-08-31 23:03:24.370 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] HTTP GET Attempt #1 of 2: https://192.168.1.144/production.json?details=1: use token: True: Header: <Token hidden>  Timeout: 30 Holdoff: 0
2023-08-31 23:03:24.839 DEBUG (MainThread) [custom_components.enphase_envoy.envoy_reader] Fetched (1 of 2) in 0.5 sec from https://192.168.1.144/production.json?details=1: <Response [200 OK]>: {"production":

I'm giving up on this one.

@catsmanac
Copy link
Collaborator

Fully understand, this remains mind-blowing... .

Only thing currently I can think of to determine if it's the overall Envoy or just the production page is to swap the order of the collection in envoy_reader.py

from:

    async def _update_from_pc_endpoint(self):
        """Update from PC endpoint."""
        await self._update_endpoint(
            "endpoint_production_json_results", ENDPOINT_URL_PRODUCTION_JSON
        )
        await self._update_endpoint(
            "endpoint_ensemble_json_results", ENDPOINT_URL_ENSEMBLE_INVENTORY
        )

to:

    async def _update_from_pc_endpoint(self):
        """Update from PC endpoint."""
        await self._update_endpoint(
            "endpoint_ensemble_json_results", ENDPOINT_URL_ENSEMBLE_INVENTORY
        )
        await self._update_endpoint(
            "endpoint_production_json_results", ENDPOINT_URL_PRODUCTION_JSON
        )

and see if it bumps on ensemble inventory or still on production.

But I understand it if abandon all attempts for now, I have given up too before.

@catsmanac
Copy link
Collaborator

@Steve2017 with installing v0.0.18 you now can opt to not use the production page anymore using the configure option on the envoy integration page and reload the envoy. It will make today's and last 7 days numbers unavailable, but might solve the daily 11 PM dropout by avoiding the production page.

afbeelding

@catsmanac catsmanac added the FW Problem Problem in Envoy Firmware, can't fix in integration label Sep 21, 2023
@Steve2017
Copy link
Author

Tks. I have v 0.0.18 installed using default settings - and I still got the 23:00 dropout. I’ve ticked the check-box and reloaded so I’ll see how it goes tonight.

If I recall V 0.0.17 had the daily export and import data? (Called “index”) I liked that.

@catsmanac
Copy link
Collaborator

In v0.0.18 the import and export is renamed to 'Lifetime Net Energy Consumption' and 'Lifetime Net Energy Production'. You can use these in the energy dashboard for grid import and export.

@Steve2017
Copy link
Author

Tks

They are not the most intuitive labels (at least for me). Lifetime Net Energy Production is not producing any data for me - it is flat-lined at zero. I am getting data for the Lifetime Net Energy Consumption so it will work with a utility meter.

I’ve tried with the “do Not Use Production” checkbox ticked, but I found the loss of today’s consumption and production is inconvenient, so I’ll revert to using “Production”. That means the above sensors will no be available, so not an issue for me.

@catsmanac
Copy link
Collaborator

Labels names are indeed not always intuitive the same way for everyone I've learned with this one. HA allows you to assign your own labels that will show.

The lifetime net energy production and consumption come from the Consumption CT in Solar included mode. If you don't have it installed it will be flat I guess, or even bogus data. If you have a consumption CT installed then we may have an issue unless all solar production is consumed directly by the house.

The net production and consumption number are available also when do not use production is unchecked, but in your case with same result, flat-line zero. Only data used from the production page is Today's and Last 7 days, all other data now comes from the ivp/meters/reports page, irregardless of the checkbox.

@catsmanac
Copy link
Collaborator

There's more reports of net production being zero, seems like newer firmware behaves different as one of the original author.

@Steve2017
Copy link
Author

It might depend how the CT clamps are configured. I have a CT clamp on the production and that is straight forward (aside from the phantom production after sundown).

The consumption CT measures load with solar production as below:

image
So anything else requires some calculations from the Envoy. The alternative is load-only.
image

@catsmanac
Copy link
Collaborator

Right, that's correct. The issue was reported for a metered Envoy with a production CT and a consumption CT with solar included. The ivp/meters/readings page reported as before, but the ivp/meters/reports page changed it behavior over firmware upgrade.

@catsmanac
Copy link
Collaborator

I've updated the code to get the net-production and net-consumption data from ivp/meters/readings so they show both as positive numbers to use with energy dashboard and not as negative net consumption. #164. No need to use that do Not Use Production option for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FW Problem Problem in Envoy Firmware, can't fix in integration
Projects
None yet
Development

No branches or pull requests

4 participants