Skip to content
This repository has been archived by the owner on Sep 1, 2022. It is now read-only.

AddressBook + HTTP Proxy: store only unique subscription addresses #835

Merged
merged 2 commits into from
Apr 3, 2018

Conversation

coneiric
Copy link
Contributor

@coneiric coneiric commented Mar 14, 2018


By submitting this pull-request, I confirm the following:

  • I have read and understood the contributor guide in kovri-docs.
  • I have checked that another pull-request for this purpose does not exist.
  • I have considered and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used and that this pull-request may be closed by the will of the maintainer.
  • I give this submission freely under the BSD 3-clause license.

Fixes #814

edit: HTTP Proxy is changed indirectly through the Address Book singleton the proxy uses to save jump service addresses.

@coneiric coneiric force-pushed the 814/unique-addresses branch 3 times, most recently from 4db8e67 to 37a7b30 Compare March 17, 2018 06:14
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 17, 2018
New helper functions validate human readable hostnames,
and ensure only unique entries are loaded from subscriptions.

Refactor subscription validation + saving with new helpers.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 17, 2018
Refactor test cases and fixture for added utility/clarity.

New test cases for validating only unique, well-formed entries are added
to the address book.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 17, 2018
Save jump service addresses to in-memory address book, and remove
singleton used for directly storing to disk.

Add a URL parser, and refactor various components to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 17, 2018
Ensure that saving jump service addresses are handled properly.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 17, 2018
@coneiric coneiric force-pushed the 814/unique-addresses branch from 37a7b30 to eec1345 Compare March 18, 2018 02:32
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
New helper functions validate human readable hostnames,
and ensure only unique entries are loaded from subscriptions.

Refactor subscription validation + saving with new helpers.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
Refactor test cases and fixture for added utility/clarity.

New test cases for validating only unique, well-formed entries are added
to the address book.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
Save jump service addresses to in-memory address book, and remove
singleton used for directly storing to disk.

Add a URL parser, and refactor various components to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
HTTP jump service requests save valid addresses to in-memory
address book, separating storage from HTTP message processing.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 18, 2018
Update test fixture with a destination that has a valid length.
Previous destination fails identity validation checks, and throws a
length error when decoded from base64.

Refs monero-project#835
@coneiric coneiric force-pushed the 814/unique-addresses branch from eec1345 to 22f9202 Compare March 19, 2018 01:18
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
Refactor fixture for added utility + clarity.

New test cases for validating only unique, well-formed entries are added
to the address book.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
Save jump service addresses to in-memory address book, and remove
singleton used for directly storing to disk.

Add a URL parser, and refactor various components to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
HTTP jump service requests save valid addresses to in-memory
address book, separating storage from HTTP message processing.

Refs monero-project#835 + monero-project#838
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 19, 2018
Rewrite to fix bugs uncovered by unit-tests.

Refs monero-project#835 + monero-project#838
@coneiric coneiric force-pushed the 814/unique-addresses branch from 22f9202 to 5944bdd Compare March 20, 2018 05:38
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 20, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 20, 2018
Rewrite to fix bugs uncovered by unit-tests.

Refs monero-project#835 + monero-project#838
Copy link
Collaborator

@anonimal anonimal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At 47 NACKs, I'll finish the review once the issues are resolved. Also look at your commit messages: they are inaccurate:

  • You didn't rewrite the unit-test: you did unnecessary refactoring and added test-cases.
  • You didn't rewrite address saving: you moved parsing from extraction and added a check.
  • "HTTPProxy: rewrite to fix uncovered bugs" doesn't describe the refactor, nor does it describe any of the bugs anywhere (message title or body).

@@ -165,7 +209,9 @@ BOOST_AUTO_TEST_CASE(PGPClearSign) {
"=P8Ug"
"-----END PGP SIGNATURE-----";
lines.push_back(line);
BOOST_CHECK(Validate());
//TODO(oneiric): PGP-signed subscriptions should pass validation
// mark true after implementing PGP signature verification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementing PGP signature verification

There's no real reason to do that. You probably mean ignoring pgp signature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure if PGP signatures are actually verified by AddressBook. Thought it might be a good idea for validating the shipped hosts.txt + other subscriptions.

If AddressBook ignores PGP signatures in subscriptions, is PGP verification a feature somewhere else?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AddressBook ignores PGP signatures in subscriptions, is PGP verification a feature somewhere else?

Yes, the reason for this is because our monero publisher service pulls the default subscription directly from github (see the pkg directory). I PGP sign the subscription to prevent any tampering; either by github or the very rare potential "I broke your TLS" mitm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thank you, that answers my question completely.

file << host << "=" << ident.ToBase64() << '\n'; // TODO(anonimal): this is not optimal, especially for large subscriptions
// Add to address book
m_Storage->AddAddress(ident);
m_Addresses[host] = ident.GetIdentHash(); // TODO(anonimal): setter?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Add to address book
m_Storage->AddAddress(ident);
m_Addresses[host] = ident.GetIdentHash(); // TODO(anonimal): setter?

.....do I really have to say why removing this is a bad idea?...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you don't. This was left over from previous edits, and I re-added it to the try-catch block below.

Thanks for catching it.

LOG(warning) << "AddressBook: malformed address, skipping";
continue;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way. Not only does this vastly decrease efficiency (you're inserting for every line!), but insertion should stay decoupled from validation. There was no reason to refactor this function nor setup a vector nor use boost::split.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was my mistake, I was trying to check if addresses were already loaded in memory during validation. This function is now unchanged.

std::string line;
// To ensure valid hostname
// Note: uncomment if this regexp fails on some locales (to not rely on [a-z])
//const std::string alpha = "abcdefghijklmnopqrstuvwxyz";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this unless you've adequately tested across multiple locales.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-added the comment. Wasn't sure if this was leftover, and still don't really understand how the alpha string would help if the regex below fails.

@@ -332,16 +338,8 @@ bool AddressBook::SaveSubscription(
const std::map<std::string, kovri::core::IdentityEx>
AddressBook::ValidateSubscription(std::istream& stream) {
LOG(debug) << "AddressBook: validating subscription";
// Map host to address identity
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no functional reason to remove this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it got caught up when I was trying to separate out host validation from subscription validation.

if (helper_ident.size() != 2)
throw std::length_error("HTTPProxyHandler: invalid number of jump service parameters");
return helper_ident;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

m_Base64Destination = boost::network::uri::decoded(base64);
return true;
std::vector<std::string> const helper_ident = ParseJumpServiceQuery();
if (!helper_ident[1].empty())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to access like this exposes bad design. Don't repeat guzzi's mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try not to. I feel infected by it.

pos = pos2;
std::vector<std::string> const helper_ident = ParseJumpServiceQuery();
for (const auto& helper : m_JumpService)
if (helper_ident[0] == helper)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to access like this exposes bad design. Don't repeat guzzi's mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try not to. I feel infected by it.

return false;
}
if (m_Address.empty())
ParseURL();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now, you shouldn't need to parse anything more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree, I went overkill on this. I'll clean up this area in the HTTP proxy rewrite.

ParseURL();
core::IdentityEx ident;
// Extra sanity check to ensure base64 destination is uri-decoded
ident.FromBase64(boost::network::uri::decoded(m_Base64Destination));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By now, you shouldn't need to parse anything more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I was considering not adding this extra URI-decoding step. Reverted, and will keep in mind for the HTTP proxy rewrite.

@anonimal
Copy link
Collaborator

Note: if your intent is only to AddressBook + HTTP Proxy: store only unique subscription addresses, then keep PRs atomic so unrelated work doesn't block flow.

coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New helper checks if host + address already loaded into memory.

Refactor functions that store addresses to use the new helper.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New helper to validate I2P hostnames, and refactor subscription
validation to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New test case to ensure address book only inserts unique entries.

Added struct + helper to test fixture for extracting an address book
entry from a subscription line.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
Save jump service addresses to in-memory address book, and remove
singleton used for directly storing to disk.

Add a URL parser, and refactor various components to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
HTTP jump service requests save valid addresses to in-memory
address book, separating storage from HTTP message processing.

Refs monero-project#835 + monero-project#838
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
Rewrite to fix bugs uncovered by unit-tests.

Refs monero-project#835 + monero-project#838
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
Implement inserting an address to the in-memory address book.

Will replace the singleton used by HTTP proxy, and is generic to be used
by other member functions or external classes.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New test case to ensure address book only inserts unique entries.

Added struct + helper to test fixture for extracting an address book
entry from a subscription line.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
Save jump service addresses to in-memory address book, and remove
singleton used for directly storing to disk.

Add a URL parser, and refactor various components to use it.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
HTTP jump service requests save valid addresses to in-memory
address book, separating storage from HTTP message processing.

Refs monero-project#835 + monero-project#838
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New setter for in-memory address book to insert unique entries.

Refactor functions that store addresses to use the new setter.

Refs monero-project#835
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 22, 2018
New test case to ensure address book only inserts unique entries.

Added struct + helper to test fixture for extracting an address book
entry from a subscription line.

Refs monero-project#835
@coneiric coneiric force-pushed the 814/unique-addresses branch from 5944bdd to 0bbd406 Compare March 23, 2018 01:10
@coneiric
Copy link
Contributor Author

coneiric commented Mar 23, 2018

keep PRs atomic so unrelated work doesn't block flow

Painful lesson. I get over-eager to do refactoring as I see the problem. Hard habit to break.

I moved my HTTP Proxy refactor attempts (the few parts worth saving), and your recommendations to a separate branch dedicated to an HTTP message rewrite.

Again, thank you for your patience, and our conversation in IRC about future refactors.

Apologize for any time wasted, or blood vessels popped.

@coneiric coneiric force-pushed the 814/unique-addresses branch from 0bbd406 to 75b5712 Compare March 23, 2018 03:52
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 23, 2018
New test case to ensure address book only inserts unique entries.

New test fixture struct to convert a subscription line into an address
book entry.

Refs monero-project#835
@coneiric coneiric force-pushed the 814/unique-addresses branch from 75b5712 to 40fd3da Compare March 29, 2018 17:17
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 29, 2018
New test case to ensure address book only inserts unique entries.

New test fixture struct to convert a subscription line into an address
book entry.

Refs monero-project#835
Copy link
Collaborator

@anonimal anonimal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all the necessary changes.

{
const auto it = m_Addresses.find(host);
if (it != m_Addresses.end())
throw std::runtime_error("AddressBook: host already loaded");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto it = m_Addresses.find(host);
if (it != m_Addresses.end())
throw std::runtime_error("AddressBook: host already loaded");

So, not only is this incredibly expensive, but it's not necessary because of m_Addresses[host] = address

std::map::operator[] returns a reference to the value that is mapped to a key equivalent to key, performing an insertion if such key does not already exist.

A much better solution is to tackle file writing / address saving after the map is properly updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize std::map worked that way, pretty cool feature. Removed the quoted block, but kept the check for loaded addresses. Eventually, likely after multiple subscriptions PR, there should be a reverse address => hostname lookup, making the current naive lookup unnecessary.

A much better solution is to tackle file writing / address saving after the map is properly updated.

I will keep this in mind while working on the multiple subscription PR. Do we want to follow Java I2P naming for filenames, i.e. userhosts.txt, privatehosts.txt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to follow Java I2P naming for filenames, i.e. userhosts.txt, privatehosts.txt?

There's no need, and userhosts.txt is also confusing.

struct AddressBookEntry
{
public:
AddressBookEntry() : m_Host(""), m_Address() {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing a member string like this is unnecessary. Initializing m_Address like this is unnecessary because we do that during Tag construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure here about initialization, so I added it just in case. Removed now.

AddressBookEntry() : m_Host(""), m_Address() {}

explicit AddressBookEntry(const std::string& subscription_line)
: m_Host(""), m_Address()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing a member string like this is unnecessary. Initializing m_Address like this is unnecessary because we do that during Tag construction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure here about initialization, so I added it just in case. Removed now.

}
}

inline const std::string& host() const { return m_Host; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline

Not necessary as this is implicitly inlined. inlines are also suggestions which are not always observed by the compiler and may go ignored. The compiler may also inline when not explicitly stated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the inline suggestion, obvious now it doesn't help. Will look more into inline for future use.


inline const std::string& host() const { return m_Host; }

inline const kovri::core::IdentHash& address() const { return m_Address; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline

Not necessary as this is implicitly inlined. inlines are also suggestions which are not always observed by the compiler and may go ignored. The compiler may also inline when not explicitly stated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the inline suggestion, obvious now it doesn't help. Will look more into inline for future use.

inline const kovri::core::IdentHash& address() const { return m_Address; }

private:
/// @var m_Host
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// @var m_Host

Not necessary. In fact, with members, you can do std::string m_Host; ///< Human-readable I2P hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner! Updated now.

/// @var m_Host
/// @brief Human-readable I2P hostname
std::string m_Host;
/// @var m_Address
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// @var m_Address

Not necessary. In fact, with members, you can do kovri::core::IdentHash m_Address; ///< I2P address hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner! Updated now.

BOOST_AUTO_TEST_CASE(RejectDuplicateEntry)
{
// Ensure valid subscription line creates an entry
BOOST_CHECK_NO_THROW(AddressBookEntry const entry(subscription.front()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddressBookEntry const

const not necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, guess it is a bit too pedantic given the context.

// Ensure valid subscription line creates an entry
BOOST_CHECK_NO_THROW(AddressBookEntry const entry(subscription.front()));

AddressBookEntry const entry(subscription.front());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddressBookEntry const

const not necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, guess it is a bit too pedantic given the context.

New setter for in-memory address book to insert unique entries.

Refactor functions that store addresses to use the new setter.

Refs monero-project#835
@coneiric coneiric force-pushed the 814/unique-addresses branch from 40fd3da to c691441 Compare March 30, 2018 17:49
coneiric added a commit to coneiric/kovri that referenced this pull request Mar 30, 2018
New test case to ensure address book only inserts unique entries.

New test fixture struct to convert a subscription line into an address
book entry.

Refs monero-project#835
@coneiric
Copy link
Contributor Author

Thanks for making all the necessary changes.

You're welcome, thanks for the thorough review.

New test case to ensure address book only inserts unique entries.

New test fixture struct to convert a subscription line into an address
book entry.

Refs monero-project#835
Copy link
Collaborator

@anonimal anonimal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizations (design, not compiler) will eventually be needed because you want to iterate through the entire map for every single address in a subscription.

Works well enough for now.

@coneiric coneiric force-pushed the 814/unique-addresses branch from c691441 to a673466 Compare April 3, 2018 02:02
@anonimal anonimal merged commit a673466 into monero-project:master Apr 3, 2018
anonimal added a commit that referenced this pull request Apr 3, 2018
a673466 Tests: Address Book unit-test checks unique entry (oneiric)
05ee550 AddressBook: store only unique addresses (oneiric)
@coneiric coneiric deleted the 814/unique-addresses branch May 30, 2018 00:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP Proxy / AddressBook: ensure that jump service doesn't save already-known addresses
2 participants