Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add origin_referrer_url and origin_url to the file attribute #1430

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
968e8c2
add modified registry.yaml and generated files
AsuNa-jp Sep 24, 2024
ec834e6
add changelog
AsuNa-jp Sep 25, 2024
160b7ee
reflect the feedback
AsuNa-jp Sep 27, 2024
37c9710
reflect the feedback
AsuNa-jp Sep 27, 2024
4a2fed9
format fix
AsuNa-jp Sep 27, 2024
481bd1c
Merge branch 'main' into file_originevents
AsuNa-jp Sep 30, 2024
a2b4b35
Merge branch 'main' into file_originevents
AsuNa-jp Sep 30, 2024
4b11936
Merge branch 'main' into file_originevents
AsuNa-jp Oct 3, 2024
a1a4867
Merge branch 'main' into file_originevents
AsuNa-jp Oct 10, 2024
e8256e6
add file.zone_identifier
AsuNa-jp Oct 10, 2024
c55cb2b
Merge branch 'file_originevents' of github.com:AsuNa-jp/semantic-conv…
AsuNa-jp Oct 10, 2024
130bf61
re-generated the docs
AsuNa-jp Oct 25, 2024
1f6663a
Merge branch 'main' into file_originevents
AsuNa-jp Oct 28, 2024
329df10
Merge branch 'main' into file_originevents
trisch-me Oct 29, 2024
ba508e4
Merge branch 'main' into file_originevents
AsuNa-jp Oct 30, 2024
f4ad750
Merge branch 'main' into file_originevents
AsuNa-jp Oct 30, 2024
d82f261
add note of zoneID
AsuNa-jp Oct 30, 2024
c6b8d78
Merge branch 'file_originevents' of github.com:AsuNa-jp/semantic-conv…
AsuNa-jp Oct 30, 2024
b0016f6
Merge branch 'main' into file_originevents
AsuNa-jp Oct 31, 2024
7d5331c
Merge branch 'open-telemetry:main' into file_originevents
AsuNa-jp Oct 31, 2024
ed796cc
fixed the typo
AsuNa-jp Oct 31, 2024
f4e241c
Merge branch 'main' into file_originevents
AsuNa-jp Nov 1, 2024
264fd13
Merge branch 'main' into file_originevents
AsuNa-jp Nov 1, 2024
b590595
Merge branch 'main' into file_originevents
AsuNa-jp Nov 7, 2024
0793ef9
Merge branch 'main' into file_originevents
AsuNa-jp Nov 8, 2024
17107e4
Merge branch 'main' into file_originevents
AsuNa-jp Nov 11, 2024
a3dfd75
Merge branch 'main' into file_originevents
AsuNa-jp Nov 11, 2024
daa081e
Merge branch 'main' into file_originevents
AsuNa-jp Nov 13, 2024
22d06ab
Merge branch 'main' into file_originevents
AsuNa-jp Nov 18, 2024
088e959
resolve merge conflicts
AsuNa-jp Nov 19, 2024
358f21a
Merge branch 'main' into file_originevents
AsuNa-jp Nov 19, 2024
71c4edf
Merge branch 'open-telemetry:main' into file_originevents
AsuNa-jp Nov 29, 2024
92ceab6
add file.open event
AsuNa-jp Nov 29, 2024
8eb5600
adjust the changelog
AsuNa-jp Nov 29, 2024
43ad199
Merge branch 'main' into file_originevents
AsuNa-jp Dec 3, 2024
b763b83
Merge branch 'main' into file_originevents
trisch-me Dec 3, 2024
581b0e0
Merge branch 'main' into file_originevents
AsuNa-jp Dec 5, 2024
bb17dcd
removed zone_identifier
AsuNa-jp Dec 5, 2024
e3fbf82
Update file_originevents.yaml
AsuNa-jp Dec 5, 2024
1ba436d
Merge branch 'main' into file_originevents
AsuNa-jp Dec 9, 2024
2493537
Merge branch 'main' into file_originevents
AsuNa-jp Dec 11, 2024
cd068b6
Merge branch 'main' into file_originevents
AsuNa-jp Dec 12, 2024
8972382
Merge branch 'main' into file_originevents
AsuNa-jp Dec 16, 2024
774b768
Merge branch 'main' into file_originevents
AsuNa-jp Dec 17, 2024
13c7f29
Merge branch 'main' into file_originevents
AsuNa-jp Dec 25, 2024
00e95bc
Merge branch 'file_originevents' of github.com:AsuNa-jp/semantic-conv…
AsuNa-jp Dec 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .chloggen/file_originevents.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: file

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: This adds file.origin_referrer_url and file.origin_url attributes.
In addition, it also adds file.open event under event.yaml
AsuNa-jp marked this conversation as resolved.
Show resolved Hide resolved

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1430]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
10 changes: 8 additions & 2 deletions docs/attributes-registry/file.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ Describes file attributes.
| <a id="file-mode" href="#file-mode">`file.mode`</a> | string | Mode of the file in octal representation. | `0640` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-modified" href="#file-modified">`file.modified`</a> | string | Time when the file content was last modified, in ISO 8601 format. | `2021-01-01T12:00:00Z` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-name" href="#file-name">`file.name`</a> | string | Name of the file including the extension, without the directory. | `example.png` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-origin-referrer-url" href="#file-origin-referrer-url">`file.origin_referrer_url`</a> | string | The URL of the webpage that linked to the file. [7] | `http://example.com/article1.html` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-origin-url" href="#file-origin-url">`file.origin_url`</a> | string | The URL where the file is hosted. [8] | `http://example.com/imgs/article1_img1.jpg` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-owner-id" href="#file-owner-id">`file.owner.id`</a> | string | The user ID (UID) or security identifier (SID) of the file owner. | `1000` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-owner-name" href="#file-owner-name">`file.owner.name`</a> | string | Username of the file owner. | `root` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-path" href="#file-path">`file.path`</a> | string | Full path to the file, including the file name. It should include the drive letter, when appropriate. | `/home/alice/example.png`; `C:\Program Files\MyApp\myapp.exe` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-size" href="#file-size">`file.size`</a> | int | File size in bytes. | | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-symbolic-link-target-path" href="#file-symbolic-link-target-path">`file.symbolic_link.target_path`</a> | string | Path to the target of a symbolic link. [7] | `/usr/bin/python3` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| <a id="file-symbolic-link-target-path" href="#file-symbolic-link-target-path">`file.symbolic_link.target_path`</a> | string | Path to the target of a symbolic link. [9] | `/usr/bin/python3` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |

**[1] `file.accessed`:** This attribute might not be supported by some file systems — NFS, FAT32, in embedded OS, etc.

Expand All @@ -44,4 +46,8 @@ Describes file attributes.
**[6] `file.fork_name`:** On Linux, a resource fork is used to store additional data with a filesystem object. A file always has at least one fork for the data portion, and additional forks may exist.
On NTFS, this is analogous to an Alternate Data Stream (ADS), and the default data stream for a file is just called $DATA. Zone.Identifier is commonly used by Windows to track contents downloaded from the Internet. An ADS is typically of the form: C:\path\to\filename.extension:some_fork_name, and some_fork_name is the value that should populate `fork_name`. `filename.extension` should populate `file.name`, and `extension` should populate `file.extension`. The full path, `file.path`, will include the fork name.

**[7] `file.symbolic_link.target_path`:** This attribute is only applicable to symbolic links.
**[7] `file.origin_referrer_url`:** This information comes from metadata or alternate data streams linked to the file. `file.origin_url` represents the URL from which the file was downloaded, and `file.origin_referrer_url` indicates the URL of the page where that URL was listed. There may be cases where both `file.origin_url` and `file.origin_referrer_url` exist, or only one of them is present. Note that the URL itself may contain sensitive information.

**[8] `file.origin_url`:** This information comes from metadata or alternate data streams linked to the file. `file.origin_url` represents the URL from which the file was downloaded, and `file.origin_referrer_url` indicates the URL of the page where that URL was listed. There may be cases where both `file.origin_url` and `file.origin_referrer_url` exist, or only one of them is present. Note that the URL itself may contain sensitive information.

**[9] `file.symbolic_link.target_path`:** This attribute is only applicable to symbolic links.
31 changes: 31 additions & 0 deletions model/file/events.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
groups:
- id: event.file.open
stability: experimental
type: event
name: file.open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we qualify what open means? Below you mention it's actually an access event (which could probably mean other things, like setting/getting metadata). Should it be called file.access then?

Does it intend to capture OS-level audit events like https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-10/security/threat-protection/auditing/event-4663 or https://github.com/linux-audit/audit-userspace?tab=readme-ov-file#events?

brief: >
A file is defined as a set of information that has been created on,
or has existed on a filesystem.
A file open event represents the action of a process accessing a file
on the filesystem. It includes details such as the file's name,
path, directory, size, extension, and metadata, including
file access time, file origin information and more. It addition,
it also includes information about the process that accessed the file.
AsuNa-jp marked this conversation as resolved.
Show resolved Hide resolved
attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add note on who/how/when should emit this event.

E.g. OTel instrumentations are usually run in a certain process and would usually 1) monitor things this process does (not OS-wide things) 2) set process attributes as resource attributes, not as event attributes (because resource attributes are shared for the process lifetime and it's much more efficient).

If it's an external observer that monitors something on the OS level, we should call it out.

If it's either, then we should explain how and if process attributes should be recorded.

Copy link
Author

@AsuNa-jp AsuNa-jp Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lmolkova (CC: @trisch-me, @jsuereth )

Thanks you for the feedback!

not OS-wide things

As you pointed out, it was my misunderstanding—I was imagining OS-level events. (Since ECS can also handle that information, I ended up confusing the two)

let's add note on who/how/when should emit this event.
E.g. OTel instrumentations are usually run in a certain process

I'm not very familiar with the internal structure of Otel's instrumentation, so to be honest, it's difficult for me to provide an answer at this point. If possible, could you please share the specific technical documentation for the Otel instrumentation you're referring to? Additionally, if OS-level events are not expected, who is expected to send the file.access or file.open events? Could you share any assumptions or scenarios you had in mind?

Copy link
Contributor

@lmolkova lmolkova Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine a few scenarios for such events:

  • audit (and then we should probably describe how OS events are mapped to this one)
  • monitoring/observability - from within the process, I want to record an event every time file is opened and know IO operation details. For this one, we should record things like the file open mode, access permissions provided, and the error if it didn't happen.

- ref: file.name
- ref: file.path
- ref: file.directory
AsuNa-jp marked this conversation as resolved.
Show resolved Hide resolved
- ref: file.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always known when file is opened?

- ref: file.extension
AsuNa-jp marked this conversation as resolved.
Show resolved Hide resolved
- ref: file.accessed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be time it was accessed before it was opened this time? Is it available? Otherwise, it'd be the same as event timestamp and then it's not necessary.

- ref: file.created
- ref: file.owner.name
- ref: file.owner.id
- ref: file.origin_referrer_url
- ref: file.origin_url
- ref: process.pid
brief: Process id of the process that accessed the file.
- ref: process.user.name
brief: Process name of the process that accessed the file.
- ref: process.executable.name
brief: Executable file name of the process that accessed the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if file was attempted to be opened, but it failed - do we want to record this event? If so, we should add error.type attribute to it.

24 changes: 24 additions & 0 deletions model/file/registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,30 @@ groups:
The user ID (UID) or security identifier (SID) of the file owner.
stability: experimental
examples: ["1000"]
- id: file.origin_referrer_url
type: string
brief: >
The URL of the webpage that linked to the file.
note: >
This information comes from metadata or alternate data streams linked to the file.
`file.origin_url` represents the URL from which the file was downloaded, and `file.origin_referrer_url`
indicates the URL of the page where that URL was listed. There may be cases where both `file.origin_url`
and `file.origin_referrer_url` exist, or only one of them is present. Note that the URL itself may contain
sensitive information.
stability: experimental
examples: ['http://example.com/article1.html']
- id: file.origin_url
type: string
brief: >
The URL where the file is hosted.
note: >
This information comes from metadata or alternate data streams linked to the file.
`file.origin_url` represents the URL from which the file was downloaded, and `file.origin_referrer_url`
indicates the URL of the page where that URL was listed. There may be cases where both `file.origin_url`
and `file.origin_referrer_url` exist, or only one of them is present. Note that the URL itself may contain
sensitive information.
stability: experimental
examples: ['http://example.com/imgs/article1_img1.jpg']
- id: file.owner.name
type: string
brief: >
Expand Down