-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSTable max local deletion time allowing for missed deletion? #5
Comments
Thanks Jeff! I'll try setting the tombstone_compaction_interval to 21600 and unchecked_tombstone_compaction to true, along with increasing my concurrent_compactors. Also, here are my SSTables' details:
As you can see they are not very regular. I have 8 HDDs and 12 cores, so now that I know about the concurrent_compactors that should get my buckets more in sync. Watching compactionstats showed I really needed more. |
Hello Jeff,
I had a question on the same.
I referred the actual github repo for twcs
<https://github.com/jeffjirsa/twcs> , where in you suggest target fewer
than 50 buckets per table. After the window interval, all sstables are
compacted into 1 sstable file, after which it is never compacted and
further removed after the TTL . In that case, what's the reasoning behind
this recommendation for 50 buckets, how does it affect performance?
Could you please throw some light on this.
…On Tue, Mar 21, 2017 at 5:57 PM, rdzimmer ***@***.***> wrote:
Thanks Jeff! I'll try setting the tombstone_compaction_interval to 21600
and unchecked_tombstone_compaction to true, along with increasing my
concurrent_compactors.
I should have noted that I did see the recommendation about ~30 buckets
and plan to follow that. I'm targeting a TTL of 32 days with daily buckets,
but had simply dropped my KairosDB TTL to 15 days to speed up this
particular test of TWCS. In theory I don't see any issue with that since
it's just the current and last day's bucket that matter for compaction.
It's the per bucket size/load that's more important, and I'm keeping that
the same. I'll probably set this test's TTL to 3 days to get a quicker
result.
Also, here are my SSTables' details:
for x in `ls -tr | grep Data.db` ; do echo `ll $x`: `/cassandra/tools/bin/sstablemetadata $x | egrep 'timestamp|local deletion time'`; done
-rw-r--r-- 1 root root 25775981776 Mar 1 03:14 mc-73178-big-Data.db: Minimum timestamp: 1488239981308 Maximum timestamp: 1488326378869 SSTable min local deletion time: 1489535982 <01489%20535%20982> SSTable max local deletion time: 1489622378 <01489%20622%20378>
-rw-r--r-- 1 root root 25763526968 Mar 2 00:55 mc-76507-big-Data.db: Minimum timestamp: 1488326377931 Maximum timestamp: 1488412797870 SSTable min local deletion time: 1489622378 SSTable max local deletion time: 1489708797
-rw-r--r-- 1 root root 25774327437 Mar 3 04:16 mc-80736-big-Data.db: Minimum timestamp: 1488412797870 Maximum timestamp: 1488499188456 SSTable min local deletion time: 1489708798 SSTable max local deletion time: 1489795188
-rw-r--r-- 1 root root 25744791857 Mar 3 23:09 mc-83643-big-Data.db: Minimum timestamp: 1488499187529 Maximum timestamp: 1488585595207 SSTable min local deletion time: 1489795188 SSTable max local deletion time: 1489881595
-rw-r--r-- 1 root root 25674236129 Mar 4 23:21 mc-87457-big-Data.db: Minimum timestamp: 1488585594227 Maximum timestamp: 1488671974965 SSTable min local deletion time: 1489881595 SSTable max local deletion time: 1489967975
-rw-r--r-- 1 root root 25677519446 Mar 5 23:09 mc-91174-big-Data.db: Minimum timestamp: 1488671973975 Maximum timestamp: 1488758389913 SSTable min local deletion time: 1489967975 SSTable max local deletion time: 1490054389
-rw-r--r-- 1 root root 25707713475 Mar 7 00:15 mc-95079-big-Data.db: Minimum timestamp: 1488758389928 Maximum timestamp: 1488844797257 SSTable min local deletion time: 1490054390 SSTable max local deletion time: 1490140797
-rw-r--r-- 1 root root 25697918823 Mar 7 23:14 mc-98692-big-Data.db: Minimum timestamp: 1488844796272 Maximum timestamp: 1488931173843 SSTable min local deletion time: 1490140797 SSTable max local deletion time: 1490227174
-rw-r--r-- 1 root root 25722794972 Mar 8 23:13 mc-102428-big-Data.db: Minimum timestamp: 1488931172991 Maximum timestamp: 1489017581479 SSTable min local deletion time: 1490227174 SSTable max local deletion time: 1490313581
-rw-r--r-- 1 root root 25722114390 Mar 10 03:23 mc-106813-big-Data.db: Minimum timestamp: 1489017580639 Maximum timestamp: 1489103996566 SSTable min local deletion time: 1490313581 SSTable max local deletion time: 1490399996
-rw-r--r-- 1 root root 25722330473 Mar 11 06:03 mc-110829-big-Data.db: Minimum timestamp: 1489103995711 Maximum timestamp: 1489190381397 SSTable min local deletion time: 1490399996 SSTable max local deletion time: 1490486381
-rw-r--r-- 1 root root 25689467834 Mar 12 05:08 mc-114146-big-Data.db: Minimum timestamp: 1489190380467 Maximum timestamp: 1489276795947 SSTable min local deletion time: 1490486381 SSTable max local deletion time: 1490572796
-rw-r--r-- 1 root root 24963381582 Mar 13 04:39 mc-117761-big-Data.db: Minimum timestamp: 1489276795948 Maximum timestamp: 1489363176865 SSTable min local deletion time: 1490572797 SSTable max local deletion time: 1490659176
-rw-r--r-- 1 root root 25758520288 Mar 14 05:26 mc-121558-big-Data.db: Minimum timestamp: 1489363176865 Maximum timestamp: 1489449591652 SSTable min local deletion time: 1490659177 SSTable max local deletion time: 1490745591
-rw-r--r-- 1 root root 25680751154 Mar 15 04:19 mc-125048-big-Data.db: Minimum timestamp: 1489449590781 Maximum timestamp: 1489535981490 SSTable min local deletion time: 1490745591 SSTable max local deletion time: 1490831981
-rw-r--r-- 1 root root 25658943529 Mar 16 00:26 mc-128153-big-Data.db: Minimum timestamp: 1489535980629 Maximum timestamp: 1489622394923 SSTable min local deletion time: 1490831981 SSTable max local deletion time: 1490918395
-rw-r--r-- 1 root root 25666844623 Mar 17 01:54 mc-132129-big-Data.db: Minimum timestamp: 1489622394990 Maximum timestamp: 1489708780077 SSTable min local deletion time: 1490918395 SSTable max local deletion time: 1491004780
-rw-r--r-- 1 root root 25541794857 Mar 18 00:09 mc-135608-big-Data.db: Minimum timestamp: 1489708779090 Maximum timestamp: 1489795184466 SSTable min local deletion time: 1491004780 SSTable max local deletion time: 1491091184
-rw-r--r-- 1 root root 25450947515 Mar 19 04:49 mc-140068-big-Data.db: Minimum timestamp: 1489795183488 Maximum timestamp: 1489881575927 SSTable min local deletion time: 1491091184 SSTable max local deletion time: 1491177576
-rw-r--r-- 1 root root 25464485874 Mar 20 00:08 mc-143029-big-Data.db: Minimum timestamp: 1489881575928 Maximum timestamp: 1489967974089 SSTable min local deletion time: 1491177577 SSTable max local deletion time: 1491263974
As you can see they are not very regular. I have 8 HDDs and 12 cores, so
now that I know about the concurrent_compactors that should get my buckets
more in sync. Watching compactionstats showed I really needed more.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIaR7lKtj2EPYEUYtGhrgR2X4WryDu-nks5rn8IZgaJpZM4Mi9xU>
.
--
Regards
Hemalatha
|
I may be able to help a little on that. After generating the same data set and querying the entire timeframe (hitting all of the TWCS buckets), with 365 buckets my response time was 11.5 second, but only 6.5 seconds for 52 buckets. I believe that while C* can very quickly identify which SSTables files have data that it needs to read from, the actual act of reading the non-cached data from more individual files slows performance, especially with HDDs. On the other hand, too few buckets and your compactions are larger and take more time/resources. |
In order to speed up the testing, I've tried running with 1 hour buckets and a 6 hour TTL. I adjusted tombstone_compaction_interval to 900 (15 minutes) and set unchecked_tombstone_compaction to true. I also set concurrent_compactors to 8, which has prevented me from running out of compactors (I'm usually using 2~4). My 12 cores are < 40% utilized and my disk array is <8% utilization. I am ingesting ~900K datapoints per minute, or ~1GB of data per hour.
|
It's possible I was looking at the wrong tombstone compaction property tombstone_compaction_interval. I'm seeing if changing gc_grace_seconds helps. This TWCS testing has been on a single node, although I also have multi-node clusters. |
Hi,
I've been testing with TWCS and KairosDB. My KairosDB TTL for data is 15 days. Here is the SCHEMA (note the 'timestamp_resolution': 'MILLISECONDS'):
TWCS is working and creating daily SSTables, but the 2 compactors are usually very busy throughout the day. I believe I need to allocate more than the 2 default concurrent_compactors given my system size and load. I have extra disk IO and CPU capacity, so adding more should be okay. Unfortunately, I now have old SSTables that have expired but are not deleted. Instead of 15 days of daily SSTables I have 25 and growing. I didn't have any issues when testing with smaller loads, which is why I figured I needed more concurrent_compactors.
My issue is, when I stopped my incoming data, I expected the compactors to free up and clean up the old expired SSTables. However, the compactors are done and the expired SSTables are still there. Looking at the tables I see this:
I'm wondering if there was a reason for giving a "max local deletion time"? If that means what I think it does, my old SSTables have expired but will not be deleted since they missed the min/max local deletion time period. A quick google search for "SSTable min local deletion time" showed it is frequently set to 2147483647. Please let me know if there is any other information I can provide. Sorry if I'm miss-understanding those or have miss-configured TWCS/KairosDB. Thanks in advance!
The text was updated successfully, but these errors were encountered: