cassandra

2016-04-23

version

3.5

quick start

1 2	bin\cassandra bin\cqlsh

configuration

vi conf/logback.xml
vi conf/cassandra.yaml
	#data_file_directories視資料量通當指定到比較大的磁碟上
	data_file_directories: g:/var/lib/cassandra/data
	saved_caches_directory: g:/var/lib/cassandra/saved_caches
	#commitlog_directory可以放到比較快的SSD上
	commitlog_directory: g:/var/lib/cassandra/commitlog

test drop/create keyspace/table, import data, count table, find data

#####create keyspace/table
# use system; drop keyspace test;
#CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
#describe keyspaces; describe keyspace test;
use test;  
create table test(id varchar primary key, value text)
	with bloom_filter_fp_chance=1;
alter table test 
	with bloom_filter_fp_chance=1; 
desc test.test;
insert into test(id,value)values('testid','testvalue');
select * from test.test;
delete from test where id='testid';
select * from test.test;
#default 0.01, use 1 to disable bloomfilter and save memory
create table content (id varchar primary key, content text)
	with bloom_filter_fp_chance=1; 
#default 0.01, use 1 to disable bloomfilter and save memory
alter table content 
	with bloom_filter_fp_chance=1; 
#disable compression
alter table content WITH compression = { 'sstable_compression' : '' };
alter table content WITH compression = { 'sstable_compression' : 'LZ4Compressor' };
alter table content WITH compression = { 'sstable_compression' : 'DeflateCompressor'};
alter table content WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 };
select * from content;
~~select * from system.schema_keyspaces;~~
desc keyspaces;
desc keyspace test;
describe tables; 
describe table content;

#####import from file
copy content(id,content ) from 'g:\t\test.content.data.csv';

#####
use test; select count(*) from content limit 2147483647;
select * from content where id='01435a9f57718a8080e44358b3ff060b';

#

insert into content (id,content) values(‘test’,’test中文許功蓋’);

#

#bin\nodetool flush [keyspace] [cfnames]
bin\nodetool flush test
bin\nodetool -h localhost -p 7199 flush
bin\nodetool -h localhost -p 7199 cfstats
bin\nodetool -h localhost -p 7199 status
bin\nodetool -h localhost -p 7199 gcstatus
bin\nodetool -h localhost -p 7199 cleanup
bin\nodetool -h localhost -p 7199 cleanupsnapshot
bin\nodetool -h localhost -p 7199 compact
bin\nodetool -h localhost -p 7199 compactionhistory
bin\nodetool -h localhost -p 7199 compactionstatus
bin\nodetool -h localhost -p 7199 describecluster
bin\nodetool -h localhost -p 7199 describering
bin\nodetool -h localhost -p 7199 info
bin\nodetool -h localhost -p 7199 join
bin\nodetool -h localhost -p 7199 netstats
bin\nodetool -h localhost -p 7199 repair
bin\nodetool -h localhost -p 7199 rebuild
bin\nodetool -h localhost -p 7199 ring
bin\nodetool -h localhost -p 7199 tpstats
bin\nodetool -h localhost -p 7199 version

bin\nodetool flush &bin\nodetool repair&bin\nodetool compact
bin\nodetool info&bin\nodetool cfstats drifty&bin\nodetool compactionstats
bin\nodetool info&bin\nodetool tpstats
bin\nodetool status
bin\nodetool info
bin\nodetool flush
bin\nodetool cleanup
bin\nodetool repair
bin\nodetool ompactionhistory
bin\nodetool drain
bin\nodetool disableautocompaction drifty
bin\nodetool enableautocompaction drifty

當要使用bin\cqlsh操作copy from/to csv檔時，必須調整cassandra.yaml和cqlsh.py加大數值

cassandra.yaml

< batch_size_warn_threshold_in_kb: 5
---
> batch_size_warn_threshold_in_kb: 5000
681c683
< batch_size_fail_threshold_in_kb: 50
---
> batch_size_fail_threshold_in_kb: 50000
731c733
< read_request_timeout_in_ms: 5000
---
> read_request_timeout_in_ms: 50000
733c735
< range_request_timeout_in_ms: 10000
---
> range_request_timeout_in_ms: 100000
735c737
< write_request_timeout_in_ms: 2000
---
> write_request_timeout_in_ms: 20000
737c739
< counter_write_request_timeout_in_ms: 5000
---
> counter_write_request_timeout_in_ms: 50000
746c748
< request_timeout_in_ms: 10000
---
> request_timeout_in_ms: 100000

cqlsh.py

import csv
import getpass

#avoid "field larger than field limit (131072)"
csv.field_size_limit(sys.maxsize)

log timezone

比較新版的cassandra使用logback。透過conf/logback.xml修改設定。
預設是沒有指定timezone，所以顯示的時間跟本地時間不一致。
修改方式很簡單，只要在conf/logback.xml裡的加上時區即可。

1 2	<pattern>%-5level [%thread] %date{ISO8601,Asia/Taipei} %F:%L - %msg%n</pattern> <pattern>%-5level %date{HH:mm:ss.SSS,Asia/Taipei} %msg%n</pattern>

JVM crash

3.5
3.4
3.2.1
降到3.0.6後終於不再有JVM crash了
2.2.6 no crash

java options

1	-Duser.language=en -Duser.timezone=Asia/Taipei -Djavax.net.ssl.trustStore=d:/jssecacerts -Xverify:none^

-Dfile.encoding=utf-8不能用，不然console輸出的error msg會亂碼

1	-XX:+UseStringDeduplication -XX:+UseG1GC^

java8的G1GC到目前來看還不錯用。但xmx小於6G並不適合。

compaction tuning

可以說cassanda預設的設定值都是為cluster或高檔設備而存在的，對開發機使用很傷。常造成問題。
cassandra.yaml裡改設定concurrent_compactor=1。預設值太大，會讓系統hang著。
使用cql的copy指令大批量的匯入資料前先下nodetool disableautocompaction {keyspace} ，減少匯資料時compact搶IO。匯完資料再enableautocompaction就好。
使用cql的copy指令大批量的匯入資料一定要設定with numprocesses=1。一般7200rpm碟磁可以跑到2500rows/s，不限制也頂多跑到4500rows/s，但IO很忙。預設的numprocesses是{機器core數-1}，但disk跟不上時就很悲劇。

cqlsh很渣，無法處理\n換行字元

不像mysql的load data，用cqlsh操作copy from csv時，csv必需先跳脫\n成\\n不然會有問題。
所以匯出mysql資料給cqlsh匯入要使用replace(content,’\n’,’\\n’)；並且程式中在取出資料後解除跳脫字元 replace(content,”\\n”,”\n”)。

如果是使用程式操作cassandra-driver就不用多做這個處理。完全是cqlsh渣，用python寫的無法良好處理換行字元。